✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

* ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
@ 2026-01-22  9:42 ` Patchwork
  2026-01-22  9:43 ` ✓ CI.KUnit: success " Patchwork
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Patchwork @ 2026-01-22  9:42 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling
URL   : https://patchwork.freedesktop.org/series/160482/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
1f57ba1afceae32108bd24770069f764d940a0e4
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit bcc1ba1c41f29174a9fdb42feea194d18ecfb69a
Author: Riana Tauro <riana.tauro@intel.com>
Date:   Thu Jan 22 15:36:20 2026 +0530

    drm/xe/xe_pci_error: Process errors in mmio_enabled
    
    Query system controller when any non fatal error occurs to check
    the type of the error, contain and recover.
    
    The system controller is queried in the mmio_enabled callback.
    
    Signed-off-by: Riana Tauro <riana.tauro@intel.com>
+ /mt/dim checkpatch a3ecd278f9a05323fab7471760a7ea10081251d6 drm-intel
d51de1f66d6b drm/xe/xe_sysctrl: Add System controller patch
-:26: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#26: 
new file mode 100644

-:838: ERROR:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Anoop Vijay <anoop.c.vijay@intel.com>'

total: 1 errors, 1 warnings, 0 checks, 756 lines checked
0cb5e94d1a88 drm/xe/xe_pci_error: Implement PCI error recovery callbacks
-:98: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#98: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 137 lines checked
5d864a6762e3 drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset
e160972b6c2b drm/xe: Skip device access during PCI error recovery
5f63d857187d drm/xe/xe_ras: Initialize Uncorrectable AER Registers
-:52: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#52: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 106 lines checked
798a8d3a9d74 drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
-:80: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#80: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 212 lines checked
934e28278724 drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
bcc1ba1c41f2 drm/xe/xe_pci_error: Process errors in mmio_enabled



^ permalink raw reply	[flat|nested] 41+ messages in thread

* ✓ CI.KUnit: success for Introduce Xe Uncorrectable Error Handling
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
  2026-01-22  9:42 ` ✗ CI.checkpatch: warning for " Patchwork
@ 2026-01-22  9:43 ` Patchwork
  2026-01-22 10:06 ` [PATCH 1/8] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Patchwork @ 2026-01-22  9:43 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling
URL   : https://patchwork.freedesktop.org/series/160482/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[09:42:12] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[09:42:16] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=25
[09:42:55] Starting KUnit Kernel (1/1)...
[09:42:55] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[09:42:55] ================== guc_buf (11 subtests) ===================
[09:42:55] [PASSED] test_smallest
[09:42:55] [PASSED] test_largest
[09:42:55] [PASSED] test_granular
[09:42:55] [PASSED] test_unique
[09:42:55] [PASSED] test_overlap
[09:42:55] [PASSED] test_reusable
[09:42:55] [PASSED] test_too_big
[09:42:55] [PASSED] test_flush
[09:42:55] [PASSED] test_lookup
[09:42:55] [PASSED] test_data
[09:42:55] [PASSED] test_class
[09:42:55] ===================== [PASSED] guc_buf =====================
[09:42:55] =================== guc_dbm (7 subtests) ===================
[09:42:55] [PASSED] test_empty
[09:42:55] [PASSED] test_default
[09:42:55] ======================== test_size  ========================
[09:42:55] [PASSED] 4
[09:42:55] [PASSED] 8
[09:42:55] [PASSED] 32
[09:42:55] [PASSED] 256
[09:42:55] ==================== [PASSED] test_size ====================
[09:42:55] ======================= test_reuse  ========================
[09:42:55] [PASSED] 4
[09:42:55] [PASSED] 8
[09:42:55] [PASSED] 32
[09:42:55] [PASSED] 256
[09:42:55] =================== [PASSED] test_reuse ====================
[09:42:55] =================== test_range_overlap  ====================
[09:42:55] [PASSED] 4
[09:42:55] [PASSED] 8
[09:42:55] [PASSED] 32
[09:42:55] [PASSED] 256
[09:42:55] =============== [PASSED] test_range_overlap ================
[09:42:55] =================== test_range_compact  ====================
[09:42:55] [PASSED] 4
[09:42:55] [PASSED] 8
[09:42:55] [PASSED] 32
[09:42:55] [PASSED] 256
[09:42:55] =============== [PASSED] test_range_compact ================
[09:42:55] ==================== test_range_spare  =====================
[09:42:55] [PASSED] 4
[09:42:55] [PASSED] 8
[09:42:55] [PASSED] 32
[09:42:55] [PASSED] 256
[09:42:55] ================ [PASSED] test_range_spare =================
[09:42:55] ===================== [PASSED] guc_dbm =====================
[09:42:55] =================== guc_idm (6 subtests) ===================
[09:42:55] [PASSED] bad_init
[09:42:55] [PASSED] no_init
[09:42:55] [PASSED] init_fini
[09:42:55] [PASSED] check_used
[09:42:55] [PASSED] check_quota
[09:42:55] [PASSED] check_all
[09:42:55] ===================== [PASSED] guc_idm =====================
[09:42:55] ================== no_relay (3 subtests) ===================
[09:42:55] [PASSED] xe_drops_guc2pf_if_not_ready
[09:42:55] [PASSED] xe_drops_guc2vf_if_not_ready
[09:42:55] [PASSED] xe_rejects_send_if_not_ready
[09:42:55] ==================== [PASSED] no_relay =====================
[09:42:55] ================== pf_relay (14 subtests) ==================
[09:42:55] [PASSED] pf_rejects_guc2pf_too_short
[09:42:55] [PASSED] pf_rejects_guc2pf_too_long
[09:42:55] [PASSED] pf_rejects_guc2pf_no_payload
[09:42:55] [PASSED] pf_fails_no_payload
[09:42:55] [PASSED] pf_fails_bad_origin
[09:42:55] [PASSED] pf_fails_bad_type
[09:42:55] [PASSED] pf_txn_reports_error
[09:42:55] [PASSED] pf_txn_sends_pf2guc
[09:42:55] [PASSED] pf_sends_pf2guc
[09:42:55] [SKIPPED] pf_loopback_nop
[09:42:55] [SKIPPED] pf_loopback_echo
[09:42:55] [SKIPPED] pf_loopback_fail
[09:42:55] [SKIPPED] pf_loopback_busy
[09:42:55] [SKIPPED] pf_loopback_retry
[09:42:55] ==================== [PASSED] pf_relay =====================
[09:42:55] ================== vf_relay (3 subtests) ===================
[09:42:55] [PASSED] vf_rejects_guc2vf_too_short
[09:42:55] [PASSED] vf_rejects_guc2vf_too_long
[09:42:55] [PASSED] vf_rejects_guc2vf_no_payload
[09:42:55] ==================== [PASSED] vf_relay =====================
[09:42:55] ================ pf_gt_config (6 subtests) =================
[09:42:55] [PASSED] fair_contexts_1vf
[09:42:55] [PASSED] fair_doorbells_1vf
[09:42:55] [PASSED] fair_ggtt_1vf
[09:42:55] ====================== fair_contexts  ======================
[09:42:55] [PASSED] 1 VF
[09:42:55] [PASSED] 2 VFs
[09:42:55] [PASSED] 3 VFs
[09:42:55] [PASSED] 4 VFs
[09:42:55] [PASSED] 5 VFs
[09:42:55] [PASSED] 6 VFs
[09:42:55] [PASSED] 7 VFs
[09:42:55] [PASSED] 8 VFs
[09:42:55] [PASSED] 9 VFs
[09:42:55] [PASSED] 10 VFs
[09:42:55] [PASSED] 11 VFs
[09:42:55] [PASSED] 12 VFs
[09:42:55] [PASSED] 13 VFs
[09:42:55] [PASSED] 14 VFs
[09:42:55] [PASSED] 15 VFs
[09:42:55] [PASSED] 16 VFs
[09:42:55] [PASSED] 17 VFs
[09:42:55] [PASSED] 18 VFs
[09:42:55] [PASSED] 19 VFs
[09:42:55] [PASSED] 20 VFs
[09:42:55] [PASSED] 21 VFs
[09:42:55] [PASSED] 22 VFs
[09:42:55] [PASSED] 23 VFs
[09:42:55] [PASSED] 24 VFs
[09:42:55] [PASSED] 25 VFs
[09:42:55] [PASSED] 26 VFs
[09:42:55] [PASSED] 27 VFs
[09:42:55] [PASSED] 28 VFs
[09:42:55] [PASSED] 29 VFs
[09:42:55] [PASSED] 30 VFs
[09:42:55] [PASSED] 31 VFs
[09:42:55] [PASSED] 32 VFs
[09:42:55] [PASSED] 33 VFs
[09:42:55] [PASSED] 34 VFs
[09:42:55] [PASSED] 35 VFs
[09:42:55] [PASSED] 36 VFs
[09:42:55] [PASSED] 37 VFs
[09:42:55] [PASSED] 38 VFs
[09:42:55] [PASSED] 39 VFs
[09:42:55] [PASSED] 40 VFs
[09:42:55] [PASSED] 41 VFs
[09:42:55] [PASSED] 42 VFs
[09:42:55] [PASSED] 43 VFs
[09:42:55] [PASSED] 44 VFs
[09:42:55] [PASSED] 45 VFs
[09:42:55] [PASSED] 46 VFs
[09:42:55] [PASSED] 47 VFs
[09:42:55] [PASSED] 48 VFs
[09:42:55] [PASSED] 49 VFs
[09:42:55] [PASSED] 50 VFs
[09:42:55] [PASSED] 51 VFs
[09:42:55] [PASSED] 52 VFs
[09:42:55] [PASSED] 53 VFs
[09:42:55] [PASSED] 54 VFs
[09:42:55] [PASSED] 55 VFs
[09:42:55] [PASSED] 56 VFs
[09:42:55] [PASSED] 57 VFs
[09:42:55] [PASSED] 58 VFs
[09:42:55] [PASSED] 59 VFs
[09:42:55] [PASSED] 60 VFs
[09:42:55] [PASSED] 61 VFs
[09:42:55] [PASSED] 62 VFs
[09:42:55] [PASSED] 63 VFs
[09:42:55] ================== [PASSED] fair_contexts ==================
[09:42:55] ===================== fair_doorbells  ======================
[09:42:55] [PASSED] 1 VF
[09:42:55] [PASSED] 2 VFs
[09:42:55] [PASSED] 3 VFs
[09:42:55] [PASSED] 4 VFs
[09:42:55] [PASSED] 5 VFs
[09:42:55] [PASSED] 6 VFs
[09:42:55] [PASSED] 7 VFs
[09:42:55] [PASSED] 8 VFs
[09:42:55] [PASSED] 9 VFs
[09:42:55] [PASSED] 10 VFs
[09:42:55] [PASSED] 11 VFs
[09:42:56] [PASSED] 12 VFs
[09:42:56] [PASSED] 13 VFs
[09:42:56] [PASSED] 14 VFs
[09:42:56] [PASSED] 15 VFs
[09:42:56] [PASSED] 16 VFs
[09:42:56] [PASSED] 17 VFs
[09:42:56] [PASSED] 18 VFs
[09:42:56] [PASSED] 19 VFs
[09:42:56] [PASSED] 20 VFs
[09:42:56] [PASSED] 21 VFs
[09:42:56] [PASSED] 22 VFs
[09:42:56] [PASSED] 23 VFs
[09:42:56] [PASSED] 24 VFs
[09:42:56] [PASSED] 25 VFs
[09:42:56] [PASSED] 26 VFs
[09:42:56] [PASSED] 27 VFs
[09:42:56] [PASSED] 28 VFs
[09:42:56] [PASSED] 29 VFs
[09:42:56] [PASSED] 30 VFs
[09:42:56] [PASSED] 31 VFs
[09:42:56] [PASSED] 32 VFs
[09:42:56] [PASSED] 33 VFs
[09:42:56] [PASSED] 34 VFs
[09:42:56] [PASSED] 35 VFs
[09:42:56] [PASSED] 36 VFs
[09:42:56] [PASSED] 37 VFs
[09:42:56] [PASSED] 38 VFs
[09:42:56] [PASSED] 39 VFs
[09:42:56] [PASSED] 40 VFs
[09:42:56] [PASSED] 41 VFs
[09:42:56] [PASSED] 42 VFs
[09:42:56] [PASSED] 43 VFs
[09:42:56] [PASSED] 44 VFs
[09:42:56] [PASSED] 45 VFs
[09:42:56] [PASSED] 46 VFs
[09:42:56] [PASSED] 47 VFs
[09:42:56] [PASSED] 48 VFs
[09:42:56] [PASSED] 49 VFs
[09:42:56] [PASSED] 50 VFs
[09:42:56] [PASSED] 51 VFs
[09:42:56] [PASSED] 52 VFs
[09:42:56] [PASSED] 53 VFs
[09:42:56] [PASSED] 54 VFs
[09:42:56] [PASSED] 55 VFs
[09:42:56] [PASSED] 56 VFs
[09:42:56] [PASSED] 57 VFs
[09:42:56] [PASSED] 58 VFs
[09:42:56] [PASSED] 59 VFs
[09:42:56] [PASSED] 60 VFs
[09:42:56] [PASSED] 61 VFs
[09:42:56] [PASSED] 62 VFs
[09:42:56] [PASSED] 63 VFs
[09:42:56] ================= [PASSED] fair_doorbells ==================
[09:42:56] ======================== fair_ggtt  ========================
[09:42:56] [PASSED] 1 VF
[09:42:56] [PASSED] 2 VFs
[09:42:56] [PASSED] 3 VFs
[09:42:56] [PASSED] 4 VFs
[09:42:56] [PASSED] 5 VFs
[09:42:56] [PASSED] 6 VFs
[09:42:56] [PASSED] 7 VFs
[09:42:56] [PASSED] 8 VFs
[09:42:56] [PASSED] 9 VFs
[09:42:56] [PASSED] 10 VFs
[09:42:56] [PASSED] 11 VFs
[09:42:56] [PASSED] 12 VFs
[09:42:56] [PASSED] 13 VFs
[09:42:56] [PASSED] 14 VFs
[09:42:56] [PASSED] 15 VFs
[09:42:56] [PASSED] 16 VFs
[09:42:56] [PASSED] 17 VFs
[09:42:56] [PASSED] 18 VFs
[09:42:56] [PASSED] 19 VFs
[09:42:56] [PASSED] 20 VFs
[09:42:56] [PASSED] 21 VFs
[09:42:56] [PASSED] 22 VFs
[09:42:56] [PASSED] 23 VFs
[09:42:56] [PASSED] 24 VFs
[09:42:56] [PASSED] 25 VFs
[09:42:56] [PASSED] 26 VFs
[09:42:56] [PASSED] 27 VFs
[09:42:56] [PASSED] 28 VFs
[09:42:56] [PASSED] 29 VFs
[09:42:56] [PASSED] 30 VFs
[09:42:56] [PASSED] 31 VFs
[09:42:56] [PASSED] 32 VFs
[09:42:56] [PASSED] 33 VFs
[09:42:56] [PASSED] 34 VFs
[09:42:56] [PASSED] 35 VFs
[09:42:56] [PASSED] 36 VFs
[09:42:56] [PASSED] 37 VFs
[09:42:56] [PASSED] 38 VFs
[09:42:56] [PASSED] 39 VFs
[09:42:56] [PASSED] 40 VFs
[09:42:56] [PASSED] 41 VFs
[09:42:56] [PASSED] 42 VFs
[09:42:56] [PASSED] 43 VFs
[09:42:56] [PASSED] 44 VFs
[09:42:56] [PASSED] 45 VFs
[09:42:56] [PASSED] 46 VFs
[09:42:56] [PASSED] 47 VFs
[09:42:56] [PASSED] 48 VFs
[09:42:56] [PASSED] 49 VFs
[09:42:56] [PASSED] 50 VFs
[09:42:56] [PASSED] 51 VFs
[09:42:56] [PASSED] 52 VFs
[09:42:56] [PASSED] 53 VFs
[09:42:56] [PASSED] 54 VFs
[09:42:56] [PASSED] 55 VFs
[09:42:56] [PASSED] 56 VFs
[09:42:56] [PASSED] 57 VFs
[09:42:56] [PASSED] 58 VFs
[09:42:56] [PASSED] 59 VFs
[09:42:56] [PASSED] 60 VFs
[09:42:56] [PASSED] 61 VFs
[09:42:56] [PASSED] 62 VFs
[09:42:56] [PASSED] 63 VFs
[09:42:56] ==================== [PASSED] fair_ggtt ====================
[09:42:56] ================== [PASSED] pf_gt_config ===================
[09:42:56] ===================== lmtt (1 subtest) =====================
[09:42:56] ======================== test_ops  =========================
[09:42:56] [PASSED] 2-level
[09:42:56] [PASSED] multi-level
[09:42:56] ==================== [PASSED] test_ops =====================
[09:42:56] ====================== [PASSED] lmtt =======================
[09:42:56] ================= pf_service (11 subtests) =================
[09:42:56] [PASSED] pf_negotiate_any
[09:42:56] [PASSED] pf_negotiate_base_match
[09:42:56] [PASSED] pf_negotiate_base_newer
[09:42:56] [PASSED] pf_negotiate_base_next
[09:42:56] [SKIPPED] pf_negotiate_base_older
[09:42:56] [PASSED] pf_negotiate_base_prev
[09:42:56] [PASSED] pf_negotiate_latest_match
[09:42:56] [PASSED] pf_negotiate_latest_newer
[09:42:56] [PASSED] pf_negotiate_latest_next
[09:42:56] [SKIPPED] pf_negotiate_latest_older
[09:42:56] [SKIPPED] pf_negotiate_latest_prev
[09:42:56] =================== [PASSED] pf_service ====================
[09:42:56] ================= xe_guc_g2g (2 subtests) ==================
[09:42:56] ============== xe_live_guc_g2g_kunit_default  ==============
[09:42:56] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[09:42:56] ============== xe_live_guc_g2g_kunit_allmem  ===============
[09:42:56] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[09:42:56] =================== [SKIPPED] xe_guc_g2g ===================
[09:42:56] =================== xe_mocs (2 subtests) ===================
[09:42:56] ================ xe_live_mocs_kernel_kunit  ================
[09:42:56] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[09:42:56] ================ xe_live_mocs_reset_kunit  =================
[09:42:56] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[09:42:56] ==================== [SKIPPED] xe_mocs =====================
[09:42:56] ================= xe_migrate (2 subtests) ==================
[09:42:56] ================= xe_migrate_sanity_kunit  =================
[09:42:56] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[09:42:56] ================== xe_validate_ccs_kunit  ==================
[09:42:56] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[09:42:56] =================== [SKIPPED] xe_migrate ===================
[09:42:56] ================== xe_dma_buf (1 subtest) ==================
[09:42:56] ==================== xe_dma_buf_kunit  =====================
[09:42:56] ================ [SKIPPED] xe_dma_buf_kunit ================
[09:42:56] =================== [SKIPPED] xe_dma_buf ===================
[09:42:56] ================= xe_bo_shrink (1 subtest) =================
[09:42:56] =================== xe_bo_shrink_kunit  ====================
[09:42:56] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[09:42:56] ================== [SKIPPED] xe_bo_shrink ==================
[09:42:56] ==================== xe_bo (2 subtests) ====================
[09:42:56] ================== xe_ccs_migrate_kunit  ===================
[09:42:56] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[09:42:56] ==================== xe_bo_evict_kunit  ====================
[09:42:56] =============== [SKIPPED] xe_bo_evict_kunit ================
[09:42:56] ===================== [SKIPPED] xe_bo ======================
[09:42:56] ==================== args (13 subtests) ====================
[09:42:56] [PASSED] count_args_test
[09:42:56] [PASSED] call_args_example
[09:42:56] [PASSED] call_args_test
[09:42:56] [PASSED] drop_first_arg_example
[09:42:56] [PASSED] drop_first_arg_test
[09:42:56] [PASSED] first_arg_example
[09:42:56] [PASSED] first_arg_test
[09:42:56] [PASSED] last_arg_example
[09:42:56] [PASSED] last_arg_test
[09:42:56] [PASSED] pick_arg_example
[09:42:56] [PASSED] if_args_example
[09:42:56] [PASSED] if_args_test
[09:42:56] [PASSED] sep_comma_example
[09:42:56] ====================== [PASSED] args =======================
[09:42:56] =================== xe_pci (3 subtests) ====================
[09:42:56] ==================== check_graphics_ip  ====================
[09:42:56] [PASSED] 12.00 Xe_LP
[09:42:56] [PASSED] 12.10 Xe_LP+
[09:42:56] [PASSED] 12.55 Xe_HPG
[09:42:56] [PASSED] 12.60 Xe_HPC
[09:42:56] [PASSED] 12.70 Xe_LPG
[09:42:56] [PASSED] 12.71 Xe_LPG
[09:42:56] [PASSED] 12.74 Xe_LPG+
[09:42:56] [PASSED] 20.01 Xe2_HPG
[09:42:56] [PASSED] 20.02 Xe2_HPG
[09:42:56] [PASSED] 20.04 Xe2_LPG
[09:42:56] [PASSED] 30.00 Xe3_LPG
[09:42:56] [PASSED] 30.01 Xe3_LPG
[09:42:56] [PASSED] 30.03 Xe3_LPG
[09:42:56] [PASSED] 30.04 Xe3_LPG
[09:42:56] [PASSED] 30.05 Xe3_LPG
[09:42:56] [PASSED] 35.11 Xe3p_XPC
[09:42:56] ================ [PASSED] check_graphics_ip ================
[09:42:56] ===================== check_media_ip  ======================
[09:42:56] [PASSED] 12.00 Xe_M
[09:42:56] [PASSED] 12.55 Xe_HPM
[09:42:56] [PASSED] 13.00 Xe_LPM+
[09:42:56] [PASSED] 13.01 Xe2_HPM
[09:42:56] [PASSED] 20.00 Xe2_LPM
[09:42:56] [PASSED] 30.00 Xe3_LPM
[09:42:56] [PASSED] 30.02 Xe3_LPM
[09:42:56] [PASSED] 35.00 Xe3p_LPM
[09:42:56] [PASSED] 35.03 Xe3p_HPM
[09:42:56] ================= [PASSED] check_media_ip ==================
[09:42:56] =================== check_platform_desc  ===================
[09:42:56] [PASSED] 0x9A60 (TIGERLAKE)
[09:42:56] [PASSED] 0x9A68 (TIGERLAKE)
[09:42:56] [PASSED] 0x9A70 (TIGERLAKE)
[09:42:56] [PASSED] 0x9A40 (TIGERLAKE)
[09:42:56] [PASSED] 0x9A49 (TIGERLAKE)
[09:42:56] [PASSED] 0x9A59 (TIGERLAKE)
[09:42:56] [PASSED] 0x9A78 (TIGERLAKE)
[09:42:56] [PASSED] 0x9AC0 (TIGERLAKE)
[09:42:56] [PASSED] 0x9AC9 (TIGERLAKE)
[09:42:56] [PASSED] 0x9AD9 (TIGERLAKE)
[09:42:56] [PASSED] 0x9AF8 (TIGERLAKE)
[09:42:56] [PASSED] 0x4C80 (ROCKETLAKE)
[09:42:56] [PASSED] 0x4C8A (ROCKETLAKE)
[09:42:56] [PASSED] 0x4C8B (ROCKETLAKE)
[09:42:56] [PASSED] 0x4C8C (ROCKETLAKE)
[09:42:56] [PASSED] 0x4C90 (ROCKETLAKE)
[09:42:56] [PASSED] 0x4C9A (ROCKETLAKE)
[09:42:56] [PASSED] 0x4680 (ALDERLAKE_S)
[09:42:56] [PASSED] 0x4682 (ALDERLAKE_S)
[09:42:56] [PASSED] 0x4688 (ALDERLAKE_S)
[09:42:56] [PASSED] 0x468A (ALDERLAKE_S)
[09:42:56] [PASSED] 0x468B (ALDERLAKE_S)
[09:42:56] [PASSED] 0x4690 (ALDERLAKE_S)
[09:42:56] [PASSED] 0x4692 (ALDERLAKE_S)
[09:42:56] [PASSED] 0x4693 (ALDERLAKE_S)
[09:42:56] [PASSED] 0x46A0 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46A1 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46A2 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46A3 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46A6 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46A8 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46AA (ALDERLAKE_P)
[09:42:56] [PASSED] 0x462A (ALDERLAKE_P)
[09:42:56] [PASSED] 0x4626 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x4628 (ALDERLAKE_P)
stty: 'standard input': Inappropriate ioctl for device
[09:42:56] [PASSED] 0x46B0 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46B1 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46B2 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46B3 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46C0 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46C1 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46C2 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46C3 (ALDERLAKE_P)
[09:42:56] [PASSED] 0x46D0 (ALDERLAKE_N)
[09:42:56] [PASSED] 0x46D1 (ALDERLAKE_N)
[09:42:56] [PASSED] 0x46D2 (ALDERLAKE_N)
[09:42:56] [PASSED] 0x46D3 (ALDERLAKE_N)
[09:42:56] [PASSED] 0x46D4 (ALDERLAKE_N)
[09:42:56] [PASSED] 0xA721 (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7A1 (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7A9 (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7AC (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7AD (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA720 (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7A0 (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7A8 (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7AA (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA7AB (ALDERLAKE_P)
[09:42:56] [PASSED] 0xA780 (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA781 (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA782 (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA783 (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA788 (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA789 (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA78A (ALDERLAKE_S)
[09:42:56] [PASSED] 0xA78B (ALDERLAKE_S)
[09:42:56] [PASSED] 0x4905 (DG1)
[09:42:56] [PASSED] 0x4906 (DG1)
[09:42:56] [PASSED] 0x4907 (DG1)
[09:42:56] [PASSED] 0x4908 (DG1)
[09:42:56] [PASSED] 0x4909 (DG1)
[09:42:56] [PASSED] 0x56C0 (DG2)
[09:42:56] [PASSED] 0x56C2 (DG2)
[09:42:56] [PASSED] 0x56C1 (DG2)
[09:42:56] [PASSED] 0x7D51 (METEORLAKE)
[09:42:56] [PASSED] 0x7DD1 (METEORLAKE)
[09:42:56] [PASSED] 0x7D41 (METEORLAKE)
[09:42:56] [PASSED] 0x7D67 (METEORLAKE)
[09:42:56] [PASSED] 0xB640 (METEORLAKE)
[09:42:56] [PASSED] 0x56A0 (DG2)
[09:42:56] [PASSED] 0x56A1 (DG2)
[09:42:56] [PASSED] 0x56A2 (DG2)
[09:42:56] [PASSED] 0x56BE (DG2)
[09:42:56] [PASSED] 0x56BF (DG2)
[09:42:56] [PASSED] 0x5690 (DG2)
[09:42:56] [PASSED] 0x5691 (DG2)
[09:42:56] [PASSED] 0x5692 (DG2)
[09:42:56] [PASSED] 0x56A5 (DG2)
[09:42:56] [PASSED] 0x56A6 (DG2)
[09:42:56] [PASSED] 0x56B0 (DG2)
[09:42:56] [PASSED] 0x56B1 (DG2)
[09:42:56] [PASSED] 0x56BA (DG2)
[09:42:56] [PASSED] 0x56BB (DG2)
[09:42:56] [PASSED] 0x56BC (DG2)
[09:42:56] [PASSED] 0x56BD (DG2)
[09:42:56] [PASSED] 0x5693 (DG2)
[09:42:56] [PASSED] 0x5694 (DG2)
[09:42:56] [PASSED] 0x5695 (DG2)
[09:42:56] [PASSED] 0x56A3 (DG2)
[09:42:56] [PASSED] 0x56A4 (DG2)
[09:42:56] [PASSED] 0x56B2 (DG2)
[09:42:56] [PASSED] 0x56B3 (DG2)
[09:42:56] [PASSED] 0x5696 (DG2)
[09:42:56] [PASSED] 0x5697 (DG2)
[09:42:56] [PASSED] 0xB69 (PVC)
[09:42:56] [PASSED] 0xB6E (PVC)
[09:42:56] [PASSED] 0xBD4 (PVC)
[09:42:56] [PASSED] 0xBD5 (PVC)
[09:42:56] [PASSED] 0xBD6 (PVC)
[09:42:56] [PASSED] 0xBD7 (PVC)
[09:42:56] [PASSED] 0xBD8 (PVC)
[09:42:56] [PASSED] 0xBD9 (PVC)
[09:42:56] [PASSED] 0xBDA (PVC)
[09:42:56] [PASSED] 0xBDB (PVC)
[09:42:56] [PASSED] 0xBE0 (PVC)
[09:42:56] [PASSED] 0xBE1 (PVC)
[09:42:56] [PASSED] 0xBE5 (PVC)
[09:42:56] [PASSED] 0x7D40 (METEORLAKE)
[09:42:56] [PASSED] 0x7D45 (METEORLAKE)
[09:42:56] [PASSED] 0x7D55 (METEORLAKE)
[09:42:56] [PASSED] 0x7D60 (METEORLAKE)
[09:42:56] [PASSED] 0x7DD5 (METEORLAKE)
[09:42:56] [PASSED] 0x6420 (LUNARLAKE)
[09:42:56] [PASSED] 0x64A0 (LUNARLAKE)
[09:42:56] [PASSED] 0x64B0 (LUNARLAKE)
[09:42:56] [PASSED] 0xE202 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE209 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE20B (BATTLEMAGE)
[09:42:56] [PASSED] 0xE20C (BATTLEMAGE)
[09:42:56] [PASSED] 0xE20D (BATTLEMAGE)
[09:42:56] [PASSED] 0xE210 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE211 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE212 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE216 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE220 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE221 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE222 (BATTLEMAGE)
[09:42:56] [PASSED] 0xE223 (BATTLEMAGE)
[09:42:56] [PASSED] 0xB080 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB081 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB082 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB083 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB084 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB085 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB086 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB087 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB08F (PANTHERLAKE)
[09:42:56] [PASSED] 0xB090 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB0A0 (PANTHERLAKE)
[09:42:56] [PASSED] 0xB0B0 (PANTHERLAKE)
[09:42:56] [PASSED] 0xFD80 (PANTHERLAKE)
[09:42:56] [PASSED] 0xFD81 (PANTHERLAKE)
[09:42:56] [PASSED] 0xD740 (NOVALAKE_S)
[09:42:56] [PASSED] 0xD741 (NOVALAKE_S)
[09:42:56] [PASSED] 0xD742 (NOVALAKE_S)
[09:42:56] [PASSED] 0xD743 (NOVALAKE_S)
[09:42:56] [PASSED] 0xD744 (NOVALAKE_S)
[09:42:56] [PASSED] 0xD745 (NOVALAKE_S)
[09:42:56] [PASSED] 0x674C (CRESCENTISLAND)
[09:42:56] =============== [PASSED] check_platform_desc ===============
[09:42:56] ===================== [PASSED] xe_pci ======================
[09:42:56] =================== xe_rtp (2 subtests) ====================
[09:42:56] =============== xe_rtp_process_to_sr_tests  ================
[09:42:56] [PASSED] coalesce-same-reg
[09:42:56] [PASSED] no-match-no-add
[09:42:56] [PASSED] match-or
[09:42:56] [PASSED] match-or-xfail
[09:42:56] [PASSED] no-match-no-add-multiple-rules
[09:42:56] [PASSED] two-regs-two-entries
[09:42:56] [PASSED] clr-one-set-other
[09:42:56] [PASSED] set-field
[09:42:56] [PASSED] conflict-duplicate
[09:42:56] [PASSED] conflict-not-disjoint
[09:42:56] [PASSED] conflict-reg-type
[09:42:56] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[09:42:56] ================== xe_rtp_process_tests  ===================
[09:42:56] [PASSED] active1
[09:42:56] [PASSED] active2
[09:42:56] [PASSED] active-inactive
[09:42:56] [PASSED] inactive-active
[09:42:56] [PASSED] inactive-1st_or_active-inactive
[09:42:56] [PASSED] inactive-2nd_or_active-inactive
[09:42:56] [PASSED] inactive-last_or_active-inactive
[09:42:56] [PASSED] inactive-no_or_active-inactive
[09:42:56] ============== [PASSED] xe_rtp_process_tests ===============
[09:42:56] ===================== [PASSED] xe_rtp ======================
[09:42:56] ==================== xe_wa (1 subtest) =====================
[09:42:56] ======================== xe_wa_gt  =========================
[09:42:56] [PASSED] TIGERLAKE B0
[09:42:56] [PASSED] DG1 A0
[09:42:56] [PASSED] DG1 B0
[09:42:56] [PASSED] ALDERLAKE_S A0
[09:42:56] [PASSED] ALDERLAKE_S B0
[09:42:56] [PASSED] ALDERLAKE_S C0
[09:42:56] [PASSED] ALDERLAKE_S D0
[09:42:56] [PASSED] ALDERLAKE_P A0
[09:42:56] [PASSED] ALDERLAKE_P B0
[09:42:56] [PASSED] ALDERLAKE_P C0
[09:42:56] [PASSED] ALDERLAKE_S RPLS D0
[09:42:56] [PASSED] ALDERLAKE_P RPLU E0
[09:42:56] [PASSED] DG2 G10 C0
[09:42:56] [PASSED] DG2 G11 B1
[09:42:56] [PASSED] DG2 G12 A1
[09:42:56] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[09:42:56] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[09:42:56] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[09:42:56] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[09:42:56] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[09:42:56] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[09:42:56] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[09:42:56] ==================== [PASSED] xe_wa_gt =====================
[09:42:56] ====================== [PASSED] xe_wa ======================
[09:42:56] ============================================================
[09:42:56] Testing complete. Ran 512 tests: passed: 494, skipped: 18
[09:42:56] Elapsed time: 43.795s total, 4.391s configuring, 38.837s building, 0.539s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[09:42:56] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[09:42:57] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=25
[09:43:28] Starting KUnit Kernel (1/1)...
[09:43:28] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[09:43:28] ============ drm_test_pick_cmdline (2 subtests) ============
[09:43:28] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[09:43:28] =============== drm_test_pick_cmdline_named  ===============
[09:43:28] [PASSED] NTSC
[09:43:28] [PASSED] NTSC-J
[09:43:28] [PASSED] PAL
[09:43:28] [PASSED] PAL-M
[09:43:28] =========== [PASSED] drm_test_pick_cmdline_named ===========
[09:43:28] ============== [PASSED] drm_test_pick_cmdline ==============
[09:43:28] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[09:43:28] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[09:43:28] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[09:43:28] =========== drm_validate_clone_mode (2 subtests) ===========
[09:43:28] ============== drm_test_check_in_clone_mode  ===============
[09:43:28] [PASSED] in_clone_mode
[09:43:28] [PASSED] not_in_clone_mode
[09:43:28] ========== [PASSED] drm_test_check_in_clone_mode ===========
[09:43:28] =============== drm_test_check_valid_clones  ===============
[09:43:28] [PASSED] not_in_clone_mode
[09:43:28] [PASSED] valid_clone
[09:43:28] [PASSED] invalid_clone
[09:43:28] =========== [PASSED] drm_test_check_valid_clones ===========
[09:43:28] ============= [PASSED] drm_validate_clone_mode =============
[09:43:28] ============= drm_validate_modeset (1 subtest) =============
[09:43:28] [PASSED] drm_test_check_connector_changed_modeset
[09:43:28] ============== [PASSED] drm_validate_modeset ===============
[09:43:28] ====== drm_test_bridge_get_current_state (2 subtests) ======
[09:43:28] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[09:43:28] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[09:43:28] ======== [PASSED] drm_test_bridge_get_current_state ========
[09:43:28] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[09:43:28] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[09:43:28] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[09:43:28] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[09:43:28] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[09:43:28] ============== drm_bridge_alloc (2 subtests) ===============
[09:43:28] [PASSED] drm_test_drm_bridge_alloc_basic
[09:43:28] [PASSED] drm_test_drm_bridge_alloc_get_put
[09:43:28] ================ [PASSED] drm_bridge_alloc =================
[09:43:28] ================== drm_buddy (9 subtests) ==================
[09:43:28] [PASSED] drm_test_buddy_alloc_limit
[09:43:28] [PASSED] drm_test_buddy_alloc_optimistic
[09:43:28] [PASSED] drm_test_buddy_alloc_pessimistic
[09:43:28] [PASSED] drm_test_buddy_alloc_pathological
[09:43:28] [PASSED] drm_test_buddy_alloc_contiguous
[09:43:28] [PASSED] drm_test_buddy_alloc_clear
[09:43:28] [PASSED] drm_test_buddy_alloc_range_bias
[09:43:28] [PASSED] drm_test_buddy_fragmentation_performance
[09:43:28] [PASSED] drm_test_buddy_alloc_exceeds_max_order
[09:43:28] ==================== [PASSED] drm_buddy ====================
[09:43:28] ============= drm_cmdline_parser (40 subtests) =============
[09:43:28] [PASSED] drm_test_cmdline_force_d_only
[09:43:28] [PASSED] drm_test_cmdline_force_D_only_dvi
[09:43:28] [PASSED] drm_test_cmdline_force_D_only_hdmi
[09:43:28] [PASSED] drm_test_cmdline_force_D_only_not_digital
[09:43:28] [PASSED] drm_test_cmdline_force_e_only
[09:43:28] [PASSED] drm_test_cmdline_res
[09:43:28] [PASSED] drm_test_cmdline_res_vesa
[09:43:28] [PASSED] drm_test_cmdline_res_vesa_rblank
[09:43:28] [PASSED] drm_test_cmdline_res_rblank
[09:43:28] [PASSED] drm_test_cmdline_res_bpp
[09:43:28] [PASSED] drm_test_cmdline_res_refresh
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[09:43:28] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[09:43:28] [PASSED] drm_test_cmdline_res_margins_force_on
[09:43:28] [PASSED] drm_test_cmdline_res_vesa_margins
[09:43:28] [PASSED] drm_test_cmdline_name
[09:43:28] [PASSED] drm_test_cmdline_name_bpp
[09:43:28] [PASSED] drm_test_cmdline_name_option
[09:43:28] [PASSED] drm_test_cmdline_name_bpp_option
[09:43:28] [PASSED] drm_test_cmdline_rotate_0
[09:43:28] [PASSED] drm_test_cmdline_rotate_90
[09:43:28] [PASSED] drm_test_cmdline_rotate_180
[09:43:28] [PASSED] drm_test_cmdline_rotate_270
[09:43:28] [PASSED] drm_test_cmdline_hmirror
[09:43:28] [PASSED] drm_test_cmdline_vmirror
[09:43:28] [PASSED] drm_test_cmdline_margin_options
[09:43:28] [PASSED] drm_test_cmdline_multiple_options
[09:43:28] [PASSED] drm_test_cmdline_bpp_extra_and_option
[09:43:28] [PASSED] drm_test_cmdline_extra_and_option
[09:43:28] [PASSED] drm_test_cmdline_freestanding_options
[09:43:28] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[09:43:28] [PASSED] drm_test_cmdline_panel_orientation
[09:43:28] ================ drm_test_cmdline_invalid  =================
[09:43:28] [PASSED] margin_only
[09:43:28] [PASSED] interlace_only
[09:43:28] [PASSED] res_missing_x
[09:43:28] [PASSED] res_missing_y
[09:43:28] [PASSED] res_bad_y
[09:43:28] [PASSED] res_missing_y_bpp
[09:43:28] [PASSED] res_bad_bpp
[09:43:28] [PASSED] res_bad_refresh
[09:43:28] [PASSED] res_bpp_refresh_force_on_off
[09:43:28] [PASSED] res_invalid_mode
[09:43:28] [PASSED] res_bpp_wrong_place_mode
[09:43:28] [PASSED] name_bpp_refresh
[09:43:28] [PASSED] name_refresh
[09:43:28] [PASSED] name_refresh_wrong_mode
[09:43:28] [PASSED] name_refresh_invalid_mode
[09:43:28] [PASSED] rotate_multiple
[09:43:28] [PASSED] rotate_invalid_val
[09:43:28] [PASSED] rotate_truncated
[09:43:28] [PASSED] invalid_option
[09:43:28] [PASSED] invalid_tv_option
[09:43:28] [PASSED] truncated_tv_option
[09:43:28] ============ [PASSED] drm_test_cmdline_invalid =============
[09:43:28] =============== drm_test_cmdline_tv_options  ===============
[09:43:28] [PASSED] NTSC
[09:43:28] [PASSED] NTSC_443
[09:43:28] [PASSED] NTSC_J
[09:43:28] [PASSED] PAL
[09:43:28] [PASSED] PAL_M
[09:43:28] [PASSED] PAL_N
[09:43:28] [PASSED] SECAM
[09:43:28] [PASSED] MONO_525
[09:43:28] [PASSED] MONO_625
[09:43:28] =========== [PASSED] drm_test_cmdline_tv_options ===========
[09:43:28] =============== [PASSED] drm_cmdline_parser ================
[09:43:28] ========== drmm_connector_hdmi_init (20 subtests) ==========
[09:43:28] [PASSED] drm_test_connector_hdmi_init_valid
[09:43:28] [PASSED] drm_test_connector_hdmi_init_bpc_8
[09:43:28] [PASSED] drm_test_connector_hdmi_init_bpc_10
[09:43:28] [PASSED] drm_test_connector_hdmi_init_bpc_12
[09:43:28] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[09:43:28] [PASSED] drm_test_connector_hdmi_init_bpc_null
[09:43:28] [PASSED] drm_test_connector_hdmi_init_formats_empty
[09:43:28] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[09:43:28] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[09:43:28] [PASSED] supported_formats=0x9 yuv420_allowed=1
[09:43:28] [PASSED] supported_formats=0x9 yuv420_allowed=0
[09:43:28] [PASSED] supported_formats=0x3 yuv420_allowed=1
[09:43:28] [PASSED] supported_formats=0x3 yuv420_allowed=0
[09:43:28] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[09:43:28] [PASSED] drm_test_connector_hdmi_init_null_ddc
[09:43:28] [PASSED] drm_test_connector_hdmi_init_null_product
[09:43:28] [PASSED] drm_test_connector_hdmi_init_null_vendor
[09:43:28] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[09:43:28] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[09:43:28] [PASSED] drm_test_connector_hdmi_init_product_valid
[09:43:28] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[09:43:28] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[09:43:28] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[09:43:28] ========= drm_test_connector_hdmi_init_type_valid  =========
[09:43:28] [PASSED] HDMI-A
[09:43:28] [PASSED] HDMI-B
[09:43:28] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[09:43:28] ======== drm_test_connector_hdmi_init_type_invalid  ========
[09:43:28] [PASSED] Unknown
[09:43:28] [PASSED] VGA
[09:43:28] [PASSED] DVI-I
[09:43:28] [PASSED] DVI-D
[09:43:28] [PASSED] DVI-A
[09:43:28] [PASSED] Composite
[09:43:28] [PASSED] SVIDEO
[09:43:28] [PASSED] LVDS
[09:43:28] [PASSED] Component
[09:43:28] [PASSED] DIN
[09:43:28] [PASSED] DP
[09:43:28] [PASSED] TV
[09:43:28] [PASSED] eDP
[09:43:28] [PASSED] Virtual
[09:43:28] [PASSED] DSI
[09:43:28] [PASSED] DPI
[09:43:28] [PASSED] Writeback
[09:43:28] [PASSED] SPI
[09:43:28] [PASSED] USB
[09:43:28] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[09:43:28] ============ [PASSED] drmm_connector_hdmi_init =============
[09:43:28] ============= drmm_connector_init (3 subtests) =============
[09:43:28] [PASSED] drm_test_drmm_connector_init
[09:43:28] [PASSED] drm_test_drmm_connector_init_null_ddc
[09:43:28] ========= drm_test_drmm_connector_init_type_valid  =========
[09:43:28] [PASSED] Unknown
[09:43:28] [PASSED] VGA
[09:43:28] [PASSED] DVI-I
[09:43:28] [PASSED] DVI-D
[09:43:28] [PASSED] DVI-A
[09:43:28] [PASSED] Composite
[09:43:28] [PASSED] SVIDEO
[09:43:28] [PASSED] LVDS
[09:43:28] [PASSED] Component
[09:43:28] [PASSED] DIN
[09:43:28] [PASSED] DP
[09:43:28] [PASSED] HDMI-A
[09:43:28] [PASSED] HDMI-B
[09:43:28] [PASSED] TV
[09:43:28] [PASSED] eDP
[09:43:28] [PASSED] Virtual
[09:43:28] [PASSED] DSI
[09:43:28] [PASSED] DPI
[09:43:28] [PASSED] Writeback
[09:43:28] [PASSED] SPI
[09:43:28] [PASSED] USB
[09:43:28] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[09:43:28] =============== [PASSED] drmm_connector_init ===============
[09:43:28] ========= drm_connector_dynamic_init (6 subtests) ==========
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_init
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_init_properties
[09:43:28] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[09:43:28] [PASSED] Unknown
[09:43:28] [PASSED] VGA
[09:43:28] [PASSED] DVI-I
[09:43:28] [PASSED] DVI-D
[09:43:28] [PASSED] DVI-A
[09:43:28] [PASSED] Composite
[09:43:28] [PASSED] SVIDEO
[09:43:28] [PASSED] LVDS
[09:43:28] [PASSED] Component
[09:43:28] [PASSED] DIN
[09:43:28] [PASSED] DP
[09:43:28] [PASSED] HDMI-A
[09:43:28] [PASSED] HDMI-B
[09:43:28] [PASSED] TV
[09:43:28] [PASSED] eDP
[09:43:28] [PASSED] Virtual
[09:43:28] [PASSED] DSI
[09:43:28] [PASSED] DPI
[09:43:28] [PASSED] Writeback
[09:43:28] [PASSED] SPI
[09:43:28] [PASSED] USB
[09:43:28] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[09:43:28] ======== drm_test_drm_connector_dynamic_init_name  =========
[09:43:28] [PASSED] Unknown
[09:43:28] [PASSED] VGA
[09:43:28] [PASSED] DVI-I
[09:43:28] [PASSED] DVI-D
[09:43:28] [PASSED] DVI-A
[09:43:28] [PASSED] Composite
[09:43:28] [PASSED] SVIDEO
[09:43:28] [PASSED] LVDS
[09:43:28] [PASSED] Component
[09:43:28] [PASSED] DIN
[09:43:28] [PASSED] DP
[09:43:28] [PASSED] HDMI-A
[09:43:28] [PASSED] HDMI-B
[09:43:28] [PASSED] TV
[09:43:28] [PASSED] eDP
[09:43:28] [PASSED] Virtual
[09:43:28] [PASSED] DSI
[09:43:28] [PASSED] DPI
[09:43:28] [PASSED] Writeback
[09:43:28] [PASSED] SPI
[09:43:28] [PASSED] USB
[09:43:28] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[09:43:28] =========== [PASSED] drm_connector_dynamic_init ============
[09:43:28] ==== drm_connector_dynamic_register_early (4 subtests) =====
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[09:43:28] ====== [PASSED] drm_connector_dynamic_register_early =======
[09:43:28] ======= drm_connector_dynamic_register (7 subtests) ========
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[09:43:28] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[09:43:28] ========= [PASSED] drm_connector_dynamic_register ==========
[09:43:28] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[09:43:28] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[09:43:28] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[09:43:28] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[09:43:28] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[09:43:28] ========== drm_test_get_tv_mode_from_name_valid  ===========
[09:43:28] [PASSED] NTSC
[09:43:28] [PASSED] NTSC-443
[09:43:28] [PASSED] NTSC-J
[09:43:28] [PASSED] PAL
[09:43:28] [PASSED] PAL-M
[09:43:28] [PASSED] PAL-N
[09:43:28] [PASSED] SECAM
[09:43:28] [PASSED] Mono
[09:43:28] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[09:43:28] [PASSED] drm_test_get_tv_mode_from_name_truncated
[09:43:28] ============ [PASSED] drm_get_tv_mode_from_name ============
[09:43:28] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[09:43:28] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[09:43:28] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[09:43:28] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[09:43:28] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[09:43:28] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[09:43:28] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[09:43:28] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[09:43:28] [PASSED] VIC 96
[09:43:28] [PASSED] VIC 97
[09:43:28] [PASSED] VIC 101
[09:43:28] [PASSED] VIC 102
[09:43:28] [PASSED] VIC 106
[09:43:28] [PASSED] VIC 107
[09:43:28] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[09:43:28] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[09:43:28] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[09:43:28] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[09:43:28] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[09:43:28] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[09:43:28] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[09:43:28] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[09:43:28] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[09:43:28] [PASSED] Automatic
[09:43:28] [PASSED] Full
[09:43:28] [PASSED] Limited 16:235
[09:43:28] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[09:43:28] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[09:43:28] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[09:43:28] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[09:43:28] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[09:43:28] [PASSED] RGB
[09:43:28] [PASSED] YUV 4:2:0
[09:43:28] [PASSED] YUV 4:2:2
[09:43:28] [PASSED] YUV 4:4:4
[09:43:28] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[09:43:28] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[09:43:28] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[09:43:28] ============= drm_damage_helper (21 subtests) ==============
[09:43:28] [PASSED] drm_test_damage_iter_no_damage
[09:43:28] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[09:43:28] [PASSED] drm_test_damage_iter_no_damage_src_moved
[09:43:28] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[09:43:28] [PASSED] drm_test_damage_iter_no_damage_not_visible
[09:43:28] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[09:43:28] [PASSED] drm_test_damage_iter_no_damage_no_fb
[09:43:28] [PASSED] drm_test_damage_iter_simple_damage
[09:43:28] [PASSED] drm_test_damage_iter_single_damage
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_outside_src
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_src_moved
[09:43:28] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[09:43:28] [PASSED] drm_test_damage_iter_damage
[09:43:28] [PASSED] drm_test_damage_iter_damage_one_intersect
[09:43:28] [PASSED] drm_test_damage_iter_damage_one_outside
[09:43:28] [PASSED] drm_test_damage_iter_damage_src_moved
[09:43:28] [PASSED] drm_test_damage_iter_damage_not_visible
[09:43:28] ================ [PASSED] drm_damage_helper ================
[09:43:28] ============== drm_dp_mst_helper (3 subtests) ==============
[09:43:28] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[09:43:28] [PASSED] Clock 154000 BPP 30 DSC disabled
[09:43:28] [PASSED] Clock 234000 BPP 30 DSC disabled
[09:43:28] [PASSED] Clock 297000 BPP 24 DSC disabled
[09:43:28] [PASSED] Clock 332880 BPP 24 DSC enabled
[09:43:28] [PASSED] Clock 324540 BPP 24 DSC enabled
[09:43:28] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[09:43:28] ============== drm_test_dp_mst_calc_pbn_div  ===============
[09:43:28] [PASSED] Link rate 2000000 lane count 4
[09:43:28] [PASSED] Link rate 2000000 lane count 2
[09:43:28] [PASSED] Link rate 2000000 lane count 1
[09:43:28] [PASSED] Link rate 1350000 lane count 4
[09:43:28] [PASSED] Link rate 1350000 lane count 2
[09:43:28] [PASSED] Link rate 1350000 lane count 1
[09:43:28] [PASSED] Link rate 1000000 lane count 4
[09:43:28] [PASSED] Link rate 1000000 lane count 2
[09:43:28] [PASSED] Link rate 1000000 lane count 1
[09:43:28] [PASSED] Link rate 810000 lane count 4
[09:43:28] [PASSED] Link rate 810000 lane count 2
[09:43:28] [PASSED] Link rate 810000 lane count 1
[09:43:28] [PASSED] Link rate 540000 lane count 4
[09:43:28] [PASSED] Link rate 540000 lane count 2
[09:43:28] [PASSED] Link rate 540000 lane count 1
[09:43:28] [PASSED] Link rate 270000 lane count 4
[09:43:28] [PASSED] Link rate 270000 lane count 2
[09:43:28] [PASSED] Link rate 270000 lane count 1
[09:43:28] [PASSED] Link rate 162000 lane count 4
[09:43:28] [PASSED] Link rate 162000 lane count 2
[09:43:28] [PASSED] Link rate 162000 lane count 1
[09:43:28] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[09:43:28] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[09:43:28] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[09:43:28] [PASSED] DP_POWER_UP_PHY with port number
[09:43:28] [PASSED] DP_POWER_DOWN_PHY with port number
[09:43:28] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[09:43:28] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[09:43:28] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[09:43:28] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[09:43:28] [PASSED] DP_QUERY_PAYLOAD with port number
[09:43:28] [PASSED] DP_QUERY_PAYLOAD with VCPI
[09:43:28] [PASSED] DP_REMOTE_DPCD_READ with port number
[09:43:28] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[09:43:28] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[09:43:28] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[09:43:28] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[09:43:28] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[09:43:28] [PASSED] DP_REMOTE_I2C_READ with port number
[09:43:28] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[09:43:28] [PASSED] DP_REMOTE_I2C_READ with transactions array
[09:43:28] [PASSED] DP_REMOTE_I2C_WRITE with port number
[09:43:28] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[09:43:28] [PASSED] DP_REMOTE_I2C_WRITE with data array
[09:43:28] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[09:43:28] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[09:43:28] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[09:43:28] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[09:43:28] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[09:43:28] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[09:43:28] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[09:43:28] ================ [PASSED] drm_dp_mst_helper ================
[09:43:28] ================== drm_exec (7 subtests) ===================
[09:43:28] [PASSED] sanitycheck
[09:43:28] [PASSED] test_lock
[09:43:28] [PASSED] test_lock_unlock
[09:43:28] [PASSED] test_duplicates
[09:43:28] [PASSED] test_prepare
[09:43:28] [PASSED] test_prepare_array
[09:43:28] [PASSED] test_multiple_loops
[09:43:28] ==================== [PASSED] drm_exec =====================
[09:43:28] =========== drm_format_helper_test (17 subtests) ===========
[09:43:28] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[09:43:28] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[09:43:28] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[09:43:28] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[09:43:28] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[09:43:28] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[09:43:28] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[09:43:28] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[09:43:28] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[09:43:28] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[09:43:28] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[09:43:28] ============== drm_test_fb_xrgb8888_to_mono  ===============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[09:43:28] ==================== drm_test_fb_swab  =====================
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ================ [PASSED] drm_test_fb_swab =================
[09:43:28] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[09:43:28] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[09:43:28] [PASSED] single_pixel_source_buffer
[09:43:28] [PASSED] single_pixel_clip_rectangle
[09:43:28] [PASSED] well_known_colors
[09:43:28] [PASSED] destination_pitch
[09:43:28] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[09:43:28] ================= drm_test_fb_clip_offset  =================
[09:43:28] [PASSED] pass through
[09:43:28] [PASSED] horizontal offset
[09:43:28] [PASSED] vertical offset
[09:43:28] [PASSED] horizontal and vertical offset
[09:43:28] [PASSED] horizontal offset (custom pitch)
[09:43:28] [PASSED] vertical offset (custom pitch)
[09:43:28] [PASSED] horizontal and vertical offset (custom pitch)
[09:43:28] ============= [PASSED] drm_test_fb_clip_offset =============
[09:43:28] =================== drm_test_fb_memcpy  ====================
[09:43:28] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[09:43:28] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[09:43:28] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[09:43:28] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[09:43:28] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[09:43:28] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[09:43:28] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[09:43:28] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[09:43:28] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[09:43:28] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[09:43:28] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[09:43:28] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[09:43:28] =============== [PASSED] drm_test_fb_memcpy ================
[09:43:28] ============= [PASSED] drm_format_helper_test ==============
[09:43:28] ================= drm_format (18 subtests) =================
[09:43:28] [PASSED] drm_test_format_block_width_invalid
[09:43:28] [PASSED] drm_test_format_block_width_one_plane
[09:43:28] [PASSED] drm_test_format_block_width_two_plane
[09:43:28] [PASSED] drm_test_format_block_width_three_plane
[09:43:28] [PASSED] drm_test_format_block_width_tiled
[09:43:28] [PASSED] drm_test_format_block_height_invalid
[09:43:28] [PASSED] drm_test_format_block_height_one_plane
[09:43:28] [PASSED] drm_test_format_block_height_two_plane
[09:43:28] [PASSED] drm_test_format_block_height_three_plane
[09:43:28] [PASSED] drm_test_format_block_height_tiled
[09:43:28] [PASSED] drm_test_format_min_pitch_invalid
[09:43:28] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[09:43:28] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[09:43:28] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[09:43:28] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[09:43:28] [PASSED] drm_test_format_min_pitch_two_plane
[09:43:28] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[09:43:28] [PASSED] drm_test_format_min_pitch_tiled
[09:43:28] =================== [PASSED] drm_format ====================
[09:43:28] ============== drm_framebuffer (10 subtests) ===============
[09:43:28] ========== drm_test_framebuffer_check_src_coords  ==========
[09:43:28] [PASSED] Success: source fits into fb
[09:43:28] [PASSED] Fail: overflowing fb with x-axis coordinate
[09:43:28] [PASSED] Fail: overflowing fb with y-axis coordinate
[09:43:28] [PASSED] Fail: overflowing fb with source width
[09:43:28] [PASSED] Fail: overflowing fb with source height
[09:43:28] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[09:43:28] [PASSED] drm_test_framebuffer_cleanup
[09:43:28] =============== drm_test_framebuffer_create  ===============
[09:43:28] [PASSED] ABGR8888 normal sizes
[09:43:28] [PASSED] ABGR8888 max sizes
[09:43:28] [PASSED] ABGR8888 pitch greater than min required
[09:43:28] [PASSED] ABGR8888 pitch less than min required
[09:43:28] [PASSED] ABGR8888 Invalid width
[09:43:28] [PASSED] ABGR8888 Invalid buffer handle
[09:43:28] [PASSED] No pixel format
[09:43:28] [PASSED] ABGR8888 Width 0
[09:43:28] [PASSED] ABGR8888 Height 0
[09:43:28] [PASSED] ABGR8888 Out of bound height * pitch combination
[09:43:28] [PASSED] ABGR8888 Large buffer offset
[09:43:28] [PASSED] ABGR8888 Buffer offset for inexistent plane
[09:43:28] [PASSED] ABGR8888 Invalid flag
[09:43:28] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[09:43:28] [PASSED] ABGR8888 Valid buffer modifier
[09:43:28] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[09:43:28] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] NV12 Normal sizes
[09:43:28] [PASSED] NV12 Max sizes
[09:43:28] [PASSED] NV12 Invalid pitch
[09:43:28] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[09:43:28] [PASSED] NV12 different  modifier per-plane
[09:43:28] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[09:43:28] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] NV12 Modifier for inexistent plane
[09:43:28] [PASSED] NV12 Handle for inexistent plane
[09:43:28] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[09:43:28] [PASSED] YVU420 Normal sizes
[09:43:28] [PASSED] YVU420 Max sizes
[09:43:28] [PASSED] YVU420 Invalid pitch
[09:43:28] [PASSED] YVU420 Different pitches
[09:43:28] [PASSED] YVU420 Different buffer offsets/pitches
[09:43:28] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[09:43:28] [PASSED] YVU420 Valid modifier
[09:43:28] [PASSED] YVU420 Different modifiers per plane
[09:43:28] [PASSED] YVU420 Modifier for inexistent plane
[09:43:28] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[09:43:28] [PASSED] X0L2 Normal sizes
[09:43:28] [PASSED] X0L2 Max sizes
[09:43:28] [PASSED] X0L2 Invalid pitch
[09:43:28] [PASSED] X0L2 Pitch greater than minimum required
[09:43:28] [PASSED] X0L2 Handle for inexistent plane
[09:43:28] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[09:43:28] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[09:43:28] [PASSED] X0L2 Valid modifier
[09:43:28] [PASSED] X0L2 Modifier for inexistent plane
[09:43:28] =========== [PASSED] drm_test_framebuffer_create ===========
[09:43:28] [PASSED] drm_test_framebuffer_free
[09:43:28] [PASSED] drm_test_framebuffer_init
[09:43:28] [PASSED] drm_test_framebuffer_init_bad_format
[09:43:28] [PASSED] drm_test_framebuffer_init_dev_mismatch
[09:43:28] [PASSED] drm_test_framebuffer_lookup
[09:43:28] [PASSED] drm_test_framebuffer_lookup_inexistent
[09:43:28] [PASSED] drm_test_framebuffer_modifiers_not_supported
[09:43:28] ================= [PASSED] drm_framebuffer =================
[09:43:28] ================ drm_gem_shmem (8 subtests) ================
[09:43:28] [PASSED] drm_gem_shmem_test_obj_create
[09:43:28] [PASSED] drm_gem_shmem_test_obj_create_private
[09:43:28] [PASSED] drm_gem_shmem_test_pin_pages
[09:43:28] [PASSED] drm_gem_shmem_test_vmap
[09:43:28] [PASSED] drm_gem_shmem_test_get_sg_table
[09:43:28] [PASSED] drm_gem_shmem_test_get_pages_sgt
[09:43:28] [PASSED] drm_gem_shmem_test_madvise
[09:43:28] [PASSED] drm_gem_shmem_test_purge
[09:43:28] ================== [PASSED] drm_gem_shmem ==================
[09:43:28] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[09:43:28] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[09:43:28] [PASSED] Automatic
[09:43:28] [PASSED] Full
[09:43:28] [PASSED] Limited 16:235
[09:43:28] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[09:43:28] [PASSED] drm_test_check_disable_connector
[09:43:28] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[09:43:28] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[09:43:28] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[09:43:28] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[09:43:28] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[09:43:28] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[09:43:28] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[09:43:28] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[09:43:28] [PASSED] drm_test_check_output_bpc_dvi
[09:43:28] [PASSED] drm_test_check_output_bpc_format_vic_1
[09:43:28] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[09:43:28] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[09:43:28] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[09:43:28] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[09:43:28] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[09:43:28] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[09:43:28] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[09:43:28] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[09:43:28] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[09:43:28] [PASSED] drm_test_check_broadcast_rgb_value
[09:43:28] [PASSED] drm_test_check_bpc_8_value
[09:43:28] [PASSED] drm_test_check_bpc_10_value
[09:43:28] [PASSED] drm_test_check_bpc_12_value
[09:43:28] [PASSED] drm_test_check_format_value
[09:43:28] [PASSED] drm_test_check_tmds_char_value
[09:43:28] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[09:43:28] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[09:43:28] [PASSED] drm_test_check_mode_valid
[09:43:28] [PASSED] drm_test_check_mode_valid_reject
[09:43:28] [PASSED] drm_test_check_mode_valid_reject_rate
[09:43:28] [PASSED] drm_test_check_mode_valid_reject_max_clock
[09:43:28] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[09:43:28] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[09:43:28] [PASSED] drm_test_check_infoframes
[09:43:28] [PASSED] drm_test_check_reject_avi_infoframe
[09:43:28] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[09:43:28] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[09:43:28] [PASSED] drm_test_check_reject_audio_infoframe
[09:43:28] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[09:43:28] ================= drm_managed (2 subtests) =================
[09:43:28] [PASSED] drm_test_managed_release_action
[09:43:28] [PASSED] drm_test_managed_run_action
[09:43:28] =================== [PASSED] drm_managed ===================
[09:43:28] =================== drm_mm (6 subtests) ====================
[09:43:28] [PASSED] drm_test_mm_init
[09:43:28] [PASSED] drm_test_mm_debug
[09:43:28] [PASSED] drm_test_mm_align32
[09:43:28] [PASSED] drm_test_mm_align64
[09:43:28] [PASSED] drm_test_mm_lowest
[09:43:28] [PASSED] drm_test_mm_highest
[09:43:28] ===================== [PASSED] drm_mm ======================
[09:43:28] ============= drm_modes_analog_tv (5 subtests) =============
[09:43:28] [PASSED] drm_test_modes_analog_tv_mono_576i
[09:43:28] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[09:43:28] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[09:43:28] [PASSED] drm_test_modes_analog_tv_pal_576i
[09:43:28] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[09:43:28] =============== [PASSED] drm_modes_analog_tv ===============
[09:43:28] ============== drm_plane_helper (2 subtests) ===============
[09:43:28] =============== drm_test_check_plane_state  ================
[09:43:28] [PASSED] clipping_simple
[09:43:28] [PASSED] clipping_rotate_reflect
[09:43:28] [PASSED] positioning_simple
[09:43:28] [PASSED] upscaling
[09:43:28] [PASSED] downscaling
[09:43:28] [PASSED] rounding1
[09:43:28] [PASSED] rounding2
[09:43:28] [PASSED] rounding3
[09:43:28] [PASSED] rounding4
[09:43:28] =========== [PASSED] drm_test_check_plane_state ============
[09:43:28] =========== drm_test_check_invalid_plane_state  ============
[09:43:28] [PASSED] positioning_invalid
[09:43:28] [PASSED] upscaling_invalid
[09:43:28] [PASSED] downscaling_invalid
[09:43:28] ======= [PASSED] drm_test_check_invalid_plane_state ========
[09:43:28] ================ [PASSED] drm_plane_helper =================
[09:43:28] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[09:43:28] ====== drm_test_connector_helper_tv_get_modes_check  =======
[09:43:28] [PASSED] None
[09:43:28] [PASSED] PAL
[09:43:28] [PASSED] NTSC
[09:43:28] [PASSED] Both, NTSC Default
[09:43:28] [PASSED] Both, PAL Default
[09:43:28] [PASSED] Both, NTSC Default, with PAL on command-line
[09:43:28] [PASSED] Both, PAL Default, with NTSC on command-line
[09:43:28] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[09:43:28] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[09:43:28] ================== drm_rect (9 subtests) ===================
[09:43:28] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[09:43:28] [PASSED] drm_test_rect_clip_scaled_not_clipped
[09:43:28] [PASSED] drm_test_rect_clip_scaled_clipped
[09:43:28] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[09:43:28] ================= drm_test_rect_intersect  =================
[09:43:28] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[09:43:28] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[09:43:28] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[09:43:28] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[09:43:28] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[09:43:28] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[09:43:28] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[09:43:28] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[09:43:28] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[09:43:28] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[09:43:28] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[09:43:28] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[09:43:28] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[09:43:28] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[09:43:28] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
stty: 'standard input': Inappropriate ioctl for device
[09:43:28] ============= [PASSED] drm_test_rect_intersect =============
[09:43:28] ================ drm_test_rect_calc_hscale  ================
[09:43:28] [PASSED] normal use
[09:43:28] [PASSED] out of max range
[09:43:28] [PASSED] out of min range
[09:43:28] [PASSED] zero dst
[09:43:28] [PASSED] negative src
[09:43:28] [PASSED] negative dst
[09:43:28] ============ [PASSED] drm_test_rect_calc_hscale ============
[09:43:28] ================ drm_test_rect_calc_vscale  ================
[09:43:28] [PASSED] normal use
[09:43:28] [PASSED] out of max range
[09:43:28] [PASSED] out of min range
[09:43:28] [PASSED] zero dst
[09:43:28] [PASSED] negative src
[09:43:28] [PASSED] negative dst
[09:43:28] ============ [PASSED] drm_test_rect_calc_vscale ============
[09:43:28] ================== drm_test_rect_rotate  ===================
[09:43:28] [PASSED] reflect-x
[09:43:28] [PASSED] reflect-y
[09:43:28] [PASSED] rotate-0
[09:43:28] [PASSED] rotate-90
[09:43:28] [PASSED] rotate-180
[09:43:28] [PASSED] rotate-270
[09:43:28] ============== [PASSED] drm_test_rect_rotate ===============
[09:43:28] ================ drm_test_rect_rotate_inv  =================
[09:43:28] [PASSED] reflect-x
[09:43:28] [PASSED] reflect-y
[09:43:28] [PASSED] rotate-0
[09:43:28] [PASSED] rotate-90
[09:43:28] [PASSED] rotate-180
[09:43:28] [PASSED] rotate-270
[09:43:28] ============ [PASSED] drm_test_rect_rotate_inv =============
[09:43:28] ==================== [PASSED] drm_rect =====================
[09:43:28] ============ drm_sysfb_modeset_test (1 subtest) ============
[09:43:28] ============ drm_test_sysfb_build_fourcc_list  =============
[09:43:28] [PASSED] no native formats
[09:43:28] [PASSED] XRGB8888 as native format
[09:43:28] [PASSED] remove duplicates
[09:43:28] [PASSED] convert alpha formats
[09:43:28] [PASSED] random formats
[09:43:28] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[09:43:28] ============= [PASSED] drm_sysfb_modeset_test ==============
[09:43:28] ================== drm_fixp (2 subtests) ===================
[09:43:28] [PASSED] drm_test_int2fixp
[09:43:28] [PASSED] drm_test_sm2fixp
[09:43:28] ==================== [PASSED] drm_fixp =====================
[09:43:28] ============================================================
[09:43:28] Testing complete. Ran 630 tests: passed: 630
[09:43:28] Elapsed time: 32.646s total, 1.642s configuring, 30.487s building, 0.460s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[09:43:29] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[09:43:30] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=25
[09:43:40] Starting KUnit Kernel (1/1)...
[09:43:40] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[09:43:40] ================= ttm_device (5 subtests) ==================
[09:43:40] [PASSED] ttm_device_init_basic
[09:43:40] [PASSED] ttm_device_init_multiple
[09:43:40] [PASSED] ttm_device_fini_basic
[09:43:40] [PASSED] ttm_device_init_no_vma_man
[09:43:40] ================== ttm_device_init_pools  ==================
[09:43:40] [PASSED] No DMA allocations, no DMA32 required
[09:43:40] [PASSED] DMA allocations, DMA32 required
[09:43:40] [PASSED] No DMA allocations, DMA32 required
[09:43:40] [PASSED] DMA allocations, no DMA32 required
[09:43:40] ============== [PASSED] ttm_device_init_pools ==============
[09:43:40] =================== [PASSED] ttm_device ====================
[09:43:40] ================== ttm_pool (8 subtests) ===================
[09:43:40] ================== ttm_pool_alloc_basic  ===================
[09:43:40] [PASSED] One page
[09:43:40] [PASSED] More than one page
[09:43:40] [PASSED] Above the allocation limit
[09:43:40] [PASSED] One page, with coherent DMA mappings enabled
[09:43:40] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[09:43:40] ============== [PASSED] ttm_pool_alloc_basic ===============
[09:43:40] ============== ttm_pool_alloc_basic_dma_addr  ==============
[09:43:40] [PASSED] One page
[09:43:40] [PASSED] More than one page
[09:43:40] [PASSED] Above the allocation limit
[09:43:40] [PASSED] One page, with coherent DMA mappings enabled
[09:43:40] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[09:43:40] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[09:43:40] [PASSED] ttm_pool_alloc_order_caching_match
[09:43:40] [PASSED] ttm_pool_alloc_caching_mismatch
[09:43:40] [PASSED] ttm_pool_alloc_order_mismatch
[09:43:40] [PASSED] ttm_pool_free_dma_alloc
[09:43:40] [PASSED] ttm_pool_free_no_dma_alloc
[09:43:40] [PASSED] ttm_pool_fini_basic
[09:43:40] ==================== [PASSED] ttm_pool =====================
[09:43:40] ================ ttm_resource (8 subtests) =================
[09:43:40] ================= ttm_resource_init_basic  =================
[09:43:40] [PASSED] Init resource in TTM_PL_SYSTEM
[09:43:40] [PASSED] Init resource in TTM_PL_VRAM
[09:43:40] [PASSED] Init resource in a private placement
[09:43:40] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[09:43:40] ============= [PASSED] ttm_resource_init_basic =============
[09:43:40] [PASSED] ttm_resource_init_pinned
[09:43:40] [PASSED] ttm_resource_fini_basic
[09:43:40] [PASSED] ttm_resource_manager_init_basic
[09:43:40] [PASSED] ttm_resource_manager_usage_basic
[09:43:40] [PASSED] ttm_resource_manager_set_used_basic
[09:43:40] [PASSED] ttm_sys_man_alloc_basic
[09:43:40] [PASSED] ttm_sys_man_free_basic
[09:43:40] ================== [PASSED] ttm_resource ===================
[09:43:40] =================== ttm_tt (15 subtests) ===================
[09:43:40] ==================== ttm_tt_init_basic  ====================
[09:43:40] [PASSED] Page-aligned size
[09:43:40] [PASSED] Extra pages requested
[09:43:40] ================ [PASSED] ttm_tt_init_basic ================
[09:43:40] [PASSED] ttm_tt_init_misaligned
[09:43:40] [PASSED] ttm_tt_fini_basic
[09:43:40] [PASSED] ttm_tt_fini_sg
[09:43:40] [PASSED] ttm_tt_fini_shmem
[09:43:40] [PASSED] ttm_tt_create_basic
[09:43:40] [PASSED] ttm_tt_create_invalid_bo_type
[09:43:40] [PASSED] ttm_tt_create_ttm_exists
[09:43:40] [PASSED] ttm_tt_create_failed
[09:43:40] [PASSED] ttm_tt_destroy_basic
[09:43:40] [PASSED] ttm_tt_populate_null_ttm
[09:43:40] [PASSED] ttm_tt_populate_populated_ttm
[09:43:40] [PASSED] ttm_tt_unpopulate_basic
[09:43:40] [PASSED] ttm_tt_unpopulate_empty_ttm
[09:43:40] [PASSED] ttm_tt_swapin_basic
[09:43:40] ===================== [PASSED] ttm_tt ======================
[09:43:40] =================== ttm_bo (14 subtests) ===================
[09:43:40] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[09:43:40] [PASSED] Cannot be interrupted and sleeps
[09:43:40] [PASSED] Cannot be interrupted, locks straight away
[09:43:40] [PASSED] Can be interrupted, sleeps
[09:43:40] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[09:43:40] [PASSED] ttm_bo_reserve_locked_no_sleep
[09:43:40] [PASSED] ttm_bo_reserve_no_wait_ticket
[09:43:40] [PASSED] ttm_bo_reserve_double_resv
[09:43:40] [PASSED] ttm_bo_reserve_interrupted
[09:43:40] [PASSED] ttm_bo_reserve_deadlock
[09:43:40] [PASSED] ttm_bo_unreserve_basic
[09:43:40] [PASSED] ttm_bo_unreserve_pinned
[09:43:40] [PASSED] ttm_bo_unreserve_bulk
[09:43:40] [PASSED] ttm_bo_fini_basic
[09:43:40] [PASSED] ttm_bo_fini_shared_resv
[09:43:40] [PASSED] ttm_bo_pin_basic
[09:43:40] [PASSED] ttm_bo_pin_unpin_resource
[09:43:40] [PASSED] ttm_bo_multiple_pin_one_unpin
[09:43:40] ===================== [PASSED] ttm_bo ======================
[09:43:40] ============== ttm_bo_validate (21 subtests) ===============
[09:43:40] ============== ttm_bo_init_reserved_sys_man  ===============
[09:43:40] [PASSED] Buffer object for userspace
[09:43:40] [PASSED] Kernel buffer object
[09:43:40] [PASSED] Shared buffer object
[09:43:40] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[09:43:40] ============== ttm_bo_init_reserved_mock_man  ==============
[09:43:40] [PASSED] Buffer object for userspace
[09:43:40] [PASSED] Kernel buffer object
[09:43:40] [PASSED] Shared buffer object
[09:43:40] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[09:43:40] [PASSED] ttm_bo_init_reserved_resv
[09:43:40] ================== ttm_bo_validate_basic  ==================
[09:43:40] [PASSED] Buffer object for userspace
[09:43:40] [PASSED] Kernel buffer object
[09:43:40] [PASSED] Shared buffer object
[09:43:40] ============== [PASSED] ttm_bo_validate_basic ==============
[09:43:40] [PASSED] ttm_bo_validate_invalid_placement
[09:43:40] ============= ttm_bo_validate_same_placement  ==============
[09:43:40] [PASSED] System manager
[09:43:40] [PASSED] VRAM manager
[09:43:40] ========= [PASSED] ttm_bo_validate_same_placement ==========
[09:43:40] [PASSED] ttm_bo_validate_failed_alloc
[09:43:40] [PASSED] ttm_bo_validate_pinned
[09:43:40] [PASSED] ttm_bo_validate_busy_placement
[09:43:40] ================ ttm_bo_validate_multihop  =================
[09:43:40] [PASSED] Buffer object for userspace
[09:43:40] [PASSED] Kernel buffer object
[09:43:40] [PASSED] Shared buffer object
[09:43:40] ============ [PASSED] ttm_bo_validate_multihop =============
[09:43:40] ========== ttm_bo_validate_no_placement_signaled  ==========
[09:43:40] [PASSED] Buffer object in system domain, no page vector
[09:43:40] [PASSED] Buffer object in system domain with an existing page vector
[09:43:40] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[09:43:40] ======== ttm_bo_validate_no_placement_not_signaled  ========
[09:43:40] [PASSED] Buffer object for userspace
[09:43:40] [PASSED] Kernel buffer object
[09:43:40] [PASSED] Shared buffer object
[09:43:40] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[09:43:40] [PASSED] ttm_bo_validate_move_fence_signaled
[09:43:40] ========= ttm_bo_validate_move_fence_not_signaled  =========
[09:43:40] [PASSED] Waits for GPU
[09:43:40] [PASSED] Tries to lock straight away
[09:43:40] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[09:43:40] [PASSED] ttm_bo_validate_happy_evict
[09:43:40] [PASSED] ttm_bo_validate_all_pinned_evict
[09:43:40] [PASSED] ttm_bo_validate_allowed_only_evict
[09:43:40] [PASSED] ttm_bo_validate_deleted_evict
[09:43:40] [PASSED] ttm_bo_validate_busy_domain_evict
[09:43:40] [PASSED] ttm_bo_validate_evict_gutting
[09:43:40] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[09:43:40] ================= [PASSED] ttm_bo_validate =================
[09:43:40] ============================================================
[09:43:40] Testing complete. Ran 101 tests: passed: 101
[09:43:40] Elapsed time: 11.612s total, 1.644s configuring, 9.702s building, 0.239s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/8] Introduce Xe Uncorrectable Error Handling
@ 2026-01-22 10:06 Riana Tauro
  2026-01-22  9:42 ` ✗ CI.checkpatch: warning for " Patchwork
                   ` (11 more replies)
  0 siblings, 12 replies; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

This series adds the base support for XE Uncorrectable Error Handling
on top of the system controller patch [1].

The first four patches implement PCI error recovery callbacks for AER events.
On fatal errors, the device is wedged in error_detected and a Secondary
Bus reset (SBR) is requested from PCI core by returning
PCI_ERS_RESULT_NEED_RESET.

On non-fatal errors, the mmio_enabled callback is invoked to query the
error and attempt the required recovery.

The rest of the patches add the base support for Uncorrectable Error handling
of Core-Compute errors.

This series adds the basic foundation and will be extended to other
types of errors and commands.

[1] https://patchwork.freedesktop.org/series/159554/

Anoop Vijay (1):
  drm/xe/xe_sysctrl: Add System controller patch

Riana Tauro (7):
  drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  drm/xe/xe_pci_error: Group all devres to release them on PCIe slot
    reset
  drm/xe: Skip device access during PCI error recovery
  drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  drm/xe/xe_ras: Add structures and commands for Uncorrectable Core
    Compute Errors
  drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  drm/xe/xe_pci_error: Process errors in mmio_enabled

 drivers/gpu/drm/xe/Makefile                   |   4 +
 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h     |  44 ++
 drivers/gpu/drm/xe/xe_device.c                |  17 +-
 drivers/gpu/drm/xe/xe_device.h                |  15 +
 drivers/gpu/drm/xe/xe_device_types.h          |  12 +
 drivers/gpu/drm/xe/xe_gt.c                    |   9 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |   8 +-
 drivers/gpu/drm/xe/xe_pci.c                   |   5 +
 drivers/gpu/drm/xe/xe_pci_error.c             |  93 ++++
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_ras.c                   | 229 +++++++++
 drivers/gpu/drm/xe/xe_ras.h                   |  16 +
 drivers/gpu/drm/xe/xe_ras_types.h             | 131 ++++++
 drivers/gpu/drm/xe/xe_sysctrl.c               |  80 ++++
 drivers/gpu/drm/xe/xe_sysctrl.h               |  13 +
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c       | 445 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h       |  35 ++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  49 ++
 drivers/gpu/drm/xe/xe_sysctrl_types.h         |  33 ++
 19 files changed, 1232 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.h
 create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_types.h

-- 
2.47.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/8] drm/xe/xe_sysctrl: Add System controller patch
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
  2026-01-22  9:42 ` ✗ CI.checkpatch: warning for " Patchwork
  2026-01-22  9:43 ` ✓ CI.KUnit: success " Patchwork
@ 2026-01-22 10:06 ` Riana Tauro
  2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Anoop Vijay

From: Anoop Vijay <anoop.c.vijay@intel.com>

DO NOT REVIEW. COMPILATION ONLY
This patch is from https://patchwork.freedesktop.org/series/159554/
Added only for Compilation.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |   2 +
 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h     |  44 ++
 drivers/gpu/drm/xe/xe_device.c                |   5 +
 drivers/gpu/drm/xe/xe_device_types.h          |   6 +
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_sysctrl.c               |  80 ++++
 drivers/gpu/drm/xe/xe_sysctrl.h               |  13 +
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c       | 445 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h       |  35 ++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  36 ++
 drivers/gpu/drm/xe/xe_sysctrl_types.h         |  33 ++
 12 files changed, 702 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index b39cbb756232..f6650ec3ab42 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -121,6 +121,8 @@ xe-y += xe_bb.o \
 	xe_step.o \
 	xe_survivability_mode.o \
 	xe_sync.o \
+	xe_sysctrl.o \
+	xe_sysctrl_mailbox.o \
 	xe_tile.o \
 	xe_tile_sysfs.o \
 	xe_tlb_inval.o \
diff --git a/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h b/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
new file mode 100644
index 000000000000..5875890d19dc
--- /dev/null
+++ b/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_REGS_H_
+#define _XE_SYSCTRL_REGS_H_
+
+#include "xe_regs.h"
+
+#define SYSCTRL_BASE_OFFSET			0xdb000
+#define SYSCTRL_BASE				(SOC_BASE + SYSCTRL_BASE_OFFSET)
+#define SYSCTRL_MAILBOX_INDEX			0x03
+#define SYSCTRL_BAR_LENGTH			0x1000
+
+#define SYSCTRL_MB_CTRL				XE_REG(0x10)
+#define   SYSCTRL_MB_CTRL_RUN_BUSY		REG_BIT(31)
+#define   SYSCTRL_MB_CTRL_IRQ			REG_BIT(30)
+#define   SYSCTRL_MB_CTRL_RUN_BUSY_OUT		REG_BIT(29)
+#define   SYSCTRL_MB_CTRL_PARAM3_MASK		REG_GENMASK(28, 24)
+#define   SYSCTRL_MB_CTRL_PARAM2_MASK		REG_GENMASK(23, 16)
+#define   SYSCTRL_MB_CTRL_PARAM1_MASK		REG_GENMASK(15, 8)
+#define   SYSCTRL_MB_CTRL_COMMAND_MASK		REG_GENMASK(7, 0)
+
+#define SYSCTRL_MB_DATA0			XE_REG(0x14)
+#define SYSCTRL_MB_DATA1			XE_REG(0x18)
+#define SYSCTRL_MB_DATA2			XE_REG(0x1C)
+#define SYSCTRL_MB_DATA3			XE_REG(0x20)
+
+#define MKHI_FRAME_PHASE			REG_BIT(24)
+#define MKHI_FRAME_CURRENT_MASK			REG_GENMASK(21, 16)
+#define MKHI_FRAME_TOTAL_MASK			REG_GENMASK(13, 8)
+#define MKHI_FRAME_COMMAND_MASK			REG_GENMASK(7, 0)
+
+#define SYSCTRL_MB_FRAME_SIZE			16
+#define SYSCTRL_MB_MAX_FRAMES			64
+#define SYSCTRL_MB_MAX_MESSAGE_SIZE		(SYSCTRL_MB_FRAME_SIZE * SYSCTRL_MB_MAX_FRAMES)
+#define SYSCTRL_MKHI_COMMAND			5
+
+#define SYSCTRL_MB_DEFAULT_TIMEOUT_MS		500
+#define SYSCTRL_MB_RETRY_TIMEOUT_MS		20
+#define SYSCTRL_MB_POLL_INTERVAL_US		100
+
+#endif /* _XE_SYSCTRL_REGS_H_ */
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index aad4aa53a51f..16fc6da01357 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -64,6 +64,7 @@
 #include "xe_survivability_mode.h"
 #include "xe_sriov.h"
 #include "xe_svm.h"
+#include "xe_sysctrl.h"
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
@@ -988,6 +989,10 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		goto err_unregister_display;
 
+	err = xe_sysctrl_init(xe);
+	if (err)
+		goto err_unregister_display;
+
 	err = xe_device_sysfs_init(xe);
 	if (err)
 		goto err_unregister_display;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 34feef79fa4e..944f909a86ad 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -29,6 +29,7 @@
 #include "xe_sriov_vf_ccs_types.h"
 #include "xe_step_types.h"
 #include "xe_survivability_mode_types.h"
+#include "xe_sysctrl_types.h"
 #include "xe_tile_sriov_vf_types.h"
 #include "xe_validation.h"
 
@@ -370,6 +371,8 @@ struct xe_device {
 		u8 has_soc_remapper_telem:1;
 		/** @info.has_sriov: Supports SR-IOV */
 		u8 has_sriov:1;
+		/** @info.has_sysctrl: Supports System Controller */
+		u8 has_sysctrl:1;
 		/** @info.has_usm: Device has unified shared memory support */
 		u8 has_usm:1;
 		/** @info.has_64bit_timestamp: Device supports 64-bit timestamps */
@@ -636,6 +639,9 @@ struct xe_device {
 	/** @heci_gsc: graphics security controller */
 	struct xe_heci_gsc heci_gsc;
 
+	/** @sc: System Controller */
+	struct xe_sysctrl sc;
+
 	/** @nvm: discrete graphics non-volatile memory */
 	struct intel_dg_nvm_dev *nvm;
 
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 34df063024fe..c92cc176f669 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -430,6 +430,7 @@ static const struct xe_device_desc cri_desc = {
 	.has_soc_remapper_sysctrl = true,
 	.has_soc_remapper_telem = true,
 	.has_sriov = true,
+	.has_sysctrl = true,
 	.max_gt_per_tile = 2,
 	.require_force_probe = true,
 	.va_bits = 57,
@@ -706,6 +707,7 @@ static int xe_info_init_early(struct xe_device *xe,
 	xe->info.has_soc_remapper_telem = desc->has_soc_remapper_telem;
 	xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) &&
 		desc->has_sriov;
+	xe->info.has_sysctrl = desc->has_sysctrl;
 	xe->info.has_mem_copy_instr = desc->has_mem_copy_instr;
 	xe->info.skip_guc_pc = desc->skip_guc_pc;
 	xe->info.skip_mtcfg = desc->skip_mtcfg;
diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
index 7ccb0ab7a53b..3d943c341a14 100644
--- a/drivers/gpu/drm/xe/xe_pci_types.h
+++ b/drivers/gpu/drm/xe/xe_pci_types.h
@@ -57,6 +57,7 @@ struct xe_device_desc {
 	u8 has_soc_remapper_sysctrl:1;
 	u8 has_soc_remapper_telem:1;
 	u8 has_sriov:1;
+	u8 has_sysctrl:1;
 	u8 needs_scratch:1;
 	u8 skip_guc_pc:1;
 	u8 skip_mtcfg:1;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl.c b/drivers/gpu/drm/xe/xe_sysctrl.c
new file mode 100644
index 000000000000..430bccbdc3b9
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <drm/drm_managed.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+
+#include "regs/xe_sysctrl_regs.h"
+#include "xe_device.h"
+#include "xe_mmio.h"
+#include "xe_printk.h"
+#include "xe_soc_remapper.h"
+#include "xe_sysctrl.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_types.h"
+
+/**
+ * DOC: System Controller (sysctrl)
+ *
+ * The System Controller (sysctrl) is an embedded microcontroller in Intel GPUs
+ * responsible for managing various low-level platform functions. Communication
+ * between the driver and the System Controller occurs via a mailbox interface,
+ * enabling the exchange of commands and responses.
+ *
+ * This module provides initialization routines and helper functions to interact
+ * with the System Controller through the mailbox.
+ */
+
+static void xe_sysctrl_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	xe->soc_remapper.set_sysctrl_region(xe, 0);
+}
+
+/**
+ * xe_sysctrl_init - Initialize System Controller subsystem
+ * @xe: xe device instance
+ *
+ * Entry point for System Controller initialization, called from xe_device_probe.
+ * This function checks platform support and initializes the system controller.
+ *
+ * Return: 0 on success, error code on failure
+ */
+int xe_sysctrl_init(struct xe_device *xe)
+{
+	struct xe_tile *tile = xe_device_get_root_tile(xe);
+	struct xe_sysctrl *sc = &xe->sc;
+	int ret;
+
+	if (!xe->info.has_sysctrl)
+		return 0;
+
+	if (!xe->soc_remapper.set_sysctrl_region)
+		return -ENODEV;
+
+	xe->soc_remapper.set_sysctrl_region(xe, SYSCTRL_MAILBOX_INDEX);
+
+	ret = devm_add_action_or_reset(xe->drm.dev, xe_sysctrl_fini, xe);
+	if (ret)
+		return ret;
+
+	sc->mmio = devm_kzalloc(xe->drm.dev, sizeof(*sc->mmio), GFP_KERNEL);
+	if (!sc->mmio)
+		return -ENOMEM;
+
+	xe_mmio_init(sc->mmio, tile, tile->mmio.regs, tile->mmio.regs_size);
+	sc->mmio->adj_offset = SYSCTRL_BASE;
+	sc->mmio->adj_limit = U32_MAX;
+
+	ret = drmm_mutex_init(&xe->drm, &sc->cmd_lock);
+	if (ret)
+		return ret;
+
+	xe_sysctrl_mailbox_init(sc);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_sysctrl.h b/drivers/gpu/drm/xe/xe_sysctrl.h
new file mode 100644
index 000000000000..ee7826fe4c98
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_H_
+#define _XE_SYSCTRL_H_
+
+struct xe_device;
+
+int xe_sysctrl_init(struct xe_device *xe);
+
+#endif /* _XE_SYSCTRL_H_ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
new file mode 100644
index 000000000000..162208f6018b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
@@ -0,0 +1,445 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <linux/bitfield.h>
+#include <linux/container_of.h>
+#include <linux/errno.h>
+#include <linux/minmax.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include "regs/xe_sysctrl_regs.h"
+#include "xe_device.h"
+#include "xe_device_types.h"
+#include "xe_mmio.h"
+#include "xe_pm.h"
+#include "xe_printk.h"
+#include "xe_sysctrl.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+#include "xe_sysctrl_types.h"
+
+#define MKHI_HDR_GROUP_ID_MASK		GENMASK(7, 0)
+#define MKHI_HDR_COMMAND_MASK		GENMASK(14, 8)
+#define MKHI_HDR_IS_RESPONSE		BIT(15)
+#define MKHI_HDR_RESERVED_MASK		GENMASK(23, 16)
+#define MKHI_HDR_RESULT_MASK		GENMASK(31, 24)
+
+#define XE_SYSCTRL_MKHI_HDR_GROUP_ID(hdr) \
+	FIELD_GET(MKHI_HDR_GROUP_ID_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_COMMAND(hdr) \
+	FIELD_GET(MKHI_HDR_COMMAND_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_IS_RESPONSE(hdr) \
+	FIELD_GET(MKHI_HDR_IS_RESPONSE, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_RESULT(hdr) \
+	FIELD_GET(MKHI_HDR_RESULT_MASK, le32_to_cpu((hdr)->data))
+
+static struct xe_device *sc_to_xe(struct xe_sysctrl *sc)
+{
+	return container_of(sc, struct xe_device, sc);
+}
+
+static bool xe_sysctrl_mailbox_wait_bit_clear(struct xe_sysctrl *sc, u32 bit_mask,
+					      unsigned int timeout_ms)
+{
+	int ret;
+
+	ret = xe_mmio_wait32_not(sc->mmio, SYSCTRL_MB_CTRL, bit_mask, bit_mask,
+				 timeout_ms * 1000, NULL, false);
+
+	return ret == 0;
+}
+
+static bool xe_sysctrl_mailbox_wait_bit_set(struct xe_sysctrl *sc, u32 bit_mask,
+					    unsigned int timeout_ms)
+{
+	int ret;
+
+	ret = xe_mmio_wait32(sc->mmio, SYSCTRL_MB_CTRL, bit_mask, bit_mask,
+			     timeout_ms * 1000, NULL, false);
+
+	return ret == 0;
+}
+
+static int xe_sysctrl_mailbox_write_frame(struct xe_sysctrl *sc, const void *frame,
+					  size_t len)
+{
+	static const struct xe_reg regs[] = {
+		SYSCTRL_MB_DATA0, SYSCTRL_MB_DATA1, SYSCTRL_MB_DATA2, SYSCTRL_MB_DATA3
+	};
+	u32 val[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)] = {0};
+	u32 dw = DIV_ROUND_UP(len, sizeof(u32));
+	u32 i;
+
+	memcpy(val, frame, len);
+
+	for (i = 0; i < dw; i++)
+		xe_mmio_write32(sc->mmio, regs[i], val[i]);
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_read_frame(struct xe_sysctrl *sc, void *frame,
+					 size_t len)
+{
+	static const struct xe_reg regs[] = {
+		SYSCTRL_MB_DATA0, SYSCTRL_MB_DATA1, SYSCTRL_MB_DATA2, SYSCTRL_MB_DATA3
+	};
+	u32 val[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)] = {0};
+	u32 dw = DIV_ROUND_UP(len, sizeof(u32));
+	u32 i;
+
+	for (i = 0; i < dw; i++)
+		val[i] = xe_mmio_read32(sc->mmio, regs[i]);
+
+	memcpy(frame, val, len);
+
+	return 0;
+}
+
+static void xe_sysctrl_mailbox_clear_response(struct xe_sysctrl *sc)
+{
+	xe_mmio_rmw32(sc->mmio, SYSCTRL_MB_CTRL, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, 0);
+}
+
+static int xe_sysctrl_mailbox_prepare_command(struct xe_device *xe,
+					      u8 group_id, u8 command,
+					      const void *data_in, size_t data_in_len,
+					      u8 **mbox_cmd, size_t *cmd_size)
+{
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t size;
+	u8 *buffer;
+
+	if (data_in_len > SYSCTRL_MB_MAX_MESSAGE_SIZE - sizeof(*mkhi_hdr)) {
+		xe_err(xe, "sysctrl: Input data too large: %zu bytes\n", data_in_len);
+		return -EINVAL;
+	}
+
+	size = sizeof(*mkhi_hdr) + data_in_len;
+
+	buffer = kmalloc(size, GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	mkhi_hdr = (struct xe_sysctrl_mailbox_mkhi_msg_hdr *)buffer;
+	mkhi_hdr->data = cpu_to_le32(FIELD_PREP(MKHI_HDR_GROUP_ID_MASK, group_id) |
+				     FIELD_PREP(MKHI_HDR_COMMAND_MASK, command & 0x7F) |
+				     FIELD_PREP(MKHI_HDR_IS_RESPONSE, 0) |
+				     FIELD_PREP(MKHI_HDR_RESERVED_MASK, 0) |
+				     FIELD_PREP(MKHI_HDR_RESULT_MASK, 0));
+
+	if (data_in && data_in_len)
+		memcpy(buffer + sizeof(*mkhi_hdr), data_in, data_in_len);
+
+	*mbox_cmd = buffer;
+	*cmd_size = size;
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_send_frames(struct xe_sysctrl *sc,
+					  const u8 *mbox_cmd,
+					  size_t cmd_size, unsigned int timeout_ms)
+{
+	struct xe_device *xe = sc_to_xe(sc);
+	u32 ctrl_reg, total_frames, frame;
+	size_t bytes_sent, frame_size;
+
+	total_frames = DIV_ROUND_UP(cmd_size, SYSCTRL_MB_FRAME_SIZE);
+
+	if (!xe_sysctrl_mailbox_wait_bit_clear(sc, SYSCTRL_MB_CTRL_RUN_BUSY, timeout_ms)) {
+		xe_err(xe, "sysctrl: Mailbox busy\n");
+		return -EBUSY;
+	}
+
+	sc->phase_bit ^= 1;
+	bytes_sent = 0;
+
+	for (frame = 0; frame < total_frames; frame++) {
+		frame_size = min(cmd_size - bytes_sent, (size_t)SYSCTRL_MB_FRAME_SIZE);
+
+		if (xe_sysctrl_mailbox_write_frame(sc, mbox_cmd + bytes_sent, frame_size)) {
+			xe_err(xe, "sysctrl: Failed to write frame %u\n", frame);
+			sc->phase_bit ^= 1;
+			return -EIO;
+		}
+
+		ctrl_reg = SYSCTRL_MB_CTRL_RUN_BUSY |
+			   FIELD_PREP(MKHI_FRAME_CURRENT_MASK, frame) |
+			   FIELD_PREP(MKHI_FRAME_TOTAL_MASK, total_frames - 1) |
+			   FIELD_PREP(MKHI_FRAME_COMMAND_MASK, SYSCTRL_MKHI_COMMAND) |
+			   (sc->phase_bit ? MKHI_FRAME_PHASE : 0);
+
+		xe_mmio_write32(sc->mmio, SYSCTRL_MB_CTRL, ctrl_reg);
+
+		if (!xe_sysctrl_mailbox_wait_bit_clear(sc, SYSCTRL_MB_CTRL_RUN_BUSY, timeout_ms)) {
+			xe_err(xe, "sysctrl: Frame %u acknowledgment timeout\n", frame);
+			sc->phase_bit ^= 1;
+			return -ETIMEDOUT;
+		}
+
+		bytes_sent += frame_size;
+	}
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_process_first_frame(struct xe_sysctrl *sc,
+						  const struct xe_sysctrl_mailbox_mkhi_msg_hdr *req,
+						  void *out,
+						  size_t out_size,
+						  size_t frame_size,
+						  size_t *payload_bytes)
+{
+	struct xe_device *xe = sc_to_xe(sc);
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *resp_hdr;
+	u32 frame_data[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)];
+	size_t hdr_size = sizeof(*resp_hdr);
+	size_t payload_size;
+	int ret;
+
+	if (frame_size < hdr_size) {
+		xe_err(xe, "sysctrl: Frame size %zu too small\n", frame_size);
+		return -EPROTO;
+	}
+
+	ret = xe_sysctrl_mailbox_read_frame(sc, frame_data, frame_size);
+	if (ret)
+		return ret;
+
+	resp_hdr = (struct xe_sysctrl_mailbox_mkhi_msg_hdr *)frame_data;
+
+	if (!XE_SYSCTRL_MKHI_HDR_IS_RESPONSE(resp_hdr) ||
+	    XE_SYSCTRL_MKHI_HDR_GROUP_ID(resp_hdr) != XE_SYSCTRL_MKHI_HDR_GROUP_ID(req) ||
+	    XE_SYSCTRL_MKHI_HDR_COMMAND(resp_hdr) != XE_SYSCTRL_MKHI_HDR_COMMAND(req)) {
+		xe_err(xe, "SC: Response header mismatch\n");
+		return -EPROTO;
+	}
+
+	if (XE_SYSCTRL_MKHI_HDR_RESULT(resp_hdr) != 0) {
+		xe_err(xe, "SC: Firmware error: 0x%02lx\n",
+		       XE_SYSCTRL_MKHI_HDR_RESULT(resp_hdr));
+		return -EIO;
+	}
+
+	payload_size = frame_size - hdr_size;
+
+	if (payload_size > out_size) {
+		xe_err(xe, "sysctrl: Payload %zu bytes exceeds buffer %zu bytes\n",
+		       payload_size, out_size);
+		return -ENOSPC;
+	}
+
+	if (payload_size > 0)
+		memcpy(out, (u8 *)frame_data + hdr_size, payload_size);
+
+	*payload_bytes = payload_size;
+
+	xe_sysctrl_mailbox_clear_response(sc);
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_process_frame(struct xe_sysctrl *sc,
+					    void *out, size_t frame_size,
+					    unsigned int timeout_ms)
+{
+	struct xe_device *xe = sc_to_xe(sc);
+	int ret;
+
+	if (!xe_sysctrl_mailbox_wait_bit_set(sc, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, timeout_ms)) {
+		xe_err(xe, "sysctrl: Response frame timeout\n");
+		return -ETIMEDOUT;
+	}
+
+	ret = xe_sysctrl_mailbox_read_frame(sc, out, frame_size);
+	if (ret)
+		return ret;
+
+	xe_sysctrl_mailbox_clear_response(sc);
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_receive_frames(struct xe_sysctrl *sc,
+					     const struct xe_sysctrl_mailbox_mkhi_msg_hdr *req,
+					     void *data_out, size_t data_out_len,
+					     size_t *rdata_len, unsigned int timeout_ms)
+{
+	struct xe_device *xe = sc_to_xe(sc);
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t hdr_size = sizeof(*mkhi_hdr);
+	u32 ctrl_reg, total_frames, frame;
+	size_t received = 0;
+	u8 *out = data_out;
+	size_t frame_size;
+	int ret = 0;
+
+	if (!xe_sysctrl_mailbox_wait_bit_set(sc, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, timeout_ms)) {
+		xe_err(xe, "sysctrl: Response frame 0 timeout\n");
+		return -ETIMEDOUT;
+	}
+
+	ctrl_reg = xe_mmio_read32(sc->mmio, SYSCTRL_MB_CTRL);
+	total_frames = FIELD_GET(MKHI_FRAME_TOTAL_MASK, ctrl_reg) + 1;
+
+	if (total_frames == 1)
+		frame_size = min(hdr_size + data_out_len, (size_t)SYSCTRL_MB_FRAME_SIZE);
+	else
+		frame_size = SYSCTRL_MB_FRAME_SIZE;
+
+	ret = xe_sysctrl_mailbox_process_first_frame(sc, req, out, data_out_len,
+						     frame_size, &received);
+	if (ret)
+		return ret;
+
+	out += received;
+
+	for (frame = 1; frame < total_frames; frame++) {
+		size_t remaining;
+
+		if (received >= data_out_len) {
+			xe_err(xe, "sysctrl: Received %zu bytes exceeds buffer %zu bytes\n",
+			       received, data_out_len);
+			return -ENOSPC;
+		}
+
+		remaining = data_out_len - received;
+		frame_size = min_t(size_t, remaining, SYSCTRL_MB_FRAME_SIZE);
+
+		ret = xe_sysctrl_mailbox_process_frame(sc, out, frame_size, timeout_ms);
+		if (ret)
+			break;
+
+		received += frame_size;
+		out += frame_size;
+	}
+
+	*rdata_len = received;
+
+	return ret;
+}
+
+static int xe_sysctrl_mailbox_send_command(struct xe_sysctrl *sc,
+					   const u8 *mbox_cmd, size_t cmd_size,
+					   void *data_out, size_t data_out_len,
+					   size_t *rdata_len, unsigned int timeout_ms)
+{
+	const struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t received;
+	int ret;
+
+	ret = xe_sysctrl_mailbox_send_frames(sc, mbox_cmd, cmd_size, timeout_ms);
+	if (ret)
+		return ret;
+
+	if (!data_out || !rdata_len)
+		return 0;
+
+	mkhi_hdr = (const struct xe_sysctrl_mailbox_mkhi_msg_hdr *)mbox_cmd;
+
+	ret = xe_sysctrl_mailbox_receive_frames(sc, mkhi_hdr, data_out, data_out_len,
+						&received, timeout_ms);
+	if (ret)
+		return ret;
+
+	*rdata_len = received;
+
+	return 0;
+}
+
+/**
+ * xe_sysctrl_mailbox_init - Initialize System Controller mailbox interface
+ * @sc: System controller structure
+ *
+ * Initialize system controller mailbox interface for communication.
+ */
+void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc)
+{
+	u32 ctrl_reg;
+
+	ctrl_reg = xe_mmio_read32(sc->mmio, SYSCTRL_MB_CTRL);
+	sc->phase_bit = (ctrl_reg & MKHI_FRAME_PHASE) ? 1 : 0;
+}
+
+/**
+ * xe_sysctrl_send_command - Send command to System Controller via mailbox
+ * @xe: XE device instance
+ * @cmd: Pointer to xe_sysctrl_mailbox_command structure
+ * @rdata_len: Pointer to store actual response data size (can be NULL)
+ *
+ * Send a command to the System Controller using MKHI protocol. Handles
+ * command preparation, fragmentation, transmission, and response reception.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int xe_sysctrl_send_command(struct xe_device *xe,
+			    struct xe_sysctrl_mailbox_command *cmd,
+			    size_t *rdata_len)
+{
+	struct xe_sysctrl *sc;
+	u8 group_id, command_code;
+	u8 *mbox_cmd = NULL;
+	size_t cmd_size = 0;
+	int ret = 0;
+
+	if (!xe) {
+		pr_err("sysctrl: Invalid device handle\n");
+		return -EINVAL;
+	}
+
+	if (!xe->info.has_sysctrl)
+		return -ENODEV;
+
+	sc = &xe->sc;
+
+	if (!cmd) {
+		xe_err(xe, "sysctrl: Invalid command buffer\n");
+		return -EINVAL;
+	}
+
+	group_id = XE_SYSCTRL_APP_HDR_GROUP_ID(&cmd->header);
+	command_code = XE_SYSCTRL_APP_HDR_COMMAND(&cmd->header);
+
+	if (!cmd->data_in && cmd->data_in_len) {
+		xe_err(xe, "sysctrl: Invalid input parameters\n");
+		return -EINVAL;
+	}
+
+	if (!cmd->data_out && cmd->data_out_len) {
+		xe_err(xe, "sysctrl: Invalid output parameters\n");
+		return -EINVAL;
+	}
+
+	might_sleep();
+
+	ret = xe_sysctrl_mailbox_prepare_command(xe, group_id, command_code,
+						 cmd->data_in, cmd->data_in_len,
+						 &mbox_cmd, &cmd_size);
+	if (ret) {
+		xe_err(xe, "sysctrl: Failed to prepare command: %d\n", ret);
+		return ret;
+	}
+
+	guard(xe_pm_runtime)(xe);
+
+	guard(mutex)(&sc->cmd_lock);
+
+	ret = xe_sysctrl_mailbox_send_command(sc, mbox_cmd, cmd_size,
+					      cmd->data_out, cmd->data_out_len, rdata_len,
+					      SYSCTRL_MB_DEFAULT_TIMEOUT_MS);
+	if (ret)
+		xe_err(xe, "sysctrl: Mailbox command failed: %d\n", ret);
+
+	kfree(mbox_cmd);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
new file mode 100644
index 000000000000..2b64165c8e76
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef __XE_SYSCTRL_MAILBOX_H__
+#define __XE_SYSCTRL_MAILBOX_H__
+
+#include <linux/bitfield.h>
+#include <linux/types.h>
+
+struct xe_sysctrl;
+struct xe_device;
+struct xe_sysctrl_mailbox_command;
+
+#define APP_HDR_GROUP_ID_MASK			GENMASK(7, 0)
+#define APP_HDR_COMMAND_MASK			GENMASK(15, 8)
+#define APP_HDR_VERSION_MASK			GENMASK(23, 16)
+#define APP_HDR_RESERVED_MASK			GENMASK(31, 24)
+
+#define XE_SYSCTRL_APP_HDR_GROUP_ID(hdr) \
+	FIELD_GET(APP_HDR_GROUP_ID_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_APP_HDR_COMMAND(hdr) \
+	FIELD_GET(APP_HDR_COMMAND_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_APP_HDR_VERSION(hdr) \
+	FIELD_GET(APP_HDR_VERSION_MASK, le32_to_cpu((hdr)->data))
+
+void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc);
+int xe_sysctrl_send_command(struct xe_device *xe,
+			    struct xe_sysctrl_mailbox_command *cmd,
+			    size_t *rdata_len);
+
+#endif /* __XE_SYSCTRL_MAILBOX_H__ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
new file mode 100644
index 000000000000..1f315ad1b996
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef __XE_SYSCTRL_MAILBOX_TYPES_H__
+#define __XE_SYSCTRL_MAILBOX_TYPES_H__
+
+#include <linux/types.h>
+
+struct xe_sysctrl_mailbox_mkhi_msg_hdr {
+	__le32 data;
+} __packed;
+
+struct xe_sysctrl_mailbox_app_msg_hdr {
+	__le32 data;
+} __packed;
+
+struct xe_sysctrl_mailbox_command {
+	/** @header: Application message header containing command information */
+	struct xe_sysctrl_mailbox_app_msg_hdr header;
+
+	/** @data_in: Pointer to input payload data (can be NULL if no input data) */
+	void *data_in;
+
+	/** @data_in_len: Size of input payload in bytes (0 if no input data) */
+	size_t data_in_len;
+
+	/** @data_out: Pointer to output buffer for response data (can be NULL if no response) */
+	void *data_out;
+
+	/** @data_out_len: Size of output buffer in bytes (0 if no response expected) */
+	size_t data_out_len;
+};
+
+#endif /* __XE_SYSCTRL_MAILBOX_TYPES_H__ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_types.h b/drivers/gpu/drm/xe/xe_sysctrl_types.h
new file mode 100644
index 000000000000..d4a362564925
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_types.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_TYPES_H_
+#define _XE_SYSCTRL_TYPES_H_
+
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct xe_mmio;
+
+/**
+ * struct xe_sysctrl - System Controller driver context
+ */
+struct xe_sysctrl {
+	/** @mmio: MMIO region for system control registers */
+	struct xe_mmio *mmio;
+
+	/** @cmd_lock: Mutex protecting mailbox command operations */
+	struct mutex cmd_lock;
+
+	/**
+	 * @phase_bit: MKHI message boundary phase toggle bit
+	 *
+	 * Phase bit alternates between 0 and 1 for consecutive
+	 * messages to help distinguish message boundaries.
+	 */
+	bool phase_bit;
+};
+
+#endif /* _XE_SYSCTRL_TYPES_H_ */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (2 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 1/8] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-01-27 22:49   ` Michal Wajdeczko
                     ` (3 more replies)
  2026-01-22 10:06 ` [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
                   ` (7 subsequent siblings)
  11 siblings, 4 replies; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Add error_detected, mmio_enabled, slot_reset and resume
recovery callbacks to handle PCIe Advanced Error Reporting
(AER) errors.

For fatal errors, the device is wedged and becomes
inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
error_detected to request a Secondary Bus Reset (SBR).

For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
error_detected to trigger the mmio_enabled callback. In this callback,
the device is queried to determine the error cause and attempt
recovery based on the error type.

Once the secondary bus reset(SBR) is completed the slot_reset callback
cleanly removes and reprobe the device to restore functionality.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/Makefile          |  1 +
 drivers/gpu/drm/xe/xe_device.h       | 15 +++++
 drivers/gpu/drm/xe/xe_device_types.h |  3 +
 drivers/gpu/drm/xe/xe_pci.c          |  3 +
 drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
 5 files changed, 107 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index f6650ec3ab42..5581f2180b5c 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -98,6 +98,7 @@ xe-y += xe_bb.o \
 	xe_page_reclaim.o \
 	xe_pat.o \
 	xe_pci.o \
+	xe_pci_error.o \
 	xe_pci_rebar.o \
 	xe_pcode.o \
 	xe_pm.o \
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 58d7d8b2fea3..81480248eeff 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
 	return container_of(ttm, struct xe_device, ttm);
 }
 
+static inline bool xe_device_is_in_recovery(struct xe_device *xe)
+{
+	return atomic_read(&xe->in_recovery);
+}
+
+static inline void xe_device_set_in_recovery(struct xe_device *xe)
+{
+	atomic_set(&xe->in_recovery, 1);
+}
+
+static inline void xe_device_clear_in_recovery(struct xe_device *xe)
+{
+	 atomic_set(&xe->in_recovery, 0);
+}
+
 struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent);
 int xe_device_probe_early(struct xe_device *xe);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 944f909a86ad..2d140463dc5e 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -669,6 +669,9 @@ struct xe_device {
 		bool inconsistent_reset;
 	} wedged;
 
+	/** @in_recovery: Indicates if device is in recovery */
+	atomic_t in_recovery;
+
 	/** @bo_device: Struct to control async free of BOs */
 	struct xe_bo_dev {
 		/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index c92cc176f669..e1ee393b7461 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
 };
 #endif
 
+extern const struct pci_error_handlers xe_pci_error_handlers;
+
 static struct pci_driver xe_pci_driver = {
 	.name = DRIVER_NAME,
 	.id_table = pciidlist,
@@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
 	.remove = xe_pci_remove,
 	.shutdown = xe_pci_shutdown,
 	.sriov_configure = xe_pci_sriov_configure,
+	.err_handler = &xe_pci_error_handlers,
 #ifdef CONFIG_PM_SLEEP
 	.driver.pm = &xe_pm_ops,
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
new file mode 100644
index 000000000000..a3cc01afa179
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+#include <drm/drm_drv.h>
+#include <linux/pci.h>
+
+#include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_pci.h"
+#include "xe_uc.h"
+
+static void xe_pci_error_handling(struct pci_dev *pdev)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	xe_device_set_in_recovery(xe);
+	xe_device_declare_wedged(xe);
+
+	pci_disable_device(pdev);
+}
+
+static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
+
+	switch (state) {
+	case pci_channel_io_normal:
+		return PCI_ERS_RESULT_CAN_RECOVER;
+	case pci_channel_io_frozen:
+		xe_pci_error_handling(pdev);
+		return PCI_ERS_RESULT_NEED_RESET;
+	case pci_channel_io_perm_failure:
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
+{
+	dev_err(&pdev->dev, "PCI mmio enabled\n");
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
+{
+	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_err(&pdev->dev, "PCI slot reset\n");
+
+	pci_restore_state(pdev);
+
+	if (pci_enable_device(pdev)) {
+		dev_err(&pdev->dev,
+			"Cannot re-enable PCI device after reset\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	/*
+	 * Secondary Bus Reset wipes out all device memory
+	 * requiring XE KMD to perform a device removal and reprobe.
+	 */
+	pdev->driver->remove(pdev);
+	xe_device_clear_in_recovery(xe);
+
+	if (!pdev->driver->probe(pdev, ent))
+		return PCI_ERS_RESULT_RECOVERED;
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void xe_pci_error_resume(struct pci_dev *pdev)
+{
+	dev_info(&pdev->dev, "PCI error resume\n");
+}
+
+const struct pci_error_handlers xe_pci_error_handlers = {
+	.error_detected	= xe_pci_error_detected,
+	.mmio_enabled	= xe_pci_error_mmio_enabled,
+	.slot_reset	= xe_pci_error_slot_reset,
+	.resume		= xe_pci_error_resume,
+};
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (3 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-01-27 11:23   ` Mallesh, Koujalagi
  2026-01-22 10:06 ` [PATCH 4/8] drm/xe: Skip device access during PCI error recovery Riana Tauro
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Matthew Brost, Himal Prasad Ghimiray

Add devres grouping to handle device resource cleanup during
PCI error recovery.

Secondary Bus Reset (SBR) is triggered by PCI core when the
error_detected/mmio_enabled callbacks return PCI_ERS_RESULT_NEED_RESET.

Once SBR is complete, the slot_reset callback is triggered. SBR wipes
out all device memory requiring XE KMD to perform a device removal and
reprobe.
Calling xe_pci_remove() alone does not free the devres allocated.
Since there are no exported functions to release all devres, group the
devres allocations and release the entire group during slot reset to
ensure proper cleanup.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       | 7 +++++++
 drivers/gpu/drm/xe/xe_device_types.h | 3 +++
 drivers/gpu/drm/xe/xe_pci_error.c    | 1 +
 3 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 16fc6da01357..0cf6480b8aad 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -440,6 +440,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent)
 {
 	struct xe_device *xe;
+	void *devres_id;
 	int err;
 
 	xe_display_driver_set_hooks(&driver);
@@ -448,10 +449,16 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		return ERR_PTR(err);
 
+	devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
+	if (!devres_id)
+		return ERR_PTR(-ENOMEM);
+
 	xe = devm_drm_dev_alloc(&pdev->dev, &driver, struct xe_device, drm);
 	if (IS_ERR(xe))
 		return xe;
 
+	xe->devres_group_id = devres_id;
+
 	err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
 			      xe->drm.anon_inode->i_mapping,
 			      xe->drm.vma_offset_manager, 0);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 2d140463dc5e..3a19e9b5dfae 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -672,6 +672,9 @@ struct xe_device {
 	/** @in_recovery: Indicates if device is in recovery */
 	atomic_t in_recovery;
 
+	/** @devres_group_id: id for devres group */
+	void *devres_group_id;
+
 	/** @bo_device: Struct to control async free of BOs */
 	struct xe_bo_dev {
 		/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
index a3cc01afa179..0960aa5861bc 100644
--- a/drivers/gpu/drm/xe/xe_pci_error.c
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -65,6 +65,7 @@ static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
 	 */
 	pdev->driver->remove(pdev);
 	xe_device_clear_in_recovery(xe);
+	devres_release_group(&pdev->dev, xe->devres_group_id);
 
 	if (!pdev->driver->probe(pdev, ent))
 		return PCI_ERS_RESULT_RECOVERED;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/8] drm/xe: Skip device access during PCI error recovery
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (4 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Matthew Brost, Himal Prasad Ghimiray

When a fatal error occurs and the error_detected callback is
invoked the device is inaccessible. The error_detected callback
wedges the device causing the jobs to timeout.

The timedout handler acquires forcewake to dump devcoredump and
triggers a GT reset. Since the device is inacessible this causes
errors. Skip all mmio accesses and gt reset when the device
is in recovery.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c     | 2 +-
 drivers/gpu/drm/xe/xe_gt.c         | 9 +++++++--
 drivers/gpu/drm/xe/xe_guc_submit.c | 8 ++++----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 0cf6480b8aad..f418ebf04f0f 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -1332,7 +1332,7 @@ void xe_device_declare_wedged(struct xe_device *xe)
 	for_each_gt(gt, xe, id)
 		xe_gt_declare_wedged(gt);
 
-	if (xe_device_wedged(xe)) {
+	if (!xe_device_is_in_recovery(xe) && xe_device_wedged(xe)) {
 		/* If no wedge recovery method is set, use default */
 		if (!xe->wedged.method)
 			xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 9d090d0f2438..9a639223e31c 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -890,8 +890,13 @@ static void gt_reset_worker(struct work_struct *w)
 
 void xe_gt_reset_async(struct xe_gt *gt)
 {
+	struct xe_device *xe = gt_to_xe(gt);
+
 	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
 
+	if (xe_device_is_in_recovery(xe))
+		return;
+
 	/* Don't do a reset while one is already in flight */
 	if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(&gt->uc))
 		return;
@@ -899,9 +904,9 @@ void xe_gt_reset_async(struct xe_gt *gt)
 	xe_gt_info(gt, "reset queued\n");
 
 	/* Pair with put in gt_reset_worker() if work is enqueued */
-	xe_pm_runtime_get_noresume(gt_to_xe(gt));
+	xe_pm_runtime_get_noresume(xe);
 	if (!queue_work(gt->ordered_wq, &gt->reset.worker))
-		xe_pm_runtime_put(gt_to_xe(gt));
+		xe_pm_runtime_put(xe);
 }
 
 void xe_gt_suspend_prepare(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 456f549c16f6..967552c39de0 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1526,7 +1526,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * If devcoredump not captured and GuC capture for the job is not ready
 	 * do manual capture first and decide later if we need to use it
 	 */
-	if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
+	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
 	    !xe_guc_capture_get_matching_and_lock(q)) {
 		/* take force wake before engine register manual capture */
 		CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
@@ -1548,8 +1548,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	set_exec_queue_banned(q);
 
 	/* Kick job / queue off hardware */
-	if (!wedged && (exec_queue_enabled(primary) ||
-			exec_queue_pending_disable(primary))) {
+	if (!xe_device_is_in_recovery(xe) && !wedged &&
+	    (exec_queue_enabled(primary) || exec_queue_pending_disable(primary))) {
 		int ret;
 
 		if (exec_queue_reset(primary))
@@ -1617,7 +1617,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 
 	trace_xe_sched_job_timedout(job);
 
-	if (!exec_queue_killed(q))
+	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q))
 		xe_devcoredump(q, job,
 			       "Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
 			       xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (5 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 4/8] drm/xe: Skip device access during PCI error recovery Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-01-27 12:41   ` Mallesh, Koujalagi
                     ` (2 more replies)
  2026-01-22 10:06 ` [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
                   ` (4 subsequent siblings)
  11 siblings, 3 replies; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Uncorrectable errors from different endpoints in the device are steered to
the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
Downgrade all the errors to non-fatal to prevent PCIe bus driver
from triggering a Secondary Bus Reset (SBR). This allows error
detection, containment and recovery in the driver.

The Uncorrectable Error Severity Register has the 'Uncorrectable
Internal Error Severity' set to fatal by default. Set this to
non-fatal and unmask the error.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/Makefile    |  1 +
 drivers/gpu/drm/xe/xe_device.c |  3 ++
 drivers/gpu/drm/xe/xe_ras.c    | 71 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ras.h    | 13 +++++++
 4 files changed, 88 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_ras.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 5581f2180b5c..85ec53eb0b62 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -110,6 +110,7 @@ xe-y += xe_bb.o \
 	xe_pxp_debugfs.o \
 	xe_pxp_submit.o \
 	xe_query.o \
+	xe_ras.o \
 	xe_range_fence.o \
 	xe_reg_sr.o \
 	xe_reg_whitelist.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index f418ebf04f0f..be89ffc9eade 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -59,6 +59,7 @@
 #include "xe_psmi.h"
 #include "xe_pxp.h"
 #include "xe_query.h"
+#include "xe_ras.h"
 #include "xe_shrinker.h"
 #include "xe_soc_remapper.h"
 #include "xe_survivability_mode.h"
@@ -1019,6 +1020,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_vsec_init(xe);
 
+	xe_ras_init(xe);
+
 	err = xe_sriov_init_late(xe);
 	if (err)
 		goto err_unregister_display;
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
new file mode 100644
index 000000000000..ba5ed37aed28
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+#include <linux/pci.h>
+
+#include "xe_device_types.h"
+#include "xe_ras.h"
+
+#ifdef CONFIG_PCIEAER
+static void unmask_and_downgrade_internal_error(struct xe_device *xe)
+{
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+	struct pci_dev *vsp, *usp;
+	u32 aer_uncorr_sev, aer_uncorr_mask;
+	u16 aer_cap;
+
+	 /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
+	vsp = pci_upstream_bridge(pdev);
+	if (!vsp)
+		return;
+
+	usp = pci_upstream_bridge(vsp);
+	if (!usp)
+		return;
+
+	aer_cap = usp->aer_cap;
+
+	if (!aer_cap)
+		return;
+
+	/*
+	 * All errors are steered to USP which is a PCIe AER Complaint device.
+	 * Downgrade all the errors to non-fatal to prevent PCIe bus driver
+	 * from triggering a Secondary Bus Reset (SBR). This allows error
+	 * detection, containment and recovery in the driver.
+	 *
+	 * The Uncorrectable Error Severity Register has the 'Uncorrectable
+	 * Internal Error Severity' set to fatal by default. Set this to
+	 * non-fatal and unmask the error.
+	 */
+
+	/* Initialize Uncorrectable Error Severity Register */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
+	aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
+	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
+
+	/* Initialize Uncorrectable Error Mask Register */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
+	aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
+	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+
+	pci_save_state(usp);
+}
+#endif
+
+/**
+ * xe_ras_init - Initialize Xe RAS
+ * @xe: xe device instance
+ *
+ * Initialize Xe RAS
+ */
+void xe_ras_init(struct xe_device *xe)
+{
+	if (!xe->info.has_sysctrl)
+		return;
+
+#ifdef CONFIG_PCIEAER
+	unmask_and_downgrade_internal_error(xe);
+#endif
+}
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
new file mode 100644
index 000000000000..14cb973603e7
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_RAS_H_
+#define _XE_RAS_H_
+
+struct xe_device;
+
+void xe_ras_init(struct xe_device *xe);
+
+#endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (6 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-02-23 14:19   ` Mallesh, Koujalagi
  2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Add the sysctrl commands and response structures for Uncorrectable
Core Compute errors.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_ras.c                   |  53 +++++++
 drivers/gpu/drm/xe/xe_ras_types.h             | 131 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  13 ++
 3 files changed, 197 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h

diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index ba5ed37aed28..ace08d8d8d46 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -4,9 +4,62 @@
  */
 #include <linux/pci.h>
 
+#include "xe_assert.h"
 #include "xe_device_types.h"
 #include "xe_ras.h"
 
+/* Severity classification of detected errors */
+enum xe_ras_severity {
+	XE_RAS_SEVERITY_NOT_SUPPORTED = 0,
+	XE_RAS_SEVERITY_CORRECTABLE,
+	XE_RAS_SEVERITY_UNCORRECTABLE,
+	XE_RAS_SEVERITY_INFORMATIONAL,
+	XE_RAS_SEVERITY_MAX
+};
+
+/* major IP blocks where errors can originate */
+enum xe_ras_component {
+	XE_RAS_COMPONENT_NOT_SUPPORTED = 0,
+	XE_RAS_COMPONENT_DEVICE_MEMORY,
+	XE_RAS_COMPONENT_CORE_COMPUTE,
+	XE_RAS_COMPONENT_RESERVED,
+	XE_RAS_COMPONENT_PCIE,
+	XE_RAS_COMPONENT_FABRIC,
+	XE_RAS_COMPONENT_SOC,
+	XE_RAS_COMPONENT_MAX
+};
+
+static const char * const xe_ras_severities[] = {
+	[XE_RAS_SEVERITY_NOT_SUPPORTED]		= "Not Supported",
+	[XE_RAS_SEVERITY_CORRECTABLE]		= "Correctable",
+	[XE_RAS_SEVERITY_UNCORRECTABLE]		= "Uncorrectable",
+	[XE_RAS_SEVERITY_INFORMATIONAL]		= "Informational",
+};
+
+static const char * const xe_ras_components[] = {
+	[XE_RAS_COMPONENT_NOT_SUPPORTED]	= "Not Supported",
+	[XE_RAS_COMPONENT_DEVICE_MEMORY]	= "Device Memory",
+	[XE_RAS_COMPONENT_CORE_COMPUTE]		= "Core Compute",
+	[XE_RAS_COMPONENT_RESERVED]		= "Reserved",
+	[XE_RAS_COMPONENT_PCIE]			= "PCIe",
+	[XE_RAS_COMPONENT_FABRIC]		= "Fabric",
+	[XE_RAS_COMPONENT_SOC]			= "SoC",
+};
+
+static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
+{
+	xe_assert(xe, severity < XE_RAS_SEVERITY_MAX);
+
+	return xe_ras_severities[severity];
+}
+
+static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
+{
+	xe_assert(xe, comp < XE_RAS_COMPONENT_MAX);
+
+	return xe_ras_components[comp];
+}
+
 #ifdef CONFIG_PCIEAER
 static void unmask_and_downgrade_internal_error(struct xe_device *xe)
 {
diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
new file mode 100644
index 000000000000..c7a930c16f68
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_RAS_TYPES_H_
+#define _XE_RAS_TYPES_H_
+
+#include <linux/types.h>
+
+#define XE_RAS_MAX_ERROR_DETAILS	16
+
+/**
+ * struct xe_ras_error_common - Common RAS error class
+ *
+ * This structure contains error severity and component information
+ * across all products
+ */
+struct xe_ras_error_common {
+	/** @severity: Error Severity */
+	u8 severity;
+	/** @component: IP where the error originated */
+	u8 component;
+} __packed;
+
+/**
+ * struct xe_ras_error_unit - Error unit information
+ */
+struct xe_ras_error_unit {
+	/** @tile: Tile identifier */
+	u8 tile;
+	/** @instance: Instance identifier within a component */
+	u32 instance;
+} __packed;
+
+/**
+ * struct xe_ras_error_cause - Error cause information
+ */
+struct xe_ras_error_cause {
+	/** @cause: Cause */
+	u32 cause;
+	/** @reserved: For future use */
+	u8 reserved;
+} __packed;
+
+/**
+ * struct xe_ras_error_product - Error fields that are specific to the product
+ */
+struct xe_ras_error_product {
+	/** @unit: Unit within IP block */
+	struct xe_ras_error_unit unit;
+	/** @error_cause: Cause/checker */
+	struct xe_ras_error_cause error_cause;
+} __packed;
+
+/**
+ * struct xe_ras_error_class - Complete RAS Error Class
+ *
+ * This structure provides the complete error classification by combining
+ * the common error class with the product-specific error class.
+ */
+struct xe_ras_error_class {
+	/** @common: Common error severity and component */
+	struct xe_ras_error_common common;
+	/** @product: Product-specific unit and cause */
+	struct xe_ras_error_product product;
+} __packed;
+
+/**
+ * struct xe_ras_error_array - Details of the error types
+ */
+struct xe_ras_error_array {
+	/** @error_class: Error class */
+	struct xe_ras_error_class error_class;
+	/** @timestamp: Timestamp */
+	u64 timestamp;
+	/** @error_details: Error details specific to the class */
+	u32 error_details[XE_RAS_MAX_ERROR_DETAILS];
+} __packed;
+
+/**
+ * struct xe_ras_get_error_response - Response for XE_SYSCTRL_GET_SOC_ERROR
+ */
+struct xe_ras_get_error_response {
+	/** @num_errors: No of errors reported in this response */
+	u8 num_errors;
+	/** @additional_errors: Indicates if the errors are pending */
+	u8 additional_errors;
+	/** @error_arr: Array of up to 3 errors */
+	struct xe_ras_error_array error_arr[3];
+} __packed;
+
+/**
+ * struct xe_ras_compute_error: Error details of Compute error
+ */
+struct xe_ras_compute_error {
+	/** @error_log_header: Error Source and type */
+	u32 error_log_header;
+	/** @internal_error_log: Internal Error log */
+	u32 internal_error_log;
+	/** @fabric_log: Fabric Error log */
+	u32 fabric_log;
+	/** @internal_error_addr_log0: Internal Error addr log */
+	u32 internal_error_addr_log0;
+	/** @internal_error_addr_log1: Internal Error addr log */
+	u32 internal_error_addr_log1;
+	/** @packet_log0: Packet log */
+	u32 packet_log0;
+	/** @packet_log1: Packet log */
+	u32 packet_log1;
+	/** @packet_log2: Packet log */
+	u32 packet_log2;
+	/** @packet_log3: Packet log */
+	u32 packet_log3;
+	/** @packet_log4: Packet log */
+	u32 packet_log4;
+	/** @misc_log0: Misc log */
+	u32 misc_log0;
+	/** @misc_log1: Misc log */
+	u32 misc_log1;
+	/** @spare_log0: Spare log */
+	u32 spare_log0;
+	/** @spare_log1: Spare log */
+	u32 spare_log1;
+	/** @spare_log2: Spare log */
+	u32 spare_log2;
+	/** @spare_log3: Spare log */
+	u32 spare_log3;
+} __packed;
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
index 1f315ad1b996..45ef10f5cfa2 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -8,6 +8,19 @@
 
 #include <linux/types.h>
 
+/**
+ * enum xe_sysctrl_mailbox_command_id - RAS Command ID's for GFSP group
+ *
+ * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Get basic error information
+ */
+enum xe_sysctrl_mailbox_command_id {
+	XE_SYSCTRL_CMD_GET_SOC_ERROR = 1
+};
+
+enum xe_sysctrl_group {
+	XE_SYSCTRL_GROUP_GFSP = 1
+};
+
 struct xe_sysctrl_mailbox_mkhi_msg_hdr {
 	__le32 data;
 } __packed;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (7 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-01-27 11:44   ` Mallesh, Koujalagi
                     ` (2 more replies)
  2026-01-22 10:06 ` [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
                   ` (2 subsequent siblings)
  11 siblings, 3 replies; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Uncorrectable Core-Compute errors are classified into Global and Local
errors.

Global error is an error that affects the entire device requiring a
reset. This type of error is not isolated. When an AER is reported and
error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.

A Local error is confined to a specific component or context like a
engine. These errors can be contained and recovered by resetting
only the affected part without distrupting the rest of the device.

Upon detection of an Uncorrectable Local Core-Compute error, an AER is
generated and GuC is notified of the error. The KMD then sets
the context as non-runnable and initiates an engine reset.
(TODO: GuC <->KMD communication for the error).
Since the error is contained and recovered, PCI error handling
callback returns PCI_ERS_RESULT_RECOVERED.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_ras.c | 109 +++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_ras.h |   3 +
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index ace08d8d8d46..2a98cb116dc7 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -2,11 +2,16 @@
 /*
  * Copyright © 2026 Intel Corporation
  */
-#include <linux/pci.h>
-
 #include "xe_assert.h"
 #include "xe_device_types.h"
+#include "xe_printk.h"
 #include "xe_ras.h"
+#include "xe_ras_types.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+
+#define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
+#define GLOBAL_UNCORR_ERROR			2
 
 /* Severity classification of detected errors */
 enum xe_ras_severity {
@@ -60,6 +65,106 @@ static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
 	return xe_ras_components[comp];
 }
 
+static void log_ras_error(struct xe_device *xe, struct xe_ras_error_class *error_class)
+{
+	struct xe_ras_error_common common_info = error_class->common;
+	struct xe_ras_error_product product_info = error_class->product;
+	u8 tile = product_info.unit.tile;
+	u32 instance = product_info.unit.instance;
+	u32 cause = product_info.error_cause.cause;
+
+	xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected Cause: 0x%x",
+	       tile, instance, severity_to_str(xe, common_info.severity),
+	       comp_to_str(xe, common_info.component), cause);
+}
+
+static pci_ers_result_t handle_compute_errors(struct xe_device *xe, struct xe_ras_error_array *arr)
+{
+	struct xe_ras_compute_error *error_info = (struct xe_ras_compute_error *)arr->error_details;
+	u8 uncorr_type;
+
+	uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info->error_log_header);
+	log_ras_error(xe, &arr->error_class);
+
+	xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected error type %u\n",
+	       arr->timestamp, uncorr_type);
+
+	/* Request a RESET if error is global */
+	if (uncorr_type == GLOBAL_UNCORR_ERROR)
+		return PCI_ERS_RESULT_NEED_RESET;
+
+	/* Local errors are recovered using a engine reset */
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * xe_ras_process_errors - Process and contain hardware errors
+ * @xe: xe device instance
+ *
+ * Get error details from system controller and return recovery
+ * method. Called only from PCI error handling.
+ *
+ * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery needed,
+ * PCI_ERS_RESULT_NEED_RESET otherwise.
+ */
+pci_ers_result_t xe_ras_process_errors(struct xe_device *xe)
+{
+	struct xe_sysctrl_mailbox_command command = {0};
+	struct xe_sysctrl_mailbox_app_msg_hdr msg_hdr = {0};
+	struct xe_ras_get_error_response response;
+	u32 req_hdr;
+	size_t rlen;
+	int ret;
+
+	if (!xe->info.has_sysctrl)
+		return PCI_ERS_RESULT_NEED_RESET;
+
+	req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
+		  FIELD_PREP(APP_HDR_COMMAND_MASK, XE_SYSCTRL_CMD_GET_SOC_ERROR);
+
+	msg_hdr.data = req_hdr;
+	command.header = msg_hdr;
+	command.data_out = &response;
+	command.data_out_len = sizeof(response);
+
+	do {
+		memset(&response, 0, sizeof(response));
+		rlen = 0;
+
+		ret = xe_sysctrl_send_command(xe, &command, &rlen);
+		if (ret || !rlen) {
+			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
+			goto err;
+		}
+
+		if (rlen != sizeof(response)) {
+			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
+			goto err;
+		}
+
+		for (int i = 0; i < response.num_errors; i++) {
+			struct xe_ras_error_array arr = response.error_arr[i];
+			struct xe_ras_error_class error_class;
+			u8 component;
+
+			error_class = arr.error_class;
+			component = error_class.common.component;
+
+			if (component == XE_RAS_COMPONENT_CORE_COMPUTE) {
+				ret = handle_compute_errors(xe, &arr);
+				if (ret == PCI_ERS_RESULT_NEED_RESET)
+					goto err;
+			}
+		}
+
+	} while (response.additional_errors);
+
+	return PCI_ERS_RESULT_RECOVERED;
+
+err:
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
 #ifdef CONFIG_PCIEAER
 static void unmask_and_downgrade_internal_error(struct xe_device *xe)
 {
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
index 14cb973603e7..28400613c9a9 100644
--- a/drivers/gpu/drm/xe/xe_ras.h
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -6,8 +6,11 @@
 #ifndef _XE_RAS_H_
 #define _XE_RAS_H_
 
+#include <linux/pci.h>
+
 struct xe_device;
 
 void xe_ras_init(struct xe_device *xe);
+pci_ers_result_t xe_ras_process_errors(struct xe_device *xe);
 
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (8 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
@ 2026-01-22 10:06 ` Riana Tauro
  2026-02-24 12:46   ` Mallesh, Koujalagi
  2026-01-22 10:21 ` ✓ Xe.CI.BAT: success for Introduce Xe Uncorrectable Error Handling Patchwork
  2026-01-22 20:28 ` ✗ Xe.CI.Full: failure " Patchwork
  11 siblings, 1 reply; 41+ messages in thread
From: Riana Tauro @ 2026-01-22 10:06 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Query system controller when any non fatal error occurs to check
the type of the error, contain and recover.

The system controller is queried in the mmio_enabled callback.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_pci_error.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
index 0960aa5861bc..2af6c1f45c44 100644
--- a/drivers/gpu/drm/xe/xe_pci_error.c
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -8,6 +8,7 @@
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_pci.h"
+#include "xe_ras.h"
 #include "xe_uc.h"
 
 static void xe_pci_error_handling(struct pci_dev *pdev)
@@ -39,9 +40,15 @@ static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_
 
 static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
 {
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	int ret;
+
 	dev_err(&pdev->dev, "PCI mmio enabled\n");
+	ret = xe_ras_process_errors(xe);
+	if (ret == PCI_ERS_RESULT_NEED_RESET)
+		xe_pci_error_handling(pdev);
 
-	return PCI_ERS_RESULT_NEED_RESET;
+	return ret;
 }
 
 static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* ✓ Xe.CI.BAT: success for Introduce Xe Uncorrectable Error Handling
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (9 preceding siblings ...)
  2026-01-22 10:06 ` [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
@ 2026-01-22 10:21 ` Patchwork
  2026-01-22 20:28 ` ✗ Xe.CI.Full: failure " Patchwork
  11 siblings, 0 replies; 41+ messages in thread
From: Patchwork @ 2026-01-22 10:21 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 3385 bytes --]

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling
URL   : https://patchwork.freedesktop.org/series/160482/
State : success

== Summary ==

CI Bug Log - changes from xe-4433-40800011414446888105f6beae6dd3fac56516aa_BAT -> xe-pw-160482v1_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (12 -> 12)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-160482v1_BAT that come from known issues:

### IGT changes ###

#### Warnings ####

  * igt@xe_evict@evict-beng-small:
    - bat-adlp-7:         [SKIP][1] ([Intel XE#261] / [Intel XE#5564] / [Intel XE#688]) -> [SKIP][2] ([Intel XE#261] / [Intel XE#688]) +9 other tests skip
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/bat-adlp-7/igt@xe_evict@evict-beng-small.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/bat-adlp-7/igt@xe_evict@evict-beng-small.html

  * igt@xe_evict@evict-small-external-cm:
    - bat-adlp-vm:        [SKIP][3] ([Intel XE#261] / [Intel XE#5564] / [Intel XE#688]) -> [SKIP][4] ([Intel XE#261] / [Intel XE#688]) +9 other tests skip
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/bat-adlp-vm/igt@xe_evict@evict-small-external-cm.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/bat-adlp-vm/igt@xe_evict@evict-small-external-cm.html

  * igt@xe_exec_fault_mode@twice-userptr-invalidate-imm:
    - bat-adlp-vm:        [SKIP][5] ([Intel XE#288] / [Intel XE#5561]) -> [SKIP][6] ([Intel XE#288]) +32 other tests skip
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/bat-adlp-vm/igt@xe_exec_fault_mode@twice-userptr-invalidate-imm.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/bat-adlp-vm/igt@xe_exec_fault_mode@twice-userptr-invalidate-imm.html

  * igt@xe_exec_fault_mode@twice-userptr-invalidate-prefetch:
    - bat-adlp-7:         [SKIP][7] ([Intel XE#288] / [Intel XE#5561]) -> [SKIP][8] ([Intel XE#288]) +32 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/bat-adlp-7/igt@xe_exec_fault_mode@twice-userptr-invalidate-prefetch.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/bat-adlp-7/igt@xe_exec_fault_mode@twice-userptr-invalidate-prefetch.html

  
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688


Build changes
-------------

  * IGT: IGT_8711 -> IGT_8712
  * Linux: xe-4433-40800011414446888105f6beae6dd3fac56516aa -> xe-pw-160482v1

  IGT_8711: 38428617bae65b39b306f79217ac922ebee3b477 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  IGT_8712: 8712
  xe-4433-40800011414446888105f6beae6dd3fac56516aa: 40800011414446888105f6beae6dd3fac56516aa
  xe-pw-160482v1: 160482v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/index.html

[-- Attachment #2: Type: text/html, Size: 4794 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* ✗ Xe.CI.Full: failure for Introduce Xe Uncorrectable Error Handling
  2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (10 preceding siblings ...)
  2026-01-22 10:21 ` ✓ Xe.CI.BAT: success for Introduce Xe Uncorrectable Error Handling Patchwork
@ 2026-01-22 20:28 ` Patchwork
  11 siblings, 0 replies; 41+ messages in thread
From: Patchwork @ 2026-01-22 20:28 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 66876 bytes --]

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling
URL   : https://patchwork.freedesktop.org/series/160482/
State : failure

== Summary ==

CI Bug Log - changes from xe-4433-40800011414446888105f6beae6dd3fac56516aa_FULL -> xe-pw-160482v1_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-160482v1_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-160482v1_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (2 -> 2)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-160482v1_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@xe_exec_threads@threads-many-queues:
    - shard-lnl:          [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-2/igt@xe_exec_threads@threads-many-queues.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@xe_exec_threads@threads-many-queues.html
    - shard-bmg:          [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_exec_threads@threads-many-queues.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@xe_exec_threads@threads-many-queues.html

  * igt@xe_exec_threads@threads-multi-queue-cm-shared-vm-rebind:
    - shard-lnl:          NOTRUN -> [SKIP][5] +135 other tests skip
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@xe_exec_threads@threads-multi-queue-cm-shared-vm-rebind.html

  * igt@xe_exec_threads@threads-multi-queue-mixed-shared-vm-userptr-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][6] +129 other tests skip
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_exec_threads@threads-multi-queue-mixed-shared-vm-userptr-rebind.html

  
Known issues
------------

  Here are the changes found in xe-pw-160482v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_addfb_basic@unused-handle:
    - shard-bmg:          [PASS][7] -> [SKIP][8] ([Intel XE#6703]) +159 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@kms_addfb_basic@unused-handle.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_addfb_basic@unused-handle.html

  * igt@kms_big_fb@4-tiled-16bpp-rotate-270:
    - shard-lnl:          NOTRUN -> [SKIP][9] ([Intel XE#1407])
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_big_fb@4-tiled-16bpp-rotate-270.html

  * igt@kms_big_fb@4-tiled-64bpp-rotate-270:
    - shard-bmg:          NOTRUN -> [SKIP][10] ([Intel XE#2327])
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@kms_big_fb@4-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@linear-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-bmg:          NOTRUN -> [SKIP][11] ([Intel XE#7059])
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@kms_big_fb@linear-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-90:
    - shard-lnl:          NOTRUN -> [SKIP][12] ([Intel XE#1124]) +3 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_big_fb@y-tiled-32bpp-rotate-90.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-bmg:          NOTRUN -> [SKIP][13] ([Intel XE#1124]) +3 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p:
    - shard-lnl:          NOTRUN -> [SKIP][14] ([Intel XE#2191])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p.html

  * igt@kms_bw@linear-tiling-2-displays-2160x1440p:
    - shard-bmg:          NOTRUN -> [SKIP][15] ([Intel XE#367]) +1 other test skip
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@kms_bw@linear-tiling-2-displays-2160x1440p.html

  * igt@kms_bw@linear-tiling-4-displays-2160x1440p:
    - shard-lnl:          NOTRUN -> [SKIP][16] ([Intel XE#1512])
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_bw@linear-tiling-4-displays-2160x1440p.html

  * igt@kms_ccs@bad-pixel-format-4-tiled-dg2-mc-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][17] ([Intel XE#2887]) +5 other tests skip
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@kms_ccs@bad-pixel-format-4-tiled-dg2-mc-ccs.html

  * igt@kms_ccs@crc-primary-suspend-y-tiled-gen12-rc-ccs:
    - shard-lnl:          NOTRUN -> [SKIP][18] ([Intel XE#3432])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_ccs@crc-primary-suspend-y-tiled-gen12-rc-ccs.html

  * igt@kms_ccs@crc-primary-suspend-yf-tiled-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][19] ([Intel XE#3432]) +1 other test skip
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_ccs@crc-primary-suspend-yf-tiled-ccs.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-mtl-mc-ccs:
    - shard-lnl:          NOTRUN -> [SKIP][20] ([Intel XE#2887]) +9 other tests skip
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-mtl-mc-ccs.html

  * igt@kms_chamelium_color@degamma:
    - shard-lnl:          NOTRUN -> [SKIP][21] ([Intel XE#306])
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_chamelium_color@degamma.html

  * igt@kms_chamelium_frames@hdmi-cmp-planes-random:
    - shard-lnl:          NOTRUN -> [SKIP][22] ([Intel XE#373]) +6 other tests skip
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_chamelium_frames@hdmi-cmp-planes-random.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
    - shard-bmg:          NOTRUN -> [SKIP][23] ([Intel XE#2252]) +7 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@kms_chamelium_hpd@common-hpd-after-suspend.html

  * igt@kms_content_protection@atomic:
    - shard-bmg:          NOTRUN -> [FAIL][24] ([Intel XE#1178] / [Intel XE#3304]) +1 other test fail
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@dp-mst-lic-type-0:
    - shard-lnl:          NOTRUN -> [SKIP][25] ([Intel XE#307] / [Intel XE#6974])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_content_protection@dp-mst-lic-type-0.html

  * igt@kms_content_protection@uevent:
    - shard-lnl:          NOTRUN -> [SKIP][26] ([Intel XE#3278]) +1 other test skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-5/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@cursor-offscreen-32x32:
    - shard-bmg:          NOTRUN -> [SKIP][27] ([Intel XE#2320]) +1 other test skip
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_cursor_crc@cursor-offscreen-32x32.html

  * igt@kms_cursor_crc@cursor-offscreen-512x170:
    - shard-lnl:          NOTRUN -> [SKIP][28] ([Intel XE#2321]) +1 other test skip
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_cursor_crc@cursor-offscreen-512x170.html

  * igt@kms_cursor_crc@cursor-sliding-512x170:
    - shard-bmg:          NOTRUN -> [SKIP][29] ([Intel XE#2321])
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@kms_cursor_crc@cursor-sliding-512x170.html

  * igt@kms_cursor_crc@cursor-sliding-64x21:
    - shard-lnl:          NOTRUN -> [SKIP][30] ([Intel XE#1424]) +1 other test skip
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_cursor_crc@cursor-sliding-64x21.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][31] ([Intel XE#309]) +2 other tests skip
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-varying-size:
    - shard-bmg:          [PASS][32] -> [DMESG-WARN][33] ([Intel XE#5354])
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@kms_cursor_legacy@flip-vs-cursor-varying-size.html
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@kms_cursor_legacy@flip-vs-cursor-varying-size.html

  * igt@kms_dp_link_training@non-uhbr-sst:
    - shard-lnl:          NOTRUN -> [SKIP][34] ([Intel XE#4354])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@kms_dp_link_training@non-uhbr-sst.html

  * igt@kms_dsc@dsc-basic:
    - shard-lnl:          NOTRUN -> [SKIP][35] ([Intel XE#2244]) +1 other test skip
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@kms_dsc@dsc-basic.html

  * igt@kms_feature_discovery@chamelium:
    - shard-bmg:          NOTRUN -> [SKIP][36] ([Intel XE#2372])
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@kms_feature_discovery@chamelium.html
    - shard-lnl:          NOTRUN -> [SKIP][37] ([Intel XE#701])
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-5/igt@kms_feature_discovery@chamelium.html

  * igt@kms_flip@2x-flip-vs-rmfb-interruptible:
    - shard-lnl:          NOTRUN -> [SKIP][38] ([Intel XE#1421]) +4 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_flip@2x-flip-vs-rmfb-interruptible.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank:
    - shard-bmg:          [PASS][39] -> [DMESG-FAIL][40] ([Intel XE#5545]) +2 other tests dmesg-fail
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_flip@flip-vs-absolute-wf_vblank.html
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_flip@flip-vs-absolute-wf_vblank.html

  * igt@kms_flip@flip-vs-expired-vblank@a-edp1:
    - shard-lnl:          [PASS][41] -> [FAIL][42] ([Intel XE#301]) +1 other test fail
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-2/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html

  * igt@kms_flip@wf_vblank-ts-check@a-edp1:
    - shard-lnl:          [PASS][43] -> [FAIL][44] ([Intel XE#5408] / [Intel XE#6266]) +1 other test fail
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-5/igt@kms_flip@wf_vblank-ts-check@a-edp1.html
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_flip@wf_vblank-ts-check@a-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling:
    - shard-lnl:          NOTRUN -> [SKIP][45] ([Intel XE#1397] / [Intel XE#1745])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][46] ([Intel XE#1397])
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-upscaling:
    - shard-lnl:          NOTRUN -> [SKIP][47] ([Intel XE#1401] / [Intel XE#1745]) +2 other tests skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-upscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling:
    - shard-bmg:          NOTRUN -> [SKIP][48] ([Intel XE#2293] / [Intel XE#2380]) +1 other test skip
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][49] ([Intel XE#1401]) +2 other tests skip
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode:
    - shard-bmg:          NOTRUN -> [SKIP][50] ([Intel XE#2293]) +1 other test skip
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode.html

  * igt@kms_frontbuffer_tracking@drrs-1p-primscrn-spr-indfb-draw-blt:
    - shard-lnl:          NOTRUN -> [SKIP][51] ([Intel XE#651]) +3 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_frontbuffer_tracking@drrs-1p-primscrn-spr-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          NOTRUN -> [SKIP][52] ([Intel XE#2311]) +12 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-bmg:          NOTRUN -> [SKIP][53] ([Intel XE#4141]) +6 other tests skip
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt:
    - shard-lnl:          NOTRUN -> [SKIP][54] ([Intel XE#656]) +28 other tests skip
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-1p-offscreen-pri-shrfb-draw-mmap-wc:
    - shard-lnl:          NOTRUN -> [SKIP][55] ([Intel XE#6312]) +2 other tests skip
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_frontbuffer_tracking@fbcdrrs-1p-offscreen-pri-shrfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-abgr161616f-draw-blt:
    - shard-lnl:          NOTRUN -> [SKIP][56] ([Intel XE#7061]) +2 other tests skip
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_frontbuffer_tracking@fbcdrrs-abgr161616f-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-rgb565-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][57] ([Intel XE#2313]) +19 other tests skip
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbcpsr-rgb565-draw-render.html

  * igt@kms_frontbuffer_tracking@psr-argb161616f-draw-mmap-wc:
    - shard-bmg:          NOTRUN -> [SKIP][58] ([Intel XE#7061]) +3 other tests skip
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_frontbuffer_tracking@psr-argb161616f-draw-mmap-wc.html

  * igt@kms_hdmi_inject@inject-audio:
    - shard-lnl:          NOTRUN -> [SKIP][59] ([Intel XE#1470] / [Intel XE#2853])
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_hdmi_inject@inject-audio.html

  * igt@kms_hdr@invalid-hdr:
    - shard-bmg:          [PASS][60] -> [SKIP][61] ([Intel XE#1503])
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_hdr@invalid-hdr.html
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@kms_hdr@invalid-hdr.html

  * igt@kms_joiner@invalid-modeset-big-joiner:
    - shard-bmg:          NOTRUN -> [SKIP][62] ([Intel XE#6901])
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_joiner@invalid-modeset-big-joiner.html

  * igt@kms_joiner@invalid-modeset-ultra-joiner:
    - shard-lnl:          NOTRUN -> [SKIP][63] ([Intel XE#6900])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_joiner@invalid-modeset-ultra-joiner.html
    - shard-bmg:          NOTRUN -> [SKIP][64] ([Intel XE#6911])
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@kms_joiner@invalid-modeset-ultra-joiner.html

  * igt@kms_pipe_stress@stress-xrgb8888-yftiled:
    - shard-bmg:          NOTRUN -> [SKIP][65] ([Intel XE#6912])
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_pipe_stress@stress-xrgb8888-yftiled.html

  * igt@kms_pipe_stress@stress-xrgb8888-ytiled:
    - shard-lnl:          NOTRUN -> [SKIP][66] ([Intel XE#4329] / [Intel XE#6912])
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_pipe_stress@stress-xrgb8888-ytiled.html

  * igt@kms_plane_lowres@tiling-yf:
    - shard-lnl:          NOTRUN -> [SKIP][67] ([Intel XE#599])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_plane_lowres@tiling-yf.html

  * igt@kms_pm_backlight@brightness-with-dpms:
    - shard-bmg:          NOTRUN -> [SKIP][68] ([Intel XE#2938])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_pm_backlight@brightness-with-dpms.html

  * igt@kms_pm_dc@dc5-retention-flops:
    - shard-lnl:          NOTRUN -> [SKIP][69] ([Intel XE#3309])
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_pm_dc@dc5-retention-flops.html

  * igt@kms_pm_lpsp@kms-lpsp:
    - shard-bmg:          NOTRUN -> [SKIP][70] ([Intel XE#2499])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_pm_lpsp@kms-lpsp.html

  * igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf:
    - shard-lnl:          NOTRUN -> [SKIP][71] ([Intel XE#1406] / [Intel XE#2893] / [Intel XE#4608]) +1 other test skip
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf@pipe-b-edp-1:
    - shard-lnl:          NOTRUN -> [SKIP][72] ([Intel XE#1406] / [Intel XE#4608]) +3 other tests skip
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf@pipe-b-edp-1.html

  * igt@kms_psr2_sf@pr-overlay-primary-update-sf-dmg-area:
    - shard-lnl:          NOTRUN -> [SKIP][73] ([Intel XE#1406] / [Intel XE#2893]) +2 other tests skip
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_psr2_sf@pr-overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@pr-primary-plane-update-sf-dmg-area:
    - shard-bmg:          NOTRUN -> [SKIP][74] ([Intel XE#1406] / [Intel XE#1489]) +6 other tests skip
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_psr2_sf@pr-primary-plane-update-sf-dmg-area.html

  * igt@kms_psr2_su@page_flip-nv12:
    - shard-lnl:          NOTRUN -> [SKIP][75] ([Intel XE#1128] / [Intel XE#1406])
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_psr2_su@page_flip-nv12.html

  * igt@kms_psr@fbc-psr-basic:
    - shard-bmg:          NOTRUN -> [SKIP][76] ([Intel XE#1406] / [Intel XE#2234] / [Intel XE#2850]) +5 other tests skip
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@kms_psr@fbc-psr-basic.html

  * igt@kms_psr@pr-cursor-plane-move:
    - shard-lnl:          NOTRUN -> [SKIP][77] ([Intel XE#1406]) +1 other test skip
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@kms_psr@pr-cursor-plane-move.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-bmg:          NOTRUN -> [SKIP][78] ([Intel XE#1406] / [Intel XE#2414])
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_rotation_crc@primary-rotation-270:
    - shard-lnl:          NOTRUN -> [SKIP][79] ([Intel XE#3414] / [Intel XE#3904]) +1 other test skip
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-5/igt@kms_rotation_crc@primary-rotation-270.html
    - shard-bmg:          NOTRUN -> [SKIP][80] ([Intel XE#3414] / [Intel XE#3904]) +1 other test skip
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_rotation_crc@primary-rotation-270.html

  * igt@kms_sharpness_filter@filter-basic:
    - shard-bmg:          NOTRUN -> [SKIP][81] ([Intel XE#6503])
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_sharpness_filter@filter-basic.html

  * igt@kms_vrr@flipline:
    - shard-bmg:          NOTRUN -> [SKIP][82] ([Intel XE#1499])
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@kms_vrr@flipline.html

  * igt@xe_compute@ccs-mode-basic:
    - shard-bmg:          NOTRUN -> [SKIP][83] ([Intel XE#6599])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@xe_compute@ccs-mode-basic.html

  * igt@xe_compute@ccs-mode-compute-kernel:
    - shard-lnl:          NOTRUN -> [SKIP][84] ([Intel XE#1447]) +1 other test skip
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@xe_compute@ccs-mode-compute-kernel.html

  * igt@xe_eudebug@vma-ufence-faultable:
    - shard-lnl:          NOTRUN -> [SKIP][85] ([Intel XE#4837]) +3 other tests skip
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@xe_eudebug@vma-ufence-faultable.html
    - shard-bmg:          NOTRUN -> [SKIP][86] ([Intel XE#4837])
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_eudebug@vma-ufence-faultable.html

  * igt@xe_eudebug_online@preempt-breakpoint:
    - shard-lnl:          NOTRUN -> [SKIP][87] ([Intel XE#4837] / [Intel XE#6665]) +3 other tests skip
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@xe_eudebug_online@preempt-breakpoint.html

  * igt@xe_eudebug_online@set-breakpoint-sigint-debugger:
    - shard-bmg:          NOTRUN -> [SKIP][88] ([Intel XE#4837] / [Intel XE#6665]) +1 other test skip
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@xe_eudebug_online@set-breakpoint-sigint-debugger.html

  * igt@xe_evict@evict-beng-mixed-threads-large-multi-vm:
    - shard-lnl:          NOTRUN -> [SKIP][89] ([Intel XE#688]) +14 other tests skip
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@xe_evict@evict-beng-mixed-threads-large-multi-vm.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr:
    - shard-bmg:          NOTRUN -> [SKIP][90] ([Intel XE#2322]) +4 other tests skip
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr.html

  * igt@xe_exec_basic@multigpu-no-exec-bindexecqueue-userptr:
    - shard-lnl:          NOTRUN -> [SKIP][91] ([Intel XE#1392]) +4 other tests skip
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@xe_exec_basic@multigpu-no-exec-bindexecqueue-userptr.html

  * igt@xe_exec_multi_queue@many-execs-preempt-mode-fault-userptr-invalidate:
    - shard-bmg:          NOTRUN -> [SKIP][92] ([Intel XE#6874]) +15 other tests skip
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@xe_exec_multi_queue@many-execs-preempt-mode-fault-userptr-invalidate.html

  * igt@xe_exec_multi_queue@max-queues-preempt-mode-dyn-priority-smem:
    - shard-lnl:          NOTRUN -> [SKIP][93] ([Intel XE#6874]) +18 other tests skip
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@xe_exec_multi_queue@max-queues-preempt-mode-dyn-priority-smem.html

  * igt@xe_exec_system_allocator@many-64k-mmap-new-huge:
    - shard-bmg:          NOTRUN -> [SKIP][94] ([Intel XE#5007])
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_exec_system_allocator@many-64k-mmap-new-huge.html

  * igt@xe_exec_system_allocator@many-execqueues-mmap-huge-nomemset:
    - shard-bmg:          NOTRUN -> [SKIP][95] ([Intel XE#4943]) +9 other tests skip
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_system_allocator@many-execqueues-mmap-huge-nomemset.html

  * igt@xe_exec_system_allocator@many-stride-new-prefetch:
    - shard-bmg:          NOTRUN -> [INCOMPLETE][96] ([Intel XE#7098])
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_system_allocator@many-stride-new-prefetch.html

  * igt@xe_exec_system_allocator@once-mmap-race-nomemset:
    - shard-bmg:          [PASS][97] -> [SKIP][98] ([Intel XE#6557] / [Intel XE#6703]) +2 other tests skip
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_exec_system_allocator@once-mmap-race-nomemset.html
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_system_allocator@once-mmap-race-nomemset.html

  * igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-wt-single-vma:
    - shard-lnl:          NOTRUN -> [SKIP][99] ([Intel XE#6196])
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-5/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-wt-single-vma.html

  * igt@xe_exec_system_allocator@process-many-mmap-huge:
    - shard-lnl:          NOTRUN -> [SKIP][100] ([Intel XE#4943]) +11 other tests skip
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@xe_exec_system_allocator@process-many-mmap-huge.html

  * igt@xe_exec_system_allocator@threads-many-large-execqueues-malloc-busy:
    - shard-lnl:          [PASS][101] -> [DMESG-WARN][102] ([Intel XE#7063])
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-3/igt@xe_exec_system_allocator@threads-many-large-execqueues-malloc-busy.html
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@xe_exec_system_allocator@threads-many-large-execqueues-malloc-busy.html

  * igt@xe_exec_threads@threads-multi-queue-mixed-fd-userptr:
    - shard-bmg:          NOTRUN -> [SKIP][103] ([Intel XE#6703]) +24 other tests skip
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_threads@threads-multi-queue-mixed-fd-userptr.html

  * igt@xe_live_ktest@xe_bo:
    - shard-bmg:          [PASS][104] -> [FAIL][105] ([Intel XE#6558]) +1 other test fail
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_live_ktest@xe_bo.html
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_live_ktest@xe_bo.html

  * igt@xe_live_ktest@xe_eudebug:
    - shard-lnl:          NOTRUN -> [SKIP][106] ([Intel XE#2833])
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@xe_live_ktest@xe_eudebug.html

  * igt@xe_mmap@pci-membarrier:
    - shard-lnl:          NOTRUN -> [SKIP][107] ([Intel XE#5100])
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@xe_mmap@pci-membarrier.html

  * igt@xe_module_load@load:
    - shard-bmg:          ([PASS][108], [PASS][109], [PASS][110], [PASS][111], [PASS][112], [PASS][113], [PASS][114], [PASS][115], [PASS][116], [PASS][117], [PASS][118], [PASS][119], [PASS][120], [PASS][121], [PASS][122], [PASS][123], [PASS][124], [PASS][125], [PASS][126], [PASS][127], [PASS][128], [PASS][129], [PASS][130], [PASS][131]) -> ([SKIP][132], [PASS][133], [PASS][134], [PASS][135], [PASS][136], [PASS][137], [PASS][138], [PASS][139], [PASS][140], [PASS][141], [PASS][142], [PASS][143], [PASS][144], [PASS][145], [PASS][146], [PASS][147], [PASS][148], [PASS][149], [PASS][150], [PASS][151], [PASS][152], [PASS][153], [PASS][154], [PASS][155], [PASS][156], [PASS][157]) ([Intel XE#2457])
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@xe_module_load@load.html
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@xe_module_load@load.html
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@xe_module_load@load.html
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@xe_module_load@load.html
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@xe_module_load@load.html
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@xe_module_load@load.html
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@xe_module_load@load.html
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@xe_module_load@load.html
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@xe_module_load@load.html
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@xe_module_load@load.html
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@xe_module_load@load.html
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@xe_module_load@load.html
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@xe_module_load@load.html
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@xe_module_load@load.html
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@xe_module_load@load.html
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@xe_module_load@load.html
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_module_load@load.html
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-10/igt@xe_module_load@load.html
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-10/igt@xe_module_load@load.html
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_module_load@load.html
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_module_load@load.html
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@xe_module_load@load.html
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@xe_module_load@load.html
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-10/igt@xe_module_load@load.html
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_module_load@load.html
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_module_load@load.html
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_module_load@load.html
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@xe_module_load@load.html
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_module_load@load.html
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_module_load@load.html
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_module_load@load.html
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_module_load@load.html
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_module_load@load.html
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@xe_module_load@load.html
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@xe_module_load@load.html
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@xe_module_load@load.html
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@xe_module_load@load.html
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@xe_module_load@load.html
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@xe_module_load@load.html
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_module_load@load.html
   [148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_module_load@load.html
   [149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_module_load@load.html
   [150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@xe_module_load@load.html
   [151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@xe_module_load@load.html
   [152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@xe_module_load@load.html
   [153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@xe_module_load@load.html
   [154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@xe_module_load@load.html
   [155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-4/igt@xe_module_load@load.html
   [156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_module_load@load.html
   [157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_module_load@load.html

  * igt@xe_multigpu_svm@mgpu-coherency-fail-basic:
    - shard-lnl:          NOTRUN -> [SKIP][158] ([Intel XE#6964]) +2 other tests skip
   [158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@xe_multigpu_svm@mgpu-coherency-fail-basic.html

  * igt@xe_multigpu_svm@mgpu-pagefault-conflict:
    - shard-bmg:          NOTRUN -> [SKIP][159] ([Intel XE#6964])
   [159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_multigpu_svm@mgpu-pagefault-conflict.html

  * igt@xe_oa@oa-tlb-invalidate:
    - shard-bmg:          NOTRUN -> [SKIP][160] ([Intel XE#2248])
   [160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@xe_oa@oa-tlb-invalidate.html

  * igt@xe_pat@pat-index-xehpc:
    - shard-lnl:          NOTRUN -> [SKIP][161] ([Intel XE#1420] / [Intel XE#2838])
   [161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-4/igt@xe_pat@pat-index-xehpc.html
    - shard-bmg:          NOTRUN -> [SKIP][162] ([Intel XE#1420])
   [162]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-10/igt@xe_pat@pat-index-xehpc.html

  * igt@xe_pm@d3cold-mmap-vram:
    - shard-bmg:          NOTRUN -> [SKIP][163] ([Intel XE#2284])
   [163]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_pm@d3cold-mmap-vram.html

  * igt@xe_pm@d3hot-i2c:
    - shard-lnl:          NOTRUN -> [SKIP][164] ([Intel XE#5742])
   [164]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@xe_pm@d3hot-i2c.html
    - shard-bmg:          NOTRUN -> [SKIP][165] ([Intel XE#5742])
   [165]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-9/igt@xe_pm@d3hot-i2c.html

  * igt@xe_pm@d3hot-mmap-vram:
    - shard-lnl:          NOTRUN -> [SKIP][166] ([Intel XE#1948])
   [166]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@xe_pm@d3hot-mmap-vram.html

  * igt@xe_pm@s3-exec-after:
    - shard-lnl:          NOTRUN -> [SKIP][167] ([Intel XE#584])
   [167]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-2/igt@xe_pm@s3-exec-after.html

  * igt@xe_pmu@all-fn-engine-activity-load:
    - shard-lnl:          NOTRUN -> [SKIP][168] ([Intel XE#4650])
   [168]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@xe_pmu@all-fn-engine-activity-load.html

  * igt@xe_query@multigpu-query-engines:
    - shard-lnl:          NOTRUN -> [SKIP][169] ([Intel XE#944]) +2 other tests skip
   [169]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@xe_query@multigpu-query-engines.html

  * igt@xe_query@multigpu-query-invalid-uc-fw-version-mbz:
    - shard-bmg:          NOTRUN -> [SKIP][170] ([Intel XE#944]) +2 other tests skip
   [170]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_query@multigpu-query-invalid-uc-fw-version-mbz.html

  * igt@xe_sriov_flr@flr-vfs-parallel:
    - shard-lnl:          NOTRUN -> [SKIP][171] ([Intel XE#4273])
   [171]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-3/igt@xe_sriov_flr@flr-vfs-parallel.html

  
#### Possible fixes ####

  * igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-c-edp-1:
    - shard-lnl:          [FAIL][172] ([Intel XE#6054]) -> [PASS][173] +3 other tests pass
   [172]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-7/igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-c-edp-1.html
   [173]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-5/igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-c-edp-1.html

  * igt@kms_atomic@plane-invalid-params@pipe-a-edp-1:
    - shard-lnl:          [DMESG-WARN][174] ([Intel XE#7063]) -> [PASS][175] +7 other tests pass
   [174]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-1/igt@kms_atomic@plane-invalid-params@pipe-a-edp-1.html
   [175]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-1/igt@kms_atomic@plane-invalid-params@pipe-a-edp-1.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:
    - shard-bmg:          [FAIL][176] ([Intel XE#6715]) -> [PASS][177]
   [176]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
   [177]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-1/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-bmg:          [INCOMPLETE][178] ([Intel XE#2049] / [Intel XE#2597]) -> [PASS][179] +1 other test pass
   [178]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@kms_flip@flip-vs-suspend.html
   [179]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_pm_dc@dc5-dpms:
    - shard-lnl:          [FAIL][180] ([Intel XE#718]) -> [PASS][181]
   [180]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-5/igt@kms_pm_dc@dc5-dpms.html
   [181]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@kms_pm_dc@dc5-dpms.html

  * igt@kms_sharpness_filter@filter-formats@pipe-a-edp-1-nv12:
    - shard-lnl:          [DMESG-WARN][182] ([Intel XE#4537]) -> [PASS][183] +1 other test pass
   [182]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-4/igt@kms_sharpness_filter@filter-formats@pipe-a-edp-1-nv12.html
   [183]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-8/igt@kms_sharpness_filter@filter-formats@pipe-a-edp-1-nv12.html

  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:
    - shard-lnl:          [FAIL][184] ([Intel XE#2142]) -> [PASS][185] +1 other test pass
   [184]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-lnl-2/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html
   [185]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-lnl-7/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html

  * igt@xe_evict@evict-beng-mixed-many-threads-small:
    - shard-bmg:          [INCOMPLETE][186] ([Intel XE#6321]) -> [PASS][187]
   [186]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@xe_evict@evict-beng-mixed-many-threads-small.html
   [187]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-8/igt@xe_evict@evict-beng-mixed-many-threads-small.html

  
#### Warnings ####

  * igt@kms_big_fb@4-tiled-16bpp-rotate-90:
    - shard-bmg:          [SKIP][188] ([Intel XE#2327]) -> [SKIP][189] ([Intel XE#6703]) +1 other test skip
   [188]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_big_fb@4-tiled-16bpp-rotate-90.html
   [189]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_big_fb@4-tiled-16bpp-rotate-90.html

  * igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-bmg:          [SKIP][190] ([Intel XE#7059]) -> [SKIP][191] ([Intel XE#6703])
   [190]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip.html
   [191]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-bmg:          [SKIP][192] ([Intel XE#1124]) -> [SKIP][193] ([Intel XE#6703]) +3 other tests skip
   [192]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html
   [193]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_ccs@bad-aux-stride-y-tiled-gen12-mc-ccs:
    - shard-bmg:          [SKIP][194] ([Intel XE#2887]) -> [SKIP][195] ([Intel XE#6703]) +4 other tests skip
   [194]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_ccs@bad-aux-stride-y-tiled-gen12-mc-ccs.html
   [195]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_ccs@bad-aux-stride-y-tiled-gen12-mc-ccs.html

  * igt@kms_ccs@crc-primary-suspend-y-tiled-ccs:
    - shard-bmg:          [SKIP][196] ([Intel XE#3432]) -> [SKIP][197] ([Intel XE#6703])
   [196]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-10/igt@kms_ccs@crc-primary-suspend-y-tiled-ccs.html
   [197]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_ccs@crc-primary-suspend-y-tiled-ccs.html

  * igt@kms_chamelium_audio@hdmi-audio-edid:
    - shard-bmg:          [SKIP][198] ([Intel XE#2252]) -> [SKIP][199] ([Intel XE#6703]) +3 other tests skip
   [198]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@kms_chamelium_audio@hdmi-audio-edid.html
   [199]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_chamelium_audio@hdmi-audio-edid.html

  * igt@kms_chamelium_color@ctm-blue-to-red:
    - shard-bmg:          [SKIP][200] ([Intel XE#2325]) -> [SKIP][201] ([Intel XE#6703])
   [200]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@kms_chamelium_color@ctm-blue-to-red.html
   [201]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_chamelium_color@ctm-blue-to-red.html

  * igt@kms_content_protection@content-type-change:
    - shard-bmg:          [SKIP][202] ([Intel XE#2341]) -> [SKIP][203] ([Intel XE#6703])
   [202]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_content_protection@content-type-change.html
   [203]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_content_protection@content-type-change.html

  * igt@kms_cursor_crc@cursor-offscreen-256x85:
    - shard-bmg:          [SKIP][204] ([Intel XE#2320]) -> [SKIP][205] ([Intel XE#6703]) +1 other test skip
   [204]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@kms_cursor_crc@cursor-offscreen-256x85.html
   [205]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_cursor_crc@cursor-offscreen-256x85.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-upscaling:
    - shard-bmg:          [SKIP][206] ([Intel XE#2293] / [Intel XE#2380]) -> [SKIP][207] ([Intel XE#6703]) +1 other test skip
   [206]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-upscaling.html
   [207]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-upscaling.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-move:
    - shard-bmg:          [SKIP][208] ([Intel XE#4141]) -> [SKIP][209] ([Intel XE#6703]) +3 other tests skip
   [208]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-move.html
   [209]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][210] ([Intel XE#2311]) -> [SKIP][211] ([Intel XE#6703]) +6 other tests skip
   [210]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-onoff.html
   [211]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-tiling-y:
    - shard-bmg:          [SKIP][212] ([Intel XE#2352]) -> [SKIP][213] ([Intel XE#6703])
   [212]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_frontbuffer_tracking@fbcdrrs-tiling-y.html
   [213]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcdrrs-tiling-y.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-shrfb-pgflip-blt:
    - shard-bmg:          [SKIP][214] ([Intel XE#2313]) -> [SKIP][215] ([Intel XE#6703]) +5 other tests skip
   [214]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-shrfb-pgflip-blt.html
   [215]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt:
    - shard-bmg:          [SKIP][216] ([Intel XE#7061]) -> [SKIP][217] ([Intel XE#6703]) +2 other tests skip
   [216]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt.html
   [217]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_frontbuffer_tracking@psr-abgr161616f-draw-blt.html

  * igt@kms_hdr@brightness-with-hdr:
    - shard-bmg:          [SKIP][218] ([Intel XE#3374] / [Intel XE#3544]) -> [SKIP][219] ([Intel XE#6703])
   [218]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@kms_hdr@brightness-with-hdr.html
   [219]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_hdr@brightness-with-hdr.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75:
    - shard-bmg:          [SKIP][220] ([Intel XE#6886]) -> [SKIP][221] ([Intel XE#6703])
   [220]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75.html
   [221]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75.html

  * igt@kms_pm_rpm@modeset-lpsp-stress-no-wait:
    - shard-bmg:          [SKIP][222] ([Intel XE#1439] / [Intel XE#3141] / [Intel XE#836]) -> [SKIP][223] ([Intel XE#6693])
   [222]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_pm_rpm@modeset-lpsp-stress-no-wait.html
   [223]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_pm_rpm@modeset-lpsp-stress-no-wait.html

  * igt@kms_psr2_sf@pr-cursor-plane-move-continuous-exceed-sf:
    - shard-bmg:          [SKIP][224] ([Intel XE#1406] / [Intel XE#1489]) -> [SKIP][225] ([Intel XE#1406] / [Intel XE#6703]) +2 other tests skip
   [224]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-10/igt@kms_psr2_sf@pr-cursor-plane-move-continuous-exceed-sf.html
   [225]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_psr2_sf@pr-cursor-plane-move-continuous-exceed-sf.html

  * igt@kms_psr@pr-primary-blt:
    - shard-bmg:          [SKIP][226] ([Intel XE#1406] / [Intel XE#2234] / [Intel XE#2850]) -> [SKIP][227] ([Intel XE#1406] / [Intel XE#6703]) +2 other tests skip
   [226]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@kms_psr@pr-primary-blt.html
   [227]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_psr@pr-primary-blt.html

  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-180:
    - shard-bmg:          [SKIP][228] ([Intel XE#2330]) -> [SKIP][229] ([Intel XE#6703]) +1 other test skip
   [228]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_rotation_crc@primary-y-tiled-reflect-x-180.html
   [229]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_rotation_crc@primary-y-tiled-reflect-x-180.html

  * igt@kms_scaling_modes@scaling-mode-full:
    - shard-bmg:          [SKIP][230] ([Intel XE#2413]) -> [SKIP][231] ([Intel XE#6703])
   [230]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@kms_scaling_modes@scaling-mode-full.html
   [231]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@kms_scaling_modes@scaling-mode-full.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][232] ([Intel XE#2426]) -> [SKIP][233] ([Intel XE#6703])
   [232]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [233]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@kms_vrr@flip-suspend:
    - shard-bmg:          [SKIP][234] ([Intel XE#1499]) -> [SKIP][235] ([Intel XE#6703])
   [234]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@kms_vrr@flip-suspend.html
   [235]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@kms_vrr@flip-suspend.html

  * igt@xe_eudebug@basic-client:
    - shard-bmg:          [SKIP][236] ([Intel XE#4837]) -> [SKIP][237] ([Intel XE#6703])
   [236]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@xe_eudebug@basic-client.html
   [237]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_eudebug@basic-client.html

  * igt@xe_eudebug_online@pagefault-read-stress:
    - shard-bmg:          [INCOMPLETE][238] ([Intel XE#2594]) -> [SKIP][239] ([Intel XE#6665] / [Intel XE#6681])
   [238]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@xe_eudebug_online@pagefault-read-stress.html
   [239]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-7/igt@xe_eudebug_online@pagefault-read-stress.html

  * igt@xe_eudebug_online@pagefault-write:
    - shard-bmg:          [SKIP][240] ([Intel XE#4837] / [Intel XE#6665]) -> [SKIP][241] ([Intel XE#6703])
   [240]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@xe_eudebug_online@pagefault-write.html
   [241]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_eudebug_online@pagefault-write.html

  * igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr:
    - shard-bmg:          [SKIP][242] ([Intel XE#2322]) -> [SKIP][243] ([Intel XE#6703]) +1 other test skip
   [242]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-7/igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr.html
   [243]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr.html

  * igt@xe_exec_multi_queue@max-queues-basic-smem:
    - shard-bmg:          [SKIP][244] ([Intel XE#6874]) -> [SKIP][245] ([Intel XE#6703]) +7 other tests skip
   [244]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-8/igt@xe_exec_multi_queue@max-queues-basic-smem.html
   [245]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_multi_queue@max-queues-basic-smem.html

  * igt@xe_exec_system_allocator@many-64k-mmap-huge-nomemset:
    - shard-bmg:          [SKIP][246] ([Intel XE#5007]) -> [SKIP][247] ([Intel XE#6703])
   [246]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-4/igt@xe_exec_system_allocator@many-64k-mmap-huge-nomemset.html
   [247]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-2/igt@xe_exec_system_allocator@many-64k-mmap-huge-nomemset.html

  * igt@xe_exec_system_allocator@many-large-execqueues-mmap-huge:
    - shard-bmg:          [SKIP][248] ([Intel XE#4943]) -> [SKIP][249] ([Intel XE#6703]) +10 other tests skip
   [248]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-1/igt@xe_exec_system_allocator@many-large-execqueues-mmap-huge.html
   [249]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_exec_system_allocator@many-large-execqueues-mmap-huge.html

  * igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit:
    - shard-bmg:          [SKIP][250] ([Intel XE#2229]) -> [FAIL][251] ([Intel XE#6558])
   [250]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-9/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html
   [251]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html

  * igt@xe_pxp@pxp-termination-key-update-post-suspend:
    - shard-bmg:          [SKIP][252] ([Intel XE#4733]) -> [SKIP][253] ([Intel XE#6703])
   [252]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4433-40800011414446888105f6beae6dd3fac56516aa/shard-bmg-3/igt@xe_pxp@pxp-termination-key-update-post-suspend.html
   [253]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/shard-bmg-3/igt@xe_pxp@pxp-termination-key-update-post-suspend.html

  
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1128]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1128
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1397]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1397
  [Intel XE#1401]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1401
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
  [Intel XE#1420]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1420
  [Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1439]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1439
  [Intel XE#1447]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1447
  [Intel XE#1470]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1470
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
  [Intel XE#1512]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1512
  [Intel XE#1745]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1745
  [Intel XE#1948]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1948
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2142]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2142
  [Intel XE#2191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2191
  [Intel XE#2229]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2229
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2248]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2248
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2293]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2293
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2325]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2325
  [Intel XE#2327]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2327
  [Intel XE#2330]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2330
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2352]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2352
  [Intel XE#2372]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2372
  [Intel XE#2380]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2380
  [Intel XE#2413]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2413
  [Intel XE#2414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2414
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2457]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2457
  [Intel XE#2499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2499
  [Intel XE#2594]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2594
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#2833]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2833
  [Intel XE#2838]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2838
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#2853]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2853
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#2938]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2938
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#307]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/307
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#3141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3141
  [Intel XE#3278]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3278
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#3309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3309
  [Intel XE#3374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3374
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3432]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3432
  [Intel XE#3544]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3544
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#3904]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3904
  [Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
  [Intel XE#4273]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4273
  [Intel XE#4329]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4329
  [Intel XE#4354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4354
  [Intel XE#4537]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4537
  [Intel XE#4608]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4608
  [Intel XE#4650]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4650
  [Intel XE#4733]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4733
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#4943]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4943
  [Intel XE#5007]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5007
  [Intel XE#5100]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5100
  [Intel XE#5354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5354
  [Intel XE#5408]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5408
  [Intel XE#5545]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5545
  [Intel XE#5742]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5742
  [Intel XE#584]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/584
  [Intel XE#599]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/599
  [Intel XE#6054]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6054
  [Intel XE#6196]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6196
  [Intel XE#6266]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6266
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6321
  [Intel XE#6503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6503
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#6557]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6557
  [Intel XE#6558]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6558
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#6599]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6599
  [Intel XE#6665]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6665
  [Intel XE#6681]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6681
  [Intel XE#6693]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6693
  [Intel XE#6703]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6703
  [Intel XE#6715]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6715
  [Intel XE#6874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6874
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#6886]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6886
  [Intel XE#6900]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6900
  [Intel XE#6901]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6901
  [Intel XE#6911]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6911
  [Intel XE#6912]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6912
  [Intel XE#6964]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6964
  [Intel XE#6974]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6974
  [Intel XE#701]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/701
  [Intel XE#7059]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7059
  [Intel XE#7061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7061
  [Intel XE#7063]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7063
  [Intel XE#7098]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7098
  [Intel XE#718]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/718
  [Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944


Build changes
-------------

  * IGT: IGT_8711 -> IGT_8712
  * Linux: xe-4433-40800011414446888105f6beae6dd3fac56516aa -> xe-pw-160482v1

  IGT_8711: 38428617bae65b39b306f79217ac922ebee3b477 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  IGT_8712: 8712
  xe-4433-40800011414446888105f6beae6dd3fac56516aa: 40800011414446888105f6beae6dd3fac56516aa
  xe-pw-160482v1: 160482v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v1/index.html

[-- Attachment #2: Type: text/html, Size: 77905 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset
  2026-01-22 10:06 ` [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
@ 2026-01-27 11:23   ` Mallesh, Koujalagi
  2026-02-02  8:46     ` Riana Tauro
  0 siblings, 1 reply; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-01-27 11:23 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, Matthew Brost,
	Himal Prasad Ghimiray

[-- Attachment #1: Type: text/plain, Size: 3299 bytes --]

Hi Riana,

On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Add devres grouping to handle device resource cleanup during
> PCI error recovery.
>
> Secondary Bus Reset (SBR) is triggered by PCI core when the
> error_detected/mmio_enabled callbacks return PCI_ERS_RESULT_NEED_RESET.
>
> Once SBR is complete, the slot_reset callback is triggered. SBR wipes
> out all device memory requiring XE KMD to perform a device removal and
> reprobe.
> Calling xe_pci_remove() alone does not free the devres allocated.
> Since there are no exported functions to release all devres, group the
> devres allocations and release the entire group during slot reset to
> ensure proper cleanup.
>
> Cc: Matthew Brost<matthew.brost@intel.com>
> Cc: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
> Signed-off-by: Riana Tauro<riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device.c       | 7 +++++++
>   drivers/gpu/drm/xe/xe_device_types.h | 3 +++
>   drivers/gpu/drm/xe/xe_pci_error.c    | 1 +
>   3 files changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 16fc6da01357..0cf6480b8aad 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -440,6 +440,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>   				   const struct pci_device_id *ent)
>   {
>   	struct xe_device *xe;
> +	void *devres_id;
>   	int err;
>   
>   	xe_display_driver_set_hooks(&driver);
> @@ -448,10 +449,16 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>   	if (err)
>   		return ERR_PTR(err);
>   
> +	devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
> +	if (!devres_id)
> +		return ERR_PTR(-ENOMEM);
> +
>   	xe = devm_drm_dev_alloc(&pdev->dev, &driver, struct xe_device, drm);
>   	if (IS_ERR(xe))
>   		return xe;
>   
> +	xe->devres_group_id = devres_id;
> +
>   	err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
>   			      xe->drm.anon_inode->i_mapping,
>   			      xe->drm.vma_offset_manager, 0);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 2d140463dc5e..3a19e9b5dfae 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -672,6 +672,9 @@ struct xe_device {
>   	/** @in_recovery: Indicates if device is in recovery */
>   	atomic_t in_recovery;
>   
> +	/** @devres_group_id: id for devres group */
> +	void *devres_group_id;
> +
>   	/** @bo_device: Struct to control async free of BOs */
>   	struct xe_bo_dev {
>   		/** @bo_device.async_free: Free worker */
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> index a3cc01afa179..0960aa5861bc 100644
> --- a/drivers/gpu/drm/xe/xe_pci_error.c
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -65,6 +65,7 @@ static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>   	 */
>   	pdev->driver->remove(pdev);
>   	xe_device_clear_in_recovery(xe);
> +	devres_release_group(&pdev->dev, xe->devres_group_id);

We see use after free issue. In pdev->driver->remove(pdev); call xe 
structure is removed. We can handle devres_group_id by assigning locally 
and release it.

Thanks,

-/Mallesh


>   
>   	if (!pdev->driver->probe(pdev, ent))
>   		return PCI_ERS_RESULT_RECOVERED;

[-- Attachment #2: Type: text/html, Size: 4099 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
@ 2026-01-27 11:44   ` Mallesh, Koujalagi
  2026-02-02  8:38     ` Riana Tauro
  2026-01-27 14:03   ` Mallesh, Koujalagi
  2026-02-17 14:02   ` Raag Jadav
  2 siblings, 1 reply; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-01-27 11:44 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri

[-- Attachment #1: Type: text/plain, Size: 5995 bytes --]

Hi Riana,

On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Uncorrectable Core-Compute errors are classified into Global and Local
> errors.
>
> Global error is an error that affects the entire device requiring a
> reset. This type of error is not isolated. When an AER is reported and
> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>
> A Local error is confined to a specific component or context like a
> engine. These errors can be contained and recovered by resetting
> only the affected part without distrupting the rest of the device.
>
> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
> generated and GuC is notified of the error. The KMD then sets
> the context as non-runnable and initiates an engine reset.
> (TODO: GuC <->KMD communication for the error).
> Since the error is contained and recovered, PCI error handling
> callback returns PCI_ERS_RESULT_RECOVERED.
>
> Signed-off-by: Riana Tauro<riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ras.c | 109 +++++++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/xe/xe_ras.h |   3 +
>   2 files changed, 110 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index ace08d8d8d46..2a98cb116dc7 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -2,11 +2,16 @@
>   /*
>    * Copyright © 2026 Intel Corporation
>    */
> -#include <linux/pci.h>
> -
>   #include "xe_assert.h"
>   #include "xe_device_types.h"
> +#include "xe_printk.h"
>   #include "xe_ras.h"
> +#include "xe_ras_types.h"
> +#include "xe_sysctrl_mailbox.h"
> +#include "xe_sysctrl_mailbox_types.h"
> +
> +#define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
> +#define GLOBAL_UNCORR_ERROR			2
>   
>   /* Severity classification of detected errors */
>   enum xe_ras_severity {
> @@ -60,6 +65,106 @@ static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
>   	return xe_ras_components[comp];
>   }
>   
> +static void log_ras_error(struct xe_device *xe, struct xe_ras_error_class *error_class)
> +{
> +	struct xe_ras_error_common common_info = error_class->common;
> +	struct xe_ras_error_product product_info = error_class->product;
> +	u8 tile = product_info.unit.tile;
> +	u32 instance = product_info.unit.instance;
> +	u32 cause = product_info.error_cause.cause;
> +
> +	xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected Cause: 0x%x",
> +	       tile, instance, severity_to_str(xe, common_info.severity),
> +	       comp_to_str(xe, common_info.component), cause);
> +}
> +
> +static pci_ers_result_t handle_compute_errors(struct xe_device *xe, struct xe_ras_error_array *arr)
> +{
> +	struct xe_ras_compute_error *error_info = (struct xe_ras_compute_error *)arr->error_details;
> +	u8 uncorr_type;
> +
> +	uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info->error_log_header);
> +	log_ras_error(xe, &arr->error_class);
> +
> +	xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected error type %u\n",
> +	       arr->timestamp, uncorr_type);
> +
> +	/* Request a RESET if error is global */
> +	if (uncorr_type == GLOBAL_UNCORR_ERROR)
> +		return PCI_ERS_RESULT_NEED_RESET;
> +
> +	/* Local errors are recovered using a engine reset */
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +/**
> + * xe_ras_process_errors - Process and contain hardware errors
> + * @xe: xe device instance
> + *
> + * Get error details from system controller and return recovery
> + * method. Called only from PCI error handling.
> + *
> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery needed,
> + * PCI_ERS_RESULT_NEED_RESET otherwise.
> + */
> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe)
> +{
> +	struct xe_sysctrl_mailbox_command command = {0};
> +	struct xe_sysctrl_mailbox_app_msg_hdr msg_hdr = {0};
> +	struct xe_ras_get_error_response response;
> +	u32 req_hdr;
> +	size_t rlen;
> +	int ret;
> +
> +	if (!xe->info.has_sysctrl)
> +		return PCI_ERS_RESULT_NEED_RESET;
> +
> +	req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
> +		  FIELD_PREP(APP_HDR_COMMAND_MASK, XE_SYSCTRL_CMD_GET_SOC_ERROR);
> +
> +	msg_hdr.data = req_hdr;
> +	command.header = msg_hdr;
> +	command.data_out = &response;
> +	command.data_out_len = sizeof(response);
> +
> +	do {
> +		memset(&response, 0, sizeof(response));
> +		rlen = 0;
> +
> +		ret = xe_sysctrl_send_command(xe, &command, &rlen);
> +		if (ret || !rlen) {
> +			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
> +			goto err;
> +		}
> +
> +		if (rlen != sizeof(response)) {
> +			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
> +			goto err;
> +		}
> +

  Array bound check is required for response.num_errors. if num_errors 
are more than 3 then potentials security issue (accessing uninitialized 
or arbitrary memory).

Thanks,

-/Mallesh

> +		for (int i = 0; i < response.num_errors; i++) {
> +			struct xe_ras_error_array arr = response.error_arr[i];
> +			struct xe_ras_error_class error_class;
> +			u8 component;
> +
> +			error_class = arr.error_class;
> +			component = error_class.common.component;
> +
> +			if (component == XE_RAS_COMPONENT_CORE_COMPUTE) {
> +				ret = handle_compute_errors(xe, &arr);
> +				if (ret == PCI_ERS_RESULT_NEED_RESET)
> +					goto err;
> +			}
> +		}
> +
> +	} while (response.additional_errors);
> +
> +	return PCI_ERS_RESULT_RECOVERED;
> +
> +err:
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
>   #ifdef CONFIG_PCIEAER
>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>   {
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> index 14cb973603e7..28400613c9a9 100644
> --- a/drivers/gpu/drm/xe/xe_ras.h
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -6,8 +6,11 @@
>   #ifndef _XE_RAS_H_
>   #define _XE_RAS_H_
>   
> +#include <linux/pci.h>
> +
>   struct xe_device;
>   
>   void xe_ras_init(struct xe_device *xe);
> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe);
>   
>   #endif

[-- Attachment #2: Type: text/html, Size: 6486 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
@ 2026-01-27 12:41   ` Mallesh, Koujalagi
  2026-02-02  9:34     ` Riana Tauro
  2026-02-04  8:38   ` Aravind Iddamsetty
  2026-02-16 12:27   ` Mallesh, Koujalagi
  2 siblings, 1 reply; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-01-27 12:41 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri

Hi Riana,

On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Uncorrectable errors from different endpoints in the device are steered to
> the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
> Downgrade all the errors to non-fatal to prevent PCIe bus driver
> from triggering a Secondary Bus Reset (SBR). This allows error
> detection, containment and recovery in the driver.
>
> The Uncorrectable Error Severity Register has the 'Uncorrectable
> Internal Error Severity' set to fatal by default. Set this to
> non-fatal and unmask the error.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/Makefile    |  1 +
>   drivers/gpu/drm/xe/xe_device.c |  3 ++
>   drivers/gpu/drm/xe/xe_ras.c    | 71 ++++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_ras.h    | 13 +++++++
>   4 files changed, 88 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_ras.c
>   create mode 100644 drivers/gpu/drm/xe/xe_ras.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 5581f2180b5c..85ec53eb0b62 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -110,6 +110,7 @@ xe-y += xe_bb.o \
>   	xe_pxp_debugfs.o \
>   	xe_pxp_submit.o \
>   	xe_query.o \
> +	xe_ras.o \
>   	xe_range_fence.o \
>   	xe_reg_sr.o \
>   	xe_reg_whitelist.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index f418ebf04f0f..be89ffc9eade 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -59,6 +59,7 @@
>   #include "xe_psmi.h"
>   #include "xe_pxp.h"
>   #include "xe_query.h"
> +#include "xe_ras.h"
>   #include "xe_shrinker.h"
>   #include "xe_soc_remapper.h"
>   #include "xe_survivability_mode.h"
> @@ -1019,6 +1020,8 @@ int xe_device_probe(struct xe_device *xe)
>   
>   	xe_vsec_init(xe);
>   
> +	xe_ras_init(xe);
> +
>   	err = xe_sriov_init_late(xe);
>   	if (err)
>   		goto err_unregister_display;
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> new file mode 100644
> index 000000000000..ba5ed37aed28
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <linux/pci.h>
> +
> +#include "xe_device_types.h"
> +#include "xe_ras.h"
> +
> +#ifdef CONFIG_PCIEAER
> +static void unmask_and_downgrade_internal_error(struct xe_device *xe)
> +{
> +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> +	struct pci_dev *vsp, *usp;
> +	u32 aer_uncorr_sev, aer_uncorr_mask;
> +	u16 aer_cap;
> +
> +	 /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
> +	vsp = pci_upstream_bridge(pdev);
> +	if (!vsp)
> +		return;
> +
> +	usp = pci_upstream_bridge(vsp);
> +	if (!usp)
> +		return;
> +
> +	aer_cap = usp->aer_cap;
> +
> +	if (!aer_cap)
> +		return;
> +
> +	/*
> +	 * All errors are steered to USP which is a PCIe AER Complaint device.
> +	 * Downgrade all the errors to non-fatal to prevent PCIe bus driver
> +	 * from triggering a Secondary Bus Reset (SBR). This allows error
> +	 * detection, containment and recovery in the driver.
> +	 *
> +	 * The Uncorrectable Error Severity Register has the 'Uncorrectable
> +	 * Internal Error Severity' set to fatal by default. Set this to
> +	 * non-fatal and unmask the error.
> +	 */
> +
> +	/* Initialize Uncorrectable Error Severity Register */
> +	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
> +	aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
> +
> +	/* Initialize Uncorrectable Error Mask Register */
> +	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> +	aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
> +
> +	pci_save_state(usp);
> +}

What happens when upstrean switch port is shared across another device 
(another GPU / NVMe drive) etc.

Thanks,

-/Mallesh

> +#endif
> +
> +/**
> + * xe_ras_init - Initialize Xe RAS
> + * @xe: xe device instance
> + *
> + * Initialize Xe RAS
> + */
> +void xe_ras_init(struct xe_device *xe)
> +{
> +	if (!xe->info.has_sysctrl)
> +		return;
> +
> +#ifdef CONFIG_PCIEAER
> +	unmask_and_downgrade_internal_error(xe);
> +#endif
> +}
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> new file mode 100644
> index 000000000000..14cb973603e7
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _XE_RAS_H_
> +#define _XE_RAS_H_
> +
> +struct xe_device;
> +
> +void xe_ras_init(struct xe_device *xe);
> +
> +#endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
  2026-01-27 11:44   ` Mallesh, Koujalagi
@ 2026-01-27 14:03   ` Mallesh, Koujalagi
  2026-02-02  8:54     ` Riana Tauro
  2026-02-24 12:17     ` Mallesh, Koujalagi
  2026-02-17 14:02   ` Raag Jadav
  2 siblings, 2 replies; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-01-27 14:03 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri


On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Uncorrectable Core-Compute errors are classified into Global and Local
> errors.
>
> Global error is an error that affects the entire device requiring a
> reset. This type of error is not isolated. When an AER is reported and
> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>
> A Local error is confined to a specific component or context like a
> engine. These errors can be contained and recovered by resetting
> only the affected part without distrupting the rest of the device.
>
> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
> generated and GuC is notified of the error. The KMD then sets
> the context as non-runnable and initiates an engine reset.
> (TODO: GuC <->KMD communication for the error).
> Since the error is contained and recovered, PCI error handling
> callback returns PCI_ERS_RESULT_RECOVERED.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ras.c | 109 +++++++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/xe/xe_ras.h |   3 +
>   2 files changed, 110 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index ace08d8d8d46..2a98cb116dc7 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -2,11 +2,16 @@
>   /*
>    * Copyright © 2026 Intel Corporation
>    */
> -#include <linux/pci.h>
> -
>   #include "xe_assert.h"
>   #include "xe_device_types.h"
> +#include "xe_printk.h"
>   #include "xe_ras.h"
> +#include "xe_ras_types.h"
> +#include "xe_sysctrl_mailbox.h"
> +#include "xe_sysctrl_mailbox_types.h"
> +
> +#define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
> +#define GLOBAL_UNCORR_ERROR			2
>   
>   /* Severity classification of detected errors */
>   enum xe_ras_severity {
> @@ -60,6 +65,106 @@ static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
>   	return xe_ras_components[comp];
>   }
>   
> +static void log_ras_error(struct xe_device *xe, struct xe_ras_error_class *error_class)
> +{
> +	struct xe_ras_error_common common_info = error_class->common;
> +	struct xe_ras_error_product product_info = error_class->product;
> +	u8 tile = product_info.unit.tile;
> +	u32 instance = product_info.unit.instance;
> +	u32 cause = product_info.error_cause.cause;
> +
> +	xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected Cause: 0x%x",
> +	       tile, instance, severity_to_str(xe, common_info.severity),
> +	       comp_to_str(xe, common_info.component), cause);

Please fix formatting issue (Tile %u, new line at the end of message) 
and include timestamp in log message.

> +}
> +
> +static pci_ers_result_t handle_compute_errors(struct xe_device *xe, struct xe_ras_error_array *arr)
> +{
> +	struct xe_ras_compute_error *error_info = (struct xe_ras_compute_error *)arr->error_details;
> +	u8 uncorr_type;
> +
> +	uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info->error_log_header);
> +	log_ras_error(xe, &arr->error_class);
> +
> +	xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected error type %u\n",
> +	       arr->timestamp, uncorr_type);
> +
> +	/* Request a RESET if error is global */
> +	if (uncorr_type == GLOBAL_UNCORR_ERROR)
> +		return PCI_ERS_RESULT_NEED_RESET;
> +
> +	/* Local errors are recovered using a engine reset */
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +/**
> + * xe_ras_process_errors - Process and contain hardware errors
> + * @xe: xe device instance
> + *
> + * Get error details from system controller and return recovery
> + * method. Called only from PCI error handling.
> + *
> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery needed,
> + * PCI_ERS_RESULT_NEED_RESET otherwise.
> + */
> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe)
> +{
> +	struct xe_sysctrl_mailbox_command command = {0};
> +	struct xe_sysctrl_mailbox_app_msg_hdr msg_hdr = {0};
> +	struct xe_ras_get_error_response response;
> +	u32 req_hdr;
> +	size_t rlen;
> +	int ret;
> +
> +	if (!xe->info.has_sysctrl)
> +		return PCI_ERS_RESULT_NEED_RESET;
> +
> +	req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
> +		  FIELD_PREP(APP_HDR_COMMAND_MASK, XE_SYSCTRL_CMD_GET_SOC_ERROR);
> +
> +	msg_hdr.data = req_hdr;
> +	command.header = msg_hdr;
> +	command.data_out = &response;
> +	command.data_out_len = sizeof(response);
> +
> +	do {
> +		memset(&response, 0, sizeof(response));
> +		rlen = 0;
> +
> +		ret = xe_sysctrl_send_command(xe, &command, &rlen);
> +		if (ret || !rlen) {
> +			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
> +			goto err;
> +		}
> +
> +		if (rlen != sizeof(response)) {
> +			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
> +			goto err;
> +		}
> +
> +		for (int i = 0; i < response.num_errors; i++) {
> +			struct xe_ras_error_array arr = response.error_arr[i];
> +			struct xe_ras_error_class error_class;
> +			u8 component;
> +
> +			error_class = arr.error_class;
> +			component = error_class.common.component;
> +
> +			if (component == XE_RAS_COMPONENT_CORE_COMPUTE) {
> +				ret = handle_compute_errors(xe, &arr);
> +				if (ret == PCI_ERS_RESULT_NEED_RESET)
> +					goto err;
> +			}

Need to handle non-compute errors.

Thanks

-/Mallesh


> +		}
> +
> +	} while (response.additional_errors);
> +
> +	return PCI_ERS_RESULT_RECOVERED;
> +
> +err:
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
>   #ifdef CONFIG_PCIEAER
>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>   {
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> index 14cb973603e7..28400613c9a9 100644
> --- a/drivers/gpu/drm/xe/xe_ras.h
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -6,8 +6,11 @@
>   #ifndef _XE_RAS_H_
>   #define _XE_RAS_H_
>   
> +#include <linux/pci.h>
> +
>   struct xe_device;
>   
>   void xe_ras_init(struct xe_device *xe);
> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe);
>   
>   #endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
@ 2026-01-27 22:49   ` Michal Wajdeczko
  2026-02-02  9:45     ` Riana Tauro
  2026-01-29  9:09   ` Nilawar, Badal
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Michal Wajdeczko @ 2026-01-27 22:49 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, mallesh.koujalagi



On 1/22/2026 11:06 AM, Riana Tauro wrote:
> Add error_detected, mmio_enabled, slot_reset and resume
> recovery callbacks to handle PCIe Advanced Error Reporting
> (AER) errors.
> 
> For fatal errors, the device is wedged and becomes
> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> error_detected to request a Secondary Bus Reset (SBR).
> 
> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> error_detected to trigger the mmio_enabled callback. In this callback,
> the device is queried to determine the error cause and attempt
> recovery based on the error type.
> 
> Once the secondary bus reset(SBR) is completed the slot_reset callback
> cleanly removes and reprobe the device to restore functionality.
> 
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile          |  1 +
>  drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>  drivers/gpu/drm/xe/xe_device_types.h |  3 +
>  drivers/gpu/drm/xe/xe_pci.c          |  3 +
>  drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>  5 files changed, 107 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index f6650ec3ab42..5581f2180b5c 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>  	xe_page_reclaim.o \
>  	xe_pat.o \
>  	xe_pci.o \
> +	xe_pci_error.o \
>  	xe_pci_rebar.o \
>  	xe_pcode.o \
>  	xe_pm.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 58d7d8b2fea3..81480248eeff 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>  	return container_of(ttm, struct xe_device, ttm);
>  }
>  
> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
> +{
> +	return atomic_read(&xe->in_recovery);
> +}
> +
> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
> +{
> +	atomic_set(&xe->in_recovery, 1);
> +}
> +
> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
> +{
> +	 atomic_set(&xe->in_recovery, 0);
> +}
> +
>  struct xe_device *xe_device_create(struct pci_dev *pdev,
>  				   const struct pci_device_id *ent);
>  int xe_device_probe_early(struct xe_device *xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 944f909a86ad..2d140463dc5e 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -669,6 +669,9 @@ struct xe_device {
>  		bool inconsistent_reset;
>  	} wedged;
>  
> +	/** @in_recovery: Indicates if device is in recovery */
> +	atomic_t in_recovery;
> +
>  	/** @bo_device: Struct to control async free of BOs */
>  	struct xe_bo_dev {
>  		/** @bo_device.async_free: Free worker */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index c92cc176f669..e1ee393b7461 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>  };
>  #endif
>  
> +extern const struct pci_error_handlers xe_pci_error_handlers;
> +
>  static struct pci_driver xe_pci_driver = {
>  	.name = DRIVER_NAME,
>  	.id_table = pciidlist,
> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>  	.remove = xe_pci_remove,
>  	.shutdown = xe_pci_shutdown,
>  	.sriov_configure = xe_pci_sriov_configure,
> +	.err_handler = &xe_pci_error_handlers,
>  #ifdef CONFIG_PM_SLEEP
>  	.driver.pm = &xe_pm_ops,
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 000000000000..a3cc01afa179
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <drm/drm_drv.h>
> +#include <linux/pci.h>

nit: usually all linux includes go first

> +
> +#include "xe_device.h"
> +#include "xe_gt.h"
> +#include "xe_pci.h"
> +#include "xe_uc.h"
> +
> +static void xe_pci_error_handling(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	xe_device_set_in_recovery(xe);
> +	xe_device_declare_wedged(xe);
> +
> +	pci_disable_device(pdev);
> +}
> +
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> +	dev_err(&pdev->dev, "PCI error detected, state %d\n", state);

nit: you can use pci_err()

> +
> +	switch (state) {
> +	case pci_channel_io_normal:
> +		return PCI_ERS_RESULT_CAN_RECOVER;
> +	case pci_channel_io_frozen:
> +		xe_pci_error_handling(pdev);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	case pci_channel_io_perm_failure:
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
> +{
> +	dev_err(&pdev->dev, "PCI mmio enabled\n");

s/mmio/MMIO

but should this be still an err level?
hmm, as we just report static result, maybe we can drop it?
there will be already:

	pci_dbg(bridge, "broadcast mmio_enabled message\n");

> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_err(&pdev->dev, "PCI slot reset\n");
> +
> +	pci_restore_state(pdev);
> +
> +	if (pci_enable_device(pdev)) {
> +		dev_err(&pdev->dev,
> +			"Cannot re-enable PCI device after reset\n");
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	/*
> +	 * Secondary Bus Reset wipes out all device memory
> +	 * requiring XE KMD to perform a device removal and reprobe.
> +	 */
> +	pdev->driver->remove(pdev);

what will happen to all devm/drmm resources that previous xe_pci_probe() has allocated?

> +	xe_device_clear_in_recovery(xe);

is it safe to access this xe after calling remove() ?

> +
> +	if (!pdev->driver->probe(pdev, ent))
> +		return PCI_ERS_RESULT_RECOVERED;
> +
> +	return PCI_ERS_RESULT_RECOVERED;

also recovered? maybe this should be PCI_ERS_RESULT_DISCONNECT

> +}
> +
> +static void xe_pci_error_resume(struct pci_dev *pdev)
> +{
> +	dev_info(&pdev->dev, "PCI error resume\n");
> +}
> +
> +const struct pci_error_handlers xe_pci_error_handlers = {
> +	.error_detected	= xe_pci_error_detected,
> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
> +	.slot_reset	= xe_pci_error_slot_reset,
> +	.resume		= xe_pci_error_resume,
> +};


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
  2026-01-27 22:49   ` Michal Wajdeczko
@ 2026-01-29  9:09   ` Nilawar, Badal
  2026-02-02 13:19     ` Nilawar, Badal
  2026-02-03  3:41     ` Riana Tauro
  2026-02-08  8:02   ` Raag Jadav
  2026-02-16  8:53   ` Mallesh, Koujalagi
  3 siblings, 2 replies; 41+ messages in thread
From: Nilawar, Badal @ 2026-01-29  9:09 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, raag.jadav,
	ravi.kishore.koppuravuri, mallesh.koujalagi

Hi Riana,

On 22-01-2026 15:36, Riana Tauro wrote:
> Add error_detected, mmio_enabled, slot_reset and resume
> recovery callbacks to handle PCIe Advanced Error Reporting
> (AER) errors.
>
> For fatal errors, the device is wedged and becomes
> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> error_detected to request a Secondary Bus Reset (SBR).
>
> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> error_detected to trigger the mmio_enabled callback. In this callback,
> the device is queried to determine the error cause and attempt
> recovery based on the error type.
>
> Once the secondary bus reset(SBR) is completed the slot_reset callback
> cleanly removes and reprobe the device to restore functionality.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/Makefile          |  1 +
>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>   5 files changed, 107 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index f6650ec3ab42..5581f2180b5c 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>   	xe_page_reclaim.o \
>   	xe_pat.o \
>   	xe_pci.o \
> +	xe_pci_error.o \
>   	xe_pci_rebar.o \
>   	xe_pcode.o \
>   	xe_pm.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 58d7d8b2fea3..81480248eeff 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>   	return container_of(ttm, struct xe_device, ttm);
>   }
>   
> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
> +{
> +	return atomic_read(&xe->in_recovery);
> +}
> +
> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
> +{
> +	atomic_set(&xe->in_recovery, 1);
> +}
> +
> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
> +{
> +	 atomic_set(&xe->in_recovery, 0);
> +}
> +
>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>   				   const struct pci_device_id *ent);
>   int xe_device_probe_early(struct xe_device *xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 944f909a86ad..2d140463dc5e 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -669,6 +669,9 @@ struct xe_device {
>   		bool inconsistent_reset;
>   	} wedged;
>   
> +	/** @in_recovery: Indicates if device is in recovery */
> +	atomic_t in_recovery;
> +
>   	/** @bo_device: Struct to control async free of BOs */
>   	struct xe_bo_dev {
>   		/** @bo_device.async_free: Free worker */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index c92cc176f669..e1ee393b7461 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>   };
>   #endif
>   
> +extern const struct pci_error_handlers xe_pci_error_handlers;
> +
>   static struct pci_driver xe_pci_driver = {
>   	.name = DRIVER_NAME,
>   	.id_table = pciidlist,
> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>   	.remove = xe_pci_remove,
>   	.shutdown = xe_pci_shutdown,
>   	.sriov_configure = xe_pci_sriov_configure,
> +	.err_handler = &xe_pci_error_handlers,
>   #ifdef CONFIG_PM_SLEEP
>   	.driver.pm = &xe_pm_ops,
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 000000000000..a3cc01afa179
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <drm/drm_drv.h>
> +#include <linux/pci.h>
> +
> +#include "xe_device.h"
> +#include "xe_gt.h"
> +#include "xe_pci.h"
> +#include "xe_uc.h"
> +
> +static void xe_pci_error_handling(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	xe_device_set_in_recovery(xe);
> +	xe_device_declare_wedged(xe);
> +
> +	pci_disable_device(pdev);
> +}
> +
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> +	dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
> +
> +	switch (state) {
> +	case pci_channel_io_normal:
> +		return PCI_ERS_RESULT_CAN_RECOVER;
> +	case pci_channel_io_frozen:
> +		xe_pci_error_handling(pdev);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	case pci_channel_io_perm_failure:
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
> +{
> +	dev_err(&pdev->dev, "PCI mmio enabled\n");
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_err(&pdev->dev, "PCI slot reset\n");
> +
> +	pci_restore_state(pdev);
What is the significance of restore state here? In reset path any where 
pci_save_state() is happening?
> +
> +	if (pci_enable_device(pdev)) {
> +		dev_err(&pdev->dev,
> +			"Cannot re-enable PCI device after reset\n");
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	/*
> +	 * Secondary Bus Reset wipes out all device memory
> +	 * requiring XE KMD to perform a device removal and reprobe.
> +	 */
> +	pdev->driver->remove(pdev);
> +	xe_device_clear_in_recovery(xe);
> +
> +	if (!pdev->driver->probe(pdev, ent))
> +		return PCI_ERS_RESULT_RECOVERED;
> +
> +	return PCI_ERS_RESULT_RECOVERED;

Is it correct to return PCI_ERS_RESULT_RECOVERED if driver probe fails?

Thanks, Badal

> +}
> +
> +static void xe_pci_error_resume(struct pci_dev *pdev)
> +{
> +	dev_info(&pdev->dev, "PCI error resume\n");
> +}
> +
> +const struct pci_error_handlers xe_pci_error_handlers = {
> +	.error_detected	= xe_pci_error_detected,
> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
> +	.slot_reset	= xe_pci_error_slot_reset,
> +	.resume		= xe_pci_error_resume,
> +};

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-27 11:44   ` Mallesh, Koujalagi
@ 2026-02-02  8:38     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-02  8:38 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri



On 1/27/2026 5:14 PM, Mallesh, Koujalagi wrote:
> Hi Riana,
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Uncorrectable Core-Compute errors are classified into Global and Local
>> errors.
>>
>> Global error is an error that affects the entire device requiring a
>> reset. This type of error is not isolated. When an AER is reported and
>> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>>
>> A Local error is confined to a specific component or context like a
>> engine. These errors can be contained and recovered by resetting
>> only the affected part without distrupting the rest of the device.
>>
>> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
>> generated and GuC is notified of the error. The KMD then sets
>> the context as non-runnable and initiates an engine reset.
>> (TODO: GuC <->KMD communication for the error).
>> Since the error is contained and recovered, PCI error handling
>> callback returns PCI_ERS_RESULT_RECOVERED.
>>
>> Signed-off-by: Riana Tauro<riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ras.c | 109 +++++++++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/xe/xe_ras.h |   3 +
>>   2 files changed, 110 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index ace08d8d8d46..2a98cb116dc7 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -2,11 +2,16 @@
>>   /*
>>    * Copyright © 2026 Intel Corporation
>>    */
>> -#include <linux/pci.h>
>> -
>>   #include "xe_assert.h"
>>   #include "xe_device_types.h"
>> +#include "xe_printk.h"
>>   #include "xe_ras.h"
>> +#include "xe_ras_types.h"
>> +#include "xe_sysctrl_mailbox.h"
>> +#include "xe_sysctrl_mailbox_types.h"
>> +
>> +#define COMPUTE_ERROR_SEVERITY_MASK        GENMASK(26, 25)
>> +#define GLOBAL_UNCORR_ERROR            2
>>   /* Severity classification of detected errors */
>>   enum xe_ras_severity {
>> @@ -60,6 +65,106 @@ static inline const char *comp_to_str(struct 
>> xe_device *xe, u32 comp)
>>       return xe_ras_components[comp];
>>   }
>> +static void log_ras_error(struct xe_device *xe, struct 
>> xe_ras_error_class *error_class)
>> +{
>> +    struct xe_ras_error_common common_info = error_class->common;
>> +    struct xe_ras_error_product product_info = error_class->product;
>> +    u8 tile = product_info.unit.tile;
>> +    u32 instance = product_info.unit.instance;
>> +    u32 cause = product_info.error_cause.cause;
>> +
>> +    xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected 
>> Cause: 0x%x",
>> +           tile, instance, severity_to_str(xe, common_info.severity),
>> +           comp_to_str(xe, common_info.component), cause);
>> +}
>> +
>> +static pci_ers_result_t handle_compute_errors(struct xe_device *xe, 
>> struct xe_ras_error_array *arr)
>> +{
>> +    struct xe_ras_compute_error *error_info = (struct 
>> xe_ras_compute_error *)arr->error_details;
>> +    u8 uncorr_type;
>> +
>> +    uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info- 
>> >error_log_header);
>> +    log_ras_error(xe, &arr->error_class);
>> +
>> +    xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected 
>> error type %u\n",
>> +           arr->timestamp, uncorr_type);
>> +
>> +    /* Request a RESET if error is global */
>> +    if (uncorr_type == GLOBAL_UNCORR_ERROR)
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +
>> +    /* Local errors are recovered using a engine reset */
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +}
>> +
>> +/**
>> + * xe_ras_process_errors - Process and contain hardware errors
>> + * @xe: xe device instance
>> + *
>> + * Get error details from system controller and return recovery
>> + * method. Called only from PCI error handling.
>> + *
>> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery 
>> needed,
>> + * PCI_ERS_RESULT_NEED_RESET otherwise.
>> + */
>> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe)
>> +{
>> +    struct xe_sysctrl_mailbox_command command = {0};
>> +    struct xe_sysctrl_mailbox_app_msg_hdr msg_hdr = {0};
>> +    struct xe_ras_get_error_response response;
>> +    u32 req_hdr;
>> +    size_t rlen;
>> +    int ret;
>> +
>> +    if (!xe->info.has_sysctrl)
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +
>> +    req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
>> +          FIELD_PREP(APP_HDR_COMMAND_MASK, 
>> XE_SYSCTRL_CMD_GET_SOC_ERROR);
>> +
>> +    msg_hdr.data = req_hdr;
>> +    command.header = msg_hdr;
>> +    command.data_out = &response;
>> +    command.data_out_len = sizeof(response);
>> +
>> +    do {
>> +        memset(&response, 0, sizeof(response));
>> +        rlen = 0;
>> +
>> +        ret = xe_sysctrl_send_command(xe, &command, &rlen);
>> +        if (ret || !rlen) {
>> +            xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
>> +            goto err;
>> +        }
>> +
>> +        if (rlen != sizeof(response)) {
>> +            xe_err(xe, "[RAS]: Sysctrl response does not match len!! 
>> \n");
>> +            goto err;
>> +        }
>> +
> 
>   Array bound check is required for response.num_errors. if num_errors 
> are more than 3 then potentials security issue (accessing uninitialized 
> or arbitrary memory).

yeah agree. Will fix this in next revision

Thanks
Riana

> 
> Thanks,
> 
> -/Mallesh
> 
>> +        for (int i = 0; i < response.num_errors; i++) {
>> +            struct xe_ras_error_array arr = response.error_arr[i];
>> +            struct xe_ras_error_class error_class;
>> +            u8 component;
>> +
>> +            error_class = arr.error_class;
>> +            component = error_class.common.component;
>> +
>> +            if (component == XE_RAS_COMPONENT_CORE_COMPUTE) {
>> +                ret = handle_compute_errors(xe, &arr);
>> +                if (ret == PCI_ERS_RESULT_NEED_RESET)
>> +                    goto err;
>> +            }
>> +        }
>> +
>> +    } while (response.additional_errors);
>> +
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +
>> +err:
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>>   #ifdef CONFIG_PCIEAER
>>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>>   {
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> index 14cb973603e7..28400613c9a9 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.h
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -6,8 +6,11 @@
>>   #ifndef _XE_RAS_H_
>>   #define _XE_RAS_H_
>> +#include <linux/pci.h>
>> +
>>   struct xe_device;
>>   void xe_ras_init(struct xe_device *xe);
>> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe);
>>   #endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset
  2026-01-27 11:23   ` Mallesh, Koujalagi
@ 2026-02-02  8:46     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-02  8:46 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, Matthew Brost,
	Himal Prasad Ghimiray



On 1/27/2026 4:53 PM, Mallesh, Koujalagi wrote:
> Hi Riana,
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Add devres grouping to handle device resource cleanup during
>> PCI error recovery.
>>
>> Secondary Bus Reset (SBR) is triggered by PCI core when the
>> error_detected/mmio_enabled callbacks return PCI_ERS_RESULT_NEED_RESET.
>>
>> Once SBR is complete, the slot_reset callback is triggered. SBR wipes
>> out all device memory requiring XE KMD to perform a device removal and
>> reprobe.
>> Calling xe_pci_remove() alone does not free the devres allocated.
>> Since there are no exported functions to release all devres, group the
>> devres allocations and release the entire group during slot reset to
>> ensure proper cleanup.
>>
>> Cc: Matthew Brost<matthew.brost@intel.com>
>> Cc: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Riana Tauro<riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_device.c       | 7 +++++++
>>   drivers/gpu/drm/xe/xe_device_types.h | 3 +++
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 1 +
>>   3 files changed, 11 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/ 
>> xe_device.c
>> index 16fc6da01357..0cf6480b8aad 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -440,6 +440,7 @@ struct xe_device *xe_device_create(struct pci_dev 
>> *pdev,
>>                      const struct pci_device_id *ent)
>>   {
>>       struct xe_device *xe;
>> +    void *devres_id;
>>       int err;
>>       xe_display_driver_set_hooks(&driver);
>> @@ -448,10 +449,16 @@ struct xe_device *xe_device_create(struct 
>> pci_dev *pdev,
>>       if (err)
>>           return ERR_PTR(err);
>> +    devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
>> +    if (!devres_id)
>> +        return ERR_PTR(-ENOMEM);
>> +
>>       xe = devm_drm_dev_alloc(&pdev->dev, &driver, struct xe_device, 
>> drm);
>>       if (IS_ERR(xe))
>>           return xe;
>> +    xe->devres_group_id = devres_id;
>> +
>>       err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
>>                     xe->drm.anon_inode->i_mapping,
>>                     xe->drm.vma_offset_manager, 0);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/ 
>> xe/xe_device_types.h
>> index 2d140463dc5e..3a19e9b5dfae 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -672,6 +672,9 @@ struct xe_device {
>>       /** @in_recovery: Indicates if device is in recovery */
>>       atomic_t in_recovery;
>> +    /** @devres_group_id: id for devres group */
>> +    void *devres_group_id;
>> +
>>       /** @bo_device: Struct to control async free of BOs */
>>       struct xe_bo_dev {
>>           /** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/ 
>> xe_pci_error.c
>> index a3cc01afa179..0960aa5861bc 100644
>> --- a/drivers/gpu/drm/xe/xe_pci_error.c
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -65,6 +65,7 @@ static pci_ers_result_t 
>> xe_pci_error_slot_reset(struct pci_dev *pdev)
>>        */
>>       pdev->driver->remove(pdev);
>>       xe_device_clear_in_recovery(xe);
>> +    devres_release_group(&pdev->dev, xe->devres_group_id);
> 
> We see use after free issue. In pdev->driver->remove(pdev); call xe 
> structure is removed. We can handle devres_group_id by assigning locally 
> and release it.


No xe is not removed in xe_remove.  xe is also a dev_res.
If you see here in device_release. Remove is first and then the device 
resources are cleared

https://elixir.bootlin.com/linux/v6.18.6/source/drivers/base/dd.c#L1243

Ideally using devres_release_all(dev); would be better but this is not 
exported.

Grouping is the only way i could think of. Any other suggestions are 
welcome.

Thanks
Riana


> 
> Thanks,
> 
> -/Mallesh
> 
> 
>>       if (!pdev->driver->probe(pdev, ent))
>>           return PCI_ERS_RESULT_RECOVERED;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-27 14:03   ` Mallesh, Koujalagi
@ 2026-02-02  8:54     ` Riana Tauro
  2026-02-24 12:17     ` Mallesh, Koujalagi
  1 sibling, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-02  8:54 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri



On 1/27/2026 7:33 PM, Mallesh, Koujalagi wrote:
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Uncorrectable Core-Compute errors are classified into Global and Local
>> errors.
>>
>> Global error is an error that affects the entire device requiring a
>> reset. This type of error is not isolated. When an AER is reported and
>> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>>
>> A Local error is confined to a specific component or context like a
>> engine. These errors can be contained and recovered by resetting
>> only the affected part without distrupting the rest of the device.
>>
>> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
>> generated and GuC is notified of the error. The KMD then sets
>> the context as non-runnable and initiates an engine reset.
>> (TODO: GuC <->KMD communication for the error).
>> Since the error is contained and recovered, PCI error handling
>> callback returns PCI_ERS_RESULT_RECOVERED.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ras.c | 109 +++++++++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/xe/xe_ras.h |   3 +
>>   2 files changed, 110 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index ace08d8d8d46..2a98cb116dc7 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -2,11 +2,16 @@
>>   /*
>>    * Copyright © 2026 Intel Corporation
>>    */
>> -#include <linux/pci.h>
>> -
>>   #include "xe_assert.h"
>>   #include "xe_device_types.h"
>> +#include "xe_printk.h"
>>   #include "xe_ras.h"
>> +#include "xe_ras_types.h"
>> +#include "xe_sysctrl_mailbox.h"
>> +#include "xe_sysctrl_mailbox_types.h"
>> +
>> +#define COMPUTE_ERROR_SEVERITY_MASK        GENMASK(26, 25)
>> +#define GLOBAL_UNCORR_ERROR            2
>>   /* Severity classification of detected errors */
>>   enum xe_ras_severity {
>> @@ -60,6 +65,106 @@ static inline const char *comp_to_str(struct 
>> xe_device *xe, u32 comp)
>>       return xe_ras_components[comp];
>>   }
>> +static void log_ras_error(struct xe_device *xe, struct 
>> xe_ras_error_class *error_class)
>> +{
>> +    struct xe_ras_error_common common_info = error_class->common;
>> +    struct xe_ras_error_product product_info = error_class->product;
>> +    u8 tile = product_info.unit.tile;
>> +    u32 instance = product_info.unit.instance;
>> +    u32 cause = product_info.error_cause.cause;
>> +
>> +    xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected 
>> Cause: 0x%x",
>> +           tile, instance, severity_to_str(xe, common_info.severity),
>> +           comp_to_str(xe, common_info.component), cause);
> 
> Please fix formatting issue (Tile %u, new line at the end of message) 
> and include timestamp in log message.

The intention was to use Tile0,Tile1
Will fix end of line. Timestamp is component specific so it's included 
below.


> 
>> +}
>> +
>> +static pci_ers_result_t handle_compute_errors(struct xe_device *xe, 
>> struct xe_ras_error_array *arr)
>> +{
>> +    struct xe_ras_compute_error *error_info = (struct 
>> xe_ras_compute_error *)arr->error_details;
>> +    u8 uncorr_type;
>> +
>> +    uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info- 
>> >error_log_header);
>> +    log_ras_error(xe, &arr->error_class);
>> +
>> +    xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected 
>> error type %u\n",
>> +           arr->timestamp, uncorr_type);
>> +
>> +    /* Request a RESET if error is global */
>> +    if (uncorr_type == GLOBAL_UNCORR_ERROR)
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +
>> +    /* Local errors are recovered using a engine reset */
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +}
>> +
>> +/**
>> + * xe_ras_process_errors - Process and contain hardware errors
>> + * @xe: xe device instance
>> + *
>> + * Get error details from system controller and return recovery
>> + * method. Called only from PCI error handling.
>> + *
>> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery 
>> needed,
>> + * PCI_ERS_RESULT_NEED_RESET otherwise.
>> + */
>> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe)
>> +{
>> +    struct xe_sysctrl_mailbox_command command = {0};
>> +    struct xe_sysctrl_mailbox_app_msg_hdr msg_hdr = {0};
>> +    struct xe_ras_get_error_response response;
>> +    u32 req_hdr;
>> +    size_t rlen;
>> +    int ret;
>> +
>> +    if (!xe->info.has_sysctrl)
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +
>> +    req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
>> +          FIELD_PREP(APP_HDR_COMMAND_MASK, 
>> XE_SYSCTRL_CMD_GET_SOC_ERROR);
>> +
>> +    msg_hdr.data = req_hdr;
>> +    command.header = msg_hdr;
>> +    command.data_out = &response;
>> +    command.data_out_len = sizeof(response);
>> +
>> +    do {
>> +        memset(&response, 0, sizeof(response));
>> +        rlen = 0;
>> +
>> +        ret = xe_sysctrl_send_command(xe, &command, &rlen);
>> +        if (ret || !rlen) {
>> +            xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
>> +            goto err;
>> +        }
>> +
>> +        if (rlen != sizeof(response)) {
>> +            xe_err(xe, "[RAS]: Sysctrl response does not match len!! 
>> \n");
>> +            goto err;
>> +        }
>> +
>> +        for (int i = 0; i < response.num_errors; i++) {
>> +            struct xe_ras_error_array arr = response.error_arr[i];
>> +            struct xe_ras_error_class error_class;
>> +            u8 component;
>> +
>> +            error_class = arr.error_class;
>> +            component = error_class.common.component;
>> +
>> +            if (component == XE_RAS_COMPONENT_CORE_COMPUTE) {
>> +                ret = handle_compute_errors(xe, &arr);
>> +                if (ret == PCI_ERS_RESULT_NEED_RESET)
>> +                    goto err;
>> +            }
> 
> Need to handle non-compute errors.

This patch adds the base for the error handling with a single component.
The rest will be added as subsequent patches

Thanks
Riana


> 
> Thanks
> 
> -/Mallesh
> 
> 
>> +        }
>> +
>> +    } while (response.additional_errors);
>> +
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +
>> +err:
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>>   #ifdef CONFIG_PCIEAER
>>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>>   {
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> index 14cb973603e7..28400613c9a9 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.h
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -6,8 +6,11 @@
>>   #ifndef _XE_RAS_H_
>>   #define _XE_RAS_H_
>> +#include <linux/pci.h>
>> +
>>   struct xe_device;
>>   void xe_ras_init(struct xe_device *xe);
>> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe);
>>   #endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-01-27 12:41   ` Mallesh, Koujalagi
@ 2026-02-02  9:34     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-02  9:34 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri



On 1/27/2026 6:11 PM, Mallesh, Koujalagi wrote:
> Hi Riana,
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Uncorrectable errors from different endpoints in the device are 
>> steered to
>> the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
>> Downgrade all the errors to non-fatal to prevent PCIe bus driver
>> from triggering a Secondary Bus Reset (SBR). This allows error
>> detection, containment and recovery in the driver.
>>
>> The Uncorrectable Error Severity Register has the 'Uncorrectable
>> Internal Error Severity' set to fatal by default. Set this to
>> non-fatal and unmask the error.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile    |  1 +
>>   drivers/gpu/drm/xe/xe_device.c |  3 ++
>>   drivers/gpu/drm/xe/xe_ras.c    | 71 ++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_ras.h    | 13 +++++++
>>   4 files changed, 88 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_ras.c
>>   create mode 100644 drivers/gpu/drm/xe/xe_ras.h
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index 5581f2180b5c..85ec53eb0b62 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -110,6 +110,7 @@ xe-y += xe_bb.o \
>>       xe_pxp_debugfs.o \
>>       xe_pxp_submit.o \
>>       xe_query.o \
>> +    xe_ras.o \
>>       xe_range_fence.o \
>>       xe_reg_sr.o \
>>       xe_reg_whitelist.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/ 
>> xe_device.c
>> index f418ebf04f0f..be89ffc9eade 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -59,6 +59,7 @@
>>   #include "xe_psmi.h"
>>   #include "xe_pxp.h"
>>   #include "xe_query.h"
>> +#include "xe_ras.h"
>>   #include "xe_shrinker.h"
>>   #include "xe_soc_remapper.h"
>>   #include "xe_survivability_mode.h"
>> @@ -1019,6 +1020,8 @@ int xe_device_probe(struct xe_device *xe)
>>       xe_vsec_init(xe);
>> +    xe_ras_init(xe);
>> +
>>       err = xe_sriov_init_late(xe);
>>       if (err)
>>           goto err_unregister_display;
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> new file mode 100644
>> index 000000000000..ba5ed37aed28
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -0,0 +1,71 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <linux/pci.h>
>> +
>> +#include "xe_device_types.h"
>> +#include "xe_ras.h"
>> +
>> +#ifdef CONFIG_PCIEAER
>> +static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>> +{
>> +    struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>> +    struct pci_dev *vsp, *usp;
>> +    u32 aer_uncorr_sev, aer_uncorr_mask;
>> +    u16 aer_cap;
>> +
>> +     /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
>> +    vsp = pci_upstream_bridge(pdev);
>> +    if (!vsp)
>> +        return;
>> +
>> +    usp = pci_upstream_bridge(vsp);
>> +    if (!usp)
>> +        return;
>> +
>> +    aer_cap = usp->aer_cap;
>> +
>> +    if (!aer_cap)
>> +        return;
>> +
>> +    /*
>> +     * All errors are steered to USP which is a PCIe AER Complaint 
>> device.
>> +     * Downgrade all the errors to non-fatal to prevent PCIe bus driver
>> +     * from triggering a Secondary Bus Reset (SBR). This allows error
>> +     * detection, containment and recovery in the driver.
>> +     *
>> +     * The Uncorrectable Error Severity Register has the 'Uncorrectable
>> +     * Internal Error Severity' set to fatal by default. Set this to
>> +     * non-fatal and unmask the error.
>> +     */
>> +
>> +    /* Initialize Uncorrectable Error Severity Register */
>> +    pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, 
>> &aer_uncorr_sev);
>> +    aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
>> +    pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, 
>> aer_uncorr_sev);
>> +
>> +    /* Initialize Uncorrectable Error Mask Register */
>> +    pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, 
>> &aer_uncorr_mask);
>> +    aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
>> +    pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, 
>> aer_uncorr_mask);
>> +
>> +    pci_save_state(usp);
>> +}
> 
> What happens when upstrean switch port is shared across another device 
> (another GPU / NVMe drive) etc.
> 

This is part of card. If there is audio or anything, they need to also 
implement error handling.

Thanks
Riana


> Thanks,
> 
> -/Mallesh
> 
>> +#endif
>> +
>> +/**
>> + * xe_ras_init - Initialize Xe RAS
>> + * @xe: xe device instance
>> + *
>> + * Initialize Xe RAS
>> + */
>> +void xe_ras_init(struct xe_device *xe)
>> +{
>> +    if (!xe->info.has_sysctrl)
>> +        return;
>> +
>> +#ifdef CONFIG_PCIEAER
>> +    unmask_and_downgrade_internal_error(xe);
>> +#endif
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> new file mode 100644
>> index 000000000000..14cb973603e7
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_RAS_H_
>> +#define _XE_RAS_H_
>> +
>> +struct xe_device;
>> +
>> +void xe_ras_init(struct xe_device *xe);
>> +
>> +#endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-27 22:49   ` Michal Wajdeczko
@ 2026-02-02  9:45     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-02  9:45 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, mallesh.koujalagi



On 1/28/2026 4:19 AM, Michal Wajdeczko wrote:
> 
> 
> On 1/22/2026 11:06 AM, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>>   5 files changed, 107 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index f6650ec3ab42..5581f2180b5c 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>>   	xe_page_reclaim.o \
>>   	xe_pat.o \
>>   	xe_pci.o \
>> +	xe_pci_error.o \
>>   	xe_pci_rebar.o \
>>   	xe_pcode.o \
>>   	xe_pm.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> index 58d7d8b2fea3..81480248eeff 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>>   	return container_of(ttm, struct xe_device, ttm);
>>   }
>>   
>> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>> +{
>> +	return atomic_read(&xe->in_recovery);
>> +}
>> +
>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>> +{
>> +	atomic_set(&xe->in_recovery, 1);
>> +}
>> +
>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>> +{
>> +	 atomic_set(&xe->in_recovery, 0);
>> +}
>> +
>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>   				   const struct pci_device_id *ent);
>>   int xe_device_probe_early(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 944f909a86ad..2d140463dc5e 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -669,6 +669,9 @@ struct xe_device {
>>   		bool inconsistent_reset;
>>   	} wedged;
>>   
>> +	/** @in_recovery: Indicates if device is in recovery */
>> +	atomic_t in_recovery;
>> +
>>   	/** @bo_device: Struct to control async free of BOs */
>>   	struct xe_bo_dev {
>>   		/** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index c92cc176f669..e1ee393b7461 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>   };
>>   #endif
>>   
>> +extern const struct pci_error_handlers xe_pci_error_handlers;
>> +
>>   static struct pci_driver xe_pci_driver = {
>>   	.name = DRIVER_NAME,
>>   	.id_table = pciidlist,
>> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>>   	.remove = xe_pci_remove,
>>   	.shutdown = xe_pci_shutdown,
>>   	.sriov_configure = xe_pci_sriov_configure,
>> +	.err_handler = &xe_pci_error_handlers,
>>   #ifdef CONFIG_PM_SLEEP
>>   	.driver.pm = &xe_pm_ops,
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
>> new file mode 100644
>> index 000000000000..a3cc01afa179
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -0,0 +1,85 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <drm/drm_drv.h>
>> +#include <linux/pci.h>
> 
> nit: usually all linux includes go first

I thought this was alphabetical. will fix this in new rev.

> 
>> +
>> +#include "xe_device.h"
>> +#include "xe_gt.h"
>> +#include "xe_pci.h"
>> +#include "xe_uc.h"
>> +
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +	xe_device_set_in_recovery(xe);
>> +	xe_device_declare_wedged(xe);
>> +
>> +	pci_disable_device(pdev);
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
>> +{
>> +	dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
> 
> nit: you can use pci_err()


This is not exactly pci_err. I will modify the error message to have
"Xe pci error handling" or something similar.

> 
>> +
>> +	switch (state) {
>> +	case pci_channel_io_normal:
>> +		return PCI_ERS_RESULT_CAN_RECOVER;
>> +	case pci_channel_io_frozen:
>> +		xe_pci_error_handling(pdev);
>> +		return PCI_ERS_RESULT_NEED_RESET;
>> +	case pci_channel_io_perm_failure:
>> +		return PCI_ERS_RESULT_DISCONNECT;
>> +	}
>> +
>> +	return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>> +{
>> +	dev_err(&pdev->dev, "PCI mmio enabled\n");
> 
> s/mmio/MMIO
> 
> but should this be still an err level?
> hmm, as we just report static result, maybe we can drop it?
> there will be already:

This would be at bridge level. For actual error scenarios, having a log
to verify if the callbacks are called would be useful

This is not always static. Xe error handling is added in subsequent patches.

> 
> 	pci_dbg(bridge, "broadcast mmio_enabled message\n");
> 
>> +
>> +	return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>> +{
>> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +	dev_err(&pdev->dev, "PCI slot reset\n");
>> +
>> +	pci_restore_state(pdev);
>> +
>> +	if (pci_enable_device(pdev)) {
>> +		dev_err(&pdev->dev,
>> +			"Cannot re-enable PCI device after reset\n");
>> +		return PCI_ERS_RESULT_DISCONNECT;
>> +	}
>> +
>> +	/*
>> +	 * Secondary Bus Reset wipes out all device memory
>> +	 * requiring XE KMD to perform a device removal and reprobe.
>> +	 */
>> +	pdev->driver->remove(pdev);
> 
> what will happen to all devm/drmm resources that previous xe_pci_probe() has allocated?

We don't have a function exported that clears all dev resources so i 
added all devres in a  group in patch [1]. Any suggestions for this 
would be helpful

[1] 
https://lore.kernel.org/intel-xe/20260122100613.3631582-10-riana.tauro@intel.com/T/#mf4cfe9c4693defa9430c10a59b92800ba75b414e

> 
>> +	xe_device_clear_in_recovery(xe);
> 
> is it safe to access this xe after calling remove() ?

Will remove this

> 
>> +
>> +	if (!pdev->driver->probe(pdev, ent))
>> +		return PCI_ERS_RESULT_RECOVERED;
>> +
>> +	return PCI_ERS_RESULT_RECOVERED;
> 
> also recovered? maybe this should be PCI_ERS_RESULT_DISCONNECT

yeah this is a mistake. Thank you for catching this.
Should be disconnect.

Thanks
Riana

> 
>> +}
>> +
>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>> +{
>> +	dev_info(&pdev->dev, "PCI error resume\n");
>> +}
>> +
>> +const struct pci_error_handlers xe_pci_error_handlers = {
>> +	.error_detected	= xe_pci_error_detected,
>> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
>> +	.slot_reset	= xe_pci_error_slot_reset,
>> +	.resume		= xe_pci_error_resume,
>> +};
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-29  9:09   ` Nilawar, Badal
@ 2026-02-02 13:19     ` Nilawar, Badal
  2026-02-03  3:46       ` Riana Tauro
  2026-02-03  3:41     ` Riana Tauro
  1 sibling, 1 reply; 41+ messages in thread
From: Nilawar, Badal @ 2026-02-02 13:19 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, raag.jadav,
	ravi.kishore.koppuravuri, mallesh.koujalagi

Added few more comments

On 29-01-2026 14:39, Nilawar, Badal wrote:
> Hi Riana,
>
> On 22-01-2026 15:36, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>>   5 files changed, 107 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index f6650ec3ab42..5581f2180b5c 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>>       xe_page_reclaim.o \
>>       xe_pat.o \
>>       xe_pci.o \
>> +    xe_pci_error.o \
>>       xe_pci_rebar.o \
>>       xe_pcode.o \
>>       xe_pm.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.h 
>> b/drivers/gpu/drm/xe/xe_device.h
>> index 58d7d8b2fea3..81480248eeff 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -43,6 +43,21 @@ static inline struct xe_device 
>> *ttm_to_xe_device(struct ttm_device *ttm)
>>       return container_of(ttm, struct xe_device, ttm);
>>   }
>>   +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>> +{
>> +    return atomic_read(&xe->in_recovery);
>> +}
>> +
>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>> +{
>> +    atomic_set(&xe->in_recovery, 1);
>> +}
>> +
>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>> +{
>> +     atomic_set(&xe->in_recovery, 0);
>> +}
>> +
>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>                      const struct pci_device_id *ent);
>>   int xe_device_probe_early(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h 
>> b/drivers/gpu/drm/xe/xe_device_types.h
>> index 944f909a86ad..2d140463dc5e 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -669,6 +669,9 @@ struct xe_device {
>>           bool inconsistent_reset;
>>       } wedged;
>>   +    /** @in_recovery: Indicates if device is in recovery */
>> +    atomic_t in_recovery;
>> +
>>       /** @bo_device: Struct to control async free of BOs */
>>       struct xe_bo_dev {
>>           /** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index c92cc176f669..e1ee393b7461 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>   };
>>   #endif
>>   +extern const struct pci_error_handlers xe_pci_error_handlers;
>> +
>>   static struct pci_driver xe_pci_driver = {
>>       .name = DRIVER_NAME,
>>       .id_table = pciidlist,
>> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>>       .remove = xe_pci_remove,
>>       .shutdown = xe_pci_shutdown,
>>       .sriov_configure = xe_pci_sriov_configure,
>> +    .err_handler = &xe_pci_error_handlers,
>>   #ifdef CONFIG_PM_SLEEP
>>       .driver.pm = &xe_pm_ops,
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c 
>> b/drivers/gpu/drm/xe/xe_pci_error.c
>> new file mode 100644
>> index 000000000000..a3cc01afa179
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -0,0 +1,85 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <drm/drm_drv.h>
>> +#include <linux/pci.h>
>> +
>> +#include "xe_device.h"
>> +#include "xe_gt.h"
>> +#include "xe_pci.h"
>> +#include "xe_uc.h"
>> +
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    xe_device_set_in_recovery(xe);
>> +    xe_device_declare_wedged(xe);
>> +
>> +    pci_disable_device(pdev);
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, 
>> pci_channel_state_t state)
>> +{
>> +    dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
>> +
>> +    switch (state) {
>> +    case pci_channel_io_normal:
>> +        return PCI_ERS_RESULT_CAN_RECOVER;
>> +    case pci_channel_io_frozen:
>> +        xe_pci_error_handling(pdev);
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +    case pci_channel_io_perm_failure:
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>> +{
>> +    dev_err(&pdev->dev, "PCI mmio enabled\n");
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>> +{
>> +    const struct pci_device_id *ent = 
>> pci_match_id(pdev->driver->id_table, pdev);
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    dev_err(&pdev->dev, "PCI slot reset\n");
>> +
>> +    pci_restore_state(pdev);
> What is the significance of restore state here? In reset path any 
> where pci_save_state() is happening?
>> +
>> +    if (pci_enable_device(pdev)) {
>> +        dev_err(&pdev->dev,
>> +            "Cannot re-enable PCI device after reset\n");
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    /*
>> +     * Secondary Bus Reset wipes out all device memory
>> +     * requiring XE KMD to perform a device removal and reprobe.
>> +     */
>> +    pdev->driver->remove(pdev);
>> +    xe_device_clear_in_recovery(xe);
>> +
>> +    if (!pdev->driver->probe(pdev, ent))
>> +        return PCI_ERS_RESULT_RECOVERED;
>> +
Instead of invoking xe_pci_remove() and xe_pci_probe() for unbind-bind 
operations, we can use device_release_driver() followed by device_attach().
With this approach, the patch "Group all devres to release them on PCIe 
slot reset" becomes unnecessary.

     device_release_driver(&pdev->dev);
     xe_device_clear_in_recovery(xe);
     ret = device_attach(&pdev->dev);
     if (ret == 1)
         return PCI_ERS_RESULT_RECOVERED;

     return PCI_ERS_RESULT_DISCONNECT;

Thanks
Badal

>> +    return PCI_ERS_RESULT_RECOVERED;
>
> Is it correct to return PCI_ERS_RESULT_RECOVERED if driver probe fails?
>
> Thanks, Badal
>
>> +}
>> +
>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>> +{
>> +    dev_info(&pdev->dev, "PCI error resume\n");
>> +}
>> +
>> +const struct pci_error_handlers xe_pci_error_handlers = {
>> +    .error_detected    = xe_pci_error_detected,
>> +    .mmio_enabled    = xe_pci_error_mmio_enabled,
>> +    .slot_reset    = xe_pci_error_slot_reset,
>> +    .resume        = xe_pci_error_resume,
>> +};

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-29  9:09   ` Nilawar, Badal
  2026-02-02 13:19     ` Nilawar, Badal
@ 2026-02-03  3:41     ` Riana Tauro
  1 sibling, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-03  3:41 UTC (permalink / raw)
  To: Nilawar, Badal, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, raag.jadav,
	ravi.kishore.koppuravuri, mallesh.koujalagi

Hi Badal

On 1/29/2026 2:39 PM, Nilawar, Badal wrote:
> Hi Riana,
> 
> On 22-01-2026 15:36, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>>   5 files changed, 107 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index f6650ec3ab42..5581f2180b5c 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>>       xe_page_reclaim.o \
>>       xe_pat.o \
>>       xe_pci.o \
>> +    xe_pci_error.o \
>>       xe_pci_rebar.o \
>>       xe_pcode.o \
>>       xe_pm.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/ 
>> xe_device.h
>> index 58d7d8b2fea3..81480248eeff 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -43,6 +43,21 @@ static inline struct xe_device 
>> *ttm_to_xe_device(struct ttm_device *ttm)
>>       return container_of(ttm, struct xe_device, ttm);
>>   }
>> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>> +{
>> +    return atomic_read(&xe->in_recovery);
>> +}
>> +
>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>> +{
>> +    atomic_set(&xe->in_recovery, 1);
>> +}
>> +
>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>> +{
>> +     atomic_set(&xe->in_recovery, 0);
>> +}
>> +
>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>                      const struct pci_device_id *ent);
>>   int xe_device_probe_early(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/ 
>> xe/xe_device_types.h
>> index 944f909a86ad..2d140463dc5e 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -669,6 +669,9 @@ struct xe_device {
>>           bool inconsistent_reset;
>>       } wedged;
>> +    /** @in_recovery: Indicates if device is in recovery */
>> +    atomic_t in_recovery;
>> +
>>       /** @bo_device: Struct to control async free of BOs */
>>       struct xe_bo_dev {
>>           /** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index c92cc176f669..e1ee393b7461 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>   };
>>   #endif
>> +extern const struct pci_error_handlers xe_pci_error_handlers;
>> +
>>   static struct pci_driver xe_pci_driver = {
>>       .name = DRIVER_NAME,
>>       .id_table = pciidlist,
>> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>>       .remove = xe_pci_remove,
>>       .shutdown = xe_pci_shutdown,
>>       .sriov_configure = xe_pci_sriov_configure,
>> +    .err_handler = &xe_pci_error_handlers,
>>   #ifdef CONFIG_PM_SLEEP
>>       .driver.pm = &xe_pm_ops,
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/ 
>> xe_pci_error.c
>> new file mode 100644
>> index 000000000000..a3cc01afa179
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -0,0 +1,85 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <drm/drm_drv.h>
>> +#include <linux/pci.h>
>> +
>> +#include "xe_device.h"
>> +#include "xe_gt.h"
>> +#include "xe_pci.h"
>> +#include "xe_uc.h"
>> +
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    xe_device_set_in_recovery(xe);
>> +    xe_device_declare_wedged(xe);
>> +
>> +    pci_disable_device(pdev);
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, 
>> pci_channel_state_t state)
>> +{
>> +    dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
>> +
>> +    switch (state) {
>> +    case pci_channel_io_normal:
>> +        return PCI_ERS_RESULT_CAN_RECOVER;
>> +    case pci_channel_io_frozen:
>> +        xe_pci_error_handling(pdev);
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +    case pci_channel_io_perm_failure:
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>> +{
>> +    dev_err(&pdev->dev, "PCI mmio enabled\n");
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>> +{
>> +    const struct pci_device_id *ent = pci_match_id(pdev->driver- 
>> >id_table, pdev);
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    dev_err(&pdev->dev, "PCI slot reset\n");
>> +
>> +    pci_restore_state(pdev);
> What is the significance of restore state here? In reset path any where 
> pci_save_state() is happening?


This is called after SBR and state is lost. We need to save the state of
the card to the state before the error occured

>> +
>> +    if (pci_enable_device(pdev)) {
>> +        dev_err(&pdev->dev,
>> +            "Cannot re-enable PCI device after reset\n");
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    /*
>> +     * Secondary Bus Reset wipes out all device memory
>> +     * requiring XE KMD to perform a device removal and reprobe.
>> +     */
>> +    pdev->driver->remove(pdev);
>> +    xe_device_clear_in_recovery(xe);
>> +
>> +    if (!pdev->driver->probe(pdev, ent))
>> +        return PCI_ERS_RESULT_RECOVERED;
>> +
>> +    return PCI_ERS_RESULT_RECOVERED;
> 
> Is it correct to return PCI_ERS_RESULT_RECOVERED if driver probe fails?

This is a mistake. Thanks for catching this. Will fix this

Thanks
Riana

> 
> Thanks, Badal
> 
>> +}
>> +
>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>> +{
>> +    dev_info(&pdev->dev, "PCI error resume\n");
>> +}
>> +
>> +const struct pci_error_handlers xe_pci_error_handlers = {
>> +    .error_detected    = xe_pci_error_detected,
>> +    .mmio_enabled    = xe_pci_error_mmio_enabled,
>> +    .slot_reset    = xe_pci_error_slot_reset,
>> +    .resume        = xe_pci_error_resume,
>> +};


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-02-02 13:19     ` Nilawar, Badal
@ 2026-02-03  3:46       ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-03  3:46 UTC (permalink / raw)
  To: Nilawar, Badal, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, raag.jadav,
	ravi.kishore.koppuravuri, mallesh.koujalagi



On 2/2/2026 6:49 PM, Nilawar, Badal wrote:
> Added few more comments
> 
> On 29-01-2026 14:39, Nilawar, Badal wrote:
>> Hi Riana,
>>
>> On 22-01-2026 15:36, Riana Tauro wrote:
>>> Add error_detected, mmio_enabled, slot_reset and resume
>>> recovery callbacks to handle PCIe Advanced Error Reporting
>>> (AER) errors.
>>>
>>> For fatal errors, the device is wedged and becomes
>>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>>> error_detected to request a Secondary Bus Reset (SBR).
>>>
>>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>>> error_detected to trigger the mmio_enabled callback. In this callback,
>>> the device is queried to determine the error cause and attempt
>>> recovery based on the error type.
>>>
>>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>>> cleanly removes and reprobe the device to restore functionality.
>>>
>>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>>>   5 files changed, 107 insertions(+)
>>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>>
>>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>>> index f6650ec3ab42..5581f2180b5c 100644
>>> --- a/drivers/gpu/drm/xe/Makefile
>>> +++ b/drivers/gpu/drm/xe/Makefile
>>> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>>>       xe_page_reclaim.o \
>>>       xe_pat.o \
>>>       xe_pci.o \
>>> +    xe_pci_error.o \
>>>       xe_pci_rebar.o \
>>>       xe_pcode.o \
>>>       xe_pm.o \
>>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/ 
>>> xe_device.h
>>> index 58d7d8b2fea3..81480248eeff 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.h
>>> +++ b/drivers/gpu/drm/xe/xe_device.h
>>> @@ -43,6 +43,21 @@ static inline struct xe_device 
>>> *ttm_to_xe_device(struct ttm_device *ttm)
>>>       return container_of(ttm, struct xe_device, ttm);
>>>   }
>>>   +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>>> +{
>>> +    return atomic_read(&xe->in_recovery);
>>> +}
>>> +
>>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>>> +{
>>> +    atomic_set(&xe->in_recovery, 1);
>>> +}
>>> +
>>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>>> +{
>>> +     atomic_set(&xe->in_recovery, 0);
>>> +}
>>> +
>>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>>                      const struct pci_device_id *ent);
>>>   int xe_device_probe_early(struct xe_device *xe);
>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/ 
>>> xe/xe_device_types.h
>>> index 944f909a86ad..2d140463dc5e 100644
>>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>>> @@ -669,6 +669,9 @@ struct xe_device {
>>>           bool inconsistent_reset;
>>>       } wedged;
>>>   +    /** @in_recovery: Indicates if device is in recovery */
>>> +    atomic_t in_recovery;
>>> +
>>>       /** @bo_device: Struct to control async free of BOs */
>>>       struct xe_bo_dev {
>>>           /** @bo_device.async_free: Free worker */
>>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>>> index c92cc176f669..e1ee393b7461 100644
>>> --- a/drivers/gpu/drm/xe/xe_pci.c
>>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>>> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>>   };
>>>   #endif
>>>   +extern const struct pci_error_handlers xe_pci_error_handlers;
>>> +
>>>   static struct pci_driver xe_pci_driver = {
>>>       .name = DRIVER_NAME,
>>>       .id_table = pciidlist,
>>> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>>>       .remove = xe_pci_remove,
>>>       .shutdown = xe_pci_shutdown,
>>>       .sriov_configure = xe_pci_sriov_configure,
>>> +    .err_handler = &xe_pci_error_handlers,
>>>   #ifdef CONFIG_PM_SLEEP
>>>       .driver.pm = &xe_pm_ops,
>>>   #endif
>>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/ 
>>> xe_pci_error.c
>>> new file mode 100644
>>> index 000000000000..a3cc01afa179
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>>> @@ -0,0 +1,85 @@
>>> +// SPDX-License-Identifier: MIT
>>> +/*
>>> + * Copyright © 2026 Intel Corporation
>>> + */
>>> +#include <drm/drm_drv.h>
>>> +#include <linux/pci.h>
>>> +
>>> +#include "xe_device.h"
>>> +#include "xe_gt.h"
>>> +#include "xe_pci.h"
>>> +#include "xe_uc.h"
>>> +
>>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>>> +{
>>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>>> +
>>> +    xe_device_set_in_recovery(xe);
>>> +    xe_device_declare_wedged(xe);
>>> +
>>> +    pci_disable_device(pdev);
>>> +}
>>> +
>>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, 
>>> pci_channel_state_t state)
>>> +{
>>> +    dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
>>> +
>>> +    switch (state) {
>>> +    case pci_channel_io_normal:
>>> +        return PCI_ERS_RESULT_CAN_RECOVER;
>>> +    case pci_channel_io_frozen:
>>> +        xe_pci_error_handling(pdev);
>>> +        return PCI_ERS_RESULT_NEED_RESET;
>>> +    case pci_channel_io_perm_failure:
>>> +        return PCI_ERS_RESULT_DISCONNECT;
>>> +    }
>>> +
>>> +    return PCI_ERS_RESULT_NEED_RESET;
>>> +}
>>> +
>>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>>> +{
>>> +    dev_err(&pdev->dev, "PCI mmio enabled\n");
>>> +
>>> +    return PCI_ERS_RESULT_NEED_RESET;
>>> +}
>>> +
>>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>>> +{
>>> +    const struct pci_device_id *ent = pci_match_id(pdev->driver- 
>>> >id_table, pdev);
>>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>>> +
>>> +    dev_err(&pdev->dev, "PCI slot reset\n");
>>> +
>>> +    pci_restore_state(pdev);
>> What is the significance of restore state here? In reset path any 
>> where pci_save_state() is happening?
>>> +
>>> +    if (pci_enable_device(pdev)) {
>>> +        dev_err(&pdev->dev,
>>> +            "Cannot re-enable PCI device after reset\n");
>>> +        return PCI_ERS_RESULT_DISCONNECT;
>>> +    }
>>> +
>>> +    /*
>>> +     * Secondary Bus Reset wipes out all device memory
>>> +     * requiring XE KMD to perform a device removal and reprobe.
>>> +     */
>>> +    pdev->driver->remove(pdev);
>>> +    xe_device_clear_in_recovery(xe);
>>> +
>>> +    if (!pdev->driver->probe(pdev, ent))
>>> +        return PCI_ERS_RESULT_RECOVERED;
>>> +
> Instead of invoking xe_pci_remove() and xe_pci_probe() for unbind-bind 
> operations, we can use device_release_driver() followed by device_attach().
> With this approach, the patch "Group all devres to release them on PCIe 
> slot reset" becomes unnecessary.
> 
>      device_release_driver(&pdev->dev);
>      xe_device_clear_in_recovery(xe);
>      ret = device_attach(&pdev->dev);
>      if (ret == 1)
>          return PCI_ERS_RESULT_RECOVERED;
> 
>      return PCI_ERS_RESULT_DISCONNECT;


I tried this before. This causes a deadlock. The device_attach and
release_driver hold a dev lock which is also held by the error handling 
functions.

static int report_slot_reset(struct pci_dev *dev, void *data)
{
....

	device_lock(&dev->dev);


static int __device_attach(struct device *dev, bool allow_async)
{
....

	device_lock(dev);

Thanks
Riana



> 
> Thanks
> Badal
> 
>>> +    return PCI_ERS_RESULT_RECOVERED;
>>
>> Is it correct to return PCI_ERS_RESULT_RECOVERED if driver probe fails?
>>
>> Thanks, Badal
>>
>>> +}
>>> +
>>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>>> +{
>>> +    dev_info(&pdev->dev, "PCI error resume\n");
>>> +}
>>> +
>>> +const struct pci_error_handlers xe_pci_error_handlers = {
>>> +    .error_detected    = xe_pci_error_detected,
>>> +    .mmio_enabled    = xe_pci_error_mmio_enabled,
>>> +    .slot_reset    = xe_pci_error_slot_reset,
>>> +    .resume        = xe_pci_error_resume,
>>> +};


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
  2026-01-27 12:41   ` Mallesh, Koujalagi
@ 2026-02-04  8:38   ` Aravind Iddamsetty
  2026-02-16 12:27   ` Mallesh, Koujalagi
  2 siblings, 0 replies; 41+ messages in thread
From: Aravind Iddamsetty @ 2026-02-04  8:38 UTC (permalink / raw)
  To: Riana Tauro, intel-xe@lists.freedesktop.org
  Cc: anshuman.gupta, rodrigo.vivi, badal.nilawar, raag.jadav,
	ravi.kishore.koppuravuri, mallesh.koujalagi

[-- Attachment #1: Type: text/plain, Size: 5129 bytes --]

Hi Riana,

On 22-01-2026 15:36, Riana Tauro wrote:
> Uncorrectable errors from different endpoints in the device are steered to
> the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
> Downgrade all the errors to non-fatal to prevent PCIe bus driver
> from triggering a Secondary Bus Reset (SBR). This allows error
> detection, containment and recovery in the driver.
>
> The Uncorrectable Error Severity Register has the 'Uncorrectable
> Internal Error Severity' set to fatal by default. Set this to
> non-fatal and unmask the error.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile    |  1 +
>  drivers/gpu/drm/xe/xe_device.c |  3 ++
>  drivers/gpu/drm/xe/xe_ras.c    | 71 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_ras.h    | 13 +++++++
>  4 files changed, 88 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_ras.c
>  create mode 100644 drivers/gpu/drm/xe/xe_ras.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 5581f2180b5c..85ec53eb0b62 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -110,6 +110,7 @@ xe-y += xe_bb.o \
>  	xe_pxp_debugfs.o \
>  	xe_pxp_submit.o \
>  	xe_query.o \
> +	xe_ras.o \
>  	xe_range_fence.o \
>  	xe_reg_sr.o \
>  	xe_reg_whitelist.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index f418ebf04f0f..be89ffc9eade 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -59,6 +59,7 @@
>  #include "xe_psmi.h"
>  #include "xe_pxp.h"
>  #include "xe_query.h"
> +#include "xe_ras.h"
>  #include "xe_shrinker.h"
>  #include "xe_soc_remapper.h"
>  #include "xe_survivability_mode.h"
> @@ -1019,6 +1020,8 @@ int xe_device_probe(struct xe_device *xe)
>  
>  	xe_vsec_init(xe);
>  
> +	xe_ras_init(xe);
> +
>  	err = xe_sriov_init_late(xe);
>  	if (err)
>  		goto err_unregister_display;
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> new file mode 100644
> index 000000000000..ba5ed37aed28
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <linux/pci.h>
> +
> +#include "xe_device_types.h"
> +#include "xe_ras.h"
> +
> +#ifdef CONFIG_PCIEAER
> +static void unmask_and_downgrade_internal_error(struct xe_device *xe)
> +{
> +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> +	struct pci_dev *vsp, *usp;
> +	u32 aer_uncorr_sev, aer_uncorr_mask;
> +	u16 aer_cap;
> +
> +	 /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
> +	vsp = pci_upstream_bridge(pdev);
> +	if (!vsp)
> +		return;
> +
> +	usp = pci_upstream_bridge(vsp);
> +	if (!usp)
> +		return;
> +
> +	aer_cap = usp->aer_cap;
> +
> +	if (!aer_cap)
> +		return;
> +
> +	/*
> +	 * All errors are steered to USP which is a PCIe AER Complaint device.
> +	 * Downgrade all the errors to non-fatal to prevent PCIe bus driver
> +	 * from triggering a Secondary Bus Reset (SBR). This allows error
> +	 * detection, containment and recovery in the driver.
> +	 *
> +	 * The Uncorrectable Error Severity Register has the 'Uncorrectable
> +	 * Internal Error Severity' set to fatal by default. Set this to
> +	 * non-fatal and unmask the error.
> +	 */
> +

Before unmasking the PCI_ERR_UNC_INTN bit, we shall clear stale event in
PCI_ERR_UNCOR_STATUS register that would be signaled once we unmask the
bit. (Assuming the bit wasn't unmasked already.)

There is a pci_aer_unmask_internal_errors() helper declared in
drivers/pci/pcie/aer.c which we could probably use by exporting it.

Also do you think it makes more sense to move this to pci quirks,
because in virtualized environment the XeKMD might be in VM(passthrough
model) and USP in host then this might not work.

> +	/* Initialize Uncorrectable Error Severity Register */
> +	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
> +	aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
> +
> +	/* Initialize Uncorrectable Error Mask Register */
> +	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> +	aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
> +
> +	pci_save_state(usp);
> +}
> +#endif
> +
> +/**
> + * xe_ras_init - Initialize Xe RAS
> + * @xe: xe device instance
> + *
> + * Initialize Xe RAS
> + */
> +void xe_ras_init(struct xe_device *xe)
> +{
> +	if (!xe->info.has_sysctrl)
> +		return;
> +
> +#ifdef CONFIG_PCIEAER
> +	unmask_and_downgrade_internal_error(xe);
> +#endif
> +}
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> new file mode 100644
> index 000000000000..14cb973603e7
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _XE_RAS_H_
> +#define _XE_RAS_H_
> +
> +struct xe_device;
> +
> +void xe_ras_init(struct xe_device *xe);
> +
> +#endif
Thanks,
Aravind.

[-- Attachment #2: Type: text/html, Size: 5940 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
  2026-01-27 22:49   ` Michal Wajdeczko
  2026-01-29  9:09   ` Nilawar, Badal
@ 2026-02-08  8:02   ` Raag Jadav
  2026-02-24  3:23     ` Riana Tauro
  2026-02-16  8:53   ` Mallesh, Koujalagi
  3 siblings, 1 reply; 41+ messages in thread
From: Raag Jadav @ 2026-02-08  8:02 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Thu, Jan 22, 2026 at 03:36:14PM +0530, Riana Tauro wrote:
> Add error_detected, mmio_enabled, slot_reset and resume
> recovery callbacks to handle PCIe Advanced Error Reporting
> (AER) errors.
> 
> For fatal errors, the device is wedged and becomes
> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> error_detected to request a Secondary Bus Reset (SBR).
> 
> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> error_detected to trigger the mmio_enabled callback. In this callback,
> the device is queried to determine the error cause and attempt
> recovery based on the error type.
> 
> Once the secondary bus reset(SBR) is completed the slot_reset callback
> cleanly removes and reprobe the device to restore functionality.

...

> +static void xe_pci_error_handling(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	xe_device_set_in_recovery(xe);
> +	xe_device_declare_wedged(xe);

Is this the correct usage?

Documentation/gpu/drm-uapi.rst +392

"A 'wedged' device is basically a device that is declared dead by the driver
after exhausting all possible attempts to recover it from driver context."

Raag

> +	pci_disable_device(pdev);
> +}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
                     ` (2 preceding siblings ...)
  2026-02-08  8:02   ` Raag Jadav
@ 2026-02-16  8:53   ` Mallesh, Koujalagi
  2026-02-24  3:26     ` Riana Tauro
  3 siblings, 1 reply; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-02-16  8:53 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri

[-- Attachment #1: Type: text/plain, Size: 6545 bytes --]

Hi Riana,

On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Add error_detected, mmio_enabled, slot_reset and resume
> recovery callbacks to handle PCIe Advanced Error Reporting
> (AER) errors.
>
> For fatal errors, the device is wedged and becomes
> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> error_detected to request a Secondary Bus Reset (SBR).
>
> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> error_detected to trigger the mmio_enabled callback. In this callback,
> the device is queried to determine the error cause and attempt
> recovery based on the error type.
>
> Once the secondary bus reset(SBR) is completed the slot_reset callback
> cleanly removes and reprobe the device to restore functionality.
>
> Signed-off-by: Riana Tauro<riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/Makefile          |  1 +
>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>   5 files changed, 107 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index f6650ec3ab42..5581f2180b5c 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>   	xe_page_reclaim.o \
>   	xe_pat.o \
>   	xe_pci.o \
> +	xe_pci_error.o \
>   	xe_pci_rebar.o \
>   	xe_pcode.o \
>   	xe_pm.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 58d7d8b2fea3..81480248eeff 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>   	return container_of(ttm, struct xe_device, ttm);
>   }
>   
> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
> +{
> +	return atomic_read(&xe->in_recovery);
> +}
> +
> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
> +{
> +	atomic_set(&xe->in_recovery, 1);
> +}
> +
> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
> +{
> +	 atomic_set(&xe->in_recovery, 0);
> +}
> +
>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>   				   const struct pci_device_id *ent);
>   int xe_device_probe_early(struct xe_device *xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 944f909a86ad..2d140463dc5e 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -669,6 +669,9 @@ struct xe_device {
>   		bool inconsistent_reset;
>   	} wedged;
>   
> +	/** @in_recovery: Indicates if device is in recovery */
> +	atomic_t in_recovery;
> +
>   	/** @bo_device: Struct to control async free of BOs */
>   	struct xe_bo_dev {
>   		/** @bo_device.async_free: Free worker */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index c92cc176f669..e1ee393b7461 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>   };
>   #endif
>   
> +extern const struct pci_error_handlers xe_pci_error_handlers;
> +
>   static struct pci_driver xe_pci_driver = {
>   	.name = DRIVER_NAME,
>   	.id_table = pciidlist,
> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>   	.remove = xe_pci_remove,
>   	.shutdown = xe_pci_shutdown,
>   	.sriov_configure = xe_pci_sriov_configure,
> +	.err_handler = &xe_pci_error_handlers,
>   #ifdef CONFIG_PM_SLEEP
>   	.driver.pm = &xe_pm_ops,
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 000000000000..a3cc01afa179
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <drm/drm_drv.h>
> +#include <linux/pci.h>
> +
> +#include "xe_device.h"
> +#include "xe_gt.h"
> +#include "xe_pci.h"
> +#include "xe_uc.h"
> +
> +static void xe_pci_error_handling(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	xe_device_set_in_recovery(xe);
> +	xe_device_declare_wedged(xe);
> +
> +	pci_disable_device(pdev);
> +}
> +
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> +	dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
> +
We need to set recovery flag here right?
> +	switch (state) {
> +	case pci_channel_io_normal:
> +		return PCI_ERS_RESULT_CAN_RECOVER;
> +	case pci_channel_io_frozen:
> +		xe_pci_error_handling(pdev);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	case pci_channel_io_perm_failure:
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
Please make default case where we see "Unknown channel state" as dev_err 
and return PCI_ERS_RESULT_NEED_RESET.
> +}
> +
> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
> +{
> +	dev_err(&pdev->dev, "PCI mmio enabled\n");
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
Check xe is null or not? if null then return PCI_ERS_RESULT_DISCONNECT.
> +	dev_err(&pdev->dev, "PCI slot reset\n");
> +
> +	pci_restore_state(pdev);
> +
> +	if (pci_enable_device(pdev)) {
> +		dev_err(&pdev->dev,
> +			"Cannot re-enable PCI device after reset\n");
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	/*
> +	 * Secondary Bus Reset wipes out all device memory
> +	 * requiring XE KMD to perform a device removal and reprobe.
> +	 */
> +	pdev->driver->remove(pdev);
> +	xe_device_clear_in_recovery(xe);
> +
> +	if (!pdev->driver->probe(pdev, ent))
> +		return PCI_ERS_RESULT_RECOVERED;
> +
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static void xe_pci_error_resume(struct pci_dev *pdev)
> +{
> +	dev_info(&pdev->dev, "PCI error resume\n");

We need to clear recovery flag (if not already cleared), for normal 
operations.

Thanks

-/Mallesh

> +}
> +
> +const struct pci_error_handlers xe_pci_error_handlers = {
> +	.error_detected	= xe_pci_error_detected,
> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
> +	.slot_reset	= xe_pci_error_slot_reset,
> +	.resume		= xe_pci_error_resume,
> +};

[-- Attachment #2: Type: text/html, Size: 7520 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
  2026-01-27 12:41   ` Mallesh, Koujalagi
  2026-02-04  8:38   ` Aravind Iddamsetty
@ 2026-02-16 12:27   ` Mallesh, Koujalagi
  2026-02-18 14:48     ` Riana Tauro
  2 siblings, 1 reply; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-02-16 12:27 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri

+Adding more comments

Hi Riana,

On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Uncorrectable errors from different endpoints in the device are steered to
> the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
> Downgrade all the errors to non-fatal to prevent PCIe bus driver
> from triggering a Secondary Bus Reset (SBR). This allows error
> detection, containment and recovery in the driver.
>
> The Uncorrectable Error Severity Register has the 'Uncorrectable
> Internal Error Severity' set to fatal by default. Set this to
> non-fatal and unmask the error.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/Makefile    |  1 +
>   drivers/gpu/drm/xe/xe_device.c |  3 ++
>   drivers/gpu/drm/xe/xe_ras.c    | 71 ++++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_ras.h    | 13 +++++++
>   4 files changed, 88 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_ras.c
>   create mode 100644 drivers/gpu/drm/xe/xe_ras.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 5581f2180b5c..85ec53eb0b62 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -110,6 +110,7 @@ xe-y += xe_bb.o \
>   	xe_pxp_debugfs.o \
>   	xe_pxp_submit.o \
>   	xe_query.o \
> +	xe_ras.o \
>   	xe_range_fence.o \
>   	xe_reg_sr.o \
>   	xe_reg_whitelist.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index f418ebf04f0f..be89ffc9eade 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -59,6 +59,7 @@
>   #include "xe_psmi.h"
>   #include "xe_pxp.h"
>   #include "xe_query.h"
> +#include "xe_ras.h"
>   #include "xe_shrinker.h"
>   #include "xe_soc_remapper.h"
>   #include "xe_survivability_mode.h"
> @@ -1019,6 +1020,8 @@ int xe_device_probe(struct xe_device *xe)
>   
>   	xe_vsec_init(xe);
>   
> +	xe_ras_init(xe);
> +
>   	err = xe_sriov_init_late(xe);
>   	if (err)
>   		goto err_unregister_display;
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> new file mode 100644
> index 000000000000..ba5ed37aed28
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <linux/pci.h>
> +
> +#include "xe_device_types.h"
> +#include "xe_ras.h"
> +
> +#ifdef CONFIG_PCIEAER
> +static void unmask_and_downgrade_internal_error(struct xe_device *xe)
> +{
> +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> +	struct pci_dev *vsp, *usp;
> +	u32 aer_uncorr_sev, aer_uncorr_mask;
> +	u16 aer_cap;
> +
> +	 /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
> +	vsp = pci_upstream_bridge(pdev);
> +	if (!vsp)
> +		return;
> +
> +	usp = pci_upstream_bridge(vsp);
> +	if (!usp)
> +		return;
> +
> +	aer_cap = usp->aer_cap;
> +
> +	if (!aer_cap)
> +		return;
> +
> +	/*
> +	 * All errors are steered to USP which is a PCIe AER Complaint device.
> +	 * Downgrade all the errors to non-fatal to prevent PCIe bus driver
> +	 * from triggering a Secondary Bus Reset (SBR). This allows error
> +	 * detection, containment and recovery in the driver.
> +	 *
> +	 * The Uncorrectable Error Severity Register has the 'Uncorrectable
> +	 * Internal Error Severity' set to fatal by default. Set this to
> +	 * non-fatal and unmask the error.
> +	 */
> +
> +	/* Initialize Uncorrectable Error Severity Register */
> +	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
> +	aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
> +
> +	/* Initialize Uncorrectable Error Mask Register */
> +	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> +	aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
> +	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);

Handle pci_read/write_config_dword() failure (return non-zero) scenario 
for both sev and mask.

Impact: Silent failure leads to incorrect assumption about error 
handling conf. Driver may expect

non-fatal errors but still get fatal errors causing unexpected resets.


Thanks,

-/Mallesh

> +
> +	pci_save_state(usp);
> +}
> +#endif
> +
> +/**
> + * xe_ras_init - Initialize Xe RAS
> + * @xe: xe device instance
> + *
> + * Initialize Xe RAS
> + */
> +void xe_ras_init(struct xe_device *xe)
> +{
> +	if (!xe->info.has_sysctrl)
> +		return;
> +
> +#ifdef CONFIG_PCIEAER
> +	unmask_and_downgrade_internal_error(xe);
> +#endif
> +}
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> new file mode 100644
> index 000000000000..14cb973603e7
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _XE_RAS_H_
> +#define _XE_RAS_H_
> +
> +struct xe_device;
> +
> +void xe_ras_init(struct xe_device *xe);
> +
> +#endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
  2026-01-27 11:44   ` Mallesh, Koujalagi
  2026-01-27 14:03   ` Mallesh, Koujalagi
@ 2026-02-17 14:02   ` Raag Jadav
  2026-02-23 14:10     ` Riana Tauro
  2 siblings, 1 reply; 41+ messages in thread
From: Raag Jadav @ 2026-02-17 14:02 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Thu, Jan 22, 2026 at 03:36:19PM +0530, Riana Tauro wrote:
> Uncorrectable Core-Compute errors are classified into Global and Local
> errors.
> 
> Global error is an error that affects the entire device requiring a
> reset. This type of error is not isolated. When an AER is reported and
> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
> 
> A Local error is confined to a specific component or context like a
> engine. These errors can be contained and recovered by resetting
> only the affected part without distrupting the rest of the device.
> 
> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
> generated and GuC is notified of the error. The KMD then sets
> the context as non-runnable and initiates an engine reset.
> (TODO: GuC <->KMD communication for the error).
> Since the error is contained and recovered, PCI error handling
> callback returns PCI_ERS_RESULT_RECOVERED.

...

> +/**
> + * xe_ras_process_errors - Process and contain hardware errors
> + * @xe: xe device instance
> + *
> + * Get error details from system controller and return recovery
> + * method. Called only from PCI error handling.
> + *
> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery needed,
> + * PCI_ERS_RESULT_NEED_RESET otherwise.

PCI error codes are unrelated to xe_ras. IMO let's use standard error
codes here and translate them to PCI ones in the callbacks.

Raag

> + */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-02-16 12:27   ` Mallesh, Koujalagi
@ 2026-02-18 14:48     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-18 14:48 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri



On 2/16/2026 5:57 PM, Mallesh, Koujalagi wrote:
> +Adding more comments
> 
> Hi Riana,
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Uncorrectable errors from different endpoints in the device are 
>> steered to
>> the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
>> Downgrade all the errors to non-fatal to prevent PCIe bus driver
>> from triggering a Secondary Bus Reset (SBR). This allows error
>> detection, containment and recovery in the driver.
>>
>> The Uncorrectable Error Severity Register has the 'Uncorrectable
>> Internal Error Severity' set to fatal by default. Set this to
>> non-fatal and unmask the error.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile    |  1 +
>>   drivers/gpu/drm/xe/xe_device.c |  3 ++
>>   drivers/gpu/drm/xe/xe_ras.c    | 71 ++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_ras.h    | 13 +++++++
>>   4 files changed, 88 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_ras.c
>>   create mode 100644 drivers/gpu/drm/xe/xe_ras.h
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index 5581f2180b5c..85ec53eb0b62 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -110,6 +110,7 @@ xe-y += xe_bb.o \
>>       xe_pxp_debugfs.o \
>>       xe_pxp_submit.o \
>>       xe_query.o \
>> +    xe_ras.o \
>>       xe_range_fence.o \
>>       xe_reg_sr.o \
>>       xe_reg_whitelist.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/ 
>> xe_device.c
>> index f418ebf04f0f..be89ffc9eade 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -59,6 +59,7 @@
>>   #include "xe_psmi.h"
>>   #include "xe_pxp.h"
>>   #include "xe_query.h"
>> +#include "xe_ras.h"
>>   #include "xe_shrinker.h"
>>   #include "xe_soc_remapper.h"
>>   #include "xe_survivability_mode.h"
>> @@ -1019,6 +1020,8 @@ int xe_device_probe(struct xe_device *xe)
>>       xe_vsec_init(xe);
>> +    xe_ras_init(xe);
>> +
>>       err = xe_sriov_init_late(xe);
>>       if (err)
>>           goto err_unregister_display;
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> new file mode 100644
>> index 000000000000..ba5ed37aed28
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -0,0 +1,71 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <linux/pci.h>
>> +
>> +#include "xe_device_types.h"
>> +#include "xe_ras.h"
>> +
>> +#ifdef CONFIG_PCIEAER
>> +static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>> +{
>> +    struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>> +    struct pci_dev *vsp, *usp;
>> +    u32 aer_uncorr_sev, aer_uncorr_mask;
>> +    u16 aer_cap;
>> +
>> +     /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
>> +    vsp = pci_upstream_bridge(pdev);
>> +    if (!vsp)
>> +        return;
>> +
>> +    usp = pci_upstream_bridge(vsp);
>> +    if (!usp)
>> +        return;
>> +
>> +    aer_cap = usp->aer_cap;
>> +
>> +    if (!aer_cap)
>> +        return;
>> +
>> +    /*
>> +     * All errors are steered to USP which is a PCIe AER Complaint 
>> device.
>> +     * Downgrade all the errors to non-fatal to prevent PCIe bus driver
>> +     * from triggering a Secondary Bus Reset (SBR). This allows error
>> +     * detection, containment and recovery in the driver.
>> +     *
>> +     * The Uncorrectable Error Severity Register has the 'Uncorrectable
>> +     * Internal Error Severity' set to fatal by default. Set this to
>> +     * non-fatal and unmask the error.
>> +     */
>> +
>> +    /* Initialize Uncorrectable Error Severity Register */
>> +    pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, 
>> &aer_uncorr_sev);
>> +    aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
>> +    pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, 
>> aer_uncorr_sev);
>> +
>> +    /* Initialize Uncorrectable Error Mask Register */
>> +    pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, 
>> &aer_uncorr_mask);
>> +    aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
>> +    pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, 
>> aer_uncorr_mask);
> 
> Handle pci_read/write_config_dword() failure (return non-zero) scenario 
> for both sev and mask.
> 

The reason this would fail would be if pci device is disconnected or 
there is something wrong with the device which will cause failures 
across the driver.

We cannot deregister the error handlers even if there is a failure.
I can only add a log.

Searched this and didn't find many usages of handling.

Thanks
Riana

> Impact: Silent failure leads to incorrect assumption about error 
> handling conf. Driver may expect
> 
> non-fatal errors but still get fatal errors causing unexpected resets.
> 
> 
> Thanks,
> 
> -/Mallesh
> 
>> +
>> +    pci_save_state(usp);
>> +}
>> +#endif
>> +
>> +/**
>> + * xe_ras_init - Initialize Xe RAS
>> + * @xe: xe device instance
>> + *
>> + * Initialize Xe RAS
>> + */
>> +void xe_ras_init(struct xe_device *xe)
>> +{
>> +    if (!xe->info.has_sysctrl)
>> +        return;
>> +
>> +#ifdef CONFIG_PCIEAER
>> +    unmask_and_downgrade_internal_error(xe);
>> +#endif
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> new file mode 100644
>> index 000000000000..14cb973603e7
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_RAS_H_
>> +#define _XE_RAS_H_
>> +
>> +struct xe_device;
>> +
>> +void xe_ras_init(struct xe_device *xe);
>> +
>> +#endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-02-17 14:02   ` Raag Jadav
@ 2026-02-23 14:10     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-23 14:10 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi



On 2/17/2026 7:32 PM, Raag Jadav wrote:
> On Thu, Jan 22, 2026 at 03:36:19PM +0530, Riana Tauro wrote:
>> Uncorrectable Core-Compute errors are classified into Global and Local
>> errors.
>>
>> Global error is an error that affects the entire device requiring a
>> reset. This type of error is not isolated. When an AER is reported and
>> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>>
>> A Local error is confined to a specific component or context like a
>> engine. These errors can be contained and recovered by resetting
>> only the affected part without distrupting the rest of the device.
>>
>> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
>> generated and GuC is notified of the error. The KMD then sets
>> the context as non-runnable and initiates an engine reset.
>> (TODO: GuC <->KMD communication for the error).
>> Since the error is contained and recovered, PCI error handling
>> callback returns PCI_ERS_RESULT_RECOVERED.
> 
> ...
> 
>> +/**
>> + * xe_ras_process_errors - Process and contain hardware errors
>> + * @xe: xe device instance
>> + *
>> + * Get error details from system controller and return recovery
>> + * method. Called only from PCI error handling.
>> + *
>> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery needed,
>> + * PCI_ERS_RESULT_NEED_RESET otherwise.
> 
> PCI error codes are unrelated to xe_ras. IMO let's use standard error
> codes here and translate them to PCI ones in the callbacks.

Yeah agreed. I had originally implemented RAS specific enums but then 
removed it since it was same as PCI enums.

We can't use normal error codes here and map. We could have XE RAS 
specific enum and then map in callbacks

Thanks
Riana

> 
> Raag
> 
>> + */


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-01-22 10:06 ` [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
@ 2026-02-23 14:19   ` Mallesh, Koujalagi
  2026-02-23 14:30     ` Riana Tauro
  0 siblings, 1 reply; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-02-23 14:19 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri


On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Add the sysctrl commands and response structures for Uncorrectable
> Core Compute errors.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ras.c                   |  53 +++++++
>   drivers/gpu/drm/xe/xe_ras_types.h             | 131 ++++++++++++++++++
>   drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  13 ++
>   3 files changed, 197 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index ba5ed37aed28..ace08d8d8d46 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -4,9 +4,62 @@
>    */
>   #include <linux/pci.h>
>   
> +#include "xe_assert.h"
>   #include "xe_device_types.h"
>   #include "xe_ras.h"
>   
> +/* Severity classification of detected errors */
> +enum xe_ras_severity {
> +	XE_RAS_SEVERITY_NOT_SUPPORTED = 0,
> +	XE_RAS_SEVERITY_CORRECTABLE,
> +	XE_RAS_SEVERITY_UNCORRECTABLE,
> +	XE_RAS_SEVERITY_INFORMATIONAL,
> +	XE_RAS_SEVERITY_MAX
> +};
> +
> +/* major IP blocks where errors can originate */
> +enum xe_ras_component {
> +	XE_RAS_COMPONENT_NOT_SUPPORTED = 0,
> +	XE_RAS_COMPONENT_DEVICE_MEMORY,
> +	XE_RAS_COMPONENT_CORE_COMPUTE,
> +	XE_RAS_COMPONENT_RESERVED,
> +	XE_RAS_COMPONENT_PCIE,
> +	XE_RAS_COMPONENT_FABRIC,
> +	XE_RAS_COMPONENT_SOC,
> +	XE_RAS_COMPONENT_MAX
> +};
> +
> +static const char * const xe_ras_severities[] = {
> +	[XE_RAS_SEVERITY_NOT_SUPPORTED]		= "Not Supported",
> +	[XE_RAS_SEVERITY_CORRECTABLE]		= "Correctable",
> +	[XE_RAS_SEVERITY_UNCORRECTABLE]		= "Uncorrectable",
> +	[XE_RAS_SEVERITY_INFORMATIONAL]		= "Informational",
> +};
> +
> +static const char * const xe_ras_components[] = {
> +	[XE_RAS_COMPONENT_NOT_SUPPORTED]	= "Not Supported",
> +	[XE_RAS_COMPONENT_DEVICE_MEMORY]	= "Device Memory",
> +	[XE_RAS_COMPONENT_CORE_COMPUTE]		= "Core Compute",
> +	[XE_RAS_COMPONENT_RESERVED]		= "Reserved",
> +	[XE_RAS_COMPONENT_PCIE]			= "PCIe",
> +	[XE_RAS_COMPONENT_FABRIC]		= "Fabric",
> +	[XE_RAS_COMPONENT_SOC]			= "SoC",
> +};
> +
> +static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
> +{
> +	xe_assert(xe, severity < XE_RAS_SEVERITY_MAX);
> +
> +	return xe_ras_severities[severity];
> +}
> +
> +static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
> +{
> +	xe_assert(xe, comp < XE_RAS_COMPONENT_MAX);
> +
> +	return xe_ras_components[comp];
> +}
> +
>   #ifdef CONFIG_PCIEAER
>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>   {
> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> new file mode 100644
> index 000000000000..c7a930c16f68
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
> @@ -0,0 +1,131 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _XE_RAS_TYPES_H_
> +#define _XE_RAS_TYPES_H_
> +
> +#include <linux/types.h>
> +
> +#define XE_RAS_MAX_ERROR_DETAILS	16
> +
> +/**
> + * struct xe_ras_error_common - Common RAS error class
> + *
> + * This structure contains error severity and component information
> + * across all products
> + */
> +struct xe_ras_error_common {
> +	/** @severity: Error Severity */
> +	u8 severity;
> +	/** @component: IP where the error originated */
> +	u8 component;
> +} __packed;
> +
> +/**
> + * struct xe_ras_error_unit - Error unit information
> + */
> +struct xe_ras_error_unit {
> +	/** @tile: Tile identifier */
> +	u8 tile;
> +	/** @instance: Instance identifier within a component */
> +	u32 instance;
> +} __packed;
Performance penalty for accessing unaligned u32.
> +
> +/**
> + * struct xe_ras_error_cause - Error cause information
> + */
> +struct xe_ras_error_cause {
> +	/** @cause: Cause */
> +	u32 cause;
> +	/** @reserved: For future use */
> +	u8 reserved;
> +} __packed;
> +
> +/**
> + * struct xe_ras_error_product - Error fields that are specific to the product
> + */
> +struct xe_ras_error_product {
> +	/** @unit: Unit within IP block */
> +	struct xe_ras_error_unit unit;
> +	/** @error_cause: Cause/checker */
> +	struct xe_ras_error_cause error_cause;
> +} __packed;
> +
> +/**
> + * struct xe_ras_error_class - Complete RAS Error Class
> + *
> + * This structure provides the complete error classification by combining
> + * the common error class with the product-specific error class.
> + */
> +struct xe_ras_error_class {
> +	/** @common: Common error severity and component */
> +	struct xe_ras_error_common common;
> +	/** @product: Product-specific unit and cause */
> +	struct xe_ras_error_product product;
> +} __packed;
> +
> +/**
> + * struct xe_ras_error_array - Details of the error types
> + */
> +struct xe_ras_error_array {
> +	/** @error_class: Error class */
> +	struct xe_ras_error_class error_class;
> +	/** @timestamp: Timestamp */
> +	u64 timestamp;
> +	/** @error_details: Error details specific to the class */
> +	u32 error_details[XE_RAS_MAX_ERROR_DETAILS];
> +} __packed;
> +
> +/**
> + * struct xe_ras_get_error_response - Response for XE_SYSCTRL_GET_SOC_ERROR
> + */
> +struct xe_ras_get_error_response {
> +	/** @num_errors: No of errors reported in this response */
> +	u8 num_errors;
> +	/** @additional_errors: Indicates if the errors are pending */
> +	u8 additional_errors;
> +	/** @error_arr: Array of up to 3 errors */
> +	struct xe_ras_error_array error_arr[3];

Use a Macro for a magic number 3.

Thanks

-/Mallesh

> +} __packed;
> +
> +/**
> + * struct xe_ras_compute_error: Error details of Compute error
> + */
> +struct xe_ras_compute_error {
> +	/** @error_log_header: Error Source and type */
> +	u32 error_log_header;
> +	/** @internal_error_log: Internal Error log */
> +	u32 internal_error_log;
> +	/** @fabric_log: Fabric Error log */
> +	u32 fabric_log;
> +	/** @internal_error_addr_log0: Internal Error addr log */
> +	u32 internal_error_addr_log0;
> +	/** @internal_error_addr_log1: Internal Error addr log */
> +	u32 internal_error_addr_log1;
> +	/** @packet_log0: Packet log */
> +	u32 packet_log0;
> +	/** @packet_log1: Packet log */
> +	u32 packet_log1;
> +	/** @packet_log2: Packet log */
> +	u32 packet_log2;
> +	/** @packet_log3: Packet log */
> +	u32 packet_log3;
> +	/** @packet_log4: Packet log */
> +	u32 packet_log4;
> +	/** @misc_log0: Misc log */
> +	u32 misc_log0;
> +	/** @misc_log1: Misc log */
> +	u32 misc_log1;
> +	/** @spare_log0: Spare log */
> +	u32 spare_log0;
> +	/** @spare_log1: Spare log */
> +	u32 spare_log1;
> +	/** @spare_log2: Spare log */
> +	u32 spare_log2;
> +	/** @spare_log3: Spare log */
> +	u32 spare_log3;
> +} __packed;
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> index 1f315ad1b996..45ef10f5cfa2 100644
> --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> @@ -8,6 +8,19 @@
>   
>   #include <linux/types.h>
>   
> +/**
> + * enum xe_sysctrl_mailbox_command_id - RAS Command ID's for GFSP group
> + *
> + * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Get basic error information
> + */
> +enum xe_sysctrl_mailbox_command_id {
> +	XE_SYSCTRL_CMD_GET_SOC_ERROR = 1
> +};
> +
> +enum xe_sysctrl_group {
> +	XE_SYSCTRL_GROUP_GFSP = 1
> +};
> +
>   struct xe_sysctrl_mailbox_mkhi_msg_hdr {
>   	__le32 data;
>   } __packed;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-02-23 14:19   ` Mallesh, Koujalagi
@ 2026-02-23 14:30     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-23 14:30 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri



On 2/23/2026 7:49 PM, Mallesh, Koujalagi wrote:
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Add the sysctrl commands and response structures for Uncorrectable
>> Core Compute errors.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ras.c                   |  53 +++++++
>>   drivers/gpu/drm/xe/xe_ras_types.h             | 131 ++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  13 ++
>>   3 files changed, 197 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index ba5ed37aed28..ace08d8d8d46 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -4,9 +4,62 @@
>>    */
>>   #include <linux/pci.h>
>> +#include "xe_assert.h"
>>   #include "xe_device_types.h"
>>   #include "xe_ras.h"
>> +/* Severity classification of detected errors */
>> +enum xe_ras_severity {
>> +    XE_RAS_SEVERITY_NOT_SUPPORTED = 0,
>> +    XE_RAS_SEVERITY_CORRECTABLE,
>> +    XE_RAS_SEVERITY_UNCORRECTABLE,
>> +    XE_RAS_SEVERITY_INFORMATIONAL,
>> +    XE_RAS_SEVERITY_MAX
>> +};
>> +
>> +/* major IP blocks where errors can originate */
>> +enum xe_ras_component {
>> +    XE_RAS_COMPONENT_NOT_SUPPORTED = 0,
>> +    XE_RAS_COMPONENT_DEVICE_MEMORY,
>> +    XE_RAS_COMPONENT_CORE_COMPUTE,
>> +    XE_RAS_COMPONENT_RESERVED,
>> +    XE_RAS_COMPONENT_PCIE,
>> +    XE_RAS_COMPONENT_FABRIC,
>> +    XE_RAS_COMPONENT_SOC,
>> +    XE_RAS_COMPONENT_MAX
>> +};
>> +
>> +static const char * const xe_ras_severities[] = {
>> +    [XE_RAS_SEVERITY_NOT_SUPPORTED]        = "Not Supported",
>> +    [XE_RAS_SEVERITY_CORRECTABLE]        = "Correctable",
>> +    [XE_RAS_SEVERITY_UNCORRECTABLE]        = "Uncorrectable",
>> +    [XE_RAS_SEVERITY_INFORMATIONAL]        = "Informational",
>> +};
>> +
>> +static const char * const xe_ras_components[] = {
>> +    [XE_RAS_COMPONENT_NOT_SUPPORTED]    = "Not Supported",
>> +    [XE_RAS_COMPONENT_DEVICE_MEMORY]    = "Device Memory",
>> +    [XE_RAS_COMPONENT_CORE_COMPUTE]        = "Core Compute",
>> +    [XE_RAS_COMPONENT_RESERVED]        = "Reserved",
>> +    [XE_RAS_COMPONENT_PCIE]            = "PCIe",
>> +    [XE_RAS_COMPONENT_FABRIC]        = "Fabric",
>> +    [XE_RAS_COMPONENT_SOC]            = "SoC",
>> +};
>> +
>> +static inline const char *severity_to_str(struct xe_device *xe, u32 
>> severity)
>> +{
>> +    xe_assert(xe, severity < XE_RAS_SEVERITY_MAX);
>> +
>> +    return xe_ras_severities[severity];
>> +}
>> +
>> +static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
>> +{
>> +    xe_assert(xe, comp < XE_RAS_COMPONENT_MAX);
>> +
>> +    return xe_ras_components[comp];
>> +}
>> +
>>   #ifdef CONFIG_PCIEAER
>>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>>   {
>> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/ 
>> xe_ras_types.h
>> new file mode 100644
>> index 000000000000..c7a930c16f68
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
>> @@ -0,0 +1,131 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_RAS_TYPES_H_
>> +#define _XE_RAS_TYPES_H_
>> +
>> +#include <linux/types.h>
>> +
>> +#define XE_RAS_MAX_ERROR_DETAILS    16
>> +
>> +/**
>> + * struct xe_ras_error_common - Common RAS error class
>> + *
>> + * This structure contains error severity and component information
>> + * across all products
>> + */
>> +struct xe_ras_error_common {
>> +    /** @severity: Error Severity */
>> +    u8 severity;
>> +    /** @component: IP where the error originated */
>> +    u8 component;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_error_unit - Error unit information
>> + */
>> +struct xe_ras_error_unit {
>> +    /** @tile: Tile identifier */
>> +    u8 tile;
>> +    /** @instance: Instance identifier within a component */
>> +    u32 instance;
>> +} __packed;
> Performance penalty for accessing unaligned u32.

These are response structures from firmware.
will check

>> +
>> +/**
>> + * struct xe_ras_error_cause - Error cause information
>> + */
>> +struct xe_ras_error_cause {
>> +    /** @cause: Cause */
>> +    u32 cause;
>> +    /** @reserved: For future use */
>> +    u8 reserved;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_error_product - Error fields that are specific to 
>> the product
>> + */
>> +struct xe_ras_error_product {
>> +    /** @unit: Unit within IP block */
>> +    struct xe_ras_error_unit unit;
>> +    /** @error_cause: Cause/checker */
>> +    struct xe_ras_error_cause error_cause;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_error_class - Complete RAS Error Class
>> + *
>> + * This structure provides the complete error classification by 
>> combining
>> + * the common error class with the product-specific error class.
>> + */
>> +struct xe_ras_error_class {
>> +    /** @common: Common error severity and component */
>> +    struct xe_ras_error_common common;
>> +    /** @product: Product-specific unit and cause */
>> +    struct xe_ras_error_product product;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_error_array - Details of the error types
>> + */
>> +struct xe_ras_error_array {
>> +    /** @error_class: Error class */
>> +    struct xe_ras_error_class error_class;
>> +    /** @timestamp: Timestamp */
>> +    u64 timestamp;
>> +    /** @error_details: Error details specific to the class */
>> +    u32 error_details[XE_RAS_MAX_ERROR_DETAILS];
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_get_error_response - Response for 
>> XE_SYSCTRL_GET_SOC_ERROR
>> + */
>> +struct xe_ras_get_error_response {
>> +    /** @num_errors: No of errors reported in this response */
>> +    u8 num_errors;
>> +    /** @additional_errors: Indicates if the errors are pending */
>> +    u8 additional_errors;
>> +    /** @error_arr: Array of up to 3 errors */
>> +    struct xe_ras_error_array error_arr[3];
> 
> Use a Macro for a magic number 3.

Sure. Will fix

Thanks
Riana


> 
> Thanks
> 
> -/Mallesh
> 
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_compute_error: Error details of Compute error
>> + */
>> +struct xe_ras_compute_error {
>> +    /** @error_log_header: Error Source and type */
>> +    u32 error_log_header;
>> +    /** @internal_error_log: Internal Error log */
>> +    u32 internal_error_log;
>> +    /** @fabric_log: Fabric Error log */
>> +    u32 fabric_log;
>> +    /** @internal_error_addr_log0: Internal Error addr log */
>> +    u32 internal_error_addr_log0;
>> +    /** @internal_error_addr_log1: Internal Error addr log */
>> +    u32 internal_error_addr_log1;
>> +    /** @packet_log0: Packet log */
>> +    u32 packet_log0;
>> +    /** @packet_log1: Packet log */
>> +    u32 packet_log1;
>> +    /** @packet_log2: Packet log */
>> +    u32 packet_log2;
>> +    /** @packet_log3: Packet log */
>> +    u32 packet_log3;
>> +    /** @packet_log4: Packet log */
>> +    u32 packet_log4;
>> +    /** @misc_log0: Misc log */
>> +    u32 misc_log0;
>> +    /** @misc_log1: Misc log */
>> +    u32 misc_log1;
>> +    /** @spare_log0: Spare log */
>> +    u32 spare_log0;
>> +    /** @spare_log1: Spare log */
>> +    u32 spare_log1;
>> +    /** @spare_log2: Spare log */
>> +    u32 spare_log2;
>> +    /** @spare_log3: Spare log */
>> +    u32 spare_log3;
>> +} __packed;
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/ 
>> gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> index 1f315ad1b996..45ef10f5cfa2 100644
>> --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> @@ -8,6 +8,19 @@
>>   #include <linux/types.h>
>> +/**
>> + * enum xe_sysctrl_mailbox_command_id - RAS Command ID's for GFSP group
>> + *
>> + * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Get basic error information
>> + */
>> +enum xe_sysctrl_mailbox_command_id {
>> +    XE_SYSCTRL_CMD_GET_SOC_ERROR = 1
>> +};
>> +
>> +enum xe_sysctrl_group {
>> +    XE_SYSCTRL_GROUP_GFSP = 1
>> +};
>> +
>>   struct xe_sysctrl_mailbox_mkhi_msg_hdr {
>>       __le32 data;
>>   } __packed;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-02-08  8:02   ` Raag Jadav
@ 2026-02-24  3:23     ` Riana Tauro
  2026-02-24  5:33       ` Raag Jadav
  0 siblings, 1 reply; 41+ messages in thread
From: Riana Tauro @ 2026-02-24  3:23 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi



On 2/8/2026 1:32 PM, Raag Jadav wrote:
> On Thu, Jan 22, 2026 at 03:36:14PM +0530, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
> 
> ...
> 
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +	xe_device_set_in_recovery(xe);
>> +	xe_device_declare_wedged(xe);
> 
> Is this the correct usage?
> 
> Documentation/gpu/drm-uapi.rst +392
> 
> "A 'wedged' device is basically a device that is declared dead by the driver
> after exhausting all possible attempts to recover it from driver context."

Can't this be used?

"The only exception to this is WEDGED=none, which signifies that the 
device was temporarily ‘wedged’ at some point but was recovered from 
driver context using device specific methods like reset. No explicit 
recovery is expected from the consumer in this case, but it can still 
take additional steps like gathering telemetry information (devcoredump, 
syslog). "

If not will replace it with the gt_wedged function and block ioctls 
using recovery flag

Thanks
Riana



> 
> Raag
> 
>> +	pci_disable_device(pdev);
>> +}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-02-16  8:53   ` Mallesh, Koujalagi
@ 2026-02-24  3:26     ` Riana Tauro
  0 siblings, 0 replies; 41+ messages in thread
From: Riana Tauro @ 2026-02-24  3:26 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri



On 2/16/2026 2:23 PM, Mallesh, Koujalagi wrote:
> Hi Riana,
> 
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>>
>> Signed-off-by: Riana Tauro<riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 85 ++++++++++++++++++++++++++++
>>   5 files changed, 107 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index f6650ec3ab42..5581f2180b5c 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>>       xe_page_reclaim.o \
>>       xe_pat.o \
>>       xe_pci.o \
>> +    xe_pci_error.o \
>>       xe_pci_rebar.o \
>>       xe_pcode.o \
>>       xe_pm.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/ 
>> xe_device.h
>> index 58d7d8b2fea3..81480248eeff 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -43,6 +43,21 @@ static inline struct xe_device 
>> *ttm_to_xe_device(struct ttm_device *ttm)
>>       return container_of(ttm, struct xe_device, ttm);
>>   }
>> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>> +{
>> +    return atomic_read(&xe->in_recovery);
>> +}
>> +
>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>> +{
>> +    atomic_set(&xe->in_recovery, 1);
>> +}
>> +
>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>> +{
>> +     atomic_set(&xe->in_recovery, 0);
>> +}
>> +
>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>                      const struct pci_device_id *ent);
>>   int xe_device_probe_early(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/ 
>> xe/xe_device_types.h
>> index 944f909a86ad..2d140463dc5e 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -669,6 +669,9 @@ struct xe_device {
>>           bool inconsistent_reset;
>>       } wedged;
>> +    /** @in_recovery: Indicates if device is in recovery */
>> +    atomic_t in_recovery;
>> +
>>       /** @bo_device: Struct to control async free of BOs */
>>       struct xe_bo_dev {
>>           /** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index c92cc176f669..e1ee393b7461 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -1255,6 +1255,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>   };
>>   #endif
>> +extern const struct pci_error_handlers xe_pci_error_handlers;
>> +
>>   static struct pci_driver xe_pci_driver = {
>>       .name = DRIVER_NAME,
>>       .id_table = pciidlist,
>> @@ -1262,6 +1264,7 @@ static struct pci_driver xe_pci_driver = {
>>       .remove = xe_pci_remove,
>>       .shutdown = xe_pci_shutdown,
>>       .sriov_configure = xe_pci_sriov_configure,
>> +    .err_handler = &xe_pci_error_handlers,
>>   #ifdef CONFIG_PM_SLEEP
>>       .driver.pm = &xe_pm_ops,
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/ 
>> xe_pci_error.c
>> new file mode 100644
>> index 000000000000..a3cc01afa179
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -0,0 +1,85 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <drm/drm_drv.h>
>> +#include <linux/pci.h>
>> +
>> +#include "xe_device.h"
>> +#include "xe_gt.h"
>> +#include "xe_pci.h"
>> +#include "xe_uc.h"
>> +
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    xe_device_set_in_recovery(xe);
>> +    xe_device_declare_wedged(xe);
>> +
>> +    pci_disable_device(pdev);
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, 
>> pci_channel_state_t state)
>> +{
>> +    dev_err(&pdev->dev, "PCI error detected, state %d\n", state);
>> +
> We need to set recovery flag here right?

Yeah we have to set this for all errors. Will fix this

>> +    switch (state) {
>> +    case pci_channel_io_normal:
>> +        return PCI_ERS_RESULT_CAN_RECOVER;
>> +    case pci_channel_io_frozen:
>> +        xe_pci_error_handling(pdev);
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +    case pci_channel_io_perm_failure:
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
> Please make default case where we see "Unknown channel state" as dev_err 
> and return PCI_ERS_RESULT_NEED_RESET.

Sure

>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>> +{
>> +    dev_err(&pdev->dev, "PCI mmio enabled\n");
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>> +{
>> +    const struct pci_device_id *ent = pci_match_id(pdev->driver- 
>> >id_table, pdev);
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
> Check xe is null or not? if null then return PCI_ERS_RESULT_DISCONNECT.
>> +    dev_err(&pdev->dev, "PCI slot reset\n");
>> +
>> +    pci_restore_state(pdev);
>> +
>> +    if (pci_enable_device(pdev)) {
>> +        dev_err(&pdev->dev,
>> +            "Cannot re-enable PCI device after reset\n");
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    /*
>> +     * Secondary Bus Reset wipes out all device memory
>> +     * requiring XE KMD to perform a device removal and reprobe.
>> +     */
>> +    pdev->driver->remove(pdev);
>> +    xe_device_clear_in_recovery(xe);
>> +
>> +    if (!pdev->driver->probe(pdev, ent))
>> +        return PCI_ERS_RESULT_RECOVERED;
>> +
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +}
>> +
>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>> +{
>> +    dev_info(&pdev->dev, "PCI error resume\n");
> 
> We need to clear recovery flag (if not already cleared), for normal 
> operations.

Yeah if i add the in_recovery for all errors and we don't trigger a 
reset. We need to clear it here.

Will add this

Thanks
Riana


> 
> Thanks
> 
> -/Mallesh
> 
>> +}
>> +
>> +const struct pci_error_handlers xe_pci_error_handlers = {
>> +    .error_detected    = xe_pci_error_detected,
>> +    .mmio_enabled    = xe_pci_error_mmio_enabled,
>> +    .slot_reset    = xe_pci_error_slot_reset,
>> +    .resume        = xe_pci_error_resume,
>> +};


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-02-24  3:23     ` Riana Tauro
@ 2026-02-24  5:33       ` Raag Jadav
  0 siblings, 0 replies; 41+ messages in thread
From: Raag Jadav @ 2026-02-24  5:33 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Tue, Feb 24, 2026 at 08:53:06AM +0530, Riana Tauro wrote:
> On 2/8/2026 1:32 PM, Raag Jadav wrote:
> > On Thu, Jan 22, 2026 at 03:36:14PM +0530, Riana Tauro wrote:
> > > Add error_detected, mmio_enabled, slot_reset and resume
> > > recovery callbacks to handle PCIe Advanced Error Reporting
> > > (AER) errors.
> > > 
> > > For fatal errors, the device is wedged and becomes
> > > inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> > > error_detected to request a Secondary Bus Reset (SBR).
> > > 
> > > For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> > > error_detected to trigger the mmio_enabled callback. In this callback,
> > > the device is queried to determine the error cause and attempt
> > > recovery based on the error type.
> > > 
> > > Once the secondary bus reset(SBR) is completed the slot_reset callback
> > > cleanly removes and reprobe the device to restore functionality.
> > 
> > ...
> > 
> > > +static void xe_pci_error_handling(struct pci_dev *pdev)
> > > +{
> > > +	struct xe_device *xe = pdev_to_xe_device(pdev);
> > > +
> > > +	xe_device_set_in_recovery(xe);
> > > +	xe_device_declare_wedged(xe);
> > 
> > Is this the correct usage?
> > 
> > Documentation/gpu/drm-uapi.rst +392
> > 
> > "A 'wedged' device is basically a device that is declared dead by the driver
> > after exhausting all possible attempts to recover it from driver context."
> 
> Can't this be used?
> 
> "The only exception to this is WEDGED=none, which signifies that the device
> was temporarily ‘wedged’ at some point but was recovered from driver context
> using device specific methods like reset. No explicit recovery is expected
> from the consumer in this case, but it can still take additional steps like
> gathering telemetry information (devcoredump, syslog). "
> 
> If not will replace it with the gt_wedged function and block ioctls using
> recovery flag

You can block ioctls by setting wedged.flag to prevent userspace access
while the PCI core performs bus reset, but the event itself depends on the
result of it.

If it succeeds then WEDGED=none would be more appropriate, as the user will
need to reload context and recreate buffers without recovering the device
(similar to AMD usecase). If it fails then we're officially 'wedged' and
tell the user to recover the device with explicit method.

Raag

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-01-27 14:03   ` Mallesh, Koujalagi
  2026-02-02  8:54     ` Riana Tauro
@ 2026-02-24 12:17     ` Mallesh, Koujalagi
  1 sibling, 0 replies; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-02-24 12:17 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri

+ Adding more comments

On 27-01-2026 07:33 pm, Mallesh, Koujalagi wrote:
>
> On 22-01-2026 03:36 pm, Riana Tauro wrote:
>> Uncorrectable Core-Compute errors are classified into Global and Local
>> errors.
>>
>> Global error is an error that affects the entire device requiring a
>> reset. This type of error is not isolated. When an AER is reported and
>> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>>
>> A Local error is confined to a specific component or context like a
>> engine. These errors can be contained and recovered by resetting
>> only the affected part without distrupting the rest of the device.
>>
>> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
>> generated and GuC is notified of the error. The KMD then sets
>> the context as non-runnable and initiates an engine reset.
>> (TODO: GuC <->KMD communication for the error).
>> Since the error is contained and recovered, PCI error handling
>> callback returns PCI_ERS_RESULT_RECOVERED.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ras.c | 109 +++++++++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/xe/xe_ras.h |   3 +
>>   2 files changed, 110 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index ace08d8d8d46..2a98cb116dc7 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -2,11 +2,16 @@
>>   /*
>>    * Copyright © 2026 Intel Corporation
>>    */
>> -#include <linux/pci.h>
>> -
>>   #include "xe_assert.h"
>>   #include "xe_device_types.h"
>> +#include "xe_printk.h"
>>   #include "xe_ras.h"
>> +#include "xe_ras_types.h"
>> +#include "xe_sysctrl_mailbox.h"
>> +#include "xe_sysctrl_mailbox_types.h"
>> +
>> +#define COMPUTE_ERROR_SEVERITY_MASK        GENMASK(26, 25)
>> +#define GLOBAL_UNCORR_ERROR            2
Need  to handle Local uncorrectable error (affect specific component)
>>     /* Severity classification of detected errors */
>>   enum xe_ras_severity {
>> @@ -60,6 +65,106 @@ static inline const char *comp_to_str(struct 
>> xe_device *xe, u32 comp)
>>       return xe_ras_components[comp];
>>   }
>>   +static void log_ras_error(struct xe_device *xe, struct 
>> xe_ras_error_class *error_class)
>> +{
>> +    struct xe_ras_error_common common_info = error_class->common;
>> +    struct xe_ras_error_product product_info = error_class->product;
>> +    u8 tile = product_info.unit.tile;
>> +    u32 instance = product_info.unit.instance;
>> +    u32 cause = product_info.error_cause.cause;
>> +
>> +    xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected 
>> Cause: 0x%x",
>> +           tile, instance, severity_to_str(xe, common_info.severity),
>> +           comp_to_str(xe, common_info.component), cause);
>
> Please fix formatting issue (Tile %u, new line at the end of message) 
> and include timestamp in log message.
>
>> +}
>> +
>> +static pci_ers_result_t handle_compute_errors(struct xe_device *xe, 
>> struct xe_ras_error_array *arr)
>> +{
>> +    struct xe_ras_compute_error *error_info = (struct 
>> xe_ras_compute_error *)arr->error_details;
>> +    u8 uncorr_type;
>> +
>> +    uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, 
>> error_info->error_log_header);
>> +    log_ras_error(xe, &arr->error_class);
>> +
>> +    xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu 
>> Uncorrected error type %u\n",
>> +           arr->timestamp, uncorr_type);
>> +
>> +    /* Request a RESET if error is global */
>> +    if (uncorr_type == GLOBAL_UNCORR_ERROR)
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +
>> +    /* Local errors are recovered using a engine reset */
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +}
>> +
>> +/**
>> + * xe_ras_process_errors - Process and contain hardware errors
>> + * @xe: xe device instance
>> + *
>> + * Get error details from system controller and return recovery
>> + * method. Called only from PCI error handling.
>> + *
>> + * Returns: PCI_ERS_RESULT_RECOVERED if recovered or if no recovery 
>> needed,
>> + * PCI_ERS_RESULT_NEED_RESET otherwise.
>> + */
>> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe)
>> +{
>> +    struct xe_sysctrl_mailbox_command command = {0};
>> +    struct xe_sysctrl_mailbox_app_msg_hdr msg_hdr = {0};
>> +    struct xe_ras_get_error_response response;
>> +    u32 req_hdr;
>> +    size_t rlen;
>> +    int ret;
>> +
>> +    if (!xe->info.has_sysctrl)
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +
>> +    req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, 
>> XE_SYSCTRL_GROUP_GFSP) |
>> +          FIELD_PREP(APP_HDR_COMMAND_MASK, 
>> XE_SYSCTRL_CMD_GET_SOC_ERROR);
>> +
>> +    msg_hdr.data = req_hdr;
>> +    command.header = msg_hdr;
>> +    command.data_out = &response;
>> +    command.data_out_len = sizeof(response);
>> +
>> +    do {
>> +        memset(&response, 0, sizeof(response));
>> +        rlen = 0;
>> +
>> +        ret = xe_sysctrl_send_command(xe, &command, &rlen);
>> +        if (ret || !rlen) {
>> +            xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
>> +            goto err;
>> +        }
>> +
>> +        if (rlen != sizeof(response)) {
>> +            xe_err(xe, "[RAS]: Sysctrl response does not match 
>> len!!\n");
>> +            goto err;
>> +        }
>> +
Required bound check, since firmware returns more than 3, then there is 
overruns in error_arr[3]
>> +        for (int i = 0; i < response.num_errors; i++) {
>> +            struct xe_ras_error_array arr = response.error_arr[i];
>> +            struct xe_ras_error_class error_class;
>> +            u8 component;
>> +
>> +            error_class = arr.error_class;
>> +            component = error_class.common.component;
>> +
>> +            if (component == XE_RAS_COMPONENT_CORE_COMPUTE) {
>> +                ret = handle_compute_errors(xe, &arr);

Please fix return type as pci_ers_result_t

Thanks

-/Mallesh

>> +                if (ret == PCI_ERS_RESULT_NEED_RESET)
>> +                    goto err;
>> +            }
>
> Need to handle non-compute errors.
>
> Thanks
>
> -/Mallesh
>
>
>> +        }
>> +
>> +    } while (response.additional_errors);
>> +
>> +    return PCI_ERS_RESULT_RECOVERED;
>> +
>> +err:
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>>   #ifdef CONFIG_PCIEAER
>>   static void unmask_and_downgrade_internal_error(struct xe_device *xe)
>>   {
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> index 14cb973603e7..28400613c9a9 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.h
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -6,8 +6,11 @@
>>   #ifndef _XE_RAS_H_
>>   #define _XE_RAS_H_
>>   +#include <linux/pci.h>
>> +
>>   struct xe_device;
>>     void xe_ras_init(struct xe_device *xe);
>> +pci_ers_result_t xe_ras_process_errors(struct xe_device *xe);
>>     #endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled
  2026-01-22 10:06 ` [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
@ 2026-02-24 12:46   ` Mallesh, Koujalagi
  0 siblings, 0 replies; 41+ messages in thread
From: Mallesh, Koujalagi @ 2026-02-24 12:46 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri


On 22-01-2026 03:36 pm, Riana Tauro wrote:
> Query system controller when any non fatal error occurs to check
> the type of the error, contain and recover.
>
> The system controller is queried in the mmio_enabled callback.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pci_error.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> index 0960aa5861bc..2af6c1f45c44 100644
> --- a/drivers/gpu/drm/xe/xe_pci_error.c
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -8,6 +8,7 @@
>   #include "xe_device.h"
>   #include "xe_gt.h"
>   #include "xe_pci.h"
> +#include "xe_ras.h"
>   #include "xe_uc.h"
>   
>   static void xe_pci_error_handling(struct pci_dev *pdev)
> @@ -39,9 +40,15 @@ static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_
>   
>   static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>   {
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +	int ret;
> +
Type mismatch, ret should be pci_ers_result_t
>   	dev_err(&pdev->dev, "PCI mmio enabled\n");
mmio_enabled being called normal part of recovery, so we need to use 
dev_info() right.
> +	ret = xe_ras_process_errors(xe);

Need to handle local (PCI_ERS_RESULT_RECOVERED), global and unexpected 
errors  in switch.

Thanks

-/Mallesh

> +	if (ret == PCI_ERS_RESULT_NEED_RESET)
> +		xe_pci_error_handling(pdev);
>   
> -	return PCI_ERS_RESULT_NEED_RESET;
> +	return ret;
>   }
>   
>   static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-02-24 12:46 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
2026-01-22  9:42 ` ✗ CI.checkpatch: warning for " Patchwork
2026-01-22  9:43 ` ✓ CI.KUnit: success " Patchwork
2026-01-22 10:06 ` [PATCH 1/8] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
2026-01-27 22:49   ` Michal Wajdeczko
2026-02-02  9:45     ` Riana Tauro
2026-01-29  9:09   ` Nilawar, Badal
2026-02-02 13:19     ` Nilawar, Badal
2026-02-03  3:46       ` Riana Tauro
2026-02-03  3:41     ` Riana Tauro
2026-02-08  8:02   ` Raag Jadav
2026-02-24  3:23     ` Riana Tauro
2026-02-24  5:33       ` Raag Jadav
2026-02-16  8:53   ` Mallesh, Koujalagi
2026-02-24  3:26     ` Riana Tauro
2026-01-22 10:06 ` [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
2026-01-27 11:23   ` Mallesh, Koujalagi
2026-02-02  8:46     ` Riana Tauro
2026-01-22 10:06 ` [PATCH 4/8] drm/xe: Skip device access during PCI error recovery Riana Tauro
2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
2026-01-27 12:41   ` Mallesh, Koujalagi
2026-02-02  9:34     ` Riana Tauro
2026-02-04  8:38   ` Aravind Iddamsetty
2026-02-16 12:27   ` Mallesh, Koujalagi
2026-02-18 14:48     ` Riana Tauro
2026-01-22 10:06 ` [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
2026-02-23 14:19   ` Mallesh, Koujalagi
2026-02-23 14:30     ` Riana Tauro
2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
2026-01-27 11:44   ` Mallesh, Koujalagi
2026-02-02  8:38     ` Riana Tauro
2026-01-27 14:03   ` Mallesh, Koujalagi
2026-02-02  8:54     ` Riana Tauro
2026-02-24 12:17     ` Mallesh, Koujalagi
2026-02-17 14:02   ` Raag Jadav
2026-02-23 14:10     ` Riana Tauro
2026-01-22 10:06 ` [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
2026-02-24 12:46   ` Mallesh, Koujalagi
2026-01-22 10:21 ` ✓ Xe.CI.BAT: success for Introduce Xe Uncorrectable Error Handling Patchwork
2026-01-22 20:28 ` ✗ Xe.CI.Full: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox