From: Raag Jadav <raag.jadav@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: matthew.brost@intel.com, rodrigo.vivi@intel.com,
riana.tauro@intel.com, michal.wajdeczko@intel.com,
matthew.d.roper@intel.com, umesh.nerlige.ramappa@intel.com,
mallesh.koujalagi@intel.com, Raag Jadav <raag.jadav@intel.com>
Subject: [PATCH v2 1/4] drm/xe/hw_error: Setup hardware error routing
Date: Thu, 2 Apr 2026 23:12:26 +0530 [thread overview]
Message-ID: <20260402174229.1062874-2-raag.jadav@intel.com> (raw)
In-Reply-To: <20260402174229.1062874-1-raag.jadav@intel.com>
Hardware errors are reported through System Controller on the platforms
that support it. This requires that they are routed as PCIe errors
instead of direct IRQ to SGUnit, but this routing prevents the user from
investigating those errors in case it is needed for debug cases.
Setup hardware error routing based on XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET
which bypasses the System Controller and allows the user to investigate
errors without System Controller involvement.
default with NO_RESET w/o NO_RESET
------- ------------- ------------
fatal PCIe IRQ PCIe
non-fatal IRQ IRQ PCIe
correctable IRQ IRQ PCIe
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---
v2: Explain the usecase (Matt Roper)
---
drivers/gpu/drm/xe/regs/xe_hw_error_regs.h | 5 +++++
drivers/gpu/drm/xe/xe_hw_error.c | 21 +++++++++++++++++++++
drivers/gpu/drm/xe/xe_hw_error.h | 1 +
3 files changed, 27 insertions(+)
diff --git a/drivers/gpu/drm/xe/regs/xe_hw_error_regs.h b/drivers/gpu/drm/xe/regs/xe_hw_error_regs.h
index 046e1756c698..247906ae2227 100644
--- a/drivers/gpu/drm/xe/regs/xe_hw_error_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_hw_error_regs.h
@@ -44,6 +44,11 @@
#define XE_SOC_ERROR 16
#define XE_GT_ERROR 0
+#define DEV_ERR_ROUTING_CTRL XE_REG(0x100170)
+#define FATAL_ERR_ROUTING BIT(2)
+#define NONFATAL_ERR_ROUTING BIT(1)
+#define CORR_ERR_ROUTING BIT(0)
+
#define ERR_STAT_GT_FATAL_VECTOR_0 0x100260
#define ERR_STAT_GT_FATAL_VECTOR_1 0x100264
diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
index 2a31b430570e..28ad5c15685e 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.c
+++ b/drivers/gpu/drm/xe/xe_hw_error.c
@@ -526,6 +526,26 @@ static int hw_error_info_init(struct xe_device *xe)
return xe_drm_ras_init(xe);
}
+/**
+ * xe_hw_error_irq_route - irq routing of hw errors
+ * @xe: xe device instance
+ *
+ * Set irq routing of hw errors.
+ */
+void xe_hw_error_irq_route(struct xe_device *xe)
+{
+ u32 mask = CORR_ERR_ROUTING | NONFATAL_ERR_ROUTING | FATAL_ERR_ROUTING;
+ bool irq = xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET;
+ struct xe_tile *tile;
+ u8 id;
+
+ if (!xe->info.has_sysctrl)
+ return;
+
+ for_each_tile(tile, xe, id)
+ xe_mmio_rmw32(&tile->mmio, DEV_ERR_ROUTING_CTRL, mask, irq ? mask : 0);
+}
+
/*
* Process hardware errors during boot
*/
@@ -563,5 +583,6 @@ void xe_hw_error_init(struct xe_device *xe)
if (ret)
drm_err(&xe->drm, "Failed to initialize XE DRM RAS (%pe)\n", ERR_PTR(ret));
+ xe_hw_error_irq_route(xe);
process_hw_errors(xe);
}
diff --git a/drivers/gpu/drm/xe/xe_hw_error.h b/drivers/gpu/drm/xe/xe_hw_error.h
index d86e28c5180c..8f155b66b5f1 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.h
+++ b/drivers/gpu/drm/xe/xe_hw_error.h
@@ -11,5 +11,6 @@ struct xe_tile;
struct xe_device;
void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 master_ctl);
+void xe_hw_error_irq_route(struct xe_device *xe);
void xe_hw_error_init(struct xe_device *xe);
#endif
--
2.43.0
next prev parent reply other threads:[~2026-04-02 17:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 17:42 [PATCH v2 0/4] Hardware Error handling for Crescent Island Raag Jadav
2026-04-02 17:42 ` Raag Jadav [this message]
2026-04-02 17:42 ` [PATCH v2 2/4] drm/xe/hw_error: Reuse CSC worker for generic error handling Raag Jadav
2026-04-02 17:42 ` [PATCH v2 3/4] drm/xe/hw_error: Allow debugging hardware errors Raag Jadav
2026-04-02 17:42 ` [PATCH v2 4/4] drm/xe/debugfs: Update hardware error routing with wedged.mode Raag Jadav
2026-04-02 17:53 ` ✓ CI.KUnit: success for Hardware Error handling for Crescent Island (rev2) Patchwork
2026-04-02 18:43 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-03 6:37 ` ✓ Xe.CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260402174229.1062874-2-raag.jadav@intel.com \
--to=raag.jadav@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=matthew.brost@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox