From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: riana.tauro@intel.com, rodrigo.vivi@intel.com,
himal.prasad.ghimiray@intel.com, anshuman.gupta@intel.com
Subject: [PATCH 08/10] drm/xe: Handle MDFI error severity.
Date: Wed, 30 Jul 2025 11:18:12 +0530 [thread overview]
Message-ID: <20250730054814.1376770-9-aravind.iddamsetty@linux.intel.com> (raw)
In-Reply-To: <20250730054814.1376770-1-aravind.iddamsetty@linux.intel.com>
From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
NONFATAL and FATAL MDFI(T2T/T2C) errors are reported by same IEH
register and bits (Bit 4 and Bit 6 of 0x282280). To determine the
severity read local first error header log register (0x2822b0).
Value 0x00330000 ensures severity is fatal and 0x00310000 is for NONFATAL
errors. This register doesn't need explicit clearing, clearing MDFI
bit in IEH reg will clear this register too. Incase of nonfatal value
being reported by status register in fatal flow don't clean the MDFI IEH
bit and continue. Same needs to be addressed if value read by status
register is fatal in nonfatal flow.
v2
- Add commit message.
Cc: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
drivers/gpu/drm/xe/regs/xe_tile_error_regs.h | 9 +++++++++
drivers/gpu/drm/xe/xe_hw_error.c | 16 ++++++++++++++--
2 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/regs/xe_tile_error_regs.h b/drivers/gpu/drm/xe/regs/xe_tile_error_regs.h
index 31604138d511..77d397e650e5 100644
--- a/drivers/gpu/drm/xe/regs/xe_tile_error_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_tile_error_regs.h
@@ -38,6 +38,8 @@
#define SOC_LOCAL_ERR_STAT_MASTER_REG(base, x) XE_REG((x) > HARDWARE_ERROR_CORRECTABLE ? \
(base) + _SOC_LERRUNCSTS : \
(base) + _SOC_LERRCORSTS)
+#define MDFI_T2T 4
+#define MDFI_T2C 6
#define _DEV_ERR_STAT_NONFATAL 0x100178
@@ -50,6 +52,13 @@
#define XE_SOC_ERROR 16
#define SOC_PVC_BASE 0x282000
+
+#define LOCAL_FIRST_IEH_HEADER_LOG_REG XE_REG(0x2822b0)
+#define MDFI_SEVERITY_FATAL 0x00330000
+#define MDFI_SEVERITY_NONFATAL 0x00310000
+#define MDFI_SEVERITY(x) ((x) == HARDWARE_ERROR_FATAL ? \
+ MDFI_SEVERITY_FATAL : \
+ MDFI_SEVERITY_NONFATAL)
#define SOC_PVC_SLAVE_BASE 0x283000
#define PVC_GSC_HECI1_BASE 0x284000
diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
index 705a670f01fc..690b7df7ccba 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.c
+++ b/drivers/gpu/drm/xe/xe_hw_error.c
@@ -614,7 +614,7 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
{
unsigned long mst_glb_errstat, slv_glb_errstat, lcl_errstat;
struct hardware_errors_regs *err_regs;
- u32 errbit, base, slave_base;
+ u32 errbit, base, slave_base, ieh_header;
int i;
struct xe_gt *gt = tile->primary_gt;
@@ -682,9 +682,21 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
"Tile%d reported SOC_LOCAL_ERR_STAT_MASTER_REG_FATAL:0x%08lx\n",
tile->id, lcl_errstat);
- for_each_set_bit(errbit, &lcl_errstat, XE_RAS_REG_SIZE)
+ for_each_set_bit(errbit, &lcl_errstat, XE_RAS_REG_SIZE) {
+ if (errbit == MDFI_T2T || errbit == MDFI_T2C) {
+ ieh_header = xe_mmio_read32(>->tile->mmio, LOCAL_FIRST_IEH_HEADER_LOG_REG);
+ drm_info(&tile_to_xe(tile)->drm, HW_ERR "Tile%d LOCAL_FIRST_IEH_HEADER_LOG_REG:0x%08x\n",
+ tile->id, ieh_header);
+
+ if (ieh_header != MDFI_SEVERITY(hw_err)) {
+ lcl_errstat &= ~REG_BIT(errbit);
+ continue;
+ }
+ }
+
xe_soc_log_err_update_cntr(tile, hw_err, errbit,
err_regs->soc_mstr_lcl[hw_err]);
+ }
xe_mmio_write32(>->tile->mmio, SOC_LOCAL_ERR_STAT_MASTER_REG(base, hw_err), lcl_errstat);
}
--
2.25.1
next prev parent reply other threads:[~2025-07-30 5:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 5:48 [PATCH 00/10] Supporting RAS on XE Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 01/10] drm/xe: Handle errors from various components Aravind Iddamsetty
2025-07-30 9:08 ` Michal Wajdeczko
2025-07-30 19:59 ` Rodrigo Vivi
2025-07-30 5:48 ` [PATCH 02/10] drm/xe: Add new helpers to log hardware errrors Aravind Iddamsetty
2025-07-30 8:55 ` Michal Wajdeczko
2025-07-30 5:48 ` [PATCH 03/10] drm/xe: Log and count the GT hardware errors Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 04/10] drm/xe: Support GT hardware error reporting for PVC Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 05/10] drm/xe: Support GSC " Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 06/10] drm/xe: Support SOC FATAL error handling " Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 07/10] drm/xe: Support SOC NONFATAL " Aravind Iddamsetty
2025-07-30 5:48 ` Aravind Iddamsetty [this message]
2025-07-30 5:48 ` [PATCH 09/10] drm/xe: Clear SOC CORRECTABLE error registers Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 10/10] drm/xe: Clear all SoC errors post warm reset Aravind Iddamsetty
2025-07-30 5:57 ` ✗ CI.checkpatch: warning for Supporting RAS on XE Patchwork
2025-07-30 5:58 ` ✓ CI.KUnit: success " Patchwork
2025-07-30 6:59 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-07-30 8:03 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250730054814.1376770-9-aravind.iddamsetty@linux.intel.com \
--to=aravind.iddamsetty@linux.intel.com \
--cc=anshuman.gupta@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).