From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: riana.tauro@intel.com, rodrigo.vivi@intel.com,
himal.prasad.ghimiray@intel.com, anshuman.gupta@intel.com
Subject: [PATCH 10/10] drm/xe: Clear all SoC errors post warm reset.
Date: Wed, 30 Jul 2025 11:18:14 +0530 [thread overview]
Message-ID: <20250730054814.1376770-11-aravind.iddamsetty@linux.intel.com> (raw)
In-Reply-To: <20250730054814.1376770-1-aravind.iddamsetty@linux.intel.com>
From: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
There are scenarios where there are errors being reported from the SoC
uncore to IEH and not propagated to SG unit. Since these errors are not
propagated to SG unit, driver won't be able to clean them as part of
xe_process_hw_error. Hence clear all SoC register post xe_process_hw_error
during the driver load.
v2
- Fix commit message.
v3
- Limit check to PVC.
v4
- Fix check
Cc: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
---
drivers/gpu/drm/xe/xe_hw_error.c | 41 ++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
index a77779eb6ce8..6a7cd59caac1 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.c
+++ b/drivers/gpu/drm/xe/xe_hw_error.c
@@ -510,6 +510,46 @@ xe_gt_hw_error_log_vector_reg(struct xe_gt *gt, const enum hardware_error hw_err
}
}
+static void xe_clear_all_soc_errors(struct xe_device *xe)
+{
+ enum hardware_error hw_err;
+ u32 base, slave_base;
+ struct xe_tile *tile;
+ struct xe_gt *gt;
+ unsigned int i;
+
+ if (xe->info.platform != XE_PVC)
+ return;
+
+ base = SOC_PVC_BASE;
+ slave_base = SOC_PVC_SLAVE_BASE;
+
+ hw_err = HARDWARE_ERROR_CORRECTABLE;
+
+ for_each_tile(tile, xe, i) {
+ gt = tile->primary_gt;
+
+ while (hw_err < HARDWARE_ERROR_MAX) {
+ for (i = 0; i < XE_SOC_NUM_IEH; i++)
+ xe_mmio_write32(>->tile->mmio, SOC_GSYSEVTCTL_REG(base, slave_base, i),
+ ~REG_BIT(hw_err));
+
+ xe_mmio_write32(>->tile->mmio, SOC_GLOBAL_ERR_STAT_MASTER_REG(base, hw_err),
+ REG_GENMASK(31, 0));
+ xe_mmio_write32(>->tile->mmio, SOC_LOCAL_ERR_STAT_MASTER_REG(base, hw_err),
+ REG_GENMASK(31, 0));
+ xe_mmio_write32(>->tile->mmio, SOC_GLOBAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
+ REG_GENMASK(31, 0));
+ xe_mmio_write32(>->tile->mmio, SOC_LOCAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
+ REG_GENMASK(31, 0));
+ hw_err++;
+ }
+ for (i = 0; i < XE_SOC_NUM_IEH; i++)
+ xe_mmio_write32(>->tile->mmio, SOC_GSYSEVTCTL_REG(base, slave_base, i),
+ (HARDWARE_ERROR_MAX << 1) + 1);
+ }
+}
+
static void
xe_gt_hw_error_handler(struct xe_gt *gt, const enum hardware_error hw_err)
{
@@ -852,4 +892,5 @@ void xe_init_hw_errors(struct xe_device *xe)
{
xe_assign_hw_err_regs(xe);
xe_process_hw_errors(xe);
+ xe_clear_all_soc_errors(xe);
}
--
2.25.1
next prev parent reply other threads:[~2025-07-30 5:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 5:48 [PATCH 00/10] Supporting RAS on XE Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 01/10] drm/xe: Handle errors from various components Aravind Iddamsetty
2025-07-30 9:08 ` Michal Wajdeczko
2025-07-30 19:59 ` Rodrigo Vivi
2025-07-30 5:48 ` [PATCH 02/10] drm/xe: Add new helpers to log hardware errrors Aravind Iddamsetty
2025-07-30 8:55 ` Michal Wajdeczko
2025-07-30 5:48 ` [PATCH 03/10] drm/xe: Log and count the GT hardware errors Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 04/10] drm/xe: Support GT hardware error reporting for PVC Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 05/10] drm/xe: Support GSC " Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 06/10] drm/xe: Support SOC FATAL error handling " Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 07/10] drm/xe: Support SOC NONFATAL " Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 08/10] drm/xe: Handle MDFI error severity Aravind Iddamsetty
2025-07-30 5:48 ` [PATCH 09/10] drm/xe: Clear SOC CORRECTABLE error registers Aravind Iddamsetty
2025-07-30 5:48 ` Aravind Iddamsetty [this message]
2025-07-30 5:57 ` ✗ CI.checkpatch: warning for Supporting RAS on XE Patchwork
2025-07-30 5:58 ` ✓ CI.KUnit: success " Patchwork
2025-07-30 6:59 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-07-30 8:03 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250730054814.1376770-11-aravind.iddamsetty@linux.intel.com \
--to=aravind.iddamsetty@linux.intel.com \
--cc=anshuman.gupta@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).