From: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
To: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>,
<intel-xe@lists.freedesktop.org>
Subject: Re: [Intel-xe] [PATCH 11/11] drm/xe: Clear all SoC errors post warm reset.
Date: Wed, 11 Oct 2023 12:29:39 +0530 [thread overview]
Message-ID: <dd3e373e-fa9e-4832-b861-fe6aceb55814@intel.com> (raw)
In-Reply-To: <afffc88e-6c18-75b7-a968-67574573fe9b@linux.intel.com>
On 11-10-2023 12:26, Aravind Iddamsetty wrote:
> On 27/09/23 17:16, Himal Prasad Ghimiray wrote:
>> There are scenarios where there are no fatal errors reported
>> but Non-fatal/correctable errors being reported from the SoC
>> uncore to IEH and not propogated to SG unit. Clear all previous
>> SoC errors post warm reset.
> the commit msg is not very clear, how fatal error reporting is related to other errors.
Will rephrase it as
There are scenarios where there are errors being reported from the SoC
uncore to IEH and not propagated to SG unit. Since these errors are not propagated to SG unit,
driver wont be able to clean them as part of xe_process_hw_error. Hence clear all SoC register post
xe_process_hw_error.
Is it ok ?
>
> Thanks,
> Aravind.
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_hw_error.c | 37 ++++++++++++++++++++++++++++++++
>> drivers/gpu/drm/xe/xe_hw_error.h | 1 +
>> drivers/gpu/drm/xe/xe_irq.c | 1 +
>> 3 files changed, 39 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
>> index 0bcb1bea7ffb..a777c887a7be 100644
>> --- a/drivers/gpu/drm/xe/xe_hw_error.c
>> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
>> @@ -366,6 +366,43 @@ static void xe_assign_hw_err_regs(struct xe_device *xe)
>> }
>> }
>>
>> +void xe_clear_all_soc_errors(struct xe_device *xe)
>> +{
>> + enum hardware_error hw_err;
>> + u32 base, slave_base;
>> + struct xe_tile *tile;
>> + struct xe_gt *gt;
>> + unsigned int i;
>> +
>> + base = SOC_PVC_BASE;
>> + slave_base = SOC_PVC_SLAVE_BASE;
>> +
>> + hw_err = HARDWARE_ERROR_CORRECTABLE;
>> +
>> + for_each_tile(tile, xe, i) {
>> + gt = tile->primary_gt;
>> +
>> + while (hw_err < HARDWARE_ERROR_MAX) {
>> + for (i = 0; i < PVC_NUM_IEH; i++)
>> + xe_mmio_write32(gt, SOC_GSYSEVTCTL_REG(base, slave_base, i),
>> + ~REG_BIT(hw_err));
>> +
>> + xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_MASTER_REG(base, hw_err),
>> + REG_GENMASK(31, 0));
>> + xe_mmio_write32(gt, SOC_LOCAL_ERR_STAT_MASTER_REG(base, hw_err),
>> + REG_GENMASK(31, 0));
>> + xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
>> + REG_GENMASK(31, 0));
>> + xe_mmio_write32(gt, SOC_LOCAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
>> + REG_GENMASK(31, 0));
>> + hw_err++;
>> + }
>> + for (i = 0; i < PVC_NUM_IEH; i++)
>> + xe_mmio_write32(gt, SOC_GSYSEVTCTL_REG(base, slave_base, i),
>> + (HARDWARE_ERROR_MAX << 1) + 1);
>> + }
>> +}
>> +
>> static void
>> xe_gt_hw_error_status_reg_handler(struct xe_gt *gt, const enum hardware_error hw_err)
>> {
>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.h b/drivers/gpu/drm/xe/xe_hw_error.h
>> index a458a90b34a2..7ada7c97c939 100644
>> --- a/drivers/gpu/drm/xe/xe_hw_error.h
>> +++ b/drivers/gpu/drm/xe/xe_hw_error.h
>> @@ -219,4 +219,5 @@ struct xe_tile;
>> void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 master_ctl);
>> void xe_process_hw_errors(struct xe_device *xe);
>> void xe_gsc_hw_error_work(struct work_struct *work);
>> +void xe_clear_all_soc_errors(struct xe_device *xe);
>> #endif
>> diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
>> index 285c657cc789..42a6bb45acba 100644
>> --- a/drivers/gpu/drm/xe/xe_irq.c
>> +++ b/drivers/gpu/drm/xe/xe_irq.c
>> @@ -597,6 +597,7 @@ int xe_irq_install(struct xe_device *xe)
>> }
>>
>> xe_process_hw_errors(xe);
>> + xe_clear_all_soc_errors(xe);
>>
>> xe->irq.enabled = true;
>>
next prev parent reply other threads:[~2023-10-11 7:00 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-27 11:46 [Intel-xe] [PATCH 00/11] Supporting CSC and SOC HARDWARE ERROR HANDLING on PVC Himal Prasad Ghimiray
2023-09-27 11:43 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-09-27 11:43 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-09-27 11:44 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-09-27 11:46 ` [Intel-xe] [PATCH 01/11] drm/xe: Handle errors from various components Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 02/11] drm/xe: Log and count the GT hardware errors Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 03/11] drm/xe: Support GT hardware error reporting for PVC Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 04/11] drm/xe: Process fatal hardware errors Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 05/11] drm/xe: Support GSC hardware error reporting for PVC Himal Prasad Ghimiray
2023-10-11 7:18 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 06/11] drm/xe: Notify userspace about GSC HW errors Himal Prasad Ghimiray
2023-10-11 7:23 ` Aravind Iddamsetty
2023-10-11 7:25 ` Ghimiray, Himal Prasad
2023-10-12 3:12 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 07/11] drm/xe: Support SOC FATAL error handling for PVC Himal Prasad Ghimiray
2023-10-04 6:38 ` Aravind Iddamsetty
2023-10-04 6:50 ` Ghimiray, Himal Prasad
2023-10-08 9:32 ` Aravind Iddamsetty
2023-10-09 4:11 ` Ghimiray, Himal Prasad
2023-10-09 9:00 ` Aravind Iddamsetty
2023-10-09 9:15 ` Ghimiray, Himal Prasad
2023-10-10 6:27 ` Aravind Iddamsetty
2023-10-09 9:52 ` Aravind Iddamsetty
2023-10-09 10:14 ` Ghimiray, Himal Prasad
2023-09-27 11:46 ` [Intel-xe] [PATCH 08/11] drm/xe: Support SOC NONFATAL " Himal Prasad Ghimiray
2023-10-11 6:07 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 09/11] drm/xe: Handle MDFI error severity Himal Prasad Ghimiray
2023-10-04 12:11 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 10/11] drm/xe: Clear SOC CORRECTABLE error registers Himal Prasad Ghimiray
2023-10-09 9:58 ` Aravind Iddamsetty
2023-10-11 6:48 ` Aravind Iddamsetty
2023-10-11 6:52 ` Ghimiray, Himal Prasad
2023-10-12 2:59 ` Aravind Iddamsetty
2023-10-12 4:01 ` Ghimiray, Himal Prasad
2023-09-27 11:46 ` [Intel-xe] [PATCH 11/11] drm/xe: Clear all SoC errors post warm reset Himal Prasad Ghimiray
2023-10-11 6:56 ` Aravind Iddamsetty
2023-10-11 6:59 ` Ghimiray, Himal Prasad [this message]
2023-10-12 3:05 ` Aravind Iddamsetty
2023-09-27 11:51 ` [Intel-xe] ✓ CI.Build: success for Supporting CSC and SOC HARDWARE ERROR HANDLING on PVC Patchwork
2023-09-27 11:52 ` [Intel-xe] ✗ CI.Hooks: failure " Patchwork
2023-09-27 11:53 ` [Intel-xe] ✓ CI.checksparse: success " Patchwork
2023-09-27 12:28 ` [Intel-xe] ✗ CI.BAT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dd3e373e-fa9e-4832-b861-fe6aceb55814@intel.com \
--to=himal.prasad.ghimiray@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox