From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH 10/11] drm/xe: Clear SOC CORRECTABLE error registers.
Date: Thu, 12 Oct 2023 08:29:49 +0530 [thread overview]
Message-ID: <ce44d9f3-deb6-33a6-a955-db3d8ec60da7@linux.intel.com> (raw)
In-Reply-To: <ba352668-c77f-4b65-b487-ac18edaf26e8@intel.com>
On 11/10/23 12:22, Ghimiray, Himal Prasad wrote:
>
> On 11-10-2023 12:18, Aravind Iddamsetty wrote:
>> On 27/09/23 17:16, Himal Prasad Ghimiray wrote:
>>> PVC doesn't support correctable SOC errors, if we receive MSI due to
>> statement looks incomplete/inappropriate,
>>
>> better rephrase to "PVC doesn't support correctable SOC error reporting"
> ok.
>>
>> Thanks,
>> Aravind.
>>> correctable error, classify them as Undefined and clear the registers.
>>>
>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_hw_error.c | 24 +++++++++++++++++++++++-
>>> 1 file changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
>>> index dcf395bd985f..0bcb1bea7ffb 100644
>>> --- a/drivers/gpu/drm/xe/xe_hw_error.c
>>> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
>>> @@ -616,9 +616,30 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
>>> lockdep_assert_held(&tile_to_xe(tile)->irq.lock);
>>> - if ((tile_to_xe(tile)->info.platform != XE_PVC) && hw_err == HARDWARE_ERROR_CORRECTABLE)
>>> + if ((tile_to_xe(tile)->info.platform != XE_PVC))
>>> return;
>>> + if (hw_err == HARDWARE_ERROR_CORRECTABLE) {
>>> + for (i = 0; i < PVC_NUM_IEH; i++)
>>> + xe_mmio_write32(gt, SOC_GSYSEVTCTL_REG(base, slave_base, i),
>>> + ~REG_BIT(hw_err));
>>> +
>>> + xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_MASTER_REG(base, hw_err),
>>> + REG_GENMASK(31, 0));
>>> + xe_mmio_write32(gt, SOC_LOCAL_ERR_STAT_MASTER_REG(base, hw_err),
>>> + REG_GENMASK(31, 0));
>>> + xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
>>> + REG_GENMASK(31, 0));
>>> + xe_mmio_write32(gt, SOC_LOCAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
>>> + REG_GENMASK(31, 0));
>>> +
>>> + drm_info(&tile_to_xe(tile)->drm, HW_ERR
>>> + "Tile%d Undefine SOC %s error.",
>>> + tile->id, hwerr_to_str);
>> I still feel in this scenarios at least we shall flag this as drm_err, since even though
>> it is correctable and corrected by HW, aren't they spurious as we don't expect to receive them
>> and a HW misbehaviour. Thoughts?
>
> Agreed. IMO this change should be part of low driver error reporting. Not only SOC, we need to report other gt and tile errors
the category will be added as part of low level driver error, but the since you are adding the print, suggesting to change to drm_err
Thanks,
Aravind.
>
> too as spurious interrupt errors when they are undefined irrespective of error classes(correctable/uncorrectable).
>
>>
>>
>> Thanks,
>> Aravind.
>>> +
>>> + goto unmask_gsysevtctl;
>>> + }
>>> +
>>> if (hw_err == HARDWARE_ERROR_FATAL) {
>>> soc_mstr_glbl_err_reg = soc_mstr_glbl_err_reg_fatal;
>>> soc_mstr_lcl_err_reg = soc_mstr_lcl_err_reg_fatal;
>>> @@ -709,6 +730,7 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
>>> xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_MASTER_REG(base, hw_err),
>>> mst_glb_errstat);
>>> +unmask_gsysevtctl:
>>> for (i = 0; i < PVC_NUM_IEH; i++)
>>> xe_mmio_write32(gt, SOC_GSYSEVTCTL_REG(base, slave_base, i),
>>> (HARDWARE_ERROR_MAX << 1) + 1);
next prev parent reply other threads:[~2023-10-12 2:57 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-27 11:46 [Intel-xe] [PATCH 00/11] Supporting CSC and SOC HARDWARE ERROR HANDLING on PVC Himal Prasad Ghimiray
2023-09-27 11:43 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-09-27 11:43 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-09-27 11:44 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-09-27 11:46 ` [Intel-xe] [PATCH 01/11] drm/xe: Handle errors from various components Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 02/11] drm/xe: Log and count the GT hardware errors Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 03/11] drm/xe: Support GT hardware error reporting for PVC Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 04/11] drm/xe: Process fatal hardware errors Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 05/11] drm/xe: Support GSC hardware error reporting for PVC Himal Prasad Ghimiray
2023-10-11 7:18 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 06/11] drm/xe: Notify userspace about GSC HW errors Himal Prasad Ghimiray
2023-10-11 7:23 ` Aravind Iddamsetty
2023-10-11 7:25 ` Ghimiray, Himal Prasad
2023-10-12 3:12 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 07/11] drm/xe: Support SOC FATAL error handling for PVC Himal Prasad Ghimiray
2023-10-04 6:38 ` Aravind Iddamsetty
2023-10-04 6:50 ` Ghimiray, Himal Prasad
2023-10-08 9:32 ` Aravind Iddamsetty
2023-10-09 4:11 ` Ghimiray, Himal Prasad
2023-10-09 9:00 ` Aravind Iddamsetty
2023-10-09 9:15 ` Ghimiray, Himal Prasad
2023-10-10 6:27 ` Aravind Iddamsetty
2023-10-09 9:52 ` Aravind Iddamsetty
2023-10-09 10:14 ` Ghimiray, Himal Prasad
2023-09-27 11:46 ` [Intel-xe] [PATCH 08/11] drm/xe: Support SOC NONFATAL " Himal Prasad Ghimiray
2023-10-11 6:07 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 09/11] drm/xe: Handle MDFI error severity Himal Prasad Ghimiray
2023-10-04 12:11 ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 10/11] drm/xe: Clear SOC CORRECTABLE error registers Himal Prasad Ghimiray
2023-10-09 9:58 ` Aravind Iddamsetty
2023-10-11 6:48 ` Aravind Iddamsetty
2023-10-11 6:52 ` Ghimiray, Himal Prasad
2023-10-12 2:59 ` Aravind Iddamsetty [this message]
2023-10-12 4:01 ` Ghimiray, Himal Prasad
2023-09-27 11:46 ` [Intel-xe] [PATCH 11/11] drm/xe: Clear all SoC errors post warm reset Himal Prasad Ghimiray
2023-10-11 6:56 ` Aravind Iddamsetty
2023-10-11 6:59 ` Ghimiray, Himal Prasad
2023-10-12 3:05 ` Aravind Iddamsetty
2023-09-27 11:51 ` [Intel-xe] ✓ CI.Build: success for Supporting CSC and SOC HARDWARE ERROR HANDLING on PVC Patchwork
2023-09-27 11:52 ` [Intel-xe] ✗ CI.Hooks: failure " Patchwork
2023-09-27 11:53 ` [Intel-xe] ✓ CI.checksparse: success " Patchwork
2023-09-27 12:28 ` [Intel-xe] ✗ CI.BAT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce44d9f3-deb6-33a6-a955-db3d8ec60da7@linux.intel.com \
--to=aravind.iddamsetty@linux.intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox