From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>,
intel-xe@lists.freedesktop.org
Cc: Jani Nikula <jani.nikula@intel.com>,
Matt Roper <matthew.d.roper@intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-xe] [PATCH v5 1/4] drm/xe: Handle errors from various components.
Date: Thu, 5 Oct 2023 09:31:44 +0530 [thread overview]
Message-ID: <1629324f-63a2-1272-5814-97b6c2869d41@linux.intel.com> (raw)
In-Reply-To: <01b351e5-0141-20ac-f11a-f663f611dc07@linux.intel.com>
On 04/10/23 17:37, Aravind Iddamsetty wrote:
> On 23/08/23 14:28, Himal Prasad Ghimiray wrote:
>> The GFX device can generate numbers of classes of error under the new
>> infrastructure: correctable, non-fatal, and fatal errors.
>>
>> The non-fatal and fatal error classes distinguish between levels of
>> severity for uncorrectable errors. Driver will only handle logging
>> of errors and updating counters from various components within the
>> graphics device. Anything more will be handled at system level.
>>
>> For errors that will route as interrupts, three bits in the Master
>> Interrupt Register will be used to convey the class of error.
>>
>> For each class of error: Determine source of error (IP block) by reading
>> the Device Error Source Register (RW1C) that
>> corresponds to the class of error being serviced.
>>
>> Bspec: 50875, 53073, 53074, 53075
>>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Jani Nikula <jani.nikula@intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>> drivers/gpu/drm/xe/Makefile | 1 +
>> drivers/gpu/drm/xe/regs/xe_regs.h | 2 +-
>> drivers/gpu/drm/xe/regs/xe_tile_error_regs.h | 15 ++
>> drivers/gpu/drm/xe/xe_device_types.h | 11 +
>> drivers/gpu/drm/xe/xe_hw_error.c | 211 +++++++++++++++++++
>> drivers/gpu/drm/xe/xe_hw_error.h | 64 ++++++
>> drivers/gpu/drm/xe/xe_irq.c | 3 +
>> 7 files changed, 306 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/gpu/drm/xe/regs/xe_tile_error_regs.h
>> create mode 100644 drivers/gpu/drm/xe/xe_hw_error.c
>> create mode 100644 drivers/gpu/drm/xe/xe_hw_error.h
> <snip>
>
>> +/* Count of Correctable and Uncorrectable errors reported on tile */
>> +enum xe_tile_hw_errors {
>> + XE_TILE_HW_ERR_GT_FATAL = 0,
>> + XE_TILE_HW_ERR_SGGI_FATAL,
>> + XE_TILE_HW_ERR_DISPLAY_FATAL,
>> + XE_TILE_HW_ERR_SGDI_FATAL,
>> + XE_TILE_HW_ERR_SGLI_FATAL,
>> + XE_TILE_HW_ERR_SGUNIT_FATAL,
>> + XE_TILE_HW_ERR_SGCI_FATAL,
>> + XE_TILE_HW_ERR_GSC_FATAL,
>> + XE_TILE_HW_ERR_SOC_FATAL,
>> + XE_TILE_HW_ERR_MERT_FATAL,
>> + XE_TILE_HW_ERR_SGMI_FATAL,
>> + XE_TILE_HW_ERR_UNKNOWN_FATAL,
>> + XE_TILE_HW_ERR_SGGI_NONFATAL,
>> + XE_TILE_HW_ERR_DISPLAY_NONFATAL,
>> + XE_TILE_HW_ERR_SGDI_NONFATAL,
>> + XE_TILE_HW_ERR_SGLI_NONFATAL,
>> + XE_TILE_HW_ERR_GT_NONFATAL,
>> + XE_TILE_HW_ERR_SGUNIT_NONFATAL,
>> + XE_TILE_HW_ERR_SGCI_NONFATAL,
>> + XE_TILE_HW_ERR_GSC_NONFATAL,
>> + XE_TILE_HW_ERR_SOC_NONFATAL,
>> + XE_TILE_HW_ERR_MERT_NONFATAL,
>> + XE_TILE_HW_ERR_SGMI_NONFATAL,
>> + XE_TILE_HW_ERR_UNKNOWN_NONFATAL,
>> + XE_TILE_HW_ERR_GT_CORR,
>> + XE_TILE_HW_ERR_DISPLAY_CORR,
>> + XE_TILE_HW_ERR_SGUNIT_CORR,
>> + XE_TILE_HW_ERR_GSC_CORR,
>> + XE_TILE_HW_ERR_SOC_CORR,
>> + XE_TILE_HW_ERR_UNKNOWN_CORR,
>> + XE_TILE_HW_ERROR_MAX,
>> +};
>> +
> there are some defines which are not used in any platform specific structures,
> let's clean that up and at present let's add support to only DG2 and PVC and
> any specific errors for a particular platform to be added specifically for it later, applies
> to all errors in the series.
Also, let's remove display as there are no known RAS errors from it.
Thanks,
Aravind.
>
> Thanks,
> Aravind.
next prev parent reply other threads:[~2023-10-05 3:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-23 8:58 [Intel-xe] [PATCH v5 0/4] Supporting RAS on XE Himal Prasad Ghimiray
2023-08-23 8:58 ` [Intel-xe] [PATCH v5 1/4] drm/xe: Handle errors from various components Himal Prasad Ghimiray
2023-09-26 4:20 ` Aravind Iddamsetty
2023-09-26 4:57 ` Ghimiray, Himal Prasad
2023-09-26 10:09 ` Aravind Iddamsetty
2023-10-04 12:07 ` Aravind Iddamsetty
2023-10-05 4:01 ` Aravind Iddamsetty [this message]
2023-08-23 8:58 ` [Intel-xe] [PATCH v5 2/4] drm/xe: Log and count the GT hardware errors Himal Prasad Ghimiray
2023-09-26 4:20 ` Aravind Iddamsetty
2023-09-26 5:08 ` Ghimiray, Himal Prasad
2023-08-23 8:58 ` [Intel-xe] [PATCH v5 3/4] drm/xe: Support GT hardware error reporting for PVC Himal Prasad Ghimiray
2023-09-26 4:21 ` Aravind Iddamsetty
2023-09-26 5:11 ` Ghimiray, Himal Prasad
2023-08-23 8:58 ` [Intel-xe] [PATCH v5 4/4] drm/xe: Process fatal hardware errors Himal Prasad Ghimiray
2023-09-26 4:21 ` Aravind Iddamsetty
2023-09-26 10:24 ` Ghimiray, Himal Prasad
2023-08-23 9:00 ` [Intel-xe] ✓ CI.Patch_applied: success for Supporting RAS on XE (rev4) Patchwork
2023-08-23 9:00 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-08-23 9:01 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-08-23 9:05 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-08-23 9:05 ` [Intel-xe] ✗ CI.Hooks: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1629324f-63a2-1272-5814-97b6c2869d41@linux.intel.com \
--to=aravind.iddamsetty@linux.intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.