Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>,
	intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH 11/11] drm/xe: Clear all SoC errors post warm reset.
Date: Wed, 11 Oct 2023 12:26:13 +0530	[thread overview]
Message-ID: <afffc88e-6c18-75b7-a968-67574573fe9b@linux.intel.com> (raw)
In-Reply-To: <20230927114627.136925-12-himal.prasad.ghimiray@intel.com>


On 27/09/23 17:16, Himal Prasad Ghimiray wrote:
> There are scenarios where there are no fatal errors reported
> but Non-fatal/correctable errors being reported from the SoC
> uncore to IEH and not propogated to SG unit. Clear all previous
> SoC errors post warm reset.

the commit msg is not very clear, how fatal error reporting is related to other errors.

Thanks,
Aravind.
>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_hw_error.c | 37 ++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_hw_error.h |  1 +
>  drivers/gpu/drm/xe/xe_irq.c      |  1 +
>  3 files changed, 39 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
> index 0bcb1bea7ffb..a777c887a7be 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.c
> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
> @@ -366,6 +366,43 @@ static void xe_assign_hw_err_regs(struct xe_device *xe)
>  	}
>  }
>  
> +void xe_clear_all_soc_errors(struct xe_device *xe)
> +{
> +	enum hardware_error hw_err;
> +	u32 base, slave_base;
> +	struct xe_tile *tile;
> +	struct xe_gt *gt;
> +	unsigned int i;
> +
> +	base = SOC_PVC_BASE;
> +	slave_base = SOC_PVC_SLAVE_BASE;
> +
> +	hw_err = HARDWARE_ERROR_CORRECTABLE;
> +
> +	for_each_tile(tile, xe, i) {
> +		gt = tile->primary_gt;
> +
> +		while (hw_err < HARDWARE_ERROR_MAX) {
> +			for (i = 0; i < PVC_NUM_IEH; i++)
> +				xe_mmio_write32(gt, SOC_GSYSEVTCTL_REG(base, slave_base, i),
> +						~REG_BIT(hw_err));
> +
> +			xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_MASTER_REG(base, hw_err),
> +					REG_GENMASK(31, 0));
> +			xe_mmio_write32(gt, SOC_LOCAL_ERR_STAT_MASTER_REG(base, hw_err),
> +					REG_GENMASK(31, 0));
> +			xe_mmio_write32(gt, SOC_GLOBAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
> +					REG_GENMASK(31, 0));
> +			xe_mmio_write32(gt, SOC_LOCAL_ERR_STAT_SLAVE_REG(slave_base, hw_err),
> +					REG_GENMASK(31, 0));
> +			hw_err++;
> +		}
> +		for (i = 0; i < PVC_NUM_IEH; i++)
> +			xe_mmio_write32(gt, SOC_GSYSEVTCTL_REG(base, slave_base, i),
> +					(HARDWARE_ERROR_MAX << 1) + 1);
> +	}
> +}
> +
>  static void
>  xe_gt_hw_error_status_reg_handler(struct xe_gt *gt, const enum hardware_error hw_err)
>  {
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.h b/drivers/gpu/drm/xe/xe_hw_error.h
> index a458a90b34a2..7ada7c97c939 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.h
> +++ b/drivers/gpu/drm/xe/xe_hw_error.h
> @@ -219,4 +219,5 @@ struct xe_tile;
>  void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 master_ctl);
>  void xe_process_hw_errors(struct xe_device *xe);
>  void xe_gsc_hw_error_work(struct work_struct *work);
> +void xe_clear_all_soc_errors(struct xe_device *xe);
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
> index 285c657cc789..42a6bb45acba 100644
> --- a/drivers/gpu/drm/xe/xe_irq.c
> +++ b/drivers/gpu/drm/xe/xe_irq.c
> @@ -597,6 +597,7 @@ int xe_irq_install(struct xe_device *xe)
>  	}
>  
>  	xe_process_hw_errors(xe);
> +	xe_clear_all_soc_errors(xe);
>  
>  	xe->irq.enabled = true;
>  

  reply	other threads:[~2023-10-11  6:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27 11:46 [Intel-xe] [PATCH 00/11] Supporting CSC and SOC HARDWARE ERROR HANDLING on PVC Himal Prasad Ghimiray
2023-09-27 11:43 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-09-27 11:43 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-09-27 11:44 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-09-27 11:46 ` [Intel-xe] [PATCH 01/11] drm/xe: Handle errors from various components Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 02/11] drm/xe: Log and count the GT hardware errors Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 03/11] drm/xe: Support GT hardware error reporting for PVC Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 04/11] drm/xe: Process fatal hardware errors Himal Prasad Ghimiray
2023-09-27 11:46 ` [Intel-xe] [PATCH 05/11] drm/xe: Support GSC hardware error reporting for PVC Himal Prasad Ghimiray
2023-10-11  7:18   ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 06/11] drm/xe: Notify userspace about GSC HW errors Himal Prasad Ghimiray
2023-10-11  7:23   ` Aravind Iddamsetty
2023-10-11  7:25     ` Ghimiray, Himal Prasad
2023-10-12  3:12       ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 07/11] drm/xe: Support SOC FATAL error handling for PVC Himal Prasad Ghimiray
2023-10-04  6:38   ` Aravind Iddamsetty
2023-10-04  6:50     ` Ghimiray, Himal Prasad
2023-10-08  9:32       ` Aravind Iddamsetty
2023-10-09  4:11         ` Ghimiray, Himal Prasad
2023-10-09  9:00           ` Aravind Iddamsetty
2023-10-09  9:15             ` Ghimiray, Himal Prasad
2023-10-10  6:27               ` Aravind Iddamsetty
2023-10-09  9:52   ` Aravind Iddamsetty
2023-10-09 10:14     ` Ghimiray, Himal Prasad
2023-09-27 11:46 ` [Intel-xe] [PATCH 08/11] drm/xe: Support SOC NONFATAL " Himal Prasad Ghimiray
2023-10-11  6:07   ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 09/11] drm/xe: Handle MDFI error severity Himal Prasad Ghimiray
2023-10-04 12:11   ` Aravind Iddamsetty
2023-09-27 11:46 ` [Intel-xe] [PATCH 10/11] drm/xe: Clear SOC CORRECTABLE error registers Himal Prasad Ghimiray
2023-10-09  9:58   ` Aravind Iddamsetty
2023-10-11  6:48   ` Aravind Iddamsetty
2023-10-11  6:52     ` Ghimiray, Himal Prasad
2023-10-12  2:59       ` Aravind Iddamsetty
2023-10-12  4:01         ` Ghimiray, Himal Prasad
2023-09-27 11:46 ` [Intel-xe] [PATCH 11/11] drm/xe: Clear all SoC errors post warm reset Himal Prasad Ghimiray
2023-10-11  6:56   ` Aravind Iddamsetty [this message]
2023-10-11  6:59     ` Ghimiray, Himal Prasad
2023-10-12  3:05       ` Aravind Iddamsetty
2023-09-27 11:51 ` [Intel-xe] ✓ CI.Build: success for Supporting CSC and SOC HARDWARE ERROR HANDLING on PVC Patchwork
2023-09-27 11:52 ` [Intel-xe] ✗ CI.Hooks: failure " Patchwork
2023-09-27 11:53 ` [Intel-xe] ✓ CI.checksparse: success " Patchwork
2023-09-27 12:28 ` [Intel-xe] ✗ CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afffc88e-6c18-75b7-a968-67574573fe9b@linux.intel.com \
    --to=aravind.iddamsetty@linux.intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox