All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Riana Tauro" <riana.tauro@intel.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v3 2/3] drm/xe/xe_drm_ras: Add error-event support for PVC
Date: Tue, 23 Jun 2026 20:58:15 +0000	[thread overview]
Message-ID: <20260623205816.20B471F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260622101716.3313496-7-riana.tauro@intel.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Potential console flood due to unratelimited `drm_err` in atomic context when netlink multicast fails.
- [Medium] Emitting RAS events with unchanged/zero counter values for uncounted errors.
- [Medium] Pointer arithmetic is performed on a potentially NULL pointer before checking if the DRM RAS feature is enabled.
--

commit 948141f8cea37215cdae8a5d0e089b0c6b181396
Author: Riana Tauro <riana.tauro@intel.com>

drm/xe/xe_drm_ras: Add error-event support for PVC

Report drm_ras error event to userspace when an error occurs.
Add support for core-compute and SoC errors in PVC.

> diff --git a/drivers/gpu/drm/xe/xe_drm_ras.c b/drivers/gpu/drm/xe/xe_drm_ras.c
> index 7937d8ba0ed9c..36afdfb5e4120 100644
> --- a/drivers/gpu/drm/xe/xe_drm_ras.c
> +++ b/drivers/gpu/drm/xe/xe_drm_ras.c

[ ... ]

> @@ -185,6 +185,36 @@ static int register_nodes(struct xe_device *xe)
>  	return ret;
>  }
>  
> +/**
> + * xe_drm_ras_event() - Report drm-ras error event to userspace
> + * @xe: xe device structure
> + * @component: error component (see &enum drm_xe_ras_error_component)
> + * @severity: error severity (see &enum drm_xe_ras_error_severity)
> + * @value: value of error counter
> + * @flags: flags for allocation
> + *
> + * Report an error-event to userspace.
> + */
> +void xe_drm_ras_event(struct xe_device *xe, u32 component, u32 severity, u32 value, gfp_t flags)
> +{
> +	struct xe_drm_ras *ras = &xe->ras;
> +	struct xe_drm_ras_counter *info = ras->info[severity];
> +	struct drm_ras_node *node = &ras->node[severity];

[Severity: Medium]
Is it safe to compute the node pointer here before checking if the DRM
RAS feature is enabled?

If xe->info.has_drm_ras is false, xe_drm_ras_init() might not have been
called, leaving ras->node as NULL. Evaluating &ras->node[severity] on a
NULL pointer results in undefined behavior which could be flagged by UBSAN
or compiler optimizations.

> +	int ret;
> +
> +	/* Event is supported only if drm_ras is enabled */
> +	if (!xe->info.has_drm_ras)
> +		return;
> +
> +	if (!info || !info[component].name)
> +		return;
> +
> +	ret = drm_ras_nl_error_event(node, component, info[component].name, value, flags);
> +	if (ret)
> +		drm_err(&xe->drm, "RAS error-event failed: %d for %s %s\n", ret,
> +			info[component].name, error_severity[severity]);

[Severity: High]
Could this unratelimited drm_err() cause a console flood?

Since xe_drm_ras_event() is called from hw_error_source_handler() in
hardware error interrupt context, an interrupt storm could occur.
If drm_ras_nl_error_event() fails (e.g., due to GFP_ATOMIC allocation
failures under memory pressure), this will rapidly print errors to
the console.

Should this use drm_err_ratelimited() similar to how the caller
handles its logging?

> +}
> +
>  /**
>   * xe_drm_ras_init() - Initialize DRM RAS
>   * @xe: xe device instance

> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
> index 4a4b363fc844b..a833cecc74ec5 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.c
> +++ b/drivers/gpu/drm/xe/xe_hw_error.c

[ ... ]

> @@ -495,6 +495,9 @@ static void hw_error_source_handler(struct xe_tile *tile, const enum hardware_er
>  			gt_hw_error_handler(tile, hw_err, error_id);
>  		if (err_bit == XE_SOC_ERROR)
>  			soc_hw_error_handler(tile, hw_err, error_id);
> +
> +		value = atomic_read(&info[error_id].counter);
> +		xe_drm_ras_event(xe, error_id, severity, value, GFP_ATOMIC);

[Severity: Medium]
Will this emit an event with an unchanged or zero counter value for
correctable SOC errors?

In soc_hw_error_handler(), if the error is HARDWARE_ERROR_CORRECTABLE,
it clears the registers and jumps to unmask_gsysevtctl without
incrementing the error counter:

soc_hw_error_handler() {
    ...
    if (hw_err == HARDWARE_ERROR_CORRECTABLE) {
        ...
        goto unmask_gsysevtctl;
    }
    ...
}

However, here in hw_error_source_handler(), xe_drm_ras_event() is
called unconditionally after soc_hw_error_handler() returns.
Does this result in userspace receiving an error event where the
counter value hasn't actually changed?

>  	}
>  
>  clear_reg:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622101716.3313496-5-riana.tauro@intel.com?part=2

  reply	other threads:[~2026-06-23 20:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22 10:17 [PATCH v3 0/3] Add drm_ras netlink error event support Riana Tauro
2026-06-22 10:17 ` [PATCH v3 1/3] drm/drm_ras: Add drm_ras netlink error event Riana Tauro
2026-06-23 20:46   ` sashiko-bot
2026-06-22 10:17 ` [PATCH v3 2/3] drm/xe/xe_drm_ras: Add error-event support for PVC Riana Tauro
2026-06-23 20:58   ` sashiko-bot [this message]
2026-06-22 10:17 ` [PATCH v3 3/3] drm/xe/xe_ras: Add error-event support for CRI Riana Tauro
2026-06-23 21:07   ` sashiko-bot
2026-06-22 14:48 ` ✓ CI.KUnit: success for Add drm_ras netlink error event support (rev3) Patchwork
2026-06-22 15:35 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-22 17:58 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623205816.20B471F00A3A@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=riana.tauro@intel.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.