Re: [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Mallesh Koujalagi" <mallesh.koujalagi@intel.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler
Date: Tue, 23 Jun 2026 22:47:45 +0000	[thread overview]
Message-ID: <20260623224745.EBE521F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260622131458.144873-14-mallesh.koujalagi@intel.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Global struct fault_attr causes dentry reference leak and cross-device state contamination
- [Medium] Potential NULL pointer dereference during early probe if wedged event is sent before DRM device registration
--

commit f49d10330bc9238666f6e80cf2c7887368a6612b
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>

drm/xe/ras: Use fault-inject to trigger punit error handler

Use fault-inject framework to trigger punit_error_handler()
for testing.

> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 22b471303984a..3a90deb5596ce 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -40,6 +40,7 @@
>  
>  DECLARE_FAULT_ATTR(gt_reset_failure);
>  DECLARE_FAULT_ATTR(inject_csc_hw_error);
> +DECLARE_FAULT_ATTR(inject_punit_error);
>  
>  static void read_residency_counter(struct xe_device *xe, struct xe_mmio *mmio,
>  				   u32 offset, const char *name, struct drm_printer *p)

[ ... ]

> @@ -612,6 +613,8 @@ void xe_debugfs_register(struct xe_device *xe)
>  
>  	fault_create_debugfs_attr("fail_gt_reset", root, &gt_reset_failure);
>  
> +	fault_create_debugfs_attr("inject_punit_error", root, &inject_punit_error);

[Severity: Medium]
Since inject_punit_error is declared as a global attribute, will calling
fault_create_debugfs_attr() for each device overwrite attr->dname?

Looking at fault_create_debugfs_attr(), it unconditionally assigns
attr->dname = dget(dir). If the system has multiple XE GPUs, each
subsequent device's debugfs registration would overwrite attr->dname without
calling dput() on the previous reference, leaking the dentry.

Would this also expose a single global fault injection state across all
devices, meaning a write to one device's debugfs file unintentionally alters
the behavior for all other XE devices?

> +
>  	if (IS_SRIOV_PF(xe))
>  		xe_sriov_pf_debugfs_register(xe, root);
>  	else if (IS_SRIOV_VF(xe))
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 5a2fee5a13083..bb8502472f738 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c

[ ... ]

> @@ -708,6 +717,11 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
>  	size_t rlen;
>  	int ret;
>  
> +	if (fault_inject_punit_error()) {
> +		punit_error_handler(xe);
> +		return XE_RAS_RECOVERY_ACTION_DISCONNECT;
> +	}

[Severity: Medium]
Could this trigger a NULL pointer dereference during early probe?

xe_ras_init() is called during driver probe and executes
xe_ras_process_errors(). If the injected fault triggers here, it executes
punit_error_handler() which in turn calls xe_device_declare_wedged().

If xe_device_declare_wedged() invokes drm_dev_wedged_event() to notify
userspace before the DRM device interface is fully registered (before
drm_dev_register() initializes the minor's kdev), it might evaluate
&dev->primary->kdev->kobj and dereference NULL.

Because the inject_punit_error fault attribute is global, if a user sets
the probability to 100% on the first probed device, could it inadvertently
trigger during the probe of a subsequently attached device and crash the
system?

> +
>  	if (!xe->info.has_sysctrl)
>  		return XE_RAS_RECOVERY_ACTION_RESET;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=6

next prev parent reply	other threads:[~2026-06-23 22:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-23 20:54   ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-06-23 21:04   ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 3/6] drm/doc: Document " Mallesh Koujalagi
2026-06-23 21:11   ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-23 22:47   ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler Mallesh Koujalagi
2026-06-23 22:47   ` sashiko-bot [this message]
2026-06-22 17:20 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev9) Patchwork
2026-06-22 17:22 ` ✓ CI.KUnit: success " Patchwork
2026-06-22 18:09 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-22 22:03 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623224745.EBE521F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=mallesh.koujalagi@intel.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.