From: sashiko-bot@kernel.org
To: "Mallesh Koujalagi" <mallesh.koujalagi@intel.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
Date: Tue, 23 Jun 2026 21:04:47 +0000 [thread overview]
Message-ID: <20260623210448.60C771F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260622131458.144873-10-mallesh.koujalagi@intel.com>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Directly invoking driver `remove()` and `probe()` callbacks from the PCI error handler bypasses the Linux device model, leading to driver core state corruption and subsequent crashes.
- [High] WEDGE_STR_LEN buffer size is too small to hold all recovery methods, leading to silently truncated uevent strings that userspace cannot parse.
--
commit e0ad0ce6dec9b17be989295e8a1391f07c6154a5
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
Introduce DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)) recovery method to handle
scenarios requiring device power cycle. This method addresses cases where other
recovery mechanisms are insufficient to restore device functionality.
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 675675480da49..19c25366635e9 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -535,6 +535,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
> return "bus-reset";
> case DRM_WEDGE_RECOVERY_VENDOR:
> return "vendor-specific";
> + case DRM_WEDGE_RECOVERY_COLD_RESET:
> + return "cold-reset";
[Severity: High]
This is a pre-existing issue, but does adding the cold-reset method make the
WEDGE_STR_LEN buffer too small to hold all recovery methods?
Could this lead to silently truncated uevent strings that userspace cannot
parse?
In drm_dev_wedged_event(), the event_string buffer is hardcoded to 32 bytes:
drivers/gpu/drm/drm_drv.c:drm_dev_wedged_event() {
char event_string[WEDGE_STR_LEN];
...
len = scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=");
for_each_set_bit(opt, &method, BITS_PER_TYPE(method)) {
...
len += scnprintf(event_string + len, sizeof(event_string) - len, "%s,", recovery);
}
if (recovery)
/* Get rid of trailing comma */
event_string[len - 1] = '\0';
...
}
When formatting multiple methods (e.g. "WEDGED=rebind,bus-reset,cold-reset"),
the string length easily exceeds 32 bytes. If scnprintf() hits the buffer limit,
it returns 0 on the last iteration. Decrementing len then chops off the last
valid character of the truncated string, resulting in a corrupted uevent that
breaks userspace parsing.
> default:
> return NULL;
> }
[ ... ]
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 0000000000000..b08601f470d63
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,135 @@
[ ... ]
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> + const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> + struct xe_device *xe = pdev_to_xe_device(pdev);
[ ... ]
> + xe_device_clear_in_reset(xe);
> + pdev->driver->remove(pdev);
> + devres_release_group(&pdev->dev, xe->devres_group_id);
> +
> + if (pdev->driver->probe(pdev, ent))
[Severity: High]
This isn't a bug introduced by this patch, but does directly invoking the driver
remove() and probe() callbacks bypass the Linux device model?
The driver core still considers the device 'bound'. If the manual probe()
fails, it returns PCI_ERS_RESULT_DISCONNECT, causing the PCI core to formally
unbind the device later.
This invokes xe_pci_remove() on a partially initialized or already
manually-removed state, which could lead to resource double-frees and NULL
pointer dereferences.
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + xe = pdev_to_xe_device(pdev);
> +
> + /* Wedge the device to prevent I/O operations till the resume callback */
> + xe_device_wedged_get(xe);
> +
> + return PCI_ERS_RESULT_RECOVERED;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=2
next prev parent reply other threads:[~2026-06-23 21:04 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-23 20:54 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-06-23 21:04 ` sashiko-bot [this message]
2026-06-22 13:15 ` [PATCH v9 3/6] drm/doc: Document " Mallesh Koujalagi
2026-06-23 21:11 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-23 22:47 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler Mallesh Koujalagi
2026-06-23 22:47 ` sashiko-bot
2026-06-22 17:20 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev9) Patchwork
2026-06-22 17:22 ` ✓ CI.KUnit: success " Patchwork
2026-06-22 18:09 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-22 22:03 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260623210448.60C771F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.