From: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
rodrigo.vivi@intel.com
Cc: andrealmeid@igalia.com, christian.koenig@amd.com,
airlied@gmail.com, simona.vetter@ffwll.ch, mripard@kernel.org,
anshuman.gupta@intel.com, badal.nilawar@intel.com,
riana.tauro@intel.com, karthik.poosa@intel.com,
sk.anirban@intel.com, raag.jadav@intel.com,
Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Subject: [PATCH 4/4] drm/xe/debugfs: Add interface to trigger critical error handler
Date: Wed, 11 Feb 2026 17:29:51 +0530 [thread overview]
Message-ID: <20260211115946.2014051-10-mallesh.koujalagi@intel.com> (raw)
In-Reply-To: <20260211115946.2014051-6-mallesh.koujalagi@intel.com>
Add a debugfs interface to manually trigger the critical error handler
for testing cold reset recovery paths. This is useful for validating
the error recovery mechanism.
The new debugfs entry 'trigger_critical_error' is located at:
/sys/kernel/debug/dri/N/trigger_critical_error
Reading the file displays usage instructions. Writing '1' invokes
xe_critical_error_handler(), which marks the device as wedged with
DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
indicating that a complete device power cycle is required for recovery.
Writing '0' or any other false value has no effect.
This interface is intended for development, testing, and validation
of critical error recovery code.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
drivers/gpu/drm/xe/xe_debugfs.c | 38 +++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 844cfafe1ec7..61c76e5e617e 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -18,6 +18,7 @@
#include "xe_gt_debugfs.h"
#include "xe_gt_printk.h"
#include "xe_guc_ads.h"
+#include "xe_hw_error.h"
#include "xe_mmio.h"
#include "xe_pm.h"
#include "xe_psmi.h"
@@ -509,6 +510,40 @@ static const struct file_operations disable_late_binding_fops = {
.write = disable_late_binding_set,
};
+static ssize_t trigger_critical_error_show(struct file *f, char __user *ubuf,
+ size_t size, loff_t *pos)
+{
+ const char *msg = "Write 1 to trigger critical error handler\n";
+
+ return simple_read_from_buffer(ubuf, size, pos, msg, strlen(msg));
+}
+
+static ssize_t trigger_critical_error_set(struct file *f,
+ const char __user *ubuf,
+ size_t size, loff_t *pos)
+{
+ struct xe_device *xe = file_inode(f)->i_private;
+ bool trigger;
+ ssize_t ret;
+
+ ret = kstrtobool_from_user(ubuf, size, &trigger);
+ if (ret)
+ return ret;
+
+ if (trigger) {
+ xe_critical_error_handler(xe);
+ drm_info(&xe->drm, "Critical error handler triggered via debugfs\n");
+ }
+
+ return size;
+}
+
+static const struct file_operations trigger_critical_error_fops = {
+ .owner = THIS_MODULE,
+ .read = trigger_critical_error_show,
+ .write = trigger_critical_error_set,
+};
+
void xe_debugfs_register(struct xe_device *xe)
{
struct ttm_device *bdev = &xe->ttm;
@@ -550,6 +585,9 @@ void xe_debugfs_register(struct xe_device *xe)
debugfs_create_file("disable_late_binding", 0600, root, xe,
&disable_late_binding_fops);
+ debugfs_create_file("trigger_critical_error", 0600, root, xe,
+ &trigger_critical_error_fops);
+
/*
* Don't expose page reclaim configuration file if not supported by the
* hardware initially.
--
2.34.1
next prev parent reply other threads:[~2026-02-11 12:02 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-11 11:59 [RFC PATCH 0/4] Add cold reset recovery method for critical errors Mallesh Koujalagi
2026-02-11 11:59 ` [PATCH 1/4] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for critical error Mallesh Koujalagi
2026-02-11 11:59 ` [PATCH 2/4] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-02-11 13:29 ` Jani Nikula
2026-02-12 7:54 ` Mallesh, Koujalagi
2026-02-11 11:59 ` [PATCH 3/4] drm/xe: Add handler for critical errors which require cold-reset Mallesh Koujalagi
2026-02-11 11:59 ` Mallesh Koujalagi [this message]
2026-02-11 12:27 ` [RFC PATCH 0/4] Add cold reset recovery method for critical errors Christian König
2026-02-13 10:39 ` Mallesh, Koujalagi
2026-02-11 15:02 ` ✓ CI.KUnit: success for " Patchwork
2026-02-11 15:23 ` ✗ CI.checksparse: warning " Patchwork
2026-02-11 16:16 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-02-12 22:30 ` ✗ Xe.CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260211115946.2014051-10-mallesh.koujalagi@intel.com \
--to=mallesh.koujalagi@intel.com \
--cc=airlied@gmail.com \
--cc=andrealmeid@igalia.com \
--cc=anshuman.gupta@intel.com \
--cc=badal.nilawar@intel.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=karthik.poosa@intel.com \
--cc=mripard@kernel.org \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona.vetter@ffwll.ch \
--cc=sk.anirban@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox