* [PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state @ 2024-03-13 19:55 Rodrigo Vivi 2024-03-13 20:07 ` Lucas De Marchi 2024-03-13 20:45 ` ✗ CI.Patch_applied: failure for " Patchwork 0 siblings, 2 replies; 4+ messages in thread From: Rodrigo Vivi @ 2024-03-13 19:55 UTC (permalink / raw) To: igt-dev; +Cc: intel-xe, Rodrigo Vivi, Himal Prasad Ghimiray Let's inject a gt_reset failure that will put Xe device in the new wedged state, then we confirm the IOCTL is blocked and we reload the driver to get back to a clean state for other test execution, since wedged state in Xe is a final state that can only be cleared with a module reload. This new test case is entirely based on xe_uevent provided by Himal. Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> --- tests/intel/xe_wedged.c | 91 +++++++++++++++++++++++++++++++++++++++++ tests/meson.build | 1 + 2 files changed, 92 insertions(+) create mode 100644 tests/intel/xe_wedged.c diff --git a/tests/intel/xe_wedged.c b/tests/intel/xe_wedged.c new file mode 100644 index 000000000..f767e2511 --- /dev/null +++ b/tests/intel/xe_wedged.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2024 Intel Corporation + */ + +/** + * TEST: cause fake gt reset failure which put Xe device in wedged state + * Category: Software building block + * Sub-category: driver + * Functionality: wedged + * Test category: functionality test + */ + +#include "igt.h" +#include "igt_kmod.h" + +#include "xe/xe_ioctl.h" + +static void force_wedged(int fd) +{ + igt_debugfs_write(fd, "fail_gt_reset/probability", "100"); + igt_debugfs_write(fd, "fail_gt_reset/times", "2"); + + xe_force_gt_reset(fd, 0); + sleep(1); +} + +static int reload_xe(int fd) +{ + int error; + + drm_close_driver(fd); + igt_xe_driver_unload(); + + error = igt_xe_driver_load(NULL); + + igt_assert_eq(error, 0); + + /* driver is ready, check if it's bound */ + fd = __drm_open_driver(DRIVER_XE); + igt_fail_on_f(fd < 0, "Cannot open the xe DRM driver while reloading xe after wedged\n"); + return fd; +} + +static int simple_ioctl(int fd) +{ + int ret; + + struct drm_xe_vm_create create = { + .extensions = 0, + .flags = 0, + }; + + ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create); + + if (ret == 0) + xe_vm_destroy(fd, create.vm_id); + + return ret; +} + +/** + * SUBTEST: basic-wedged + * Description: Force Xe device wedged after injecting a failure in GT reset + */ +igt_main +{ + int fd; + + igt_fixture { + fd = drm_open_driver(DRIVER_XE); + igt_require(igt_debugfs_exists(fd, "fail_gt_reset/probability", + O_RDWR)); + } + + igt_subtest("basic-wedged") { + igt_assert_eq(simple_ioctl(fd), 0); + force_wedged(fd); + igt_assert_neq(simple_ioctl(fd), 0); + fd = reload_xe(fd); + igt_assert_eq(simple_ioctl(fd), 0); + } + + igt_fixture { + if (igt_debugfs_exists(fd, "fail_gt_reset/probability", O_RDWR)) { + igt_debugfs_write(fd, "fail_gt_reset/probability", "0"); + igt_debugfs_write(fd, "fail_gt_reset/times", "1"); + } + drm_close_driver(fd); + } +} diff --git a/tests/meson.build b/tests/meson.build index a856510fc..e590d4348 100644 --- a/tests/meson.build +++ b/tests/meson.build @@ -312,6 +312,7 @@ intel_xe_progs = [ 'xe_render_copy', 'xe_vm', 'xe_waitfence', + 'xe_wedged', 'xe_spin_batch', 'xe_sysfs_defaults', 'xe_sysfs_scheduler', -- 2.44.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state 2024-03-13 19:55 [PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi @ 2024-03-13 20:07 ` Lucas De Marchi 2024-03-14 17:57 ` Rodrigo Vivi 2024-03-13 20:45 ` ✗ CI.Patch_applied: failure for " Patchwork 1 sibling, 1 reply; 4+ messages in thread From: Lucas De Marchi @ 2024-03-13 20:07 UTC (permalink / raw) To: Rodrigo Vivi; +Cc: igt-dev, intel-xe, Himal Prasad Ghimiray On Wed, Mar 13, 2024 at 03:55:28PM -0400, Rodrigo Vivi wrote: >Let's inject a gt_reset failure that will put Xe device in the >new wedged state, then we confirm the IOCTL is blocked and we >reload the driver to get back to a clean state for other test >execution, since wedged state in Xe is a final state that can only >be cleared with a module reload. > >This new test case is entirely based on xe_uevent provided by >Himal. /me confused... I don't see any uevent handling here. > >Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> >Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> >--- > tests/intel/xe_wedged.c | 91 +++++++++++++++++++++++++++++++++++++++++ > tests/meson.build | 1 + > 2 files changed, 92 insertions(+) > create mode 100644 tests/intel/xe_wedged.c > >diff --git a/tests/intel/xe_wedged.c b/tests/intel/xe_wedged.c >new file mode 100644 >index 000000000..f767e2511 >--- /dev/null >+++ b/tests/intel/xe_wedged.c >@@ -0,0 +1,91 @@ >+// SPDX-License-Identifier: MIT >+/* >+ * Copyright © 2024 Intel Corporation >+ */ >+ >+/** >+ * TEST: cause fake gt reset failure which put Xe device in wedged state >+ * Category: Software building block >+ * Sub-category: driver >+ * Functionality: wedged >+ * Test category: functionality test >+ */ >+ >+#include "igt.h" >+#include "igt_kmod.h" >+ >+#include "xe/xe_ioctl.h" >+ >+static void force_wedged(int fd) >+{ >+ igt_debugfs_write(fd, "fail_gt_reset/probability", "100"); >+ igt_debugfs_write(fd, "fail_gt_reset/times", "2"); >+ >+ xe_force_gt_reset(fd, 0); humn... do we have to check the writes above did anything? I also don't see the kernel side, but if it just resets normally, the test would still pass afaics. >+ sleep(1); >+} >+ >+static int reload_xe(int fd) >+{ >+ int error; >+ >+ drm_close_driver(fd); >+ igt_xe_driver_unload(); what if we are running on e.g. MTL with a DG2 and want to debug one of them? Rather than re-loading the module and possibly causing unrelated issues (if e.g. module removal from the other card crashes), why not just unbind the module from the card under test? i.e. the equivalent in C of: rebind() { pci_slot=$1 echo -n "0000:$pci_slot" > /sys/bus/pci/drivers/$driver/unbind echo -n "0000:$pci_slot" > /sys/bus/pci/drivers/$driver/bind } Lucas De Marchi >+ >+ error = igt_xe_driver_load(NULL); >+ >+ igt_assert_eq(error, 0); >+ >+ /* driver is ready, check if it's bound */ >+ fd = __drm_open_driver(DRIVER_XE); >+ igt_fail_on_f(fd < 0, "Cannot open the xe DRM driver while reloading xe after wedged\n"); >+ return fd; >+} >+ >+static int simple_ioctl(int fd) >+{ >+ int ret; >+ >+ struct drm_xe_vm_create create = { >+ .extensions = 0, >+ .flags = 0, >+ }; >+ >+ ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create); >+ >+ if (ret == 0) >+ xe_vm_destroy(fd, create.vm_id); >+ >+ return ret; >+} >+ >+/** >+ * SUBTEST: basic-wedged >+ * Description: Force Xe device wedged after injecting a failure in GT reset >+ */ >+igt_main >+{ >+ int fd; >+ >+ igt_fixture { >+ fd = drm_open_driver(DRIVER_XE); >+ igt_require(igt_debugfs_exists(fd, "fail_gt_reset/probability", >+ O_RDWR)); >+ } >+ >+ igt_subtest("basic-wedged") { >+ igt_assert_eq(simple_ioctl(fd), 0); >+ force_wedged(fd); >+ igt_assert_neq(simple_ioctl(fd), 0); >+ fd = reload_xe(fd); >+ igt_assert_eq(simple_ioctl(fd), 0); >+ } >+ >+ igt_fixture { >+ if (igt_debugfs_exists(fd, "fail_gt_reset/probability", O_RDWR)) { >+ igt_debugfs_write(fd, "fail_gt_reset/probability", "0"); >+ igt_debugfs_write(fd, "fail_gt_reset/times", "1"); >+ } >+ drm_close_driver(fd); >+ } >+} >diff --git a/tests/meson.build b/tests/meson.build >index a856510fc..e590d4348 100644 >--- a/tests/meson.build >+++ b/tests/meson.build >@@ -312,6 +312,7 @@ intel_xe_progs = [ > 'xe_render_copy', > 'xe_vm', > 'xe_waitfence', >+ 'xe_wedged', > 'xe_spin_batch', > 'xe_sysfs_defaults', > 'xe_sysfs_scheduler', >-- >2.44.0 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state 2024-03-13 20:07 ` Lucas De Marchi @ 2024-03-14 17:57 ` Rodrigo Vivi 0 siblings, 0 replies; 4+ messages in thread From: Rodrigo Vivi @ 2024-03-14 17:57 UTC (permalink / raw) To: Lucas De Marchi; +Cc: igt-dev, intel-xe, Himal Prasad Ghimiray On Wed, Mar 13, 2024 at 03:07:44PM -0500, Lucas De Marchi wrote: > On Wed, Mar 13, 2024 at 03:55:28PM -0400, Rodrigo Vivi wrote: > > Let's inject a gt_reset failure that will put Xe device in the > > new wedged state, then we confirm the IOCTL is blocked and we > > reload the driver to get back to a clean state for other test > > execution, since wedged state in Xe is a final state that can only > > be cleared with a module reload. > > > > This new test case is entirely based on xe_uevent provided by > > Himal. > > /me confused... I don't see any uevent handling here. the uevent part is gone, but the failure injection came from there. > > > > > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > --- > > tests/intel/xe_wedged.c | 91 +++++++++++++++++++++++++++++++++++++++++ > > tests/meson.build | 1 + > > 2 files changed, 92 insertions(+) > > create mode 100644 tests/intel/xe_wedged.c > > > > diff --git a/tests/intel/xe_wedged.c b/tests/intel/xe_wedged.c > > new file mode 100644 > > index 000000000..f767e2511 > > --- /dev/null > > +++ b/tests/intel/xe_wedged.c > > @@ -0,0 +1,91 @@ > > +// SPDX-License-Identifier: MIT > > +/* > > + * Copyright © 2024 Intel Corporation > > + */ > > + > > +/** > > + * TEST: cause fake gt reset failure which put Xe device in wedged state > > + * Category: Software building block > > + * Sub-category: driver > > + * Functionality: wedged > > + * Test category: functionality test > > + */ > > + > > +#include "igt.h" > > +#include "igt_kmod.h" > > + > > +#include "xe/xe_ioctl.h" > > + > > +static void force_wedged(int fd) > > +{ > > + igt_debugfs_write(fd, "fail_gt_reset/probability", "100"); > > + igt_debugfs_write(fd, "fail_gt_reset/times", "2"); > > + > > + xe_force_gt_reset(fd, 0); > > humn... do we have to check the writes above did anything? unfortunately the debugfs_write is a void return... we could read it back, but I don't believe it brings anything... > I also don't > see the kernel side, but if it just resets normally, the test would > still pass afaics. nope, if the reset works normally without injecting the failure and declaring the gt busted, then we would fail below igt_assert_eq(simple_ioctl(fd), 0); force_busted(fd); igt_assert_neq(simple_ioctl(fd), 0); fd = rebind_xe(fd); igt_assert_eq(simple_ioctl(fd), 0); notice that the middle one is != 0, but I'm considering to change that to igt_assert_eq(simple_ioctl(fd), -ECANCELED); for clarity. > > > + sleep(1); > > +} > > + > > +static int reload_xe(int fd) > > +{ > > + int error; > > + > > + drm_close_driver(fd); > > + igt_xe_driver_unload(); > > > what if we are running on e.g. MTL with a DG2 and want to debug one of > them? Rather than re-loading the module and possibly causing unrelated > issues (if e.g. module removal from the other card crashes), why not > just unbind the module from the card under test? > > i.e. the equivalent in C of: > > rebind() { > pci_slot=$1 > echo -n "0000:$pci_slot" > /sys/bus/pci/drivers/$driver/unbind > echo -n "0000:$pci_slot" > /sys/bus/pci/drivers/$driver/bind > } Thanks, that indeed is a better choice. the only caveat is that I need to close the main fd client for a proper exit before we can rebind cleanly. But I could finally get that working. > > Lucas De Marchi > > > + > > + error = igt_xe_driver_load(NULL); > > + > > + igt_assert_eq(error, 0); > > + > > + /* driver is ready, check if it's bound */ > > + fd = __drm_open_driver(DRIVER_XE); > > + igt_fail_on_f(fd < 0, "Cannot open the xe DRM driver while reloading xe after wedged\n"); > > + return fd; > > +} > > + > > +static int simple_ioctl(int fd) > > +{ > > + int ret; > > + > > + struct drm_xe_vm_create create = { > > + .extensions = 0, > > + .flags = 0, > > + }; > > + > > + ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create); > > + > > + if (ret == 0) > > + xe_vm_destroy(fd, create.vm_id); > > + > > + return ret; > > +} > > + > > +/** > > + * SUBTEST: basic-wedged > > + * Description: Force Xe device wedged after injecting a failure in GT reset > > + */ > > +igt_main > > +{ > > + int fd; > > + > > + igt_fixture { > > + fd = drm_open_driver(DRIVER_XE); > > + igt_require(igt_debugfs_exists(fd, "fail_gt_reset/probability", > > + O_RDWR)); > > + } > > + > > + igt_subtest("basic-wedged") { > > + igt_assert_eq(simple_ioctl(fd), 0); > > + force_wedged(fd); > > + igt_assert_neq(simple_ioctl(fd), 0); > > + fd = reload_xe(fd); > > + igt_assert_eq(simple_ioctl(fd), 0); > > + } > > + > > + igt_fixture { > > + if (igt_debugfs_exists(fd, "fail_gt_reset/probability", O_RDWR)) { > > + igt_debugfs_write(fd, "fail_gt_reset/probability", "0"); > > + igt_debugfs_write(fd, "fail_gt_reset/times", "1"); > > + } > > + drm_close_driver(fd); > > + } > > +} > > diff --git a/tests/meson.build b/tests/meson.build > > index a856510fc..e590d4348 100644 > > --- a/tests/meson.build > > +++ b/tests/meson.build > > @@ -312,6 +312,7 @@ intel_xe_progs = [ > > 'xe_render_copy', > > 'xe_vm', > > 'xe_waitfence', > > + 'xe_wedged', > > 'xe_spin_batch', > > 'xe_sysfs_defaults', > > 'xe_sysfs_scheduler', > > -- > > 2.44.0 > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* ✗ CI.Patch_applied: failure for tests/intel/xe_wedged: Introduce a new test for Xe device wedged state 2024-03-13 19:55 [PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi 2024-03-13 20:07 ` Lucas De Marchi @ 2024-03-13 20:45 ` Patchwork 1 sibling, 0 replies; 4+ messages in thread From: Patchwork @ 2024-03-13 20:45 UTC (permalink / raw) To: Rodrigo Vivi; +Cc: intel-xe == Series Details == Series: tests/intel/xe_wedged: Introduce a new test for Xe device wedged state URL : https://patchwork.freedesktop.org/series/131099/ State : failure == Summary == === Applying kernel patches on branch 'drm-tip' with base: === Base commit: 790a1d4e546a drm-tip: 2024y-03m-13d-20h-00m-39s UTC integration manifest === git am output follows === error: tests/meson.build: does not exist in index hint: Use 'git am --show-current-patch=diff' to see the failed patch Applying: tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Patch failed at 0001 tests/intel/xe_wedged: Introduce a new test for Xe device wedged state When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-14 17:58 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-13 19:55 [PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi 2024-03-13 20:07 ` Lucas De Marchi 2024-03-14 17:57 ` Rodrigo Vivi 2024-03-13 20:45 ` ✗ CI.Patch_applied: failure for " Patchwork
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox