* [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload
2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
@ 2024-04-03 15:14 ` Rodrigo Vivi
2024-04-05 17:36 ` Lucas De Marchi
2024-04-03 15:14 ` [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi
2024-04-03 17:21 ` ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests Patchwork
2 siblings, 1 reply; 6+ messages in thread
From: Rodrigo Vivi @ 2024-04-03 15:14 UTC (permalink / raw)
To: igt-dev
Cc: intel-xe, Rodrigo Vivi, Lucas De Marchi, Maarten Lankhorst,
José Roberto de Souza
devcoredump holds a module reference, blocking the module removal.
It is intentional from the devcoredump perspective to keep the
log available even after the unbind/unprobe. However it blocks
our module removal here.
v2: Accepting many suggestions from Lucas.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
lib/igt_kmod.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
index cc242838f..14d51f4f6 100644
--- a/lib/igt_kmod.c
+++ b/lib/igt_kmod.c
@@ -323,6 +323,59 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
return err;
}
+static void igt_drop_devcoredump(const char *driver)
+{
+ char sysfspath[PATH_MAX];
+ DIR *dir;
+ char *devcoredump;
+ FILE *data;
+ struct dirent *entry;
+ int len, ret;
+
+ len = snprintf(sysfspath, sizeof(sysfspath),
+ "/sys/bus/pci/drivers/%s", driver);
+
+ igt_assert(len < sizeof(sysfspath));
+
+ /* Not a PCI module */
+ if (access(sysfspath, F_OK))
+ return;
+
+ devcoredump = sysfspath + len;
+
+ dir = opendir(sysfspath);
+ igt_assert(dir);
+
+ while ((entry = readdir(dir)) != NULL) {
+ if (entry->d_type != DT_LNK ||
+ strcmp(entry->d_name, ".") == 0 ||
+ strcmp(entry->d_name, "..") == 0)
+ continue;
+
+ ret = snprintf(devcoredump, sizeof(sysfspath) - len,
+ "/%s/devcoredump", entry->d_name);
+
+ igt_assert(ret < sizeof(sysfspath) - len);
+
+ if (access(sysfspath, F_OK) != -1) {
+ igt_info("Removing devcoredump before module unload: %s\n",
+ sysfspath);
+
+ strcat(sysfspath, "/data");
+ data = fopen(sysfspath, "w");
+ igt_assert(data);
+
+ /*
+ * Write anything to devcoredump/data to
+ * force its deletion
+ */
+ fprintf(data, "1\n");
+ fclose(data);
+ }
+ }
+ closedir(dir);
+}
+
/**
* igt_kmod_unload:
* @mod_name: Module name.
@@ -341,6 +394,8 @@ igt_kmod_unload(const char *mod_name, unsigned int flags)
struct kmod_module *kmod;
int err;
+ igt_drop_devcoredump(mod_name);
+
err = kmod_module_new_from_name(ctx, mod_name, &kmod);
if (err < 0) {
igt_debug("Could not use module %s (%s)\n", mod_name,
--
2.44.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload
2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
@ 2024-04-05 17:36 ` Lucas De Marchi
2024-04-05 17:42 ` Souza, Jose
0 siblings, 1 reply; 6+ messages in thread
From: Lucas De Marchi @ 2024-04-05 17:36 UTC (permalink / raw)
To: Rodrigo Vivi
Cc: igt-dev, intel-xe, Maarten Lankhorst, José Roberto de Souza
On Wed, Apr 03, 2024 at 11:14:08AM -0400, Rodrigo Vivi wrote:
>devcoredump holds a module reference, blocking the module removal.
>
>It is intentional from the devcoredump perspective to keep the
>log available even after the unbind/unprobe. However it blocks
>our module removal here.
>
>v2: Accepting many suggestions from Lucas.
>
>Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>---
> lib/igt_kmod.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
>diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
>index cc242838f..14d51f4f6 100644
>--- a/lib/igt_kmod.c
>+++ b/lib/igt_kmod.c
>@@ -323,6 +323,59 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> return err;
> }
>
>+static void igt_drop_devcoredump(const char *driver)
>+{
>+ char sysfspath[PATH_MAX];
>+ DIR *dir;
>+ char *devcoredump;
>+ FILE *data;
>+ struct dirent *entry;
>+ int len, ret;
>+
>+ len = snprintf(sysfspath, sizeof(sysfspath),
>+ "/sys/bus/pci/drivers/%s", driver);
>+
>+ igt_assert(len < sizeof(sysfspath));
>+
>+ /* Not a PCI module */
>+ if (access(sysfspath, F_OK))
>+ return;
>+
>+ devcoredump = sysfspath + len;
>+
>+ dir = opendir(sysfspath);
>+ igt_assert(dir);
>+
>+ while ((entry = readdir(dir)) != NULL) {
>+ if (entry->d_type != DT_LNK ||
>+ strcmp(entry->d_name, ".") == 0 ||
>+ strcmp(entry->d_name, "..") == 0)
>+ continue;
>+
>+ ret = snprintf(devcoredump, sizeof(sysfspath) - len,
>+ "/%s/devcoredump", entry->d_name);
I think this could be simplified a little bit further
ret = snprintf(devcoredump, sizeof(sysfspath) - len,
"/%s/devcoredump/data", entry->d_name);
igt_assert(ret < sizeof(sysfspath) - len);
data = fopen(sysfspath, "w");
if (data) {
igt_info("Removing devcoredump before module unload: %s\n",
sysfspath);
/*
* Write anything to devcoredump/data to
* force its deletion
*/
fprintf(data, "1\n");
fclose(data);
}
so it drops the TOCTOU of access()/open() and make it shorter.
... but totally optional. And untested).
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload
2024-04-05 17:36 ` Lucas De Marchi
@ 2024-04-05 17:42 ` Souza, Jose
0 siblings, 0 replies; 6+ messages in thread
From: Souza, Jose @ 2024-04-05 17:42 UTC (permalink / raw)
To: Vivi, Rodrigo, De Marchi, Lucas
Cc: intel-xe@lists.freedesktop.org, igt-dev@lists.freedesktop.org,
maarten.lankhorst@linux.intel.com
On Fri, 2024-04-05 at 12:36 -0500, Lucas De Marchi wrote:
> On Wed, Apr 03, 2024 at 11:14:08AM -0400, Rodrigo Vivi wrote:
> > devcoredump holds a module reference, blocking the module removal.
> >
> > It is intentional from the devcoredump perspective to keep the
> > log available even after the unbind/unprobe. However it blocks
> > our module removal here.
'devcoredump: Add dev_coredump_put()' was reviewed by devcoredump maintainers, so we can remove devcoredump before unload Xe.
So I don't think we will need this patch.
It is still pending on getting pushed but we could add to the topic branches to unblock CI if needed.
> >
> > v2: Accepting many suggestions from Lucas.
> >
> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: José Roberto de Souza <jose.souza@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > lib/igt_kmod.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 55 insertions(+)
> >
> > diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
> > index cc242838f..14d51f4f6 100644
> > --- a/lib/igt_kmod.c
> > +++ b/lib/igt_kmod.c
> > @@ -323,6 +323,59 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> > return err;
> > }
> >
> > +static void igt_drop_devcoredump(const char *driver)
> > +{
> > + char sysfspath[PATH_MAX];
> > + DIR *dir;
> > + char *devcoredump;
> > + FILE *data;
> > + struct dirent *entry;
> > + int len, ret;
> > +
> > + len = snprintf(sysfspath, sizeof(sysfspath),
> > + "/sys/bus/pci/drivers/%s", driver);
> > +
> > + igt_assert(len < sizeof(sysfspath));
> > +
> > + /* Not a PCI module */
> > + if (access(sysfspath, F_OK))
> > + return;
> > +
> > + devcoredump = sysfspath + len;
> > +
> > + dir = opendir(sysfspath);
> > + igt_assert(dir);
> > +
> > + while ((entry = readdir(dir)) != NULL) {
> > + if (entry->d_type != DT_LNK ||
> > + strcmp(entry->d_name, ".") == 0 ||
> > + strcmp(entry->d_name, "..") == 0)
> > + continue;
> > +
> > + ret = snprintf(devcoredump, sizeof(sysfspath) - len,
> > + "/%s/devcoredump", entry->d_name);
>
> I think this could be simplified a little bit further
>
> ret = snprintf(devcoredump, sizeof(sysfspath) - len,
> "/%s/devcoredump/data", entry->d_name);
> igt_assert(ret < sizeof(sysfspath) - len);
>
> data = fopen(sysfspath, "w");
> if (data) {
> igt_info("Removing devcoredump before module unload: %s\n",
> sysfspath);
>
> /*
> * Write anything to devcoredump/data to
> * force its deletion
> */
> fprintf(data, "1\n");
> fclose(data);
> }
>
> so it drops the TOCTOU of access()/open() and make it shorter.
> ... but totally optional. And untested).
>
>
> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
>
> Lucas De Marchi
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state
2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
@ 2024-04-03 15:14 ` Rodrigo Vivi
2024-04-03 17:21 ` ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests Patchwork
2 siblings, 0 replies; 6+ messages in thread
From: Rodrigo Vivi @ 2024-04-03 15:14 UTC (permalink / raw)
To: igt-dev; +Cc: intel-xe, Rodrigo Vivi, Himal Prasad Ghimiray
Let's inject a gt_reset failure that will put Xe device in the
new wedged state, then we confirm the IOCTL is blocked and we
reload the driver to get back to a clean state for other test
execution, since wedged state in Xe is a final state that can only
be cleared with a module reload.
This new test case is entirely based on xe_uevent provided by
Himal.
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
tests/intel/xe_wedged.c | 91 +++++++++++++++++++++++++++++++++++++++++
tests/meson.build | 1 +
2 files changed, 92 insertions(+)
create mode 100644 tests/intel/xe_wedged.c
diff --git a/tests/intel/xe_wedged.c b/tests/intel/xe_wedged.c
new file mode 100644
index 000000000..f767e2511
--- /dev/null
+++ b/tests/intel/xe_wedged.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+/**
+ * TEST: cause fake gt reset failure which put Xe device in wedged state
+ * Category: Software building block
+ * Sub-category: driver
+ * Functionality: wedged
+ * Test category: functionality test
+ */
+
+#include "igt.h"
+#include "igt_kmod.h"
+
+#include "xe/xe_ioctl.h"
+
+static void force_wedged(int fd)
+{
+ igt_debugfs_write(fd, "fail_gt_reset/probability", "100");
+ igt_debugfs_write(fd, "fail_gt_reset/times", "2");
+
+ xe_force_gt_reset(fd, 0);
+ sleep(1);
+}
+
+static int reload_xe(int fd)
+{
+ int error;
+
+ drm_close_driver(fd);
+ igt_xe_driver_unload();
+
+ error = igt_xe_driver_load(NULL);
+
+ igt_assert_eq(error, 0);
+
+ /* driver is ready, check if it's bound */
+ fd = __drm_open_driver(DRIVER_XE);
+ igt_fail_on_f(fd < 0, "Cannot open the xe DRM driver while reloading xe after wedged\n");
+ return fd;
+}
+
+static int simple_ioctl(int fd)
+{
+ int ret;
+
+ struct drm_xe_vm_create create = {
+ .extensions = 0,
+ .flags = 0,
+ };
+
+ ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create);
+
+ if (ret == 0)
+ xe_vm_destroy(fd, create.vm_id);
+
+ return ret;
+}
+
+/**
+ * SUBTEST: basic-wedged
+ * Description: Force Xe device wedged after injecting a failure in GT reset
+ */
+igt_main
+{
+ int fd;
+
+ igt_fixture {
+ fd = drm_open_driver(DRIVER_XE);
+ igt_require(igt_debugfs_exists(fd, "fail_gt_reset/probability",
+ O_RDWR));
+ }
+
+ igt_subtest("basic-wedged") {
+ igt_assert_eq(simple_ioctl(fd), 0);
+ force_wedged(fd);
+ igt_assert_neq(simple_ioctl(fd), 0);
+ fd = reload_xe(fd);
+ igt_assert_eq(simple_ioctl(fd), 0);
+ }
+
+ igt_fixture {
+ if (igt_debugfs_exists(fd, "fail_gt_reset/probability", O_RDWR)) {
+ igt_debugfs_write(fd, "fail_gt_reset/probability", "0");
+ igt_debugfs_write(fd, "fail_gt_reset/times", "1");
+ }
+ drm_close_driver(fd);
+ }
+}
diff --git a/tests/meson.build b/tests/meson.build
index 02cbc3780..12dd2c16e 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -311,6 +311,7 @@ intel_xe_progs = [
'xe_query',
'xe_vm',
'xe_waitfence',
+ 'xe_wedged',
'xe_spin_batch',
'xe_sysfs_defaults',
'xe_sysfs_scheduler',
--
2.44.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests
2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
2024-04-03 15:14 ` [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi
@ 2024-04-03 17:21 ` Patchwork
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2024-04-03 17:21 UTC (permalink / raw)
To: Rodrigo Vivi; +Cc: intel-xe
== Series Details ==
Series: series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests
URL : https://patchwork.freedesktop.org/series/131999/
State : failure
== Summary ==
=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: c8dc2a19ae71 drm-tip: 2024y-04m-03d-14h-13m-00s UTC integration manifest
=== git am output follows ===
error: tests/intel/xe_pm.c: does not exist in index
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: tests/intel/xe_pm: Fix runtime_pm tests
Patch failed at 0001 tests/intel/xe_pm: Fix runtime_pm tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
^ permalink raw reply [flat|nested] 6+ messages in thread