Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests
@ 2024-04-03 15:14 Rodrigo Vivi
  2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Rodrigo Vivi @ 2024-04-03 15:14 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-xe, Rodrigo Vivi, Anshuman Gupta

After the introduction of kernel commit
23cf006beac3 ("drm/xe: Runtime PM wake on every IOCTL")
the many ioctl called during dpms_on_off might be forcing
rpm to transition back and forth active to suspend.

Then, when setting the d3cold_allowed during a transitional
runtime_state, we got some situation where the runtime pm
might decide to keep the device awake for a very long time
even with runtime_usage == 0.

Then our tests would start to break like crazy:

(xe_pm:29453) igt_pm-WARNING: timeout: pm_status expected:suspended, got:active
(xe_pm:29453) CRITICAL: Test assertion failure function __igt_unique____real_main473, file ../tests/intel/xe_pm.c:556:
(xe_pm:29453) CRITICAL: Failed assertion: in_d3(device, d->state)
Stack trace:
  #0 ../lib/igt_core.c:1989 __igt_fail_assert()
  #1 ../tests/intel/xe_pm.c:432 __igt_unique____real_main473()
  #2 ../tests/intel/xe_pm.c:473 main()
  #3 [__libc_start_call_main+0x7a]
  #4 [__libc_start_main+0x8b]
  #5 [_start+0x25]
Subtest d3hot-basic failed.
**** DEBUG ****
(xe_pm:29453) igt_pm-WARNING: timeout: pm_status expected:suspended, got:active

By simply waiting the suspended state before we touch d3cold_allowed,
we get our tests back to a sane state.

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 tests/intel/xe_pm.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tests/intel/xe_pm.c b/tests/intel/xe_pm.c
index a0045da0b..fcbed6249 100644
--- a/tests/intel/xe_pm.c
+++ b/tests/intel/xe_pm.c
@@ -121,6 +121,16 @@ static bool setup_d3(device_t device, enum igt_acpi_d_state state)
 {
 	dpms_on_off(device, DRM_MODE_DPMS_OFF);
 
+	/*
+	 * The drm calls used for dpms status above will result in IOCTLs
+	 * that might wake up the device. Let's ensure the device is back
+	 * to a stable suspended state before we can proceed with the
+	 * configuration below, since some strange failures were seen
+	 * when d3cold_allowed is toggle while runtime is in a transition
+	 * state.
+	 */
+	igt_wait_for_pm_status(IGT_RUNTIME_PM_STATUS_SUSPENDED);
+
 	switch (state) {
 	case IGT_ACPI_D3Cold:
 		igt_require(igt_pm_acpi_d3cold_supported(device.pci_root));
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload
  2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
@ 2024-04-03 15:14 ` Rodrigo Vivi
  2024-04-05 17:36   ` Lucas De Marchi
  2024-04-03 15:14 ` [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi
  2024-04-03 17:21 ` ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests Patchwork
  2 siblings, 1 reply; 6+ messages in thread
From: Rodrigo Vivi @ 2024-04-03 15:14 UTC (permalink / raw)
  To: igt-dev
  Cc: intel-xe, Rodrigo Vivi, Lucas De Marchi, Maarten Lankhorst,
	José Roberto de Souza

devcoredump holds a module reference, blocking the module removal.

It is intentional from the devcoredump perspective to keep the
log available even after the unbind/unprobe. However it blocks
our module removal here.

v2: Accepting many suggestions from Lucas.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 lib/igt_kmod.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
index cc242838f..14d51f4f6 100644
--- a/lib/igt_kmod.c
+++ b/lib/igt_kmod.c
@@ -323,6 +323,59 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
 	return err;
 }
 
+static void igt_drop_devcoredump(const char *driver)
+{
+	char sysfspath[PATH_MAX];
+	DIR *dir;
+	char *devcoredump;
+	FILE *data;
+	struct dirent *entry;
+	int len, ret;
+
+	len = snprintf(sysfspath, sizeof(sysfspath),
+		       "/sys/bus/pci/drivers/%s", driver);
+
+	igt_assert(len < sizeof(sysfspath));
+
+	 /* Not a PCI module */
+	if (access(sysfspath, F_OK))
+		return;
+
+	devcoredump = sysfspath + len;
+
+	dir = opendir(sysfspath);
+	igt_assert(dir);
+
+	while ((entry = readdir(dir)) != NULL) {
+		if (entry->d_type != DT_LNK ||
+		    strcmp(entry->d_name, ".") == 0 ||
+		    strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		ret = snprintf(devcoredump, sizeof(sysfspath) - len,
+			       "/%s/devcoredump", entry->d_name);
+
+		igt_assert(ret < sizeof(sysfspath) - len);
+
+		if (access(sysfspath, F_OK) != -1) {
+			igt_info("Removing devcoredump before module unload: %s\n",
+				 sysfspath);
+
+			strcat(sysfspath, "/data");
+			data = fopen(sysfspath, "w");
+			igt_assert(data);
+
+			/*
+			 * Write anything to devcoredump/data to
+			 * force its deletion
+			 */
+			fprintf(data, "1\n");
+			fclose(data);
+		}
+	}
+	closedir(dir);
+}
+
 /**
  * igt_kmod_unload:
  * @mod_name: Module name.
@@ -341,6 +394,8 @@ igt_kmod_unload(const char *mod_name, unsigned int flags)
 	struct kmod_module *kmod;
 	int err;
 
+	igt_drop_devcoredump(mod_name);
+
 	err = kmod_module_new_from_name(ctx, mod_name, &kmod);
 	if (err < 0) {
 		igt_debug("Could not use module %s (%s)\n", mod_name,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state
  2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
  2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
@ 2024-04-03 15:14 ` Rodrigo Vivi
  2024-04-03 17:21 ` ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests Patchwork
  2 siblings, 0 replies; 6+ messages in thread
From: Rodrigo Vivi @ 2024-04-03 15:14 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-xe, Rodrigo Vivi, Himal Prasad Ghimiray

Let's inject a gt_reset failure that will put Xe device in the
new wedged state, then we confirm the IOCTL is blocked and we
reload the driver to get back to a clean state for other test
execution, since wedged state in Xe is a final state that can only
be cleared with a module reload.

This new test case is entirely based on xe_uevent provided by
Himal.

Cc:  Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 tests/intel/xe_wedged.c | 91 +++++++++++++++++++++++++++++++++++++++++
 tests/meson.build       |  1 +
 2 files changed, 92 insertions(+)
 create mode 100644 tests/intel/xe_wedged.c

diff --git a/tests/intel/xe_wedged.c b/tests/intel/xe_wedged.c
new file mode 100644
index 000000000..f767e2511
--- /dev/null
+++ b/tests/intel/xe_wedged.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+/**
+ * TEST: cause fake gt reset failure which put Xe device in wedged state
+ * Category: Software building block
+ * Sub-category: driver
+ * Functionality: wedged
+ * Test category: functionality test
+ */
+
+#include "igt.h"
+#include "igt_kmod.h"
+
+#include "xe/xe_ioctl.h"
+
+static void force_wedged(int fd)
+{
+	igt_debugfs_write(fd, "fail_gt_reset/probability", "100");
+	igt_debugfs_write(fd, "fail_gt_reset/times", "2");
+
+	xe_force_gt_reset(fd, 0);
+	sleep(1);
+}
+
+static int reload_xe(int fd)
+{
+	int error;
+
+	drm_close_driver(fd);
+	igt_xe_driver_unload();
+
+	error = igt_xe_driver_load(NULL);
+
+	igt_assert_eq(error, 0);
+
+	/* driver is ready, check if it's bound */
+	fd = __drm_open_driver(DRIVER_XE);
+	igt_fail_on_f(fd < 0, "Cannot open the xe DRM driver while reloading xe after wedged\n");
+	return fd;
+}
+
+static int simple_ioctl(int fd)
+{
+	int ret;
+
+	struct drm_xe_vm_create create = {
+		.extensions = 0,
+		.flags = 0,
+	};
+
+	ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create);
+
+	if (ret == 0)
+		xe_vm_destroy(fd, create.vm_id);
+
+	return ret;
+}
+
+/**
+ * SUBTEST: basic-wedged
+ * Description: Force Xe device wedged after injecting a failure in GT reset
+ */
+igt_main
+{
+	int fd;
+
+	igt_fixture {
+		fd = drm_open_driver(DRIVER_XE);
+		igt_require(igt_debugfs_exists(fd, "fail_gt_reset/probability",
+					       O_RDWR));
+	}
+
+	igt_subtest("basic-wedged") {
+		igt_assert_eq(simple_ioctl(fd), 0);
+		force_wedged(fd);
+		igt_assert_neq(simple_ioctl(fd), 0);
+		fd = reload_xe(fd);
+		igt_assert_eq(simple_ioctl(fd), 0);
+	}
+
+	igt_fixture {
+		if (igt_debugfs_exists(fd, "fail_gt_reset/probability", O_RDWR)) {
+			igt_debugfs_write(fd, "fail_gt_reset/probability", "0");
+			igt_debugfs_write(fd, "fail_gt_reset/times", "1");
+		}
+		drm_close_driver(fd);
+	}
+}
diff --git a/tests/meson.build b/tests/meson.build
index 02cbc3780..12dd2c16e 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -311,6 +311,7 @@ intel_xe_progs = [
 	'xe_query',
 	'xe_vm',
 	'xe_waitfence',
+	'xe_wedged',
 	'xe_spin_batch',
 	'xe_sysfs_defaults',
 	'xe_sysfs_scheduler',
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests
  2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
  2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
  2024-04-03 15:14 ` [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi
@ 2024-04-03 17:21 ` Patchwork
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2024-04-03 17:21 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

== Series Details ==

Series: series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests
URL   : https://patchwork.freedesktop.org/series/131999/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: c8dc2a19ae71 drm-tip: 2024y-04m-03d-14h-13m-00s UTC integration manifest
=== git am output follows ===
error: tests/intel/xe_pm.c: does not exist in index
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: tests/intel/xe_pm: Fix runtime_pm tests
Patch failed at 0001 tests/intel/xe_pm: Fix runtime_pm tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload
  2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
@ 2024-04-05 17:36   ` Lucas De Marchi
  2024-04-05 17:42     ` Souza, Jose
  0 siblings, 1 reply; 6+ messages in thread
From: Lucas De Marchi @ 2024-04-05 17:36 UTC (permalink / raw)
  To: Rodrigo Vivi
  Cc: igt-dev, intel-xe, Maarten Lankhorst, José Roberto de Souza

On Wed, Apr 03, 2024 at 11:14:08AM -0400, Rodrigo Vivi wrote:
>devcoredump holds a module reference, blocking the module removal.
>
>It is intentional from the devcoredump perspective to keep the
>log available even after the unbind/unprobe. However it blocks
>our module removal here.
>
>v2: Accepting many suggestions from Lucas.
>
>Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>Cc: José Roberto de Souza <jose.souza@intel.com>
>Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>---
> lib/igt_kmod.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
>diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
>index cc242838f..14d51f4f6 100644
>--- a/lib/igt_kmod.c
>+++ b/lib/igt_kmod.c
>@@ -323,6 +323,59 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> 	return err;
> }
>
>+static void igt_drop_devcoredump(const char *driver)
>+{
>+	char sysfspath[PATH_MAX];
>+	DIR *dir;
>+	char *devcoredump;
>+	FILE *data;
>+	struct dirent *entry;
>+	int len, ret;
>+
>+	len = snprintf(sysfspath, sizeof(sysfspath),
>+		       "/sys/bus/pci/drivers/%s", driver);
>+
>+	igt_assert(len < sizeof(sysfspath));
>+
>+	 /* Not a PCI module */
>+	if (access(sysfspath, F_OK))
>+		return;
>+
>+	devcoredump = sysfspath + len;
>+
>+	dir = opendir(sysfspath);
>+	igt_assert(dir);
>+
>+	while ((entry = readdir(dir)) != NULL) {
>+		if (entry->d_type != DT_LNK ||
>+		    strcmp(entry->d_name, ".") == 0 ||
>+		    strcmp(entry->d_name, "..") == 0)
>+			continue;
>+
>+		ret = snprintf(devcoredump, sizeof(sysfspath) - len,
>+			       "/%s/devcoredump", entry->d_name);

I  think this could be simplified a little bit further

		ret = snprintf(devcoredump, sizeof(sysfspath) - len,
			       "/%s/devcoredump/data", entry->d_name);
		igt_assert(ret < sizeof(sysfspath) - len);

		data = fopen(sysfspath, "w");
		if (data) {
			igt_info("Removing devcoredump before module unload: %s\n",
				 sysfspath);

			/*
			 * Write anything to devcoredump/data to
			 * force its deletion
			 */
			fprintf(data, "1\n");
			fclose(data);
		}

so it drops the TOCTOU of access()/open() and make it shorter.
... but totally optional. And untested).


Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Lucas De Marchi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload
  2024-04-05 17:36   ` Lucas De Marchi
@ 2024-04-05 17:42     ` Souza, Jose
  0 siblings, 0 replies; 6+ messages in thread
From: Souza, Jose @ 2024-04-05 17:42 UTC (permalink / raw)
  To: Vivi, Rodrigo, De Marchi, Lucas
  Cc: intel-xe@lists.freedesktop.org, igt-dev@lists.freedesktop.org,
	maarten.lankhorst@linux.intel.com

On Fri, 2024-04-05 at 12:36 -0500, Lucas De Marchi wrote:
> On Wed, Apr 03, 2024 at 11:14:08AM -0400, Rodrigo Vivi wrote:
> > devcoredump holds a module reference, blocking the module removal.
> > 
> > It is intentional from the devcoredump perspective to keep the
> > log available even after the unbind/unprobe. However it blocks
> > our module removal here.

'devcoredump: Add dev_coredump_put()' was reviewed by devcoredump maintainers, so we can remove devcoredump before unload Xe.
So I don't think we will need this patch.

It is still pending on getting pushed but we could add to the topic branches to unblock CI if needed.

> > 
> > v2: Accepting many suggestions from Lucas.
> > 
> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: José Roberto de Souza <jose.souza@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > lib/igt_kmod.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 55 insertions(+)
> > 
> > diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
> > index cc242838f..14d51f4f6 100644
> > --- a/lib/igt_kmod.c
> > +++ b/lib/igt_kmod.c
> > @@ -323,6 +323,59 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> > 	return err;
> > }
> > 
> > +static void igt_drop_devcoredump(const char *driver)
> > +{
> > +	char sysfspath[PATH_MAX];
> > +	DIR *dir;
> > +	char *devcoredump;
> > +	FILE *data;
> > +	struct dirent *entry;
> > +	int len, ret;
> > +
> > +	len = snprintf(sysfspath, sizeof(sysfspath),
> > +		       "/sys/bus/pci/drivers/%s", driver);
> > +
> > +	igt_assert(len < sizeof(sysfspath));
> > +
> > +	 /* Not a PCI module */
> > +	if (access(sysfspath, F_OK))
> > +		return;
> > +
> > +	devcoredump = sysfspath + len;
> > +
> > +	dir = opendir(sysfspath);
> > +	igt_assert(dir);
> > +
> > +	while ((entry = readdir(dir)) != NULL) {
> > +		if (entry->d_type != DT_LNK ||
> > +		    strcmp(entry->d_name, ".") == 0 ||
> > +		    strcmp(entry->d_name, "..") == 0)
> > +			continue;
> > +
> > +		ret = snprintf(devcoredump, sizeof(sysfspath) - len,
> > +			       "/%s/devcoredump", entry->d_name);
> 
> I  think this could be simplified a little bit further
> 
> 		ret = snprintf(devcoredump, sizeof(sysfspath) - len,
> 			       "/%s/devcoredump/data", entry->d_name);
> 		igt_assert(ret < sizeof(sysfspath) - len);
> 
> 		data = fopen(sysfspath, "w");
> 		if (data) {
> 			igt_info("Removing devcoredump before module unload: %s\n",
> 				 sysfspath);
> 
> 			/*
> 			 * Write anything to devcoredump/data to
> 			 * force its deletion
> 			 */
> 			fprintf(data, "1\n");
> 			fclose(data);
> 		}
> 
> so it drops the TOCTOU of access()/open() and make it shorter.
> ... but totally optional. And untested).
> 
> 
> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
> 
> Lucas De Marchi


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-05 17:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03 15:14 [PATCH i-g-t 1/3] tests/intel/xe_pm: Fix runtime_pm tests Rodrigo Vivi
2024-04-03 15:14 ` [PATCH i-g-t 2/3] lib/igt_kmod: drop devcoredump before a PCI module unload Rodrigo Vivi
2024-04-05 17:36   ` Lucas De Marchi
2024-04-05 17:42     ` Souza, Jose
2024-04-03 15:14 ` [PATCH i-g-t 3/3] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state Rodrigo Vivi
2024-04-03 17:21 ` ✗ CI.Patch_applied: failure for series starting with [i-g-t,1/3] tests/intel/xe_pm: Fix runtime_pm tests Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox