Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements
@ 2020-08-28  7:59 Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 01/21] tests/core_hotunplug: Use igt_assert_fd() Janusz Krzysztofik
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Clean up the test code, add some new basic subtests, then unblock
unbind test variants.

No incompletes / aborts have been reported by Trybot this time.

Series changelog:
v2: New patch "Un-blocklist *bind* subtests added.
v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
    from subtest failures".
  - a new patche "Clean up device open error handling" added, an old
    patch "Fix missing newline" obsoleted by the new one dropped,
  - other new patches added:
    - "Let the driver time out essential sysfs operations",
    - "More thorough i915 healthcheck and recovery",
  - a patch "Add 'lateclose before restore' variants" from another
    series included.
v4: Optional patch "Duplicate debug messages in dmesg" from another
    series included.
v5: New patch added with Haswell audio related kernel warning worked
    around and replaced with an IGT warning to preserve visibility of
    the issue.

@Michał: Since some patch updates are trivial, I've preserved your
v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
marked your R-b as v1/v2 applicable.  Please have a look and confirm if
you are still OK with them.

@Tvrtko: As I already asked before, please support my attempt to remove
the unbind test variants from the blocklist.

@Petri, @Martin: Assuming CI results will be as good as those obtained
on Trybot, please give me your green light for merging this series if
you have no objections.

Thanks,
Janusz


Janusz Krzysztofik (21):
  tests/core_hotunplug: Use igt_assert_fd()
  tests/core_hotunplug: Constify dev_bus_addr string
  tests/core_hotunplug: Clean up device open error handling
  tests/core_hotunplug: Consolidate duplicated debug messages
  tests/core_hotunplug: Assert successful device filter application
  tests/core_hotunplug: Maintain a single data structure instance
  tests/core_hotunplug: Pass errors via a data structure field
  tests/core_hotunplug: Handle device close errors
  tests/core_hotunplug: Prepare invariant data once per test run
  tests/core_hotunplug: Skip selectively on sysfs close errors
  tests/core_hotunplug: Recover from subtest failures
  tests/core_hotunplug: Fail subtests on device close errors
  tests/core_hotunplug: Let the driver time out essential sysfs
    operations
  tests/core_hotunplug: Process return values of sysfs operations
  tests/core_hotunplug: Assert expected device presence/absence
  tests/core_hotunplug: Explicitly ignore unused return values
  tests/core_hotunplug: More thorough i915 healthcheck and recovery
  tests/core_hotunplug: Add 'lateclose before restore' variants
  tests/core_hotunplug: Duplicate debug messages in dmesg
  tests/core_hotunplug: HSW audio issue workaround
  tests/core_hotunplug: Un-blocklist *bind* subtests

 tests/core_hotunplug.c       | 542 ++++++++++++++++++++++++++---------
 tests/intel-ci/blacklist.txt |   2 +-
 2 files changed, 410 insertions(+), 134 deletions(-)

-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 01/21] tests/core_hotunplug: Use igt_assert_fd()
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 02/21] tests/core_hotunplug: Constify dev_bus_addr string Janusz Krzysztofik
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

There is a new library helper that asserts validity of open file
descriptors.  Use it instead of open coding.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index e03f3b945..7431346b1 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -57,7 +57,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 
 	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
 				    O_DIRECTORY);
-	igt_assert(priv->fd.sysfs_drv >= 0);
+	igt_assert_fd(priv->fd.sysfs_drv);
 
 	len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
 	buf[len] = '\0';
@@ -72,10 +72,10 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
 	igt_debug("opening device\n");
 	priv->fd.drm = __drm_open_driver(DRIVER_ANY);
-	igt_assert(priv->fd.drm >= 0);
+	igt_assert_fd(priv->fd.drm);
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
-	igt_assert(priv->fd.sysfs_dev >= 0);
+	igt_assert_fd(priv->fd.sysfs_dev);
 
 	if (buf) {
 		prepare_for_unbind(priv, buf, buflen);
@@ -83,7 +83,7 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 		/* prepare for bus rescan */
 		priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
 					    "device/subsystem", O_DIRECTORY);
-		igt_assert(priv->fd.sysfs_bus >= 0);
+		igt_assert_fd(priv->fd.sysfs_bus);
 	}
 }
 
@@ -261,7 +261,7 @@ igt_main
 		 * a device file descriptor open for exit handler use.
 		 */
 		fd_drm = __drm_open_driver(DRIVER_ANY);
-		igt_assert(fd_drm >= 0);
+		igt_assert_fd(fd_drm);
 
 		if (is_i915_device(fd_drm))
 			igt_require_gem(fd_drm);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 02/21] tests/core_hotunplug: Constify dev_bus_addr string
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 01/21] tests/core_hotunplug: Use igt_assert_fd() Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 03/21] tests/core_hotunplug: Clean up device open error handling Janusz Krzysztofik
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Device bus address structure field is always initialized with a pointer
to a substring of the device sysfs path and never used for its
modification.  Declare it as a constant string.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 7431346b1..a4071f51e 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -44,7 +44,7 @@ struct hotunplug {
 		int sysfs_bus;
 		int sysfs_drv;
 	} fd;
-	char *dev_bus_addr;
+	const char *dev_bus_addr;
 };
 
 /* Helpers */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 03/21] tests/core_hotunplug: Clean up device open error handling
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 01/21] tests/core_hotunplug: Use igt_assert_fd() Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 02/21] tests/core_hotunplug: Constify dev_bus_addr string Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 04/21] tests/core_hotunplug: Consolidate duplicated debug messages Janusz Krzysztofik
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

We don't use drm_driver_open() since in case of an i915 device it keeps
an extra file descriptor of the exercised device open for exit handler
use, while we would like to be able to close the device completely
before running certain test operations.  Instead, we call
__drm_driver_open() and handle its result ourselves.  Unlike
drm_driver_open() which skips on device open errors, we always fail or
abort the test in such case.  Moreover, we don't ensure that the i915
driver is idle before starting subtests like drm_open_driver() does.

Skip instead of failing on initial device open error.  Also, call
gem_quiescent_gpu() if an i915 device is detected.  For subsequent
device opens, define a local helper that fails on error and use it.  If
we think we need to abort the test execution on device open error, set
our failure marker first to trigger the abort from a follow up
igt_fixture section.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index a4071f51e..e576a6c6c 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -49,6 +49,21 @@ struct hotunplug {
 
 /* Helpers */
 
+/**
+ * Subtests must be able to close examined devices completely.  Don't
+ * use drm_open_driver() since in case of an i915 device it opens it
+ * twice and keeps a second file descriptor open for exit handler use.
+ */
+static int local_drm_open_driver(void)
+{
+	int fd_drm;
+
+	fd_drm = __drm_open_driver(DRIVER_ANY);
+	igt_assert_fd(fd_drm);
+
+	return fd_drm;
+}
+
 static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 {
 	int len;
@@ -71,8 +86,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
 	igt_debug("opening device\n");
-	priv->fd.drm = __drm_open_driver(DRIVER_ANY);
-	igt_assert_fd(priv->fd.drm);
+	priv->fd.drm = local_drm_open_driver();
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -145,8 +159,9 @@ static void healthcheck(void)
 	igt_devices_scan(true);
 
 	igt_debug("reopening the device\n");
-	fd_drm = __drm_open_driver(DRIVER_ANY);
-	igt_abort_on_f(fd_drm < 0, "Device reopen failure");
+	failure = "Device reopen failure!";
+	fd_drm = local_drm_open_driver();
+	failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
 		failure = "GEM failure";
@@ -255,16 +270,13 @@ igt_main
 	igt_fixture {
 		int fd_drm;
 
-		/**
-		 * As subtests must be able to close examined devices
-		 * completely, don't use drm_open_driver() as it keeps
-		 * a device file descriptor open for exit handler use.
-		 */
 		fd_drm = __drm_open_driver(DRIVER_ANY);
-		igt_assert_fd(fd_drm);
+		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
 
-		if (is_i915_device(fd_drm))
+		if (is_i915_device(fd_drm)) {
+			gem_quiescent_gpu(fd_drm);
 			igt_require_gem(fd_drm);
+		}
 
 		/* Make sure subtests always reopen the same device */
 		set_filter_from_device(fd_drm);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 04/21] tests/core_hotunplug: Consolidate duplicated debug messages
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (2 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 03/21] tests/core_hotunplug: Clean up device open error handling Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 05/21] tests/core_hotunplug: Assert successful device filter application Janusz Krzysztofik
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Some debug messages which designate specific test operations, or their
greater parts at least, sound always the same, no matter which subtest
they are called from.  Emit them, possibly updated with subtest
specified modifiers, from inside respective helpers instead of
duplicating them in subtest bodies.

v2: Rebase only.
v3: Refresh and extend over new case (local_drm_open_driver),
  - allow callers to specify a message suffix as well where applicable.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index e576a6c6c..5093233d7 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -54,10 +54,12 @@ struct hotunplug {
  * use drm_open_driver() since in case of an i915 device it opens it
  * twice and keeps a second file descriptor open for exit handler use.
  */
-static int local_drm_open_driver(void)
+static int local_drm_open_driver(const char *prefix, const char *suffix)
 {
 	int fd_drm;
 
+	igt_debug("%sopening device%s\n", prefix, suffix);
+
 	fd_drm = __drm_open_driver(DRIVER_ANY);
 	igt_assert_fd(fd_drm);
 
@@ -85,8 +87,7 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
-	igt_debug("opening device\n");
-	priv->fd.drm = local_drm_open_driver();
+	priv->fd.drm = local_drm_open_driver("", " for subtest");
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -104,8 +105,11 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 static const char *failure;
 
 /* Unbind the driver from the device */
-static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
+static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr,
+			  const char *prefix)
 {
+	igt_debug("%sunbinding the driver from the device\n", prefix);
+
 	failure = "Driver unbind timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
@@ -118,6 +122,8 @@ static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
 /* Re-bind the driver to the device */
 static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
 {
+	igt_debug("rebinding the driver to the device\n");
+
 	failure = "Driver re-bind timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
@@ -128,8 +134,10 @@ static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(int fd_sysfs_dev)
+static void device_unplug(int fd_sysfs_dev, const char *prefix)
 {
+	igt_debug("%sunplugging the device\n", prefix);
+
 	failure = "Device unplug timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
@@ -142,6 +150,8 @@ static void device_unplug(int fd_sysfs_dev)
 /* Re-discover the device by rescanning its bus */
 static void bus_rescan(int fd_sysfs_bus)
 {
+	igt_debug("rediscovering the device\n");
+
 	failure = "Bus rescan timeout!";
 	igt_set_timeout(60, failure);
 	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
@@ -158,9 +168,8 @@ static void healthcheck(void)
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	igt_debug("reopening the device\n");
 	failure = "Device reopen failure!";
-	fd_drm = local_drm_open_driver();
+	fd_drm = local_drm_open_driver("re", " for healthcheck");
 	failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
@@ -199,10 +208,8 @@ static void unbind_rebind(void)
 	igt_debug("closing the device\n");
 	close(priv.fd.drm);
 
-	igt_debug("unbinding the driver from the device\n");
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "");
 
-	igt_debug("rebinding the driver to the device\n");
 	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
 
 	healthcheck();
@@ -217,10 +224,8 @@ static void unplug_rescan(void)
 	igt_debug("closing the device\n");
 	close(priv.fd.drm);
 
-	igt_debug("unplugging the device\n");
-	device_unplug(priv.fd.sysfs_dev);
+	device_unplug(priv.fd.sysfs_dev, "");
 
-	igt_debug("recovering the device\n");
 	bus_rescan(priv.fd.sysfs_bus);
 
 	healthcheck();
@@ -233,10 +238,8 @@ static void hotunbind_lateclose(void)
 
 	prepare(&priv, buf, sizeof(buf));
 
-	igt_debug("hot unbinding the driver from the device\n");
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "hot ");
 
-	igt_debug("rebinding the driver to the device\n");
 	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
 
 	igt_debug("late closing the unbound device instance\n");
@@ -251,10 +254,8 @@ static void hotunplug_lateclose(void)
 
 	prepare(&priv, NULL, 0);
 
-	igt_debug("hot unplugging the device\n");
-	device_unplug(priv.fd.sysfs_dev);
+	device_unplug(priv.fd.sysfs_dev, "hot ");
 
-	igt_debug("recovering the device\n");
 	bus_rescan(priv.fd.sysfs_bus);
 
 	igt_debug("late closing the removed device instance\n");
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 05/21] tests/core_hotunplug: Assert successful device filter application
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (3 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 04/21] tests/core_hotunplug: Consolidate duplicated debug messages Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 06/21] tests/core_hotunplug: Maintain a single data structure instance Janusz Krzysztofik
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Return value of igt_device_filter_add() representing a number of
successfully installed device filters is now ignored.  Fail if not 1.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 5093233d7..46f9ad118 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -193,7 +193,7 @@ static void set_filter_from_device(int fd)
 	igt_assert(realpath(path, dst));
 
 	igt_device_filter_free_all();
-	igt_device_filter_add(filter);
+	igt_assert_eq(igt_device_filter_add(filter), 1);
 }
 
 /* Subtests */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 06/21] tests/core_hotunplug: Maintain a single data structure instance
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (4 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 05/21] tests/core_hotunplug: Assert successful device filter application Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 07/21] tests/core_hotunplug: Pass errors via a data structure field Janusz Krzysztofik
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The following changes to the test are planned:
- avoid global variables if possible,
- prepare invariant data only once per test run,
- skip subsequent subtests after device close errors,
- allow subtests to fail on errors and try to recover from those
  failures in follow up igt dixture sections instead of aborting.
For that to be possible, maintain a single instance of hotunplug
structure at igt_main level and pass it down to subtests.

v2: Commit description refreshed.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 56 ++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 46f9ad118..95d326ee9 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -198,68 +198,62 @@ static void set_filter_from_device(int fd)
 
 /* Subtests */
 
-static void unbind_rebind(void)
+static void unbind_rebind(struct hotunplug *priv)
 {
-	struct hotunplug priv;
 	char buf[PATH_MAX];
 
-	prepare(&priv, buf, sizeof(buf));
+	prepare(priv, buf, sizeof(buf));
 
 	igt_debug("closing the device\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "");
+	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "");
 
-	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
 
 	healthcheck();
 }
 
-static void unplug_rescan(void)
+static void unplug_rescan(struct hotunplug *priv)
 {
-	struct hotunplug priv;
-
-	prepare(&priv, NULL, 0);
+	prepare(priv, NULL, 0);
 
 	igt_debug("closing the device\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
-	device_unplug(priv.fd.sysfs_dev, "");
+	device_unplug(priv->fd.sysfs_dev, "");
 
-	bus_rescan(priv.fd.sysfs_bus);
+	bus_rescan(priv->fd.sysfs_bus);
 
 	healthcheck();
 }
 
-static void hotunbind_lateclose(void)
+static void hotunbind_lateclose(struct hotunplug *priv)
 {
-	struct hotunplug priv;
 	char buf[PATH_MAX];
 
-	prepare(&priv, buf, sizeof(buf));
+	prepare(priv, buf, sizeof(buf));
 
-	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr, "hot ");
+	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "hot ");
 
-	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
 
 	igt_debug("late closing the unbound device instance\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
 	healthcheck();
 }
 
-static void hotunplug_lateclose(void)
+static void hotunplug_lateclose(struct hotunplug *priv)
 {
-	struct hotunplug priv;
-
-	prepare(&priv, NULL, 0);
+	prepare(priv, NULL, 0);
 
-	device_unplug(priv.fd.sysfs_dev, "hot ");
+	device_unplug(priv->fd.sysfs_dev, "hot ");
 
-	bus_rescan(priv.fd.sysfs_bus);
+	bus_rescan(priv->fd.sysfs_bus);
 
 	igt_debug("late closing the removed device instance\n");
-	close(priv.fd.drm);
+	close(priv->fd.drm);
 
 	healthcheck();
 }
@@ -268,6 +262,8 @@ static void hotunplug_lateclose(void)
 
 igt_main
 {
+	struct hotunplug priv;
+
 	igt_fixture {
 		int fd_drm;
 
@@ -287,28 +283,28 @@ igt_main
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
 	igt_subtest("unbind-rebind")
-		unbind_rebind();
+		unbind_rebind(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
-		unplug_rescan();
+		unplug_rescan(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
-		hotunbind_lateclose();
+		hotunbind_lateclose(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
-		hotunplug_lateclose();
+		hotunplug_lateclose(&priv);
 
 	igt_fixture
 		igt_abort_on_f(failure, "%s\n", failure);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 07/21] tests/core_hotunplug: Pass errors via a data structure field
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (5 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 06/21] tests/core_hotunplug: Maintain a single data structure instance Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 08/21] tests/core_hotunplug: Handle device close errors Janusz Krzysztofik
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

A pointer to fatal error messages can be passed around via hotunplug
structure, no need to declare it as global.

v2: Rebase only.
v3: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 96 +++++++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 49 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 95d326ee9..4f7e89c95 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -45,6 +45,7 @@ struct hotunplug {
 		int sysfs_drv;
 	} fd;
 	const char *dev_bus_addr;
+	const char *failure;
 };
 
 /* Helpers */
@@ -102,80 +103,77 @@ static void prepare(struct hotunplug *priv, char *buf, int buflen)
 	}
 }
 
-static const char *failure;
-
 /* Unbind the driver from the device */
-static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr,
-			  const char *prefix)
+static void driver_unbind(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
 
-	failure = "Driver unbind timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
+	priv->failure = "Driver unbind timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	/* don't close fd_sysfs_drv, it will be used for driver rebinding */
+	/* don't close fd.sysfs_drv, it will be used for driver rebinding */
 }
 
 /* Re-bind the driver to the device */
-static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
+static void driver_bind(struct hotunplug *priv)
 {
 	igt_debug("rebinding the driver to the device\n");
 
-	failure = "Driver re-bind timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
+	priv->failure = "Driver re-bind timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_drv);
+	close(priv->fd.sysfs_drv);
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(int fd_sysfs_dev, const char *prefix)
+static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunplugging the device\n", prefix);
 
-	failure = "Device unplug timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
+	priv->failure = "Device unplug timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_dev, "device/remove", "1");
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_dev);
+	close(priv->fd.sysfs_dev);
 }
 
 /* Re-discover the device by rescanning its bus */
-static void bus_rescan(int fd_sysfs_bus)
+static void bus_rescan(struct hotunplug *priv)
 {
 	igt_debug("rediscovering the device\n");
 
-	failure = "Bus rescan timeout!";
-	igt_set_timeout(60, failure);
-	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
+	priv->failure = "Bus rescan timeout!";
+	igt_set_timeout(60, priv->failure);
+	igt_sysfs_set(priv->fd.sysfs_bus, "rescan", "1");
 	igt_reset_timeout();
-	failure = NULL;
+	priv->failure = NULL;
 
-	close(fd_sysfs_bus);
+	close(priv->fd.sysfs_bus);
 }
 
-static void healthcheck(void)
+static void healthcheck(struct hotunplug *priv)
 {
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
-	failure = "Device reopen failure!";
+	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for healthcheck");
-	failure = NULL;
+	priv->failure = NULL;
 
 	if (is_i915_device(fd_drm)) {
-		failure = "GEM failure";
+		priv->failure = "GEM failure";
 		igt_require_gem(fd_drm);
-		failure = NULL;
+		priv->failure = NULL;
 	}
 
 	close(fd_drm);
@@ -207,11 +205,11 @@ static void unbind_rebind(struct hotunplug *priv)
 	igt_debug("closing the device\n");
 	close(priv->fd.drm);
 
-	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "");
+	driver_unbind(priv, "");
 
-	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
+	driver_bind(priv);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void unplug_rescan(struct hotunplug *priv)
@@ -221,11 +219,11 @@ static void unplug_rescan(struct hotunplug *priv)
 	igt_debug("closing the device\n");
 	close(priv->fd.drm);
 
-	device_unplug(priv->fd.sysfs_dev, "");
+	device_unplug(priv, "");
 
-	bus_rescan(priv->fd.sysfs_bus);
+	bus_rescan(priv);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void hotunbind_lateclose(struct hotunplug *priv)
@@ -234,35 +232,35 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 	prepare(priv, buf, sizeof(buf));
 
-	driver_unbind(priv->fd.sysfs_drv, priv->dev_bus_addr, "hot ");
+	driver_unbind(priv, "hot ");
 
-	driver_bind(priv->fd.sysfs_drv, priv->dev_bus_addr);
+	driver_bind(priv);
 
 	igt_debug("late closing the unbound device instance\n");
 	close(priv->fd.drm);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
 	prepare(priv, NULL, 0);
 
-	device_unplug(priv->fd.sysfs_dev, "hot ");
+	device_unplug(priv, "hot ");
 
-	bus_rescan(priv->fd.sysfs_bus);
+	bus_rescan(priv);
 
 	igt_debug("late closing the removed device instance\n");
 	close(priv->fd.drm);
 
-	healthcheck();
+	healthcheck(priv);
 }
 
 /* Main */
 
 igt_main
 {
-	struct hotunplug priv;
+	struct hotunplug priv = { .failure = NULL, };
 
 	igt_fixture {
 		int fd_drm;
@@ -286,26 +284,26 @@ igt_main
 		unbind_rebind(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
 		unplug_rescan(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
 		hotunbind_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(failure, "%s\n", failure);
+		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 08/21] tests/core_hotunplug: Handle device close errors
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (6 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 07/21] tests/core_hotunplug: Pass errors via a data structure field Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 09/21] tests/core_hotunplug: Prepare invariant data once per test run Janusz Krzysztofik
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The test now ignores device close errors.  Those errors are believed to
have no influence on device health so there is no need to process them
the same way as we mostly do on errors, i.e., notify CI about a problem
via igt_abort.  However, those errors may indicate issues with the test
itself.  Moreover, impact of those errors on operations performed by
subtests, like driver unbind or device remove, should be perceived as
undefined.  Then, we should fail as soon as a device or device sysfs
node close error occurs in a subtest and also skip subsequent subtests.
However, once a driver unbind or device unplug operation has been
attempted by a subtest, we would still like to check the device health.

When in a subtest, store results of device close operations for future
reference.  Reuse file descriptor fields of the hotunplug structure for
that.  Unless in between of a driver remove or device unplug operation
and a successful device health check completion, fail current test
section right after a device close error occurs, warn otherwise.  If
still running, examine device file descriptor fields in subsequent
igt_fixture sections and skip on errors.

v2: Fix a typo in post_healthcheck function name.
v3: Don't fail on close error after successful health check, warn only,
  - move duplicated messages to helpers.
v4: On start of each subtest assert device file descriptors closed
    cleanly.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 68 +++++++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 14 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 4f7e89c95..2884c3f77 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -43,7 +43,7 @@ struct hotunplug {
 		int sysfs_dev;
 		int sysfs_bus;
 		int sysfs_drv;
-	} fd;
+	} fd;	/* >= 0: valid fd, == -1: closed, < -1: close failed */
 	const char *dev_bus_addr;
 	const char *failure;
 };
@@ -67,6 +67,25 @@ static int local_drm_open_driver(const char *prefix, const char *suffix)
 	return fd_drm;
 }
 
+static int local_close(int fd, const char *message)
+{
+	errno = 0;
+	if (igt_warn_on_f(close(fd), "%s\n", message))
+		return -errno;	/* (never -1) */
+
+	return -1;	/* success - return 'closed' */
+}
+
+static int close_device(int fd_drm)
+{
+	return local_close(fd_drm, "Device close failed");
+}
+
+static int close_sysfs(int fd_sysfs_dev)
+{
+	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
+}
+
 static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 {
 	int len;
@@ -83,11 +102,16 @@ static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
 	igt_assert(priv->dev_bus_addr++);
 
 	/* sysfs_dev no longer needed */
-	close(priv->fd.sysfs_dev);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 static void prepare(struct hotunplug *priv, char *buf, int buflen)
 {
+	/* assert device file descriptors closed cleanly on subtest start */
+	igt_assert_eq(priv->fd.drm, -1);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
+
 	priv->fd.drm = local_drm_open_driver("", " for subtest");
 
 	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
@@ -142,7 +166,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_reset_timeout();
 	priv->failure = NULL;
 
-	close(priv->fd.sysfs_dev);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -161,6 +185,7 @@ static void bus_rescan(struct hotunplug *priv)
 
 static void healthcheck(struct hotunplug *priv)
 {
+	/* preserve error code potentially stored before in priv->fd.drm */
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
@@ -176,7 +201,17 @@ static void healthcheck(struct hotunplug *priv)
 		priv->failure = NULL;
 	}
 
-	close(fd_drm);
+	fd_drm = close_device(fd_drm);
+	if (priv->fd.drm == -1)	/* store result if no error code to preserve */
+		priv->fd.drm = fd_drm;
+}
+
+static void post_healthcheck(struct hotunplug *priv)
+{
+	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
+
+	igt_require(priv->fd.drm == -1);
+	igt_require(priv->fd.sysfs_dev == -1);
 }
 
 static void set_filter_from_device(int fd)
@@ -203,7 +238,8 @@ static void unbind_rebind(struct hotunplug *priv)
 	prepare(priv, buf, sizeof(buf));
 
 	igt_debug("closing the device\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm);
+	igt_assert_eq(priv->fd.drm, -1);
 
 	driver_unbind(priv, "");
 
@@ -217,7 +253,8 @@ static void unplug_rescan(struct hotunplug *priv)
 	prepare(priv, NULL, 0);
 
 	igt_debug("closing the device\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm);
+	igt_assert_eq(priv->fd.drm, -1);
 
 	device_unplug(priv, "");
 
@@ -237,7 +274,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	driver_bind(priv);
 
 	igt_debug("late closing the unbound device instance\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm);
 
 	healthcheck(priv);
 }
@@ -251,7 +288,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	bus_rescan(priv);
 
 	igt_debug("late closing the removed device instance\n");
-	close(priv->fd.drm);
+	priv->fd.drm = close_device(priv->fd.drm);
 
 	healthcheck(priv);
 }
@@ -260,7 +297,10 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 
 igt_main
 {
-	struct hotunplug priv = { .failure = NULL, };
+	struct hotunplug priv = {
+		.fd		= { .drm = -1, .sysfs_dev = -1, },
+		.failure	= NULL,
+	};
 
 	igt_fixture {
 		int fd_drm;
@@ -276,7 +316,7 @@ igt_main
 		/* Make sure subtests always reopen the same device */
 		set_filter_from_device(fd_drm);
 
-		close(fd_drm);
+		igt_assert_eq(close_device(fd_drm), -1);
 	}
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
@@ -284,26 +324,26 @@ igt_main
 		unbind_rebind(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
 	igt_subtest("unplug-rescan")
 		unplug_rescan(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
 	igt_subtest("hotunbind-lateclose")
 		hotunbind_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 
 	igt_describe("Check if a still open device can be cleanly unplugged, then released");
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
 	igt_fixture
-		igt_abort_on_f(priv.failure, "%s\n", priv.failure);
+		post_healthcheck(&priv);
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 09/21] tests/core_hotunplug: Prepare invariant data once per test run
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (7 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 08/21] tests/core_hotunplug: Handle device close errors Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 10/21] tests/core_hotunplug: Skip selectively on sysfs close errors Janusz Krzysztofik
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Each subtest now calls a prepare() helper which opens a couple of files
required by that subtest.  Those files are then closed after use,
either directly from the subtest body, or indirectly from inside one of
helper functions called during the subtest execution.  That approach
not only makes life cycle of individual file descriptors difficult to
follow but also prevents us from re-running health checks on subtest
failures from follow up igt_fixture sections since we may need to retry
bus rescan or driver rebind operations.

Two of those files - device bus and driver sysfs nodes - are not
affected nor interfere with driver unbind / device unplug operations
performed by subtests.  Then, there is not much sense in closing and
reopening those nodes.  Open them once at the beginning of a test run,
then close them as late as on test completion.

The prepare() helper also populates a device bus address string used by
driver unbind / rebind operations.  Since the bus address of an
exercised device never changes, also prepare that string only once at
the beginning of a test run.  Note that it is the same as the last
component of a device filter string which is already resolved and
installed from an initial igt_fixture section of the test.  Then,
initialize the device bus address field of a hotunplug structure
instance with a pointer to the respective substring of that filter
rather than resolving it again from the device sysfs node pathname.

There is one more sysfs node - a DRM device node - now opened by the
prepare() helper for subtests which perform device remove operations.
That node can't be opened only once at the beginning of a test run
because its open file descriptor is no longer usable as soon as a
driver unbind operation is performed.  On the other hand, it can't be
opened easily from inside a device_remove() helper since some subtests
just don't open the device so its file descriptor used by
igt_sysfs_open() may just not be available.  However, note that only a
PCI sysfs node of the device, not necessarily the DRM one, is actually
required for a successful device remove operation, and that node can be
opened easily from a bus file descriptor using a device bus address
string, both already available.  Then, change the semantics of a
.fd.sysfs_dev field of the hotunplug structure from DRM to PCI device
sysfs file descriptor, then let the device_remove() helper open the
device PCI node by itself and store its file descriptor in that field.
Also, for still more easy access to the device PCI node, use a
'subsystem/devices' sub-node of the PCI device as its bus sysfs
location instead of just 'subsystem', then adjust a relative path to
the bus 'rescan' function accordingly.

A side benefit of using the PCI device sysfs node, not the DRM one,
while removing the device is that a future subtest may now easily
perform both driver unbind and device remove operations in a row.

v2: Rebase only.
v3: Refresh.
v4: Still assert a device dile descriptor closed cleanly on subtest
    start, a device sysfs file descriptor still before open.

Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 85 ++++++++++++++++--------------------------
 1 file changed, 33 insertions(+), 52 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 2884c3f77..1da0e5a9f 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -86,45 +86,31 @@ static int close_sysfs(int fd_sysfs_dev)
 	return local_close(fd_sysfs_dev, "Device sysfs node close failed");
 }
 
-static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
+static void prepare(struct hotunplug *priv)
 {
-	int len;
+	const char *filter = igt_device_filter_get(0), *sysfs_path;
 
-	igt_assert(buflen);
+	igt_assert(filter);
 
-	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
-				    O_DIRECTORY);
-	igt_assert_fd(priv->fd.sysfs_drv);
-
-	len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
-	buf[len] = '\0';
-	priv->dev_bus_addr = strrchr(buf, '/');
+	priv->dev_bus_addr = strrchr(filter, '/');
 	igt_assert(priv->dev_bus_addr++);
 
-	/* sysfs_dev no longer needed */
-	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
-}
+	sysfs_path = strchr(filter, ':');
+	igt_assert(sysfs_path++);
 
-static void prepare(struct hotunplug *priv, char *buf, int buflen)
-{
-	/* assert device file descriptors closed cleanly on subtest start */
-	igt_assert_eq(priv->fd.drm, -1);
 	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	priv->fd.sysfs_dev = open(sysfs_path, O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_dev);
 
-	priv->fd.drm = local_drm_open_driver("", " for subtest");
+	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "driver", O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_drv);
 
-	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
-	igt_assert_fd(priv->fd.sysfs_dev);
+	priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev, "subsystem/devices",
+				    O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_bus);
 
-	if (buf) {
-		prepare_for_unbind(priv, buf, buflen);
-	} else {
-		/* prepare for bus rescan */
-		priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
-					    "device/subsystem", O_DIRECTORY);
-		igt_assert_fd(priv->fd.sysfs_bus);
-	}
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Unbind the driver from the device */
@@ -137,8 +123,6 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix)
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	/* don't close fd.sysfs_drv, it will be used for driver rebinding */
 }
 
 /* Re-bind the driver to the device */
@@ -151,18 +135,21 @@ static void driver_bind(struct hotunplug *priv)
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	close(priv->fd.sysfs_drv);
 }
 
 /* Remove (virtually unplug) the device from its bus */
 static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	priv->fd.sysfs_dev = openat(priv->fd.sysfs_bus, priv->dev_bus_addr,
+				    O_DIRECTORY);
+	igt_assert_fd(priv->fd.sysfs_dev);
+
 	igt_debug("%sunplugging the device\n", prefix);
 
 	priv->failure = "Device unplug timeout!";
 	igt_set_timeout(60, priv->failure);
-	igt_sysfs_set(priv->fd.sysfs_dev, "device/remove", "1");
+	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
 	priv->failure = NULL;
 
@@ -176,11 +163,9 @@ static void bus_rescan(struct hotunplug *priv)
 
 	priv->failure = "Bus rescan timeout!";
 	igt_set_timeout(60, priv->failure);
-	igt_sysfs_set(priv->fd.sysfs_bus, "rescan", "1");
+	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
 	priv->failure = NULL;
-
-	close(priv->fd.sysfs_bus);
 }
 
 static void healthcheck(struct hotunplug *priv)
@@ -233,12 +218,6 @@ static void set_filter_from_device(int fd)
 
 static void unbind_rebind(struct hotunplug *priv)
 {
-	char buf[PATH_MAX];
-
-	prepare(priv, buf, sizeof(buf));
-
-	igt_debug("closing the device\n");
-	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
 	driver_unbind(priv, "");
@@ -250,10 +229,6 @@ static void unbind_rebind(struct hotunplug *priv)
 
 static void unplug_rescan(struct hotunplug *priv)
 {
-	prepare(priv, NULL, 0);
-
-	igt_debug("closing the device\n");
-	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
 	device_unplug(priv, "");
@@ -265,9 +240,8 @@ static void unplug_rescan(struct hotunplug *priv)
 
 static void hotunbind_lateclose(struct hotunplug *priv)
 {
-	char buf[PATH_MAX];
-
-	prepare(priv, buf, sizeof(buf));
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver("", " for hotunbind");
 
 	driver_unbind(priv, "hot ");
 
@@ -281,7 +255,8 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 static void hotunplug_lateclose(struct hotunplug *priv)
 {
-	prepare(priv, NULL, 0);
+	igt_assert_eq(priv->fd.drm, -1);
+	priv->fd.drm = local_drm_open_driver("", " for hotunplug");
 
 	device_unplug(priv, "hot ");
 
@@ -317,6 +292,8 @@ igt_main
 		set_filter_from_device(fd_drm);
 
 		igt_assert_eq(close_device(fd_drm), -1);
+
+		prepare(&priv);
 	}
 
 	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
@@ -344,6 +321,10 @@ igt_main
 	igt_subtest("hotunplug-lateclose")
 		hotunplug_lateclose(&priv);
 
-	igt_fixture
+	igt_fixture {
 		post_healthcheck(&priv);
+
+		close(priv.fd.sysfs_bus);
+		close(priv.fd.sysfs_drv);
+	}
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 10/21] tests/core_hotunplug: Skip selectively on sysfs close errors
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (8 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 09/21] tests/core_hotunplug: Prepare invariant data once per test run Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 11/21] tests/core_hotunplug: Recover from subtest failures Janusz Krzysztofik
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Since we no longer open a device DRM sysfs node, only a PCI one, driver
unbind operations are no longer affected by missed or unsuccessful
sysfs file close attempts.  Skip only affected subtests if that
happens.

v2: Rebase only.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 1da0e5a9f..25508db85 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -110,7 +110,6 @@ static void prepare(struct hotunplug *priv)
 	igt_assert_fd(priv->fd.sysfs_bus);
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Unbind the driver from the device */
@@ -140,7 +139,8 @@ static void driver_bind(struct hotunplug *priv)
 /* Remove (virtually unplug) the device from its bus */
 static void device_unplug(struct hotunplug *priv, const char *prefix)
 {
-	igt_assert_eq(priv->fd.sysfs_dev, -1);
+	igt_require(priv->fd.sysfs_dev == -1);
+
 	priv->fd.sysfs_dev = openat(priv->fd.sysfs_bus, priv->dev_bus_addr,
 				    O_DIRECTORY);
 	igt_assert_fd(priv->fd.sysfs_dev);
@@ -196,7 +196,6 @@ static void post_healthcheck(struct hotunplug *priv)
 	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
 
 	igt_require(priv->fd.drm == -1);
-	igt_require(priv->fd.sysfs_dev == -1);
 }
 
 static void set_filter_from_device(int fd)
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 11/21] tests/core_hotunplug: Recover from subtest failures
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (9 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 10/21] tests/core_hotunplug: Skip selectively on sysfs close errors Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 12/21] tests/core_hotunplug: Fail subtests on device close errors Janusz Krzysztofik
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Subtests now forcibly call or request igt_abort on failures in order to
avoid silently leaving an exercised device in an unusable state.
However, a failure inside a subtest doesn't always mean the device is
no longer working correctly and reboot is needed.  On the other hand,
if a subtest just fails without aborting, that doesn't mean in turn the
device is healthy.  We should still perform a device health check
in that case before deciding on next steps.

Reuse the 'failure' structure field as a mark which is set before each
critical operation is executed that must be followed by a successful
health check in order to avoid aborting the test.  Then, follow each
subtest with its individual igt_fixture section, from where device file
descriptors potentially left open are closed, device rediscover or
driver rebing operation is run as needed, and finally the health check
is run again if the preceding igt_subtest section has exited with the
marker set.

v2: Start each recovery phase from unconditionally closing file
    descriptors potentially left open by a subtest before it entered
    its critical section,
  - replace igt_require() with 'if() return;' construct in recover() to
    reduce noise,
  - replace "subtest failure" message used as a request for healthcheck
    with a more appropriate "need healthcheck" for clarity,
  - rebase on current upstream master.
v3: Refresh,
  - move bus_rescan() and driver_bind() function calls back from
    heaalthcheck() to recover() so a pure health check can still be
    called from a subtest if essential,
  - move failure mark assignments back from subtests to helpers for
    more adequate abort reason reporting but clean the mark only on
    health check success,
  - call cleanup() also from post_healthcheck() in order to close a
    device file descriptor potentially left open by a failed health
    check,
  - reword commit message and update description.
v4: Close exercised device fd before failing a health check run,
  - don't drop health checks from subtest bodies, their results should
    always matter.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v1
---
 tests/core_hotunplug.c | 100 ++++++++++++++++++++++++++++++-----------
 1 file changed, 74 insertions(+), 26 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 25508db85..b72361900 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -69,6 +69,9 @@ static int local_drm_open_driver(const char *prefix, const char *suffix)
 
 static int local_close(int fd, const char *message)
 {
+	if (fd < 0)	/* not open - return current status */
+		return fd;
+
 	errno = 0;
 	if (igt_warn_on_f(close(fd), "%s\n", message))
 		return -errno;	/* (never -1) */
@@ -116,24 +119,22 @@ static void prepare(struct hotunplug *priv)
 static void driver_unbind(struct hotunplug *priv, const char *prefix)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
+	priv->failure = "Driver unbind failure!";
 
-	priv->failure = "Driver unbind timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Driver unbind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	priv->failure = NULL;
 }
 
 /* Re-bind the driver to the device */
 static void driver_bind(struct hotunplug *priv)
 {
 	igt_debug("rebinding the driver to the device\n");
+	priv->failure = "Driver re-bind failure!";
 
-	priv->failure = "Driver re-bind timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Driver re-bind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
-	priv->failure = NULL;
 }
 
 /* Remove (virtually unplug) the device from its bus */
@@ -146,12 +147,11 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_assert_fd(priv->fd.sysfs_dev);
 
 	igt_debug("%sunplugging the device\n", prefix);
+	priv->failure = "Device unplug failure!";
 
-	priv->failure = "Device unplug timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Device unplug timeout!");
 	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
-	priv->failure = NULL;
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
@@ -160,17 +160,23 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 static void bus_rescan(struct hotunplug *priv)
 {
 	igt_debug("rediscovering the device\n");
+	priv->failure = "Bus rescan failure!";
 
-	priv->failure = "Bus rescan timeout!";
-	igt_set_timeout(60, priv->failure);
+	igt_set_timeout(60, "Bus rescan timeout!");
 	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
-	priv->failure = NULL;
+}
+
+static void cleanup(struct hotunplug *priv)
+{
+	priv->fd.drm = close_device(priv->fd.drm);
+	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
 static void healthcheck(struct hotunplug *priv)
 {
 	/* preserve error code potentially stored before in priv->fd.drm */
+	bool closed = priv->fd.drm == -1;
 	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
@@ -178,23 +184,45 @@ static void healthcheck(struct hotunplug *priv)
 
 	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for healthcheck");
-	priv->failure = NULL;
+	if (closed)	/* store for cleanup if no error code to preserve */
+		priv->fd.drm = fd_drm;
 
 	if (is_i915_device(fd_drm)) {
 		priv->failure = "GEM failure";
 		igt_require_gem(fd_drm);
 		priv->failure = NULL;
+	} else {
+		/* no device specific healthcheck, rely on reopen result */
+		priv->failure = NULL;
 	}
 
 	fd_drm = close_device(fd_drm);
-	if (priv->fd.drm == -1)	/* store result if no error code to preserve */
+	if (closed)	/* store result if no error code to preserve */
 		priv->fd.drm = fd_drm;
+
+	/* not only request igt_abort on failure, also fail the health check */
+	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
+}
+
+static void recover(struct hotunplug *priv)
+{
+	cleanup(priv);
+
+	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
+		bus_rescan(priv);
+
+	else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
+		driver_bind(priv);
+
+	if (priv->failure)
+		healthcheck(priv);
 }
 
 static void post_healthcheck(struct hotunplug *priv)
 {
 	igt_abort_on_f(priv->failure, "%s\n", priv->failure);
 
+	cleanup(priv);
 	igt_require(priv->fd.drm == -1);
 }
 
@@ -295,30 +323,50 @@ igt_main
 		prepare(&priv);
 	}
 
-	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
-	igt_subtest("unbind-rebind")
-		unbind_rebind(&priv);
+	igt_subtest_group {
+		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+		igt_subtest("unbind-rebind")
+			unbind_rebind(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
-	igt_subtest("unplug-rescan")
-		unplug_rescan(&priv);
+	igt_subtest_group {
+		igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+		igt_subtest("unplug-rescan")
+			unplug_rescan(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
-	igt_subtest("hotunbind-lateclose")
-		hotunbind_lateclose(&priv);
+	igt_subtest_group {
+		igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
+		igt_subtest("hotunbind-lateclose")
+			hotunbind_lateclose(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture
 		post_healthcheck(&priv);
 
-	igt_describe("Check if a still open device can be cleanly unplugged, then released");
-	igt_subtest("hotunplug-lateclose")
-		hotunplug_lateclose(&priv);
+	igt_subtest_group {
+		igt_describe("Check if a still open device can be cleanly unplugged, then released");
+		igt_subtest("hotunplug-lateclose")
+			hotunplug_lateclose(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
 
 	igt_fixture {
 		post_healthcheck(&priv);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 12/21] tests/core_hotunplug: Fail subtests on device close errors
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (10 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 11/21] tests/core_hotunplug: Recover from subtest failures Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 13/21] tests/core_hotunplug: Let the driver time out essential sysfs operations Janusz Krzysztofik
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Since health checks are now run from follow-up fixture sections, it is
safe to fail subtests without the need to abort the test execution.  Do
that on device close errors instead of just emitting warnings.

v2: Rebase only.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index b72361900..dd1dc1fe0 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -154,6 +154,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_reset_timeout();
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
+	igt_assert_eq(priv->fd.sysfs_dev, -1);
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -276,6 +277,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 
 	igt_debug("late closing the unbound device instance\n");
 	priv->fd.drm = close_device(priv->fd.drm);
+	igt_assert_eq(priv->fd.drm, -1);
 
 	healthcheck(priv);
 }
@@ -291,6 +293,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 
 	igt_debug("late closing the removed device instance\n");
 	priv->fd.drm = close_device(priv->fd.drm);
+	igt_assert_eq(priv->fd.drm, -1);
 
 	healthcheck(priv);
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 13/21] tests/core_hotunplug: Let the driver time out essential sysfs operations
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (11 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 12/21] tests/core_hotunplug: Fail subtests on device close errors Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 14/21] tests/core_hotunplug: Process return values of " Janusz Krzysztofik
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The test now arms a timer before performing each driver unbind / rebind
or device unplug / bus rescan sysfs operation.  Then in case of issues
we may prevent the driver from showing us if and how it can handle
them.

Don't arm the timer before sysfs operations which are essential for a
subtest.

v2: Refresh,
  - don't time out on hot driver rebind / hot device restore in
    *-lateclose variants, those operations haven't been covered by
    other subtests.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index dd1dc1fe0..1fdbd9b4c 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -116,29 +116,31 @@ static void prepare(struct hotunplug *priv)
 }
 
 /* Unbind the driver from the device */
-static void driver_unbind(struct hotunplug *priv, const char *prefix)
+static void driver_unbind(struct hotunplug *priv, const char *prefix,
+			  int timeout)
 {
 	igt_debug("%sunbinding the driver from the device\n", prefix);
 	priv->failure = "Driver unbind failure!";
 
-	igt_set_timeout(60, "Driver unbind timeout!");
+	igt_set_timeout(timeout, "Driver unbind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
 	igt_reset_timeout();
 }
 
 /* Re-bind the driver to the device */
-static void driver_bind(struct hotunplug *priv)
+static void driver_bind(struct hotunplug *priv, int timeout)
 {
 	igt_debug("rebinding the driver to the device\n");
 	priv->failure = "Driver re-bind failure!";
 
-	igt_set_timeout(60, "Driver re-bind timeout!");
+	igt_set_timeout(timeout, "Driver re-bind timeout!");
 	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
 	igt_reset_timeout();
 }
 
 /* Remove (virtually unplug) the device from its bus */
-static void device_unplug(struct hotunplug *priv, const char *prefix)
+static void device_unplug(struct hotunplug *priv, const char *prefix,
+			  int timeout)
 {
 	igt_require(priv->fd.sysfs_dev == -1);
 
@@ -149,7 +151,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 	igt_debug("%sunplugging the device\n", prefix);
 	priv->failure = "Device unplug failure!";
 
-	igt_set_timeout(60, "Device unplug timeout!");
+	igt_set_timeout(timeout, "Device unplug timeout!");
 	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
 	igt_reset_timeout();
 
@@ -158,12 +160,12 @@ static void device_unplug(struct hotunplug *priv, const char *prefix)
 }
 
 /* Re-discover the device by rescanning its bus */
-static void bus_rescan(struct hotunplug *priv)
+static void bus_rescan(struct hotunplug *priv, int timeout)
 {
 	igt_debug("rediscovering the device\n");
 	priv->failure = "Bus rescan failure!";
 
-	igt_set_timeout(60, "Bus rescan timeout!");
+	igt_set_timeout(timeout, "Bus rescan timeout!");
 	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
 	igt_reset_timeout();
 }
@@ -210,10 +212,10 @@ static void recover(struct hotunplug *priv)
 	cleanup(priv);
 
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
-		bus_rescan(priv);
+		bus_rescan(priv, 60);
 
 	else if (faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
-		driver_bind(priv);
+		driver_bind(priv, 60);
 
 	if (priv->failure)
 		healthcheck(priv);
@@ -248,9 +250,9 @@ static void unbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 
-	driver_unbind(priv, "");
+	driver_unbind(priv, "", 0);
 
-	driver_bind(priv);
+	driver_bind(priv, 0);
 
 	healthcheck(priv);
 }
@@ -259,9 +261,9 @@ static void unplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
 
-	device_unplug(priv, "");
+	device_unplug(priv, "", 0);
 
-	bus_rescan(priv);
+	bus_rescan(priv, 0);
 
 	healthcheck(priv);
 }
@@ -271,9 +273,9 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver("", " for hotunbind");
 
-	driver_unbind(priv, "hot ");
+	driver_unbind(priv, "hot ", 0);
 
-	driver_bind(priv);
+	driver_bind(priv, 0);
 
 	igt_debug("late closing the unbound device instance\n");
 	priv->fd.drm = close_device(priv->fd.drm);
@@ -287,9 +289,9 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	igt_assert_eq(priv->fd.drm, -1);
 	priv->fd.drm = local_drm_open_driver("", " for hotunplug");
 
-	device_unplug(priv, "hot ");
+	device_unplug(priv, "hot ", 0);
 
-	bus_rescan(priv);
+	bus_rescan(priv, 0);
 
 	igt_debug("late closing the removed device instance\n");
 	priv->fd.drm = close_device(priv->fd.drm);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 14/21] tests/core_hotunplug: Process return values of sysfs operations
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (12 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 13/21] tests/core_hotunplug: Let the driver time out essential sysfs operations Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 15/21] tests/core_hotunplug: Assert expected device presence/absence Janusz Krzysztofik
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Return values of driver bind/unbind / device remove/recover sysfs
operations are now ignored.  Assert their correctness.

v2: Add trailing newlines missing from igt_assert messages.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 1fdbd9b4c..bbc9d30b5 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -123,7 +123,9 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 	priv->failure = "Driver unbind failure!";
 
 	igt_set_timeout(timeout, "Driver unbind timeout!");
-	igt_sysfs_set(priv->fd.sysfs_drv, "unbind", priv->dev_bus_addr);
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_drv, "unbind",
+				   priv->dev_bus_addr),
+		     "Driver unbind failure!\n");
 	igt_reset_timeout();
 }
 
@@ -134,7 +136,9 @@ static void driver_bind(struct hotunplug *priv, int timeout)
 	priv->failure = "Driver re-bind failure!";
 
 	igt_set_timeout(timeout, "Driver re-bind timeout!");
-	igt_sysfs_set(priv->fd.sysfs_drv, "bind", priv->dev_bus_addr);
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_drv, "bind",
+				   priv->dev_bus_addr),
+		     "Driver re-bind failure\n!");
 	igt_reset_timeout();
 }
 
@@ -152,7 +156,8 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 	priv->failure = "Device unplug failure!";
 
 	igt_set_timeout(timeout, "Device unplug timeout!");
-	igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1");
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_dev, "remove", "1"),
+		     "Device unplug failure\n!");
 	igt_reset_timeout();
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
@@ -166,7 +171,8 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 	priv->failure = "Bus rescan failure!";
 
 	igt_set_timeout(timeout, "Bus rescan timeout!");
-	igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1");
+	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1"),
+		       "Bus rescan failure!\n");
 	igt_reset_timeout();
 }
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 15/21] tests/core_hotunplug: Assert expected device presence/absence
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (13 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 14/21] tests/core_hotunplug: Process return values of " Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 16/21] tests/core_hotunplug: Explicitly ignore unused return values Janusz Krzysztofik
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Don't rely on successful write to sysfs control files, assert existence
/ non-existence of a respective device sysfs node as well.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index bbc9d30b5..b53c9ecde 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -127,6 +127,9 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 				   priv->dev_bus_addr),
 		     "Driver unbind failure!\n");
 	igt_reset_timeout();
+
+	igt_assert_f(faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0),
+		     "Unbound device still present\n");
 }
 
 /* Re-bind the driver to the device */
@@ -140,6 +143,10 @@ static void driver_bind(struct hotunplug *priv, int timeout)
 				   priv->dev_bus_addr),
 		     "Driver re-bind failure\n!");
 	igt_reset_timeout();
+
+	igt_fail_on_f(faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr,
+				F_OK, 0),
+		      "Rebound device not present!\n");
 }
 
 /* Remove (virtually unplug) the device from its bus */
@@ -162,6 +169,9 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 	igt_assert_eq(priv->fd.sysfs_dev, -1);
+
+	igt_assert_f(faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0),
+		     "Unplugged device still present\n");
 }
 
 /* Re-discover the device by rescanning its bus */
@@ -174,6 +184,10 @@ static void bus_rescan(struct hotunplug *priv, int timeout)
 	igt_assert_f(igt_sysfs_set(priv->fd.sysfs_bus, "../rescan", "1"),
 		       "Bus rescan failure!\n");
 	igt_reset_timeout();
+
+	igt_fail_on_f(faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr,
+				F_OK, 0),
+		      "Fakely unplugged device not rediscovered!\n");
 }
 
 static void cleanup(struct hotunplug *priv)
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 16/21] tests/core_hotunplug: Explicitly ignore unused return values
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (14 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 15/21] tests/core_hotunplug: Assert expected device presence/absence Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 17/21] tests/core_hotunplug: More thorough i915 healthcheck and recovery Janusz Krzysztofik
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Some return values are not useful and can be ignored.  Wrap those cases
inside igt_ignore_warn(), not only to make sure compilers are happy but
also to clearly document our decisions.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
---
 tests/core_hotunplug.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index b53c9ecde..923b8cdfd 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -257,7 +257,7 @@ static void set_filter_from_device(int fd)
 	char path[PATH_MAX + 1];
 
 	igt_assert(igt_sysfs_path(fd, path, PATH_MAX));
-	strncat(path, "/device", PATH_MAX - strlen(path));
+	igt_ignore_warn(strncat(path, "/device", PATH_MAX - strlen(path)));
 	igt_assert(realpath(path, dst));
 
 	igt_device_filter_free_all();
@@ -396,7 +396,7 @@ igt_main
 	igt_fixture {
 		post_healthcheck(&priv);
 
-		close(priv.fd.sysfs_bus);
-		close(priv.fd.sysfs_drv);
+		igt_ignore_warn(close(priv.fd.sysfs_bus));
+		igt_ignore_warn(close(priv.fd.sysfs_drv));
 	}
 }
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 17/21] tests/core_hotunplug: More thorough i915 healthcheck and recovery
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (15 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 16/21] tests/core_hotunplug: Explicitly ignore unused return values Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 18/21] tests/core_hotunplug: Add 'lateclose before restore' variants Janusz Krzysztofik
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The test now assumes the i915 driver is able to identify potential
hardware or driver issues while rebinding to a device and indicate them
by marking the GPU wedged.  Should that assumption occur wrong, the
health check phase of the test would happily succeed while potentially
leaving the device in an unusable state.  That would not only give us
falsely positive test results but could also potentially affect
subsequently run applications.  Then, we should examine health of the
exercised device more thoroughly and try harder to recover it from
potentially detected stalls.

We could use a gem_test_engine() library function which submits and
asserts successful execution of a NOP batch on each physical engine.
Unfortunately, on failure this function jumps out of an IGT test
section it is called from, while we would like to continue with
recovery steps, possibly not adding another level of test section group
nesting.  Moreover, the function opens the device again and doesn't
close the extra file descriptor before the jump, while we care for
being able to close the exercised device completely before running
certain subtest operations.  Then, reimplement the function locally
with those issues fixed and use it as an i915 health check.  Call it
also on test startup so operations performed by the test are never
blamed for driver or hardware issues which may potentially exist and
be possible to detect on test start.

Should the i915 GPU be found unresponsive by the health check called
from a recovery section, try harder to recover it to a usable state
with a global GPU reset.

For still more effective detection of GPU hangs, use a hang detector
provided by IGT library.  However, replace the library signal handler
with our own implementation that doesn't jump out of the current IGT
test section on GPU hang so we are still able to perform the reset and
retry.

v2: Skip i915 health check if a GPU hang has been already detected by a
    previous health check run and not yet recovered with a GPU reset,
  - take care of stopping a hang detector instance possibly left
    running by a failed health check attempt.
v3: Re-run i915 health check as a first setp of i915 recovery (use full
    GPU reset as a last resort),
  - prefix i915 health check debug messages with step indicators,
  - fix spelling error in a comment.
v4: Unbind the driver from an unhealthy device before recovery,
  - drop caches on i915 health check completion.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 114 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 104 insertions(+), 10 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 923b8cdfd..1f211a820 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -23,8 +23,10 @@
 
 #include <fcntl.h>
 #include <limits.h>
+#include <signal.h>
 #include <stdlib.h>
 #include <string.h>
+#include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <unistd.h>
@@ -196,7 +198,83 @@ static void cleanup(struct hotunplug *priv)
 	priv->fd.sysfs_dev = close_sysfs(priv->fd.sysfs_dev);
 }
 
-static void healthcheck(struct hotunplug *priv)
+static bool local_i915_is_wedged(int i915)
+{
+	int err = 0;
+
+	if (ioctl(i915, DRM_IOCTL_I915_GEM_THROTTLE))
+		err = -errno;
+	return err == -EIO;
+}
+
+static bool hang_detected = false;
+
+static void local_sig_abort(int sig)
+{
+	errno = 0; /* inside a signal, last errno reporting is confusing */
+	hang_detected = true;
+}
+
+static int local_i915_healthcheck(int i915, const char *prefix)
+{
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 execbuf = {
+		.buffers_ptr = to_user_pointer(&obj),
+		.buffer_count = 1,
+	};
+	const struct intel_execution_engine2 *engine;
+
+	/* stop our hang detector possibly still running if we failed before */
+	igt_stop_hang_detector();
+
+	/* don't run again before GPU reset if hang has been already detected */
+	if (hang_detected)
+		return -EIO;
+
+	igt_debug("%srunning i915 GPU healthcheck\n", prefix);
+
+	if (local_i915_is_wedged(i915))
+		return -EIO;
+
+	obj.handle = gem_create(i915, 4096);
+	gem_write(i915, obj.handle, 0, &bbe, sizeof(bbe));
+
+	igt_fork_hang_detector(i915);
+	signal(SIGIO, local_sig_abort);
+
+	__for_each_physical_engine(i915, engine) {
+		execbuf.flags = engine->flags;
+		gem_execbuf(i915, &execbuf);
+	}
+
+	gem_sync(i915, obj.handle);
+	gem_close(i915, obj.handle);
+
+	igt_stop_hang_detector();
+	if (hang_detected)
+		return -EIO;
+
+	if (local_i915_is_wedged(i915))
+		return -EIO;
+
+	return 0;
+}
+
+static int local_i915_recover(int i915)
+{
+	hang_detected = false;
+	if (!local_i915_healthcheck(i915, "re-"))
+		return 0;
+
+	igt_debug("forcing i915 GPU reset\n");
+	igt_force_gpu_reset(i915);
+
+	hang_detected = false;
+	return local_i915_healthcheck(i915, "post-");
+}
+
+static void healthcheck(struct hotunplug *priv, bool recover)
 {
 	/* preserve error code potentially stored before in priv->fd.drm */
 	bool closed = priv->fd.drm == -1;
@@ -211,9 +289,19 @@ static void healthcheck(struct hotunplug *priv)
 		priv->fd.drm = fd_drm;
 
 	if (is_i915_device(fd_drm)) {
-		priv->failure = "GEM failure";
-		igt_require_gem(fd_drm);
-		priv->failure = NULL;
+		const char *failure = NULL;
+
+		/* don't report library failed asserts as healthcheck failure */
+		priv->failure = "Unrecoverable test failure";
+
+		if (local_i915_healthcheck(fd_drm, "") &&
+		    (!recover || local_i915_recover(fd_drm)))
+			failure = "Healthcheck failure!";
+
+		gem_quiescent_gpu(fd_drm);
+
+		priv->failure = failure;
+
 	} else {
 		/* no device specific healthcheck, rely on reopen result */
 		priv->failure = NULL;
@@ -231,6 +319,11 @@ static void recover(struct hotunplug *priv)
 {
 	cleanup(priv);
 
+	/* unbind the driver from a possibly hot rebound unhealthy device */
+	if (priv->failure && priv->fd.drm == -1 &&
+	    !faccessat(priv->fd.sysfs_drv, priv->dev_bus_addr, F_OK, 0))
+		driver_unbind(priv, "post ", 60);
+
 	if (faccessat(priv->fd.sysfs_bus, priv->dev_bus_addr, F_OK, 0))
 		bus_rescan(priv, 60);
 
@@ -238,7 +331,7 @@ static void recover(struct hotunplug *priv)
 		driver_bind(priv, 60);
 
 	if (priv->failure)
-		healthcheck(priv);
+		healthcheck(priv, true);
 }
 
 static void post_healthcheck(struct hotunplug *priv)
@@ -274,7 +367,7 @@ static void unbind_rebind(struct hotunplug *priv)
 
 	driver_bind(priv, 0);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void unplug_rescan(struct hotunplug *priv)
@@ -285,7 +378,7 @@ static void unplug_rescan(struct hotunplug *priv)
 
 	bus_rescan(priv, 0);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void hotunbind_lateclose(struct hotunplug *priv)
@@ -301,7 +394,7 @@ static void hotunbind_lateclose(struct hotunplug *priv)
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 static void hotunplug_lateclose(struct hotunplug *priv)
@@ -317,7 +410,7 @@ static void hotunplug_lateclose(struct hotunplug *priv)
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
-	healthcheck(priv);
+	healthcheck(priv, false);
 }
 
 /* Main */
@@ -337,7 +430,8 @@ igt_main
 
 		if (is_i915_device(fd_drm)) {
 			gem_quiescent_gpu(fd_drm);
-			igt_require_gem(fd_drm);
+			igt_skip_on_f(local_i915_healthcheck(fd_drm, "pre-"),
+				      "i915 device not healthy on test start\n");
 		}
 
 		/* Make sure subtests always reopen the same device */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 18/21] tests/core_hotunplug: Add 'lateclose before restore' variants
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (16 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 17/21] tests/core_hotunplug: More thorough i915 healthcheck and recovery Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 19/21] tests/core_hotunplug: Duplicate debug messages in dmesg Janusz Krzysztofik
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

If a GPU gets wedged during driver rebind or device re-plug for some
reason, current hotunbind/hotunplug test variants may time out before
lateclose phase, resulting in incomplete CI reports.

Add new test variants which close the device before restoring it.  Also
rename old variants to more adequate hotrebind/hotreplug-lateclose and
perform health checks both before and after late close.

v2: Rebase on upstream.
v3: Refresh,
  - further rename hotunbind/hotunplug-lateclose to hotunbind-rebind
    and hotunplug-rescan respectively, then add two more variants under
    the old names which only exercise late close, leaving rebind /
    rescan to be cared of in the post-subtest recovery phase,
  - also update descriptions of unmodified subtests for consistency.
v4: Refresh,
  - drop subtests with no health checks, adjust timeouts in successors,
  - perform health checks of hot restored devices also before late
    close,
  - in order to be able to safely run a health check while still
    keeping an unbound / unplugged device instance open, also preserve
    the open device fd, not only a close error,
  - adjust subtest descriptions.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> # v2
---
 tests/core_hotunplug.c | 98 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 80 insertions(+), 18 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 1f211a820..305c57a3f 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -276,17 +276,19 @@ static int local_i915_recover(int i915)
 
 static void healthcheck(struct hotunplug *priv, bool recover)
 {
-	/* preserve error code potentially stored before in priv->fd.drm */
+	/* preserve device fd / close status stored in priv->fd.drm */
+	int fd_drm, saved_fd_drm = priv->fd.drm;
 	bool closed = priv->fd.drm == -1;
-	int fd_drm;
 
 	/* device name may have changed, rebuild IGT device list */
 	igt_devices_scan(true);
 
 	priv->failure = "Device reopen failure!";
 	fd_drm = local_drm_open_driver("re", " for healthcheck");
-	if (closed)	/* store for cleanup if no error code to preserve */
+	if (closed)	/* store for cleanup if not dirty */
 		priv->fd.drm = fd_drm;
+	else		/* force close error should we fail prematurely */
+		priv->fd.drm = -EBADF;
 
 	if (is_i915_device(fd_drm)) {
 		const char *failure = NULL;
@@ -308,8 +310,10 @@ static void healthcheck(struct hotunplug *priv, bool recover)
 	}
 
 	fd_drm = close_device(fd_drm);
-	if (closed)	/* store result if no error code to preserve */
+	if (closed)	/* store result if no dirty status to preserve */
 		priv->fd.drm = fd_drm;
+	else if (fd_drm == -1)	/* cancel fake error, restore saved status */
+		priv->fd.drm = saved_fd_drm;
 
 	/* not only request igt_abort on failure, also fail the health check */
 	igt_fail_on_f(priv->failure, "%s\n", priv->failure);
@@ -381,31 +385,65 @@ static void unplug_rescan(struct hotunplug *priv)
 	healthcheck(priv, false);
 }
 
-static void hotunbind_lateclose(struct hotunplug *priv)
+static void hotunbind_rebind(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
-	priv->fd.drm = local_drm_open_driver("", " for hotunbind");
+	priv->fd.drm = local_drm_open_driver("", " for hotrebind");
 
 	driver_unbind(priv, "hot ", 0);
 
-	driver_bind(priv, 0);
-
 	igt_debug("late closing the unbound device instance\n");
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
+	driver_bind(priv, 0);
+
 	healthcheck(priv, false);
 }
 
-static void hotunplug_lateclose(struct hotunplug *priv)
+static void hotunplug_rescan(struct hotunplug *priv)
 {
 	igt_assert_eq(priv->fd.drm, -1);
-	priv->fd.drm = local_drm_open_driver("", " for hotunplug");
+	priv->fd.drm = local_drm_open_driver("", " for hotreplug");
 
 	device_unplug(priv, "hot ", 0);
 
+	igt_debug("late closing the removed device instance\n");
+	priv->fd.drm = close_device(priv->fd.drm);
+	igt_assert_eq(priv->fd.drm, -1);
+
 	bus_rescan(priv, 0);
 
+	healthcheck(priv, false);
+}
+
+static void hotrebind_lateclose(struct hotunplug *priv)
+{
+	priv->fd.drm = local_drm_open_driver("", " for hotrebind");
+
+	driver_unbind(priv, "hot ", 60);
+
+	driver_bind(priv, 0);
+
+	healthcheck(priv, false);
+
+	igt_debug("late closing the unbound device instance\n");
+	priv->fd.drm = close_device(priv->fd.drm);
+	igt_assert_eq(priv->fd.drm, -1);
+
+	healthcheck(priv, false);
+}
+
+static void hotreplug_lateclose(struct hotunplug *priv)
+{
+	priv->fd.drm = local_drm_open_driver("", " for hotreplug");
+
+	device_unplug(priv, "hot ", 60);
+
+	bus_rescan(priv, 0);
+
+	healthcheck(priv, false);
+
 	igt_debug("late closing the removed device instance\n");
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
@@ -443,7 +481,7 @@ igt_main
 	}
 
 	igt_subtest_group {
-		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+		igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed, then rebound");
 		igt_subtest("unbind-rebind")
 			unbind_rebind(&priv);
 
@@ -455,7 +493,7 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+		igt_describe("Check if a device believed to be closed can be cleanly unplugged, then restored");
 		igt_subtest("unplug-rescan")
 			unplug_rescan(&priv);
 
@@ -467,9 +505,33 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
-		igt_subtest("hotunbind-lateclose")
-			hotunbind_lateclose(&priv);
+		igt_describe("Check if the driver can be cleanly unbound from an open device, then released and rebound");
+		igt_subtest("hotunbind-rebind")
+			hotunbind_rebind(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
+
+	igt_fixture
+		post_healthcheck(&priv);
+
+	igt_subtest_group {
+		igt_describe("Check if an open device can be cleanly unplugged, then released and restored");
+		igt_subtest("hotunplug-rescan")
+			hotunplug_rescan(&priv);
+
+		igt_fixture
+			recover(&priv);
+	}
+
+	igt_fixture
+		post_healthcheck(&priv);
+
+	igt_subtest_group {
+		igt_describe("Check if the driver hot unbound from a still open device can be cleanly rebound, then the old instance released");
+		igt_subtest("hotrebind-lateclose")
+			hotrebind_lateclose(&priv);
 
 		igt_fixture
 			recover(&priv);
@@ -479,9 +541,9 @@ igt_main
 		post_healthcheck(&priv);
 
 	igt_subtest_group {
-		igt_describe("Check if a still open device can be cleanly unplugged, then released");
-		igt_subtest("hotunplug-lateclose")
-			hotunplug_lateclose(&priv);
+		igt_describe("Check if a still open while hot unplugged device can be cleanly restored, then the old instance released");
+		igt_subtest("hotreplug-lateclose")
+			hotreplug_lateclose(&priv);
 
 		igt_fixture
 			recover(&priv);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 19/21] tests/core_hotunplug: Duplicate debug messages in dmesg
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (17 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 18/21] tests/core_hotunplug: Add 'lateclose before restore' variants Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 20/21] tests/core_hotunplug: HSW audio issue workaround Janusz Krzysztofik
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

The purpose of debug messages displayed by the test is to make
identification of a subtest phase that fails more easy.  Since issues
exhibited by the test are mostly reported to dmesg, print those debug
messages to /dev/kmsg as well.

v2: Rebase on upstream.
v3: Refresh.
v4: Refresh.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 305c57a3f..361d601af 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -52,6 +52,12 @@ struct hotunplug {
 
 /* Helpers */
 
+#define local_debug(fmt, msg...)			       \
+({							       \
+	igt_debug(fmt, msg);				       \
+	igt_kmsg(KMSG_DEBUG "%s: " fmt, igt_test_name(), msg); \
+})
+
 /**
  * Subtests must be able to close examined devices completely.  Don't
  * use drm_open_driver() since in case of an i915 device it opens it
@@ -61,7 +67,7 @@ static int local_drm_open_driver(const char *prefix, const char *suffix)
 {
 	int fd_drm;
 
-	igt_debug("%sopening device%s\n", prefix, suffix);
+	local_debug("%sopening device%s\n", prefix, suffix);
 
 	fd_drm = __drm_open_driver(DRIVER_ANY);
 	igt_assert_fd(fd_drm);
@@ -121,7 +127,7 @@ static void prepare(struct hotunplug *priv)
 static void driver_unbind(struct hotunplug *priv, const char *prefix,
 			  int timeout)
 {
-	igt_debug("%sunbinding the driver from the device\n", prefix);
+	local_debug("%sunbinding the driver from the device\n", prefix);
 	priv->failure = "Driver unbind failure!";
 
 	igt_set_timeout(timeout, "Driver unbind timeout!");
@@ -137,7 +143,7 @@ static void driver_unbind(struct hotunplug *priv, const char *prefix,
 /* Re-bind the driver to the device */
 static void driver_bind(struct hotunplug *priv, int timeout)
 {
-	igt_debug("rebinding the driver to the device\n");
+	local_debug("%s\n", "rebinding the driver to the device");
 	priv->failure = "Driver re-bind failure!";
 
 	igt_set_timeout(timeout, "Driver re-bind timeout!");
@@ -161,7 +167,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 				    O_DIRECTORY);
 	igt_assert_fd(priv->fd.sysfs_dev);
 
-	igt_debug("%sunplugging the device\n", prefix);
+	local_debug("%sunplugging the device\n", prefix);
 	priv->failure = "Device unplug failure!";
 
 	igt_set_timeout(timeout, "Device unplug timeout!");
@@ -179,7 +185,7 @@ static void device_unplug(struct hotunplug *priv, const char *prefix,
 /* Re-discover the device by rescanning its bus */
 static void bus_rescan(struct hotunplug *priv, int timeout)
 {
-	igt_debug("rediscovering the device\n");
+	local_debug("%s\n", "rediscovering the device");
 	priv->failure = "Bus rescan failure!";
 
 	igt_set_timeout(timeout, "Bus rescan timeout!");
@@ -232,7 +238,7 @@ static int local_i915_healthcheck(int i915, const char *prefix)
 	if (hang_detected)
 		return -EIO;
 
-	igt_debug("%srunning i915 GPU healthcheck\n", prefix);
+	local_debug("%s%s\n", prefix, "running i915 GPU healthcheck");
 
 	if (local_i915_is_wedged(i915))
 		return -EIO;
@@ -267,7 +273,7 @@ static int local_i915_recover(int i915)
 	if (!local_i915_healthcheck(i915, "re-"))
 		return 0;
 
-	igt_debug("forcing i915 GPU reset\n");
+	local_debug("%s\n", "forcing i915 GPU reset");
 	igt_force_gpu_reset(i915);
 
 	hang_detected = false;
@@ -392,7 +398,7 @@ static void hotunbind_rebind(struct hotunplug *priv)
 
 	driver_unbind(priv, "hot ", 0);
 
-	igt_debug("late closing the unbound device instance\n");
+	local_debug("%s\n", "late closing the unbound device instance");
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -408,7 +414,7 @@ static void hotunplug_rescan(struct hotunplug *priv)
 
 	device_unplug(priv, "hot ", 0);
 
-	igt_debug("late closing the removed device instance\n");
+	local_debug("%s\n", "late closing the removed device instance");
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -427,7 +433,7 @@ static void hotrebind_lateclose(struct hotunplug *priv)
 
 	healthcheck(priv, false);
 
-	igt_debug("late closing the unbound device instance\n");
+	local_debug("%s\n", "late closing the unbound device instance");
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
@@ -444,7 +450,7 @@ static void hotreplug_lateclose(struct hotunplug *priv)
 
 	healthcheck(priv, false);
 
-	igt_debug("late closing the removed device instance\n");
+	local_debug("%s\n", "late closing the removed device instance");
 	priv->fd.drm = close_device(priv->fd.drm);
 	igt_assert_eq(priv->fd.drm, -1);
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 20/21] tests/core_hotunplug: HSW audio issue workaround
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (18 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 19/21] tests/core_hotunplug: Duplicate debug messages in dmesg Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 21/21] tests/core_hotunplug: Un-blocklist *bind* subtests Janusz Krzysztofik
       [not found] ` <159861558801.4239.9465794510007938455@emeril.freedesktop.org>
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Unbinding the i915 driver on some Haswell platforms with Azalia audio
results in a kernel WARNING on "i915 raw-wakerefs=1 wakelocks=1 on
cleanup".  The issue can be worked around by manually enabling runtime
power management for the conflicting audio adapter.  Use that method
but also display a warning to preserve visibility of the issue.  Also
tag the workaround with a FIXME comment.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/core_hotunplug.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
index 361d601af..a3d2a04ed 100644
--- a/tests/core_hotunplug.c
+++ b/tests/core_hotunplug.c
@@ -473,9 +473,23 @@ igt_main
 		igt_skip_on_f(fd_drm < 0, "No known DRM device found\n");
 
 		if (is_i915_device(fd_drm)) {
+			uint32_t devid = intel_get_drm_devid(fd_drm);
+
 			gem_quiescent_gpu(fd_drm);
 			igt_skip_on_f(local_i915_healthcheck(fd_drm, "pre-"),
 				      "i915 device not healthy on test start\n");
+
+			/**
+			 * FIXME: Unbinding the i915 driver on some Haswell
+			 * platforms with Azalia audio results in a kernel WARN
+			 * on "i915 raw-wakerefs=1 wakelocks=1 on cleanup".  The
+			 * below CI friendly user level workaround prevents the
+			 * warning from appearing.  Drop this hack as soon as
+			 * this is fixed in the kernel.
+			 */
+			if (igt_warn_on_f((bool) IS_HASWELL(devid),
+			    "Manually enabling audio PM to work around a kernel WARN\n"))
+				igt_pm_enable_audio_runtime_pm();
 		}
 
 		/* Make sure subtests always reopen the same device */
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Intel-gfx] [PATCH i-g-t v5 21/21] tests/core_hotunplug: Un-blocklist *bind* subtests
  2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
                   ` (19 preceding siblings ...)
  2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 20/21] tests/core_hotunplug: HSW audio issue workaround Janusz Krzysztofik
@ 2020-08-28  7:59 ` Janusz Krzysztofik
       [not found] ` <159861558801.4239.9465794510007938455@emeril.freedesktop.org>
  21 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28  7:59 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

Subtests which don't remove the device, only unbind the driver from it,
seem relatively safe and harmless for CI.  Remove them from the CI
blocklist.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 tests/intel-ci/blacklist.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/intel-ci/blacklist.txt b/tests/intel-ci/blacklist.txt
index f9a57cb54..25b567038 100644
--- a/tests/intel-ci/blacklist.txt
+++ b/tests/intel-ci/blacklist.txt
@@ -120,7 +120,7 @@ igt@perf_pmu@cpu-hotplug
 
 # Currently fails and leaves the machine in a very bad state, and
 # causes coverage loss for other tests.
-igt@core_hotunplug@.*
+igt@core_hotunplug@.*plug.*
 
 # hangs several gens of hosts, and has no immediate fix
 igt@device_reset@reset-bound
\ No newline at end of file
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.IGT: failure for tests/core_hotunplug: Fixes and enhancements (rev5)
       [not found] ` <159861558801.4239.9465794510007938455@emeril.freedesktop.org>
@ 2020-08-28 13:05   ` Janusz Krzysztofik
  0 siblings, 0 replies; 23+ messages in thread
From: Janusz Krzysztofik @ 2020-08-28 13:05 UTC (permalink / raw)
  To: igt-dev; +Cc: Michał Winiarski, intel-gfx

On Fri, 2020-08-28 at 11:53 +0000, Patchwork wrote:
> Patch Details
> Series:	tests/core_hotunplug: Fixes and enhancements (rev5)
> URL:	https://patchwork.freedesktop.org/series/79671/
> State:	failure
> Details:	https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4914/index.html
> CI Bug Log - changes from IGT_5774_full -> IGTPW_4914_full
> Summary
> FAILURE
> 
> Serious unknown changes coming with IGTPW_4914_full absolutely need to be
> verified manually.
> 
> If you think the reported changes have nothing to do with the changes
> introduced in IGTPW_4914_full, please notify your bug team to allow them
> to document this new failure mode, which will reduce false positives in CI.
> 
> External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4914/index.html
> 
> Possible new issues
> Here are the unknown changes that may have been introduced in IGTPW_4914_full:
> 
> IGT changes
> Possible regressions
> {igt@core_hotunplug@hotrebind-lateclose} (NEW):
> 
> shard-snb: NOTRUN -> FAIL
> 
> shard-iclb: NOTRUN -> FAIL
> 
> shard-tglb: NOTRUN -> DMESG-WARN
> 
> shard-glk: NOTRUN -> FAIL
> 
> shard-hsw: NOTRUN -> FAIL
> 
> shard-kbl: NOTRUN -> FAIL

As before (rev4), this is an existing but formerly not reported GPU
hang driver issue exhibited by the test, not a regression.  The issue
needs to be fixed in the driver for the test to succeed.  As one can
see from CI reports, the test succesfully recovers from that condition
and subsequent tests don't report GPU hangs.

> 
> {igt@core_hotunplug@unbind-rebind} (NEW):
> 
> shard-hsw: NOTRUN -> WARN +1 similar issue

This is an IGT warning that replaces a former (rev4) DMESG-WARN ->
INCOMPLETE caused by a known driver issue already reported by 
igt@device_reset@unbind-reset-rebind.  The issue has nothing to do with
device reset, only with driver unbind on Haswell with Azalia audio. 
The kernel side needs to be fixed for the WARN not be triggered and the
tests succeed.  Meanwhile, the IGT warning workaround keeps the issue
still visible in CI while not affecting CI runs.

> igt@gem_render_copy@linear:
> 
> shard-tglb: PASS -> FAIL +2 similar issues

This is a strange issue of an inaccessible "i915_gem_drop_caches"
debugfs entry for the render device node of the device just exercised
with igt@core_hotunplug@hotrebind-lateclose on a GuC platform.  Not
reported by Trybot unfortunately, but here evidently affecting
subsequent tests.  Looks like the health check and recovery phase of
the test still needs more work, sorry.

Thanks,
Janusz


> New tests
> New tests have been introduced between IGT_5774_full and IGTPW_4914_full:
> 
> New IGT tests (3)
> igt@core_hotunplug@hotrebind-lateclose:
> 
> Statuses : 1 dmesg-warn(s) 6 fail(s)
> Exec time: [6.13, 17.39] s
> igt@core_hotunplug@hotunbind-rebind:
> 
> Statuses : 6 pass(s) 1 warn(s)
> Exec time: [0.39, 1.96] s
> igt@core_hotunplug@unbind-rebind:
> 
> Statuses : 6 pass(s) 1 warn(s)
> Exec time: [0.38, 1.91] s
> Known issues
> Here are the changes found in IGTPW_4914_full that come from known issues:
> 
> IGT changes
> Issues hnotit
> igt@gem_exec_reloc@basic-concurrent0:
> 
> shard-tglb: PASS -> TIMEOUT (i915#1958)
> 
> shard-kbl: PASS -> TIMEOUT (i915#1958) +1 similar issue
> 
> igt@gem_exec_whisper@basic-forked:
> 
> shard-iclb: PASS -> TIMEOUT (i915#1958)
> igt@gem_exec_whisper@basic-forked-all:
> 
> shard-glk: PASS -> DMESG-WARN (i915#118 / i915#95)
> igt@gem_exec_whisper@basic-queues-forked-all:
> 
> shard-glk: PASS -> TIMEOUT (i915#1958) +4 similar issues
> 
> shard-apl: PASS -> TIMEOUT (i915#1635 / i915#1958) +1 similar issue
> 
> igt@gen9_exec_parse@allowed-all:
> 
> shard-apl: PASS -> DMESG-WARN (i915#1436 / i915#1635 / i915#716)
> igt@i915_pm_dc@dc6-psr:
> 
> shard-iclb: PASS -> FAIL (i915#1899)
> igt@i915_pm_rpm@reg-read-ioctl:
> 
> shard-kbl: PASS -> DMESG-WARN (i915#165)
> igt@i915_selftest@mock@contexts:
> 
> shard-hsw: PASS -> INCOMPLETE (i915#2278)
> igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt:
> 
> shard-tglb: PASS -> DMESG-WARN (i915#1982) +2 similar issues
> igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-shrfb-draw-mmap-wc:
> 
> shard-glk: PASS -> FAIL (i915#49)
> igt@kms_frontbuffer_tracking@fbc-badstride:
> 
> shard-glk: PASS -> DMESG-WARN (i915#1982)
> igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary:
> 
> shard-kbl: PASS -> FAIL (i915#49)
> 
> shard-apl: PASS -> FAIL (i915#1635 / i915#49)
> 
> igt@kms_hdmi_inject@inject-audio:
> 
> shard-tglb: PASS -> SKIP (i915#433)
> igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
> 
> shard-kbl: PASS -> DMESG-WARN (i915#180)
> igt@kms_psr2_su@frontbuffer:
> 
> shard-iclb: PASS -> SKIP (fdo#109642 / fdo#111068)
> igt@kms_psr@psr2_primary_mmap_cpu:
> 
> shard-iclb: PASS -> SKIP (fdo#109441) +1 similar issue
> igt@kms_universal_plane@universal-plane-gen9-features-pipe-a:
> 
> shard-kbl: PASS -> DMESG-WARN (i915#1982) +1 similar issue
> igt@kms_vblank@pipe-a-query-busy-hang:
> 
> shard-apl: PASS -> DMESG-WARN (i915#1635 / i915#1982)
> Possible fixes
> igt@gem_exec_reloc@basic-many-active@rcs0:
> 
> shard-apl: FAIL (i915#1635 / i915#2389) -> PASS
> 
> shard-hsw: FAIL (i915#2389) -> PASS
> 
> igt@gem_exec_whisper@basic-contexts-priority:
> 
> shard-apl: TIMEOUT (i915#1635 / i915#1958) -> PASS
> igt@gem_exec_whisper@basic-fds:
> 
> shard-iclb: TIMEOUT (i915#1958) -> PASS +1 similar issue
> igt@gem_exec_whisper@basic-normal:
> 
> shard-glk: TIMEOUT (i915#1958) -> PASS
> igt@i915_selftest@mock@contexts:
> 
> shard-apl: INCOMPLETE (i915#1635 / i915#2278) -> PASS
> igt@i915_suspend@fence-restore-tiled2untiled:
> 
> shard-kbl: INCOMPLETE (i915#155) -> PASS
> igt@kms_big_fb@x-tiled-64bpp-rotate-0:
> 
> shard-glk: DMESG-FAIL (i915#118 / i915#95) -> PASS
> igt@kms_cursor_crc@pipe-b-cursor-64x21-onscreen:
> 
> shard-kbl: FAIL (i915#54) -> PASS
> 
> shard-apl: FAIL (i915#1635 / i915#54) -> PASS
> 
> shard-glk: FAIL (i915#54) -> PASS
> 
> igt@kms_flip@2x-blocking-absolute-wf_vblank-interruptible@ab-vga1-hdmi-a1:
> 
> shard-hsw: DMESG-WARN (i915#1982) -> PASS +1 similar issue
> igt@kms_flip@dpms-vs-vblank-race-interruptible@a-dp1:
> 
> shard-kbl: DMESG-WARN (i915#1982) -> PASS +1 similar issue
> igt@kms_flip@flip-vs-expired-vblank@a-hdmi-a1:
> 
> shard-glk: FAIL (i915#79) -> PASS
> igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
> 
> shard-kbl: DMESG-WARN (i915#180) -> PASS +12 similar issues
> igt@kms_psr@psr2_sprite_mmap_gtt:
> 
> shard-iclb: SKIP (fdo#109441) -> PASS +2 similar issues
> igt@kms_universal_plane@universal-plane-pipe-c-sanity:
> 
> shard-tglb: DMESG-WARN (i915#1982) -> PASS +1 similar issue
> Warnings
> igt@runner@aborted:
> 
> shard-hsw: FAIL (i915#2283) -> (FAIL, FAIL) (i915#1436 / i915#2283)
> 
> shard-apl: FAIL (i915#1635) -> FAIL (fdo#109271 / i915#1635 / i915#716)
> 
> {name}: This element is suppressed. This means it is ignored when computing
> the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
> Participating hosts (8 -> 8)
> No changes in participating hosts
> 
> Build changes
> CI: CI-20190529 -> None
> IGT: IGT_5774 -> IGTPW_4914
> CI-20190529: 20190529
> CI_DRM_8937: 78b090a913c972368c81f05352a532590200cc89 @ git://anongit.freedesktop.org/gfx-ci/linux
> IGTPW_4914: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4914/index.html
> IGT_5774: 2a5db9f60241383272aeec176e1b97b3f487209f @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-08-28 13:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-28  7:59 [Intel-gfx] [PATCH i-g-t v5 00/21] tests/core_hotunplug: Fixes and enhancements Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 01/21] tests/core_hotunplug: Use igt_assert_fd() Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 02/21] tests/core_hotunplug: Constify dev_bus_addr string Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 03/21] tests/core_hotunplug: Clean up device open error handling Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 04/21] tests/core_hotunplug: Consolidate duplicated debug messages Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 05/21] tests/core_hotunplug: Assert successful device filter application Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 06/21] tests/core_hotunplug: Maintain a single data structure instance Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 07/21] tests/core_hotunplug: Pass errors via a data structure field Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 08/21] tests/core_hotunplug: Handle device close errors Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 09/21] tests/core_hotunplug: Prepare invariant data once per test run Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 10/21] tests/core_hotunplug: Skip selectively on sysfs close errors Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 11/21] tests/core_hotunplug: Recover from subtest failures Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 12/21] tests/core_hotunplug: Fail subtests on device close errors Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 13/21] tests/core_hotunplug: Let the driver time out essential sysfs operations Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 14/21] tests/core_hotunplug: Process return values of " Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 15/21] tests/core_hotunplug: Assert expected device presence/absence Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 16/21] tests/core_hotunplug: Explicitly ignore unused return values Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 17/21] tests/core_hotunplug: More thorough i915 healthcheck and recovery Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 18/21] tests/core_hotunplug: Add 'lateclose before restore' variants Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 19/21] tests/core_hotunplug: Duplicate debug messages in dmesg Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 20/21] tests/core_hotunplug: HSW audio issue workaround Janusz Krzysztofik
2020-08-28  7:59 ` [Intel-gfx] [PATCH i-g-t v5 21/21] tests/core_hotunplug: Un-blocklist *bind* subtests Janusz Krzysztofik
     [not found] ` <159861558801.4239.9465794510007938455@emeril.freedesktop.org>
2020-08-28 13:05   ` [Intel-gfx] ✗ Fi.CI.IGT: failure for tests/core_hotunplug: Fixes and enhancements (rev5) Janusz Krzysztofik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox