All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH i-g-t v4 1/2] lib: move i915_wedged_set to ig_gt.c
@ 2016-03-01 11:01 daniele.ceraolospurio
  2016-03-01 11:01 ` [PATCH i-g-t v4 2/2] tests/drv_hangman: test for acthd increasing through invalid VM space daniele.ceraolospurio
  0 siblings, 1 reply; 3+ messages in thread
From: daniele.ceraolospurio @ 2016-03-01 11:01 UTC (permalink / raw)
  To: intel-gfx

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Upcoming tests will call it to recover from bad states caused by
hangcheck bugs.the function was renamed to igt_force_gpu_reset to have a
naming closer to other hang-related functions in the same file.

The value written to the debugfs has also been changed to -1; this makes
no differences with the current implementation but copes with upcoming
TDR changes (still under discussion) that should allow the resetting of
a mask of rings.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 lib/igt_gt.c    | 23 +++++++++++++++++++++++
 lib/igt_gt.h    |  2 ++
 tests/gem_eio.c | 17 +----------------
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 9f8634b..7235519 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -269,6 +269,29 @@ void igt_post_hang_ring(int fd, struct igt_hang_ring arg)
 	}
 }
 
+/**
+ * igt_force_gpu_reset:
+ *
+ * forces a gpu reset using the i915_wedged debugfs interface. To be used to
+ * recover from situations where the hangcheck didn't trigger and/or the gpu is
+ * stuck, either because the test manually disabled gpu resets or because the
+ * test hit an hangcheck bug
+ */
+void igt_force_gpu_reset(void)
+{
+	int fd, ret;
+
+	igt_debug("Triggering GPU reset\n");
+
+	fd = igt_debugfs_open("i915_wedged", O_RDWR);
+	igt_require(fd >= 0);
+
+	ret = write(fd, "-1\n", 3);
+	close(fd);
+
+	igt_assert_eq(ret, 3);
+}
+
 /* GPU abusers */
 static struct igt_helper_process hang_helper;
 static void __attribute__((noreturn))
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index b7c5c4a..ad993c1 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -48,6 +48,8 @@ struct igt_hang_ring igt_hang_ctx(int fd,
 struct igt_hang_ring igt_hang_ring(int fd, int ring);
 void igt_post_hang_ring(int fd, struct igt_hang_ring arg);
 
+void igt_force_gpu_reset(void);
+
 void igt_fork_hang_helper(void);
 void igt_stop_hang_helper(void);
 
diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index d209816..ab3facc 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -58,24 +58,9 @@ static bool i915_reset_control(bool enable)
 	return ret;
 }
 
-static bool i915_wedged_set(void)
-{
-	int fd, ret;
-
-	igt_debug("Triggering GPU reset\n");
-
-	fd = igt_debugfs_open("i915_wedged", O_RDWR);
-	igt_require(fd >= 0);
-
-	ret = write(fd, "1\n", 2) == 2;
-	close(fd);
-
-	return ret;
-}
-
 static void trigger_reset(int fd)
 {
-	igt_assert(i915_wedged_set());
+	igt_force_gpu_reset();
 
 	/* And just check the gpu is indeed running again */
 	igt_debug("Checking that the GPU recovered\n");
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH i-g-t v4 2/2] tests/drv_hangman: test for acthd increasing through invalid VM space
  2016-03-01 11:01 [PATCH i-g-t v4 1/2] lib: move i915_wedged_set to ig_gt.c daniele.ceraolospurio
@ 2016-03-01 11:01 ` daniele.ceraolospurio
  2016-03-01 14:59   ` Mika Kuoppala
  0 siblings, 1 reply; 3+ messages in thread
From: daniele.ceraolospurio @ 2016-03-01 11:01 UTC (permalink / raw)
  To: intel-gfx

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

The hangcheck logic will not flag an hang if acthd keeps increasing.
However, if a malformed batch jumps to an invalid offset in the ppgtt it
can potentially continue executing through the whole address space
without triggering the hangcheck mechanism.

This patch adds a test to simulate the issue. I've kept the test running
for more than 10 minutes before killing it on a BDW and no hang occurred.
I've sampled i915_hangcheck_info a few times during the run and got the
following:

Hangcheck active, fires in 468ms
render ring:
	seqno = fffff55e [current fffff55e]
	ACTHD = 0x47df685ecc [current 0x4926b81d90]
	max ACTHD = 0x47df685ecc
	score = 0
	action = 2
	instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
	instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000

Hangcheck active, fires in 424ms
render ring:
	seqno = fffff55e [current fffff55e]
	ACTHD = 0x6c953d3a34 [current 0x6de5e76fa4]
	max ACTHD = 0x6c953d3a34
	score = 0
	action = 2
	instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
	instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000

Hangcheck active, fires in 1692ms
render ring:
	seqno = fffff55e [current fffff55e]
	ACTHD = 0x1f49b0366dc [current 0x1f4dcbd88ec]
	max ACTHD = 0x1f49b0366dc
	score = 0
	action = 2
	instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
	instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000

v2: use the new gem_wait() function (Chris)

v3: switch to unterminated batch and rename test, remove redundant
    check, update test requirements (Chris), update top comment

v4: force gpu reset if the hang detection fails (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 tests/drv_hangman.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
index a4f187a..a1a3faa 100644
--- a/tests/drv_hangman.c
+++ b/tests/drv_hangman.c
@@ -284,6 +284,47 @@ static void test_error_state_capture(unsigned ring_id,
 	check_error_state(gen, cmd_parser, ring_name, offset);
 }
 
+
+/* This test covers the case where we end up in an uninitialised area of the
+ * ppgtt and keep executing through it. This is particularly relevant if 48b
+ * ppgtt is enabled because the ppgtt is massively bigger compared to the 32b
+ * case and it takes a lot more time to wrap, so the acthd can potentially keep
+ * increasing for a long time
+ */
+#define NSEC_PER_SEC	1000000000L
+static void hangcheck_unterminated(void)
+{
+	int fd;
+	/* timeout needs to be greater than ~5*hangcheck */
+	int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 gem_exec;
+	uint32_t handle;
+
+	fd = drm_open_driver(DRIVER_INTEL);
+	igt_require(gem_uses_full_ppgtt(fd));
+	igt_require_hang_ring(fd, 0);
+
+	handle = gem_create(fd, 4096);
+
+	memset(&gem_exec, 0, sizeof(gem_exec));
+	gem_exec.handle = handle;
+
+	memset(&execbuf, 0, sizeof(execbuf));
+	execbuf.buffers_ptr = (uintptr_t)&gem_exec;
+	execbuf.buffer_count = 1;
+	execbuf.batch_len = 8;
+
+	gem_execbuf(fd, &execbuf);
+	if (gem_wait(fd, handle, &timeout_ns) != 0) {
+		/* need to manually trigger an hang to clean before failing */
+		igt_force_gpu_reset();
+		igt_assert_f(0, "unterminated batch did not trigger an hang!");
+	}
+
+	close(fd);
+}
+
 igt_main
 {
 	const struct intel_execution_engine *e;
@@ -310,4 +351,7 @@ igt_main
 			test_error_state_capture(e->exec_id | e->flags,
 						 e->full_name);
 	}
+
+	igt_subtest("hangcheck-unterminated")
+		hangcheck_unterminated();
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH i-g-t v4 2/2] tests/drv_hangman: test for acthd increasing through invalid VM space
  2016-03-01 11:01 ` [PATCH i-g-t v4 2/2] tests/drv_hangman: test for acthd increasing through invalid VM space daniele.ceraolospurio
@ 2016-03-01 14:59   ` Mika Kuoppala
  0 siblings, 0 replies; 3+ messages in thread
From: Mika Kuoppala @ 2016-03-01 14:59 UTC (permalink / raw)
  To: daniele.ceraolospurio, intel-gfx

daniele.ceraolospurio@intel.com writes:

> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>
> The hangcheck logic will not flag an hang if acthd keeps increasing.
> However, if a malformed batch jumps to an invalid offset in the ppgtt it
> can potentially continue executing through the whole address space
> without triggering the hangcheck mechanism.
>
> This patch adds a test to simulate the issue. I've kept the test running
> for more than 10 minutes before killing it on a BDW and no hang occurred.
> I've sampled i915_hangcheck_info a few times during the run and got the
> following:
>
> Hangcheck active, fires in 468ms
> render ring:
> 	seqno = fffff55e [current fffff55e]
> 	ACTHD = 0x47df685ecc [current 0x4926b81d90]
> 	max ACTHD = 0x47df685ecc
> 	score = 0
> 	action = 2
> 	instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
> 	instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
>
> Hangcheck active, fires in 424ms
> render ring:
> 	seqno = fffff55e [current fffff55e]
> 	ACTHD = 0x6c953d3a34 [current 0x6de5e76fa4]
> 	max ACTHD = 0x6c953d3a34
> 	score = 0
> 	action = 2
> 	instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
> 	instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
>
> Hangcheck active, fires in 1692ms
> render ring:
> 	seqno = fffff55e [current fffff55e]
> 	ACTHD = 0x1f49b0366dc [current 0x1f4dcbd88ec]
> 	max ACTHD = 0x1f49b0366dc
> 	score = 0
> 	action = 2
> 	instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
> 	instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
>
> v2: use the new gem_wait() function (Chris)
>
> v3: switch to unterminated batch and rename test, remove redundant
>     check, update test requirements (Chris), update top comment
>
> v4: force gpu reset if the hang detection fails (Mika)
>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniele Ceraolo Spurio
> <daniele.ceraolospurio@intel.com>

Both patches pushed. Thank you all.
-Mika

> ---
>  tests/drv_hangman.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
>
> diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
> index a4f187a..a1a3faa 100644
> --- a/tests/drv_hangman.c
> +++ b/tests/drv_hangman.c
> @@ -284,6 +284,47 @@ static void test_error_state_capture(unsigned ring_id,
>  	check_error_state(gen, cmd_parser, ring_name, offset);
>  }
>  
> +
> +/* This test covers the case where we end up in an uninitialised area of the
> + * ppgtt and keep executing through it. This is particularly relevant if 48b
> + * ppgtt is enabled because the ppgtt is massively bigger compared to the 32b
> + * case and it takes a lot more time to wrap, so the acthd can potentially keep
> + * increasing for a long time
> + */
> +#define NSEC_PER_SEC	1000000000L
> +static void hangcheck_unterminated(void)
> +{
> +	int fd;
> +	/* timeout needs to be greater than ~5*hangcheck */
> +	int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
> +	struct drm_i915_gem_execbuffer2 execbuf;
> +	struct drm_i915_gem_exec_object2 gem_exec;
> +	uint32_t handle;
> +
> +	fd = drm_open_driver(DRIVER_INTEL);
> +	igt_require(gem_uses_full_ppgtt(fd));
> +	igt_require_hang_ring(fd, 0);
> +
> +	handle = gem_create(fd, 4096);
> +
> +	memset(&gem_exec, 0, sizeof(gem_exec));
> +	gem_exec.handle = handle;
> +
> +	memset(&execbuf, 0, sizeof(execbuf));
> +	execbuf.buffers_ptr = (uintptr_t)&gem_exec;
> +	execbuf.buffer_count = 1;
> +	execbuf.batch_len = 8;
> +
> +	gem_execbuf(fd, &execbuf);
> +	if (gem_wait(fd, handle, &timeout_ns) != 0) {
> +		/* need to manually trigger an hang to clean before failing */
> +		igt_force_gpu_reset();
> +		igt_assert_f(0, "unterminated batch did not trigger an hang!");
> +	}
> +
> +	close(fd);
> +}
> +
>  igt_main
>  {
>  	const struct intel_execution_engine *e;
> @@ -310,4 +351,7 @@ igt_main
>  			test_error_state_capture(e->exec_id | e->flags,
>  						 e->full_name);
>  	}
> +
> +	igt_subtest("hangcheck-unterminated")
> +		hangcheck_unterminated();
>  }
> -- 
> 1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-03-01 15:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-01 11:01 [PATCH i-g-t v4 1/2] lib: move i915_wedged_set to ig_gt.c daniele.ceraolospurio
2016-03-01 11:01 ` [PATCH i-g-t v4 2/2] tests/drv_hangman: test for acthd increasing through invalid VM space daniele.ceraolospurio
2016-03-01 14:59   ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.