* [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space
@ 2016-02-25 10:32 daniele.ceraolospurio
2016-02-25 10:41 ` Chris Wilson
2016-02-25 15:19 ` [PATCH i-g-t v3] " daniele.ceraolospurio
0 siblings, 2 replies; 7+ messages in thread
From: daniele.ceraolospurio @ 2016-02-25 10:32 UTC (permalink / raw)
To: intel-gfx
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
The hangcheck logic will not flag an hang if acthd keeps increasing.
However, if a malformed batch jumps to an invalid offset in the ppgtt it
can potentially continue executing through the whole address space
without triggering the hangcheck mechanism.
This patch adds a test to simulate the issue. I've kept the test running
for more than 10 minutes before killing it on a BDW and no hang occurred.
I've sampled i915_hangcheck_info a few times during the run and got the
following:
Hangcheck active, fires in 468ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x47df685ecc [current 0x4926b81d90]
max ACTHD = 0x47df685ecc
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
Hangcheck active, fires in 424ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x6c953d3a34 [current 0x6de5e76fa4]
max ACTHD = 0x6c953d3a34
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
Hangcheck active, fires in 1692ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x1f49b0366dc [current 0x1f4dcbd88ec]
max ACTHD = 0x1f49b0366dc
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
v2: use the new gem_wait() function (Chris)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
tests/drv_hangman.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
index 8a465cf..4f396b9 100644
--- a/tests/drv_hangman.c
+++ b/tests/drv_hangman.c
@@ -288,6 +288,53 @@ static void test_error_state_capture(unsigned ring_id,
check_error_state(gen, cmd_parser, ring_name, offset);
}
+/* This test covers the case where we end up in an uninitialised area of the
+ * ppgtt at an offset greater than the one where the last buffer is mapped. This
+ * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
+ * massively bigger compared to the 32b case and it takes a lot more time to
+ * wrap, so the acthd can potentially keep increasing for a long time
+ */
+#define NSEC_PER_SEC 1000000000L
+static void ppgtt_walking(void)
+{
+ int fd;
+ int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
+ struct drm_i915_gem_execbuffer2 execbuf;
+ struct drm_i915_gem_exec_object2 gem_exec;
+ uint32_t handle;
+ uint32_t batch[4];
+
+ fd = drm_open_driver(DRIVER_INTEL);
+ igt_require(gem_gtt_type(fd) > 2);
+
+ /* the batch will be mapped to an offset < 4GB because the flag to allow
+ * 48b offsets is not specified, so jump to address 0x00000001 00000000
+ */
+ batch[0] = MI_BATCH_BUFFER_START | 1;
+ batch[1] = 0;
+ batch[2] = 1;
+ batch[3] = MI_BATCH_BUFFER_END;
+
+ handle = gem_create(fd, 4096);
+ gem_write(fd, handle, 0, batch, sizeof(batch));
+
+ memset(&gem_exec, 0, sizeof(gem_exec));
+ gem_exec.handle = handle;
+
+ memset(&execbuf, 0, sizeof(execbuf));
+ execbuf.buffers_ptr = (uintptr_t)&gem_exec;
+ execbuf.buffer_count = 1;
+ execbuf.batch_len = 16;
+
+ gem_execbuf(fd, &execbuf);
+
+ igt_assert(gem_wait(fd, handle, &timeout_ns) == 0);
+ igt_assert(timeout_ns > 0);
+
+ gem_close(fd, handle);
+ close(fd);
+}
+
igt_main
{
const struct intel_execution_engine *e;
@@ -314,4 +361,7 @@ igt_main
test_error_state_capture(e->exec_id | e->flags,
e->full_name);
}
+
+ igt_subtest("ppgtt-walking")
+ ppgtt_walking();
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space
2016-02-25 10:32 [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space daniele.ceraolospurio
@ 2016-02-25 10:41 ` Chris Wilson
2016-02-25 11:12 ` Daniele Ceraolo Spurio
2016-02-25 15:19 ` [PATCH i-g-t v3] " daniele.ceraolospurio
1 sibling, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-02-25 10:41 UTC (permalink / raw)
To: daniele.ceraolospurio; +Cc: intel-gfx
On Thu, Feb 25, 2016 at 10:32:11AM +0000, daniele.ceraolospurio@intel.com wrote:
> +/* This test covers the case where we end up in an uninitialised area of the
> + * ppgtt at an offset greater than the one where the last buffer is mapped. This
> + * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
> + * massively bigger compared to the 32b case and it takes a lot more time to
> + * wrap, so the acthd can potentially keep increasing for a long time
> + */
> +#define NSEC_PER_SEC 1000000000L
> +static void ppgtt_walking(void)
> +{
> + int fd;
> + int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
This needs a note that this has to be greater than ~5*hangcheck.
> + struct drm_i915_gem_execbuffer2 execbuf;
> + struct drm_i915_gem_exec_object2 gem_exec;
> + uint32_t handle;
> + uint32_t batch[4];
> +
> + fd = drm_open_driver(DRIVER_INTEL);
> + igt_require(gem_gtt_type(fd) > 2);
Nope, just full-ppgtt is required (and provides a sensible hangcheck
test if !48bit as well).
Note this does require that the hangcheck is enabled, so igt_require().
> +
> + /* the batch will be mapped to an offset < 4GB because the flag to allow
> + * 48b offsets is not specified, so jump to address 0x00000001 00000000
> + */
> + batch[0] = MI_BATCH_BUFFER_START | 1;
> + batch[1] = 0;
> + batch[2] = 1;
> + batch[3] = MI_BATCH_BUFFER_END;
The vm is entirely empty. Just submit an unterminated (empty) batch, and
it will walk from 0 to 1<<48bit and around and around and around and
around...
> +
> + handle = gem_create(fd, 4096);
> + gem_write(fd, handle, 0, batch, sizeof(batch));
> +
> + memset(&gem_exec, 0, sizeof(gem_exec));
> + gem_exec.handle = handle;
> +
> + memset(&execbuf, 0, sizeof(execbuf));
> + execbuf.buffers_ptr = (uintptr_t)&gem_exec;
> + execbuf.buffer_count = 1;
> + execbuf.batch_len = 16;
> +
> + gem_execbuf(fd, &execbuf);
> +
> + igt_assert(gem_wait(fd, handle, &timeout_ns) == 0);
igt_assert_eq(gem_wait(), 0); so you get the information about the
failure.
> + igt_assert(timeout_ns > 0);
Redundant. gem_wait() returns ETIME if we wait for timeout_ns without
completion.
> +
> + gem_close(fd, handle);
Irrelevant, it will be closed with close(fd).
> + close(fd);
> +}
> +
> igt_main
> {
> const struct intel_execution_engine *e;
> @@ -314,4 +361,7 @@ igt_main
> test_error_state_capture(e->exec_id | e->flags,
> e->full_name);
> }
> +
> + igt_subtest("ppgtt-walking")
> + ppgtt_walking();
This is a hangcheck test, "hangcheck-unterminated"
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space
2016-02-25 10:41 ` Chris Wilson
@ 2016-02-25 11:12 ` Daniele Ceraolo Spurio
2016-02-25 11:32 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Daniele Ceraolo Spurio @ 2016-02-25 11:12 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Mika Kuoppala, Arun Siluvery
On 25/02/16 10:41, Chris Wilson wrote:
> On Thu, Feb 25, 2016 at 10:32:11AM +0000, daniele.ceraolospurio@intel.com wrote:
>> +/* This test covers the case where we end up in an uninitialised area of the
>> + * ppgtt at an offset greater than the one where the last buffer is mapped. This
>> + * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
>> + * massively bigger compared to the 32b case and it takes a lot more time to
>> + * wrap, so the acthd can potentially keep increasing for a long time
>> + */
>> +#define NSEC_PER_SEC 1000000000L
>> +static void ppgtt_walking(void)
>> +{
>> + int fd;
>> + int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
> This needs a note that this has to be greater than ~5*hangcheck.
>
>> + struct drm_i915_gem_execbuffer2 execbuf;
>> + struct drm_i915_gem_exec_object2 gem_exec;
>> + uint32_t handle;
>> + uint32_t batch[4];
>> +
>> + fd = drm_open_driver(DRIVER_INTEL);
>> + igt_require(gem_gtt_type(fd) > 2);
> Nope, just full-ppgtt is required (and provides a sensible hangcheck
> test if !48bit as well).
>
> Note this does require that the hangcheck is enabled, so igt_require().
>
>> +
>> + /* the batch will be mapped to an offset < 4GB because the flag to allow
>> + * 48b offsets is not specified, so jump to address 0x00000001 00000000
>> + */
>> + batch[0] = MI_BATCH_BUFFER_START | 1;
>> + batch[1] = 0;
>> + batch[2] = 1;
>> + batch[3] = MI_BATCH_BUFFER_END;
> The vm is entirely empty. Just submit an unterminated (empty) batch, and
> it will walk from 0 to 1<<48bit and around and around and around and
> around...
I chose to jump instead of just leaving the batch unterminated to cover
the (rare) case where the rest of the allocated 4k of the batch contain
some random values, which could cause a hang and thus falsely pass the
test. I'll respin with a memset to 0 of the batch (plus all the other
suggested changes).
Thanks,
Daniele
>
>> +
>> + handle = gem_create(fd, 4096);
>> + gem_write(fd, handle, 0, batch, sizeof(batch));
>> +
>> + memset(&gem_exec, 0, sizeof(gem_exec));
>> + gem_exec.handle = handle;
>> +
>> + memset(&execbuf, 0, sizeof(execbuf));
>> + execbuf.buffers_ptr = (uintptr_t)&gem_exec;
>> + execbuf.buffer_count = 1;
>> + execbuf.batch_len = 16;
>> +
>> + gem_execbuf(fd, &execbuf);
>> +
>> + igt_assert(gem_wait(fd, handle, &timeout_ns) == 0);
> igt_assert_eq(gem_wait(), 0); so you get the information about the
> failure.
>
>> + igt_assert(timeout_ns > 0);
> Redundant. gem_wait() returns ETIME if we wait for timeout_ns without
> completion.
>
>> +
>> + gem_close(fd, handle);
> Irrelevant, it will be closed with close(fd).
>
>> + close(fd);
>> +}
>> +
>> igt_main
>> {
>> const struct intel_execution_engine *e;
>> @@ -314,4 +361,7 @@ igt_main
>> test_error_state_capture(e->exec_id | e->flags,
>> e->full_name);
>> }
>> +
>> + igt_subtest("ppgtt-walking")
>> + ppgtt_walking();
> This is a hangcheck test, "hangcheck-unterminated"
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space
2016-02-25 11:12 ` Daniele Ceraolo Spurio
@ 2016-02-25 11:32 ` Chris Wilson
2016-02-25 12:04 ` Daniele Ceraolo Spurio
0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-02-25 11:32 UTC (permalink / raw)
To: Daniele Ceraolo Spurio; +Cc: intel-gfx
On Thu, Feb 25, 2016 at 11:12:06AM +0000, Daniele Ceraolo Spurio wrote:
>
>
> On 25/02/16 10:41, Chris Wilson wrote:
> >On Thu, Feb 25, 2016 at 10:32:11AM +0000, daniele.ceraolospurio@intel.com wrote:
> >>+/* This test covers the case where we end up in an uninitialised area of the
> >>+ * ppgtt at an offset greater than the one where the last buffer is mapped. This
> >>+ * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
> >>+ * massively bigger compared to the 32b case and it takes a lot more time to
> >>+ * wrap, so the acthd can potentially keep increasing for a long time
> >>+ */
> >>+#define NSEC_PER_SEC 1000000000L
> >>+static void ppgtt_walking(void)
> >>+{
> >>+ int fd;
> >>+ int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
> >This needs a note that this has to be greater than ~5*hangcheck.
> >
> >>+ struct drm_i915_gem_execbuffer2 execbuf;
> >>+ struct drm_i915_gem_exec_object2 gem_exec;
> >>+ uint32_t handle;
> >>+ uint32_t batch[4];
> >>+
> >>+ fd = drm_open_driver(DRIVER_INTEL);
> >>+ igt_require(gem_gtt_type(fd) > 2);
> >Nope, just full-ppgtt is required (and provides a sensible hangcheck
> >test if !48bit as well).
> >
> >Note this does require that the hangcheck is enabled, so igt_require().
> >
> >>+
> >>+ /* the batch will be mapped to an offset < 4GB because the flag to allow
> >>+ * 48b offsets is not specified, so jump to address 0x00000001 00000000
> >>+ */
> >>+ batch[0] = MI_BATCH_BUFFER_START | 1;
> >>+ batch[1] = 0;
> >>+ batch[2] = 1;
> >>+ batch[3] = MI_BATCH_BUFFER_END;
> >The vm is entirely empty. Just submit an unterminated (empty) batch, and
> >it will walk from 0 to 1<<48bit and around and around and around and
> >around...
>
> I chose to jump instead of just leaving the batch unterminated to
> cover the (rare) case where the rest of the allocated 4k of the
> batch contain some random values, which could cause a hang and thus
> falsely pass the test.
That would be a huge kernel bug. Freshly allocated buffers have to be
zero to avoid information leaks. I hope you are confusing allocating
from the userspace buffer cache with a fresh kernel allocation...
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space
2016-02-25 11:32 ` Chris Wilson
@ 2016-02-25 12:04 ` Daniele Ceraolo Spurio
0 siblings, 0 replies; 7+ messages in thread
From: Daniele Ceraolo Spurio @ 2016-02-25 12:04 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Mika Kuoppala, Arun Siluvery
On 25/02/16 11:32, Chris Wilson wrote:
> On Thu, Feb 25, 2016 at 11:12:06AM +0000, Daniele Ceraolo Spurio wrote:
>>
>> On 25/02/16 10:41, Chris Wilson wrote:
>>> On Thu, Feb 25, 2016 at 10:32:11AM +0000, daniele.ceraolospurio@intel.com wrote:
>>>> +/* This test covers the case where we end up in an uninitialised area of the
>>>> + * ppgtt at an offset greater than the one where the last buffer is mapped. This
>>>> + * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
>>>> + * massively bigger compared to the 32b case and it takes a lot more time to
>>>> + * wrap, so the acthd can potentially keep increasing for a long time
>>>> + */
>>>> +#define NSEC_PER_SEC 1000000000L
>>>> +static void ppgtt_walking(void)
>>>> +{
>>>> + int fd;
>>>> + int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
>>> This needs a note that this has to be greater than ~5*hangcheck.
>>>
>>>> + struct drm_i915_gem_execbuffer2 execbuf;
>>>> + struct drm_i915_gem_exec_object2 gem_exec;
>>>> + uint32_t handle;
>>>> + uint32_t batch[4];
>>>> +
>>>> + fd = drm_open_driver(DRIVER_INTEL);
>>>> + igt_require(gem_gtt_type(fd) > 2);
>>> Nope, just full-ppgtt is required (and provides a sensible hangcheck
>>> test if !48bit as well).
>>>
>>> Note this does require that the hangcheck is enabled, so igt_require().
>>>
>>>> +
>>>> + /* the batch will be mapped to an offset < 4GB because the flag to allow
>>>> + * 48b offsets is not specified, so jump to address 0x00000001 00000000
>>>> + */
>>>> + batch[0] = MI_BATCH_BUFFER_START | 1;
>>>> + batch[1] = 0;
>>>> + batch[2] = 1;
>>>> + batch[3] = MI_BATCH_BUFFER_END;
>>> The vm is entirely empty. Just submit an unterminated (empty) batch, and
>>> it will walk from 0 to 1<<48bit and around and around and around and
>>> around...
>> I chose to jump instead of just leaving the batch unterminated to
>> cover the (rare) case where the rest of the allocated 4k of the
>> batch contain some random values, which could cause a hang and thus
>> falsely pass the test.
> That would be a huge kernel bug. Freshly allocated buffers have to be
> zero to avoid information leaks. I hope you are confusing allocating
> from the userspace buffer cache with a fresh kernel allocation...
> -Chris
>
Apologies for the confusion, you're correct I was thinking about it from
a libdrm level and not from a bare kernel level.
Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH i-g-t v3] tests/drv_hangman: test for acthd increasing through invalid VM space
2016-02-25 10:32 [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space daniele.ceraolospurio
2016-02-25 10:41 ` Chris Wilson
@ 2016-02-25 15:19 ` daniele.ceraolospurio
2016-02-26 10:24 ` Mika Kuoppala
1 sibling, 1 reply; 7+ messages in thread
From: daniele.ceraolospurio @ 2016-02-25 15:19 UTC (permalink / raw)
To: intel-gfx
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
The hangcheck logic will not flag an hang if acthd keeps increasing.
However, if a malformed batch jumps to an invalid offset in the ppgtt it
can potentially continue executing through the whole address space
without triggering the hangcheck mechanism.
This patch adds a test to simulate the issue. I've kept the test running
for more than 10 minutes before killing it on a BDW and no hang occurred.
I've sampled i915_hangcheck_info a few times during the run and got the
following:
Hangcheck active, fires in 468ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x47df685ecc [current 0x4926b81d90]
max ACTHD = 0x47df685ecc
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
Hangcheck active, fires in 424ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x6c953d3a34 [current 0x6de5e76fa4]
max ACTHD = 0x6c953d3a34
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
Hangcheck active, fires in 1692ms
render ring:
seqno = fffff55e [current fffff55e]
ACTHD = 0x1f49b0366dc [current 0x1f4dcbd88ec]
max ACTHD = 0x1f49b0366dc
score = 0
action = 2
instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
v2: use the new gem_wait() function (Chris)
v3: switch to unterminated batch and rename test, remove redundant
check, update test requirements (Chris), update top comment
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
tests/drv_hangman.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
index 8a465cf..2360f26 100644
--- a/tests/drv_hangman.c
+++ b/tests/drv_hangman.c
@@ -288,6 +288,42 @@ static void test_error_state_capture(unsigned ring_id,
check_error_state(gen, cmd_parser, ring_name, offset);
}
+/* This test covers the case where we end up in an uninitialised area of the
+ * ppgtt and keep executing through it. This is particularly relevant if 48b
+ * ppgtt is enabled because the ppgtt is massively bigger compared to the 32b
+ * case and it takes a lot more time to wrap, so the acthd can potentially keep
+ * increasing for a long time
+ */
+#define NSEC_PER_SEC 1000000000L
+static void hangcheck_unterminated(void)
+{
+ int fd;
+ /* timeout needs to be greater than ~5*hangcheck */
+ int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
+ struct drm_i915_gem_execbuffer2 execbuf;
+ struct drm_i915_gem_exec_object2 gem_exec;
+ uint32_t handle;
+
+ fd = drm_open_driver(DRIVER_INTEL);
+ igt_require(gem_uses_full_ppgtt(fd));
+ igt_require_hang_ring(fd, 0);
+
+ handle = gem_create(fd, 4096);
+
+ memset(&gem_exec, 0, sizeof(gem_exec));
+ gem_exec.handle = handle;
+
+ memset(&execbuf, 0, sizeof(execbuf));
+ execbuf.buffers_ptr = (uintptr_t)&gem_exec;
+ execbuf.buffer_count = 1;
+ execbuf.batch_len = 8;
+
+ gem_execbuf(fd, &execbuf);
+ igt_assert_eq(gem_wait(fd, handle, &timeout_ns), 0);
+
+ close(fd);
+}
+
igt_main
{
const struct intel_execution_engine *e;
@@ -314,4 +350,7 @@ igt_main
test_error_state_capture(e->exec_id | e->flags,
e->full_name);
}
+
+ igt_subtest("hangcheck-unterminated")
+ hangcheck_unterminated();
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH i-g-t v3] tests/drv_hangman: test for acthd increasing through invalid VM space
2016-02-25 15:19 ` [PATCH i-g-t v3] " daniele.ceraolospurio
@ 2016-02-26 10:24 ` Mika Kuoppala
0 siblings, 0 replies; 7+ messages in thread
From: Mika Kuoppala @ 2016-02-26 10:24 UTC (permalink / raw)
To: daniele.ceraolospurio, intel-gfx
daniele.ceraolospurio@intel.com writes:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>
> The hangcheck logic will not flag an hang if acthd keeps increasing.
> However, if a malformed batch jumps to an invalid offset in the ppgtt it
> can potentially continue executing through the whole address space
> without triggering the hangcheck mechanism.
>
> This patch adds a test to simulate the issue. I've kept the test running
> for more than 10 minutes before killing it on a BDW and no hang occurred.
> I've sampled i915_hangcheck_info a few times during the run and got the
> following:
>
> Hangcheck active, fires in 468ms
> render ring:
> seqno = fffff55e [current fffff55e]
> ACTHD = 0x47df685ecc [current 0x4926b81d90]
> max ACTHD = 0x47df685ecc
> score = 0
> action = 2
> instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
> instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
>
> Hangcheck active, fires in 424ms
> render ring:
> seqno = fffff55e [current fffff55e]
> ACTHD = 0x6c953d3a34 [current 0x6de5e76fa4]
> max ACTHD = 0x6c953d3a34
> score = 0
> action = 2
> instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
> instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
>
> Hangcheck active, fires in 1692ms
> render ring:
> seqno = fffff55e [current fffff55e]
> ACTHD = 0x1f49b0366dc [current 0x1f4dcbd88ec]
> max ACTHD = 0x1f49b0366dc
> score = 0
> action = 2
> instdone read = 0xffd7ffff 0xffffffff 0xffffffff 0xffffffff
> instdone accu = 0x00000000 0x00000000 0x00000000 0x00000000
>
> v2: use the new gem_wait() function (Chris)
>
> v3: switch to unterminated batch and rename test, remove redundant
> check, update test requirements (Chris), update top comment
>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
> tests/drv_hangman.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/tests/drv_hangman.c b/tests/drv_hangman.c
> index 8a465cf..2360f26 100644
> --- a/tests/drv_hangman.c
> +++ b/tests/drv_hangman.c
> @@ -288,6 +288,42 @@ static void test_error_state_capture(unsigned ring_id,
> check_error_state(gen, cmd_parser, ring_name, offset);
> }
>
> +/* This test covers the case where we end up in an uninitialised area of the
> + * ppgtt and keep executing through it. This is particularly relevant if 48b
> + * ppgtt is enabled because the ppgtt is massively bigger compared to the 32b
> + * case and it takes a lot more time to wrap, so the acthd can potentially keep
> + * increasing for a long time
> + */
> +#define NSEC_PER_SEC 1000000000L
> +static void hangcheck_unterminated(void)
> +{
> + int fd;
> + /* timeout needs to be greater than ~5*hangcheck */
> + int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
> + struct drm_i915_gem_execbuffer2 execbuf;
> + struct drm_i915_gem_exec_object2 gem_exec;
> + uint32_t handle;
> +
> + fd = drm_open_driver(DRIVER_INTEL);
> + igt_require(gem_uses_full_ppgtt(fd));
> + igt_require_hang_ring(fd, 0);
> +
> + handle = gem_create(fd, 4096);
> +
> + memset(&gem_exec, 0, sizeof(gem_exec));
> + gem_exec.handle = handle;
> +
> + memset(&execbuf, 0, sizeof(execbuf));
> + execbuf.buffers_ptr = (uintptr_t)&gem_exec;
> + execbuf.buffer_count = 1;
> + execbuf.batch_len = 8;
> +
> + gem_execbuf(fd, &execbuf);
> + igt_assert_eq(gem_wait(fd, handle, &timeout_ns), 0);
Chris pointed out in irc that if end up timeouting, that
means runaway head is there still progressing.
In order to make the gpu usable again, we need to forcereset
the gpu. Stop rings doesn't help at here anymore so
forcing 'echo 1 >i915_wedged_set should do the trick.
-Mika
> +
> + close(fd);
> +}
> +
> igt_main
> {
> const struct intel_execution_engine *e;
> @@ -314,4 +350,7 @@ igt_main
> test_error_state_capture(e->exec_id | e->flags,
> e->full_name);
> }
> +
> + igt_subtest("hangcheck-unterminated")
> + hangcheck_unterminated();
> }
> --
> 1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-26 10:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-25 10:32 [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space daniele.ceraolospurio
2016-02-25 10:41 ` Chris Wilson
2016-02-25 11:12 ` Daniele Ceraolo Spurio
2016-02-25 11:32 ` Chris Wilson
2016-02-25 12:04 ` Daniele Ceraolo Spurio
2016-02-25 15:19 ` [PATCH i-g-t v3] " daniele.ceraolospurio
2016-02-26 10:24 ` Mika Kuoppala
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.