* [PATCH] drm/i915: Allow unready gpu to be reset on gen8
@ 2015-10-30 14:43 Mika Kuoppala
2015-10-30 14:58 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2015-10-30 14:43 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter, Tomi Sarvela
Gen9 has had demonstrated cases where forcing a not ready gpu
into reset has caused system hang [1].
Gen8 has never to this date demonstrated such behaviour.
In our CI tests bsw sometimes ends up in a state where it claims it
is not ready for reset, based on reset request, after gpu hang.
Allow gen8 to reset even after claims of nonreadiness in order
to keep the gpu accessible. Enhance logging so that it will be
clear what conditions led to decision of proceeding or bailing out,
so that we will spot if this way of forcing our will against gpu turns
out to be foolhardy.
References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index f0f97b2..47c17f2 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1504,7 +1504,14 @@ not_ready:
I915_WRITE(RING_RESET_CTL(engine->mmio_base),
_MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
- return -EIO;
+ if (INTEL_INFO(dev)->gen == 9) {
+ DRM_ERROR("Reset would risk system stability, bailing out\n");
+ return -EIO;
+ }
+
+ DRM_ERROR("Forcing non ready gpu into reset\n");
+
+ return gen6_do_reset(dev);
}
static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-10-30 14:43 [PATCH] drm/i915: Allow unready gpu to be reset on gen8 Mika Kuoppala
@ 2015-10-30 14:58 ` Chris Wilson
2015-10-30 15:18 ` Mika Kuoppala
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2015-10-30 14:58 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
On Fri, Oct 30, 2015 at 04:43:49PM +0200, Mika Kuoppala wrote:
> Gen9 has had demonstrated cases where forcing a not ready gpu
> into reset has caused system hang [1].
>
> Gen8 has never to this date demonstrated such behaviour.
>
> In our CI tests bsw sometimes ends up in a state where it claims it
> is not ready for reset, based on reset request, after gpu hang.
>
> Allow gen8 to reset even after claims of nonreadiness in order
> to keep the gpu accessible. Enhance logging so that it will be
> clear what conditions led to decision of proceeding or bailing out,
> so that we will spot if this way of forcing our will against gpu turns
> out to be foolhardy.
>
> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f0f97b2..47c17f2 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1504,7 +1504,14 @@ not_ready:
> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>
> - return -EIO;
Where's the reference for where we hit this EIO on gen8?
> + if (INTEL_INFO(dev)->gen == 9) {
> + DRM_ERROR("Reset would risk system stability, bailing out\n");
> + return -EIO;
> + }
> +
> + DRM_ERROR("Forcing non ready gpu into reset\n");
> +
> + return gen6_do_reset(dev);
> }
>
> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
> --
> 2.5.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-10-30 14:58 ` Chris Wilson
@ 2015-10-30 15:18 ` Mika Kuoppala
2015-10-30 15:28 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2015-10-30 15:18 UTC (permalink / raw)
To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Fri, Oct 30, 2015 at 04:43:49PM +0200, Mika Kuoppala wrote:
>> Gen9 has had demonstrated cases where forcing a not ready gpu
>> into reset has caused system hang [1].
>>
>> Gen8 has never to this date demonstrated such behaviour.
>>
>> In our CI tests bsw sometimes ends up in a state where it claims it
>> is not ready for reset, based on reset request, after gpu hang.
>>
>> Allow gen8 to reset even after claims of nonreadiness in order
>> to keep the gpu accessible. Enhance logging so that it will be
>> clear what conditions led to decision of proceeding or bailing out,
>> so that we will spot if this way of forcing our will against gpu turns
>> out to be foolhardy.
>>
>> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index f0f97b2..47c17f2 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1504,7 +1504,14 @@ not_ready:
>> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>>
>> - return -EIO;
>
> Where's the reference for where we hit this EIO on gen8?
>
Internal CI logs, relevant part cutpasted below. If you want
full log holler me in irc.
[ 119.147727] kms_pipe_crc_basic: starting subtest hang-read-crc-pipe-A
[ 124.785063] [drm] stuck on render ring
[ 124.800850] [drm] GPU HANG: ecode 8:0:0xfffffffe, in kms_pipe_crc_ba
[5590], reason: Ring hung, action: reset
[ 124.801154] [drm] GPU hangs can indicate a bug anywhere in the entire
gfx stack, including userspace.
[ 124.801161] [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[ 124.801167] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[ 124.801173] [drm] The gpu crash dump is required to analyze gpu
hangs, so please always attach it.
[ 124.801179] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 124.801785] kobject: 'card0' (ffff880174ad92a0): kobject_uevent_env
[ 124.801940] kobject: 'card0' (ffff880174ad92a0): fill_kobj_path: path
= '/devices/pci0000:00/0000:00:02.0/drm/card0'
[ 124.805032] kobject: 'card0' (ffff880174ad92a0): kobject_uevent_env
[ 124.805089] kobject: 'card0' (ffff880174ad92a0): fill_kobj_path: path
= '/devices/pci0000:00/0000:00:02.0/drm/card0'
[ 125.511744] [drm:gen8_do_reset [i915]] *ERROR* render ring: reset
request timeout
[ 125.511922] [drm] Simulated gpu hang, resetting stop_rings
[ 125.511927] drm/i915: Resetting chip after gpu hang
[ 125.511954] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5
[ 125.637612] kms_pipe_crc_basic: exiting, ret=0
[ 125.653608] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring
create req: -5
[ 125.847695] gem_ctx_param_basic: executing
[ 125.850086] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring
create req: -5
[ 125.854482] gem_ctx_param_basic: exiting, ret=99
[ 126.038693] kms_addfb_basic: executing
[ 126.041754] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring
create req: -5
-Mika
>> + if (INTEL_INFO(dev)->gen == 9) {
>> + DRM_ERROR("Reset would risk system stability, bailing out\n");
>> + return -EIO;
>> + }
>> +
>> + DRM_ERROR("Forcing non ready gpu into reset\n");
>> +
>> + return gen6_do_reset(dev);
>> }
>>
>> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
>> --
>> 2.5.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-10-30 15:18 ` Mika Kuoppala
@ 2015-10-30 15:28 ` Chris Wilson
2015-10-30 16:07 ` Mika Kuoppala
2015-11-02 9:25 ` Mika Kuoppala
0 siblings, 2 replies; 13+ messages in thread
From: Chris Wilson @ 2015-10-30 15:28 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
On Fri, Oct 30, 2015 at 05:18:18PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > On Fri, Oct 30, 2015 at 04:43:49PM +0200, Mika Kuoppala wrote:
> >> Gen9 has had demonstrated cases where forcing a not ready gpu
> >> into reset has caused system hang [1].
> >>
> >> Gen8 has never to this date demonstrated such behaviour.
> >>
> >> In our CI tests bsw sometimes ends up in a state where it claims it
> >> is not ready for reset, based on reset request, after gpu hang.
> >>
> >> Allow gen8 to reset even after claims of nonreadiness in order
> >> to keep the gpu accessible. Enhance logging so that it will be
> >> clear what conditions led to decision of proceeding or bailing out,
> >> so that we will spot if this way of forcing our will against gpu turns
> >> out to be foolhardy.
> >>
> >> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >> ---
> >> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
> >> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> >> index f0f97b2..47c17f2 100644
> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> >> @@ -1504,7 +1504,14 @@ not_ready:
> >> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
> >> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
> >>
> >> - return -EIO;
> >
> > Where's the reference for where we hit this EIO on gen8?
> >
>
> Internal CI logs, relevant part cutpasted below. If you want
> full log holler me in irc.
So you are saying that's there no bugzilla for this... :-p
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-10-30 15:28 ` Chris Wilson
@ 2015-10-30 16:07 ` Mika Kuoppala
2015-11-02 8:35 ` Sarvela, TomiX P
2015-11-02 9:25 ` Mika Kuoppala
1 sibling, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2015-10-30 16:07 UTC (permalink / raw)
To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Fri, Oct 30, 2015 at 05:18:18PM +0200, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > On Fri, Oct 30, 2015 at 04:43:49PM +0200, Mika Kuoppala wrote:
>> >> Gen9 has had demonstrated cases where forcing a not ready gpu
>> >> into reset has caused system hang [1].
>> >>
>> >> Gen8 has never to this date demonstrated such behaviour.
>> >>
>> >> In our CI tests bsw sometimes ends up in a state where it claims it
>> >> is not ready for reset, based on reset request, after gpu hang.
>> >>
>> >> Allow gen8 to reset even after claims of nonreadiness in order
>> >> to keep the gpu accessible. Enhance logging so that it will be
>> >> clear what conditions led to decision of proceeding or bailing out,
>> >> so that we will spot if this way of forcing our will against gpu turns
>> >> out to be foolhardy.
>> >>
>> >> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> >> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
>> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> >> ---
>> >> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>> >> 1 file changed, 8 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> >> index f0f97b2..47c17f2 100644
>> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> >> @@ -1504,7 +1504,14 @@ not_ready:
>> >> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>> >> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>> >>
>> >> - return -EIO;
>> >
>> > Where's the reference for where we hit this EIO on gen8?
>> >
>>
>> Internal CI logs, relevant part cutpasted below. If you want
>> full log holler me in irc.
>
> So you are saying that's there no bugzilla for this... :-p
Bugzilla fairy might surprise us after a good weekend rest.
-Mika
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-10-30 16:07 ` Mika Kuoppala
@ 2015-11-02 8:35 ` Sarvela, TomiX P
0 siblings, 0 replies; 13+ messages in thread
From: Sarvela, TomiX P @ 2015-11-02 8:35 UTC (permalink / raw)
To: Mika Kuoppala, Chris Wilson
Cc: Daniel Vetter, intel-gfx@lists.freedesktop.org
> From: Mika Kuoppala [mailto:mika.kuoppala@linux.intel.com]
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > So you are saying that's there no bugzilla for this... :-p
>
> Bugzilla fairy might surprise us after a good weekend rest.
https://bugs.freedesktop.org/show_bug.cgi?id=92774
Regards,
Tomi
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki
Business Identity Code: 0357606 - 4
Domiciled in Helsinki
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-10-30 15:28 ` Chris Wilson
2015-10-30 16:07 ` Mika Kuoppala
@ 2015-11-02 9:25 ` Mika Kuoppala
2015-11-17 17:40 ` Daniel Vetter
1 sibling, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2015-11-02 9:25 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter, Tomi Sarvela
Gen9 has had demonstrated cases where forcing a not ready gpu
into reset has caused system hang [1].
Gen8 has never to this date demonstrated such behaviour.
In our CI tests there have been two cases of bsw ending in a state
where it claims it is not ready for reset, based on reset request,
after gpu hang [2]. Both of these cases have happened with kernel
where there are lots of debugs enabled. So it is possible that
something timing related is at play here like that wait_for_register()
usleep wakeups collide badly with forcewake.
If we assume that gen8 is safe to reset even with claims of nonreadiness,
we can alleviate this situations by forcing a reset and revive the gpu.
Enhance logging so that it will be clear what conditions led to decision
of proceeding or bailing out, so that we will spot if this way of forcing
our will against gpu turns out to be foolhardy.
v2: - add bugzilla entry for bsw behaviour (Chris)
- improve commit message
References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index f0f97b2..47c17f2 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1504,7 +1504,14 @@ not_ready:
I915_WRITE(RING_RESET_CTL(engine->mmio_base),
_MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
- return -EIO;
+ if (INTEL_INFO(dev)->gen == 9) {
+ DRM_ERROR("Reset would risk system stability, bailing out\n");
+ return -EIO;
+ }
+
+ DRM_ERROR("Forcing non ready gpu into reset\n");
+
+ return gen6_do_reset(dev);
}
static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-11-02 9:25 ` Mika Kuoppala
@ 2015-11-17 17:40 ` Daniel Vetter
2015-11-19 9:41 ` Mika Kuoppala
0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2015-11-17 17:40 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
On Mon, Nov 02, 2015 at 11:25:08AM +0200, Mika Kuoppala wrote:
> Gen9 has had demonstrated cases where forcing a not ready gpu
> into reset has caused system hang [1].
>
> Gen8 has never to this date demonstrated such behaviour.
>
> In our CI tests there have been two cases of bsw ending in a state
> where it claims it is not ready for reset, based on reset request,
> after gpu hang [2]. Both of these cases have happened with kernel
> where there are lots of debugs enabled. So it is possible that
> something timing related is at play here like that wait_for_register()
> usleep wakeups collide badly with forcewake.
>
> If we assume that gen8 is safe to reset even with claims of nonreadiness,
> we can alleviate this situations by forcing a reset and revive the gpu.
> Enhance logging so that it will be clear what conditions led to decision
> of proceeding or bailing out, so that we will spot if this way of forcing
> our will against gpu turns out to be foolhardy.
>
> v2: - add bugzilla entry for bsw behaviour (Chris)
> - improve commit message
>
> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
> References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f0f97b2..47c17f2 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1504,7 +1504,14 @@ not_ready:
> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>
> - return -EIO;
> + if (INTEL_INFO(dev)->gen == 9) {
> + DRM_ERROR("Reset would risk system stability, bailing out\n");
> + return -EIO;
> + }
> +
> + DRM_ERROR("Forcing non ready gpu into reset\n");
DRM_ERROR will cause dmes-warn failures when running igt. If we agree that
this is ok, it should be at most DRM_INFO. I'd got with DRM_DEBUG_DRIVER.
Ack with that change.
-Daniel
> +
> + return gen6_do_reset(dev);
> }
>
> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
> --
> 2.5.0
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-11-17 17:40 ` Daniel Vetter
@ 2015-11-19 9:41 ` Mika Kuoppala
2015-11-19 15:33 ` Daniel Vetter
0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2015-11-19 9:41 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
Daniel Vetter <daniel@ffwll.ch> writes:
> On Mon, Nov 02, 2015 at 11:25:08AM +0200, Mika Kuoppala wrote:
>> Gen9 has had demonstrated cases where forcing a not ready gpu
>> into reset has caused system hang [1].
>>
>> Gen8 has never to this date demonstrated such behaviour.
>>
>> In our CI tests there have been two cases of bsw ending in a state
>> where it claims it is not ready for reset, based on reset request,
>> after gpu hang [2]. Both of these cases have happened with kernel
>> where there are lots of debugs enabled. So it is possible that
>> something timing related is at play here like that wait_for_register()
>> usleep wakeups collide badly with forcewake.
>>
>> If we assume that gen8 is safe to reset even with claims of nonreadiness,
>> we can alleviate this situations by forcing a reset and revive the gpu.
>> Enhance logging so that it will be clear what conditions led to decision
>> of proceeding or bailing out, so that we will spot if this way of forcing
>> our will against gpu turns out to be foolhardy.
>>
>> v2: - add bugzilla entry for bsw behaviour (Chris)
>> - improve commit message
>>
>> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>> References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index f0f97b2..47c17f2 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1504,7 +1504,14 @@ not_ready:
>> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>>
>> - return -EIO;
>> + if (INTEL_INFO(dev)->gen == 9) {
>> + DRM_ERROR("Reset would risk system stability, bailing out\n");
>> + return -EIO;
>> + }
>> +
>> + DRM_ERROR("Forcing non ready gpu into reset\n");
>
> DRM_ERROR will cause dmes-warn failures when running igt. If we agree that
> this is ok, it should be at most DRM_INFO. I'd got with DRM_DEBUG_DRIVER.
> Ack with that change.
This is in the category 'shouldn't never happen' so I would prefer
igt to complain loudly if it can't request reset.
The real culprit was forcewakes missing on reset path, so
this patch can be forgotten and only resurrected if,
even with forcewake fix in, gen8 fails to request reset.
Here is the patch that fixed the underlying issue:
commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Thu Nov 5 13:11:38 2015 +0200
drm/i915: Do graphics device reset under forcewake
Thanks,
-Mika
> -Daniel
>
>> +
>> + return gen6_do_reset(dev);
>> }
>>
>> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
>> --
>> 2.5.0
>>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-11-19 9:41 ` Mika Kuoppala
@ 2015-11-19 15:33 ` Daniel Vetter
2015-12-30 9:48 ` Jani Nikula
0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2015-11-19 15:33 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
On Thu, Nov 19, 2015 at 11:41:07AM +0200, Mika Kuoppala wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
>
> > On Mon, Nov 02, 2015 at 11:25:08AM +0200, Mika Kuoppala wrote:
> >> Gen9 has had demonstrated cases where forcing a not ready gpu
> >> into reset has caused system hang [1].
> >>
> >> Gen8 has never to this date demonstrated such behaviour.
> >>
> >> In our CI tests there have been two cases of bsw ending in a state
> >> where it claims it is not ready for reset, based on reset request,
> >> after gpu hang [2]. Both of these cases have happened with kernel
> >> where there are lots of debugs enabled. So it is possible that
> >> something timing related is at play here like that wait_for_register()
> >> usleep wakeups collide badly with forcewake.
> >>
> >> If we assume that gen8 is safe to reset even with claims of nonreadiness,
> >> we can alleviate this situations by forcing a reset and revive the gpu.
> >> Enhance logging so that it will be clear what conditions led to decision
> >> of proceeding or bailing out, so that we will spot if this way of forcing
> >> our will against gpu turns out to be foolhardy.
> >>
> >> v2: - add bugzilla entry for bsw behaviour (Chris)
> >> - improve commit message
> >>
> >> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
> >> References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >> ---
> >> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
> >> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> >> index f0f97b2..47c17f2 100644
> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> >> @@ -1504,7 +1504,14 @@ not_ready:
> >> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
> >> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
> >>
> >> - return -EIO;
> >> + if (INTEL_INFO(dev)->gen == 9) {
> >> + DRM_ERROR("Reset would risk system stability, bailing out\n");
> >> + return -EIO;
> >> + }
> >> +
> >> + DRM_ERROR("Forcing non ready gpu into reset\n");
> >
> > DRM_ERROR will cause dmes-warn failures when running igt. If we agree that
> > this is ok, it should be at most DRM_INFO. I'd got with DRM_DEBUG_DRIVER.
> > Ack with that change.
>
> This is in the category 'shouldn't never happen' so I would prefer
> igt to complain loudly if it can't request reset.
>
> The real culprit was forcewakes missing on reset path, so
> this patch can be forgotten and only resurrected if,
> even with forcewake fix in, gen8 fails to request reset.
>
> Here is the patch that fixed the underlying issue:
>
> commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb
> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Date: Thu Nov 5 13:11:38 2015 +0200
>
> drm/i915: Do graphics device reset under forcewake
Oh that story. With the above explanation added to the commit message:
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
>
> Thanks,
> -Mika
>
> > -Daniel
> >
> >> +
> >> + return gen6_do_reset(dev);
> >> }
> >>
> >> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
> >> --
> >> 2.5.0
> >>
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-11-19 15:33 ` Daniel Vetter
@ 2015-12-30 9:48 ` Jani Nikula
2016-01-08 16:31 ` Mika Kuoppala
0 siblings, 1 reply; 13+ messages in thread
From: Jani Nikula @ 2015-12-30 9:48 UTC (permalink / raw)
To: Daniel Vetter, Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
On Thu, 19 Nov 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Nov 19, 2015 at 11:41:07AM +0200, Mika Kuoppala wrote:
>> Daniel Vetter <daniel@ffwll.ch> writes:
>>
>> > On Mon, Nov 02, 2015 at 11:25:08AM +0200, Mika Kuoppala wrote:
>> >> Gen9 has had demonstrated cases where forcing a not ready gpu
>> >> into reset has caused system hang [1].
>> >>
>> >> Gen8 has never to this date demonstrated such behaviour.
>> >>
>> >> In our CI tests there have been two cases of bsw ending in a state
>> >> where it claims it is not ready for reset, based on reset request,
>> >> after gpu hang [2]. Both of these cases have happened with kernel
>> >> where there are lots of debugs enabled. So it is possible that
>> >> something timing related is at play here like that wait_for_register()
>> >> usleep wakeups collide badly with forcewake.
>> >>
>> >> If we assume that gen8 is safe to reset even with claims of nonreadiness,
>> >> we can alleviate this situations by forcing a reset and revive the gpu.
>> >> Enhance logging so that it will be clear what conditions led to decision
>> >> of proceeding or bailing out, so that we will spot if this way of forcing
>> >> our will against gpu turns out to be foolhardy.
>> >>
>> >> v2: - add bugzilla entry for bsw behaviour (Chris)
>> >> - improve commit message
>> >>
>> >> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>> >> References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774
>> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> >> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
>> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> >> ---
>> >> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>> >> 1 file changed, 8 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> >> index f0f97b2..47c17f2 100644
>> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> >> @@ -1504,7 +1504,14 @@ not_ready:
>> >> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>> >> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>> >>
>> >> - return -EIO;
>> >> + if (INTEL_INFO(dev)->gen == 9) {
>> >> + DRM_ERROR("Reset would risk system stability, bailing out\n");
>> >> + return -EIO;
>> >> + }
>> >> +
>> >> + DRM_ERROR("Forcing non ready gpu into reset\n");
>> >
>> > DRM_ERROR will cause dmes-warn failures when running igt. If we agree that
>> > this is ok, it should be at most DRM_INFO. I'd got with DRM_DEBUG_DRIVER.
>> > Ack with that change.
>>
>> This is in the category 'shouldn't never happen' so I would prefer
>> igt to complain loudly if it can't request reset.
>>
>> The real culprit was forcewakes missing on reset path, so
>> this patch can be forgotten and only resurrected if,
>> even with forcewake fix in, gen8 fails to request reset.
>>
>> Here is the patch that fixed the underlying issue:
>>
>> commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb
>> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Date: Thu Nov 5 13:11:38 2015 +0200
>>
>> drm/i915: Do graphics device reset under forcewake
>
> Oh that story. With the above explanation added to the commit message:
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Mika, is this patch still relevant? Please repost with the above fixed
if it is.
BR,
Jani.
>
>>
>>
>> Thanks,
>> -Mika
>>
>> > -Daniel
>> >
>> >> +
>> >> + return gen6_do_reset(dev);
>> >> }
>> >>
>> >> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
>> >> --
>> >> 2.5.0
>> >>
>> >
>> > --
>> > Daniel Vetter
>> > Software Engineer, Intel Corporation
>> > http://blog.ffwll.ch
--
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2015-12-30 9:48 ` Jani Nikula
@ 2016-01-08 16:31 ` Mika Kuoppala
2016-01-08 16:52 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2016-01-08 16:31 UTC (permalink / raw)
To: Jani Nikula, Daniel Vetter; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
Jani Nikula <jani.nikula@linux.intel.com> writes:
> On Thu, 19 Nov 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Thu, Nov 19, 2015 at 11:41:07AM +0200, Mika Kuoppala wrote:
>>> Daniel Vetter <daniel@ffwll.ch> writes:
>>>
>>> > On Mon, Nov 02, 2015 at 11:25:08AM +0200, Mika Kuoppala wrote:
>>> >> Gen9 has had demonstrated cases where forcing a not ready gpu
>>> >> into reset has caused system hang [1].
>>> >>
>>> >> Gen8 has never to this date demonstrated such behaviour.
>>> >>
>>> >> In our CI tests there have been two cases of bsw ending in a state
>>> >> where it claims it is not ready for reset, based on reset request,
>>> >> after gpu hang [2]. Both of these cases have happened with kernel
>>> >> where there are lots of debugs enabled. So it is possible that
>>> >> something timing related is at play here like that wait_for_register()
>>> >> usleep wakeups collide badly with forcewake.
>>> >>
>>> >> If we assume that gen8 is safe to reset even with claims of nonreadiness,
>>> >> we can alleviate this situations by forcing a reset and revive the gpu.
>>> >> Enhance logging so that it will be clear what conditions led to decision
>>> >> of proceeding or bailing out, so that we will spot if this way of forcing
>>> >> our will against gpu turns out to be foolhardy.
>>> >>
>>> >> v2: - add bugzilla entry for bsw behaviour (Chris)
>>> >> - improve commit message
>>> >>
>>> >> References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959
>>> >> References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774
>>> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> >> Cc: Tomi Sarvela <tomix.p.sarvela@intel.com>
>>> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>> >> ---
>>> >> drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>>> >> 1 file changed, 8 insertions(+), 1 deletion(-)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>>> >> index f0f97b2..47c17f2 100644
>>> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
>>> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>>> >> @@ -1504,7 +1504,14 @@ not_ready:
>>> >> I915_WRITE(RING_RESET_CTL(engine->mmio_base),
>>> >> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>>> >>
>>> >> - return -EIO;
>>> >> + if (INTEL_INFO(dev)->gen == 9) {
>>> >> + DRM_ERROR("Reset would risk system stability, bailing out\n");
>>> >> + return -EIO;
>>> >> + }
>>> >> +
>>> >> + DRM_ERROR("Forcing non ready gpu into reset\n");
>>> >
>>> > DRM_ERROR will cause dmes-warn failures when running igt. If we agree that
>>> > this is ok, it should be at most DRM_INFO. I'd got with DRM_DEBUG_DRIVER.
>>> > Ack with that change.
>>>
>>> This is in the category 'shouldn't never happen' so I would prefer
>>> igt to complain loudly if it can't request reset.
>>>
>>> The real culprit was forcewakes missing on reset path, so
>>> this patch can be forgotten and only resurrected if,
>>> even with forcewake fix in, gen8 fails to request reset.
>>>
>>> Here is the patch that fixed the underlying issue:
>>>
>>> commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb
>>> Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> Date: Thu Nov 5 13:11:38 2015 +0200
>>>
>>> drm/i915: Do graphics device reset under forcewake
>>
>> Oh that story. With the above explanation added to the commit message:
>>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Mika, is this patch still relevant? Please repost with the above fixed
> if it is.
>
I think we should forget this and only resurrect it
if we ever again see failed resets. As
commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Thu Nov 5 13:11:38 2015 +0200
drm/i915: Do graphics device reset under forcewake
should have fixed atleast those that we saw on CI.
-Mika
> BR,
> Jani.
>
>
>>
>>>
>>>
>>> Thanks,
>>> -Mika
>>>
>>> > -Daniel
>>> >
>>> >> +
>>> >> + return gen6_do_reset(dev);
>>> >> }
>>> >>
>>> >> static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
>>> >> --
>>> >> 2.5.0
>>> >>
>>> >
>>> > --
>>> > Daniel Vetter
>>> > Software Engineer, Intel Corporation
>>> > http://blog.ffwll.ch
>
> --
> Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Allow unready gpu to be reset on gen8
2016-01-08 16:31 ` Mika Kuoppala
@ 2016-01-08 16:52 ` Chris Wilson
0 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2016-01-08 16:52 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Tomi Sarvela
On Fri, Jan 08, 2016 at 06:31:44PM +0200, Mika Kuoppala wrote:
> I think we should forget this and only resurrect it
> if we ever again see failed resets.
Bingo! I have been able to wedge the machine (bdw and bsw) recently so badly that
the resets failed with the "reset request timeout".
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-01-08 16:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-30 14:43 [PATCH] drm/i915: Allow unready gpu to be reset on gen8 Mika Kuoppala
2015-10-30 14:58 ` Chris Wilson
2015-10-30 15:18 ` Mika Kuoppala
2015-10-30 15:28 ` Chris Wilson
2015-10-30 16:07 ` Mika Kuoppala
2015-11-02 8:35 ` Sarvela, TomiX P
2015-11-02 9:25 ` Mika Kuoppala
2015-11-17 17:40 ` Daniel Vetter
2015-11-19 9:41 ` Mika Kuoppala
2015-11-19 15:33 ` Daniel Vetter
2015-12-30 9:48 ` Jani Nikula
2016-01-08 16:31 ` Mika Kuoppala
2016-01-08 16:52 ` Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox