* [PATCH] drm/i915: Don't accumate hangcheck score on forward progress
@ 2014-08-05 11:10 Mika Kuoppala
2014-08-05 13:29 ` Chris Wilson
0 siblings, 1 reply; 5+ messages in thread
From: Mika Kuoppala @ 2014-08-05 11:10 UTC (permalink / raw)
To: intel-gfx
If the actual head has progressed forward inside a batch (request),
don't accumulate hangcheck score.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++
drivers/gpu/drm/i915/i915_irq.c | 15 ++++++++++++---
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 0b3f694..0eb4470 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -229,6 +229,8 @@ static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
return "wait";
case HANGCHECK_ACTIVE:
return "active";
+ case HANGCHECK_ACTIVE_LOOP:
+ return "active_loop";
case HANGCHECK_KICK:
return "kick";
case HANGCHECK_HUNG:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index cdab0b4..008632e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3187,8 +3187,14 @@ ring_stuck(struct intel_engine_cs *ring, u64 acthd)
struct drm_i915_private *dev_priv = dev->dev_private;
u32 tmp;
- if (ring->hangcheck.acthd != acthd)
- return HANGCHECK_ACTIVE;
+ if (acthd != ring->hangcheck.acthd) {
+ if (acthd > ring->hangcheck.max_acthd) {
+ ring->hangcheck.max_acthd = acthd;
+ return HANGCHECK_ACTIVE;
+ }
+
+ return HANGCHECK_ACTIVE_LOOP;
+ }
if (IS_GEN2(dev))
return HANGCHECK_HUNG;
@@ -3299,8 +3305,9 @@ static void i915_hangcheck_elapsed(unsigned long data)
switch (ring->hangcheck.action) {
case HANGCHECK_IDLE:
case HANGCHECK_WAIT:
- break;
case HANGCHECK_ACTIVE:
+ break;
+ case HANGCHECK_ACTIVE_LOOP:
ring->hangcheck.score += BUSY;
break;
case HANGCHECK_KICK:
@@ -3320,6 +3327,8 @@ static void i915_hangcheck_elapsed(unsigned long data)
*/
if (ring->hangcheck.score > 0)
ring->hangcheck.score--;
+
+ ring->hangcheck.acthd = ring->hangcheck.max_acthd = 0;
}
ring->hangcheck.seqno = seqno;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ed59410..70525d0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -70,6 +70,7 @@ enum intel_ring_hangcheck_action {
HANGCHECK_IDLE = 0,
HANGCHECK_WAIT,
HANGCHECK_ACTIVE,
+ HANGCHECK_ACTIVE_LOOP,
HANGCHECK_KICK,
HANGCHECK_HUNG,
};
@@ -78,6 +79,7 @@ enum intel_ring_hangcheck_action {
struct intel_ring_hangcheck {
u64 acthd;
+ u64 max_acthd;
u32 seqno;
int score;
enum intel_ring_hangcheck_action action;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] drm/i915: Don't accumate hangcheck score on forward progress
2014-08-05 11:10 [PATCH] drm/i915: Don't accumate hangcheck score on forward progress Mika Kuoppala
@ 2014-08-05 13:29 ` Chris Wilson
2014-08-05 14:16 ` [PATCH] drm/i915: Don't accumulate " Mika Kuoppala
0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2014-08-05 13:29 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Tue, Aug 05, 2014 at 02:10:55PM +0300, Mika Kuoppala wrote:
> If the actual head has progressed forward inside a batch (request),
> don't accumulate hangcheck score.
Maybe add something like:
The result should be that we only declare an active batch as stuck if it
is trapped inside a loop. Or at least has lots of loops that manage to
overcome the bonus given to forward progress.
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] drm/i915: Don't accumulate hangcheck score on forward progress
2014-08-05 13:29 ` Chris Wilson
@ 2014-08-05 14:16 ` Mika Kuoppala
2014-08-05 14:53 ` Chris Wilson
0 siblings, 1 reply; 5+ messages in thread
From: Mika Kuoppala @ 2014-08-05 14:16 UTC (permalink / raw)
To: intel-gfx
If the actual head has progressed forward inside a batch (request),
don't accumulate hangcheck score.
As the hangcheck score in increased only by acthd jumping backwards,
the result is that we only declare an active batch as stuck if it is
trapped inside a loop. Or that the looping will dominate the batch
progression so that it overcomes the bonus that forward progress gives.
v2: Improved commit message (Chris Wilson)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++
drivers/gpu/drm/i915/i915_irq.c | 15 ++++++++++++---
drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 9a13eed..a422f96 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -229,6 +229,8 @@ static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
return "wait";
case HANGCHECK_ACTIVE:
return "active";
+ case HANGCHECK_ACTIVE_LOOP:
+ return "active_loop";
case HANGCHECK_KICK:
return "kick";
case HANGCHECK_HUNG:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index db577c7..87abe86 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3184,8 +3184,14 @@ ring_stuck(struct intel_engine_cs *ring, u64 acthd)
struct drm_i915_private *dev_priv = dev->dev_private;
u32 tmp;
- if (ring->hangcheck.acthd != acthd)
- return HANGCHECK_ACTIVE;
+ if (acthd != ring->hangcheck.acthd) {
+ if (acthd > ring->hangcheck.max_acthd) {
+ ring->hangcheck.max_acthd = acthd;
+ return HANGCHECK_ACTIVE;
+ }
+
+ return HANGCHECK_ACTIVE_LOOP;
+ }
if (IS_GEN2(dev))
return HANGCHECK_HUNG;
@@ -3296,8 +3302,9 @@ static void i915_hangcheck_elapsed(unsigned long data)
switch (ring->hangcheck.action) {
case HANGCHECK_IDLE:
case HANGCHECK_WAIT:
- break;
case HANGCHECK_ACTIVE:
+ break;
+ case HANGCHECK_ACTIVE_LOOP:
ring->hangcheck.score += BUSY;
break;
case HANGCHECK_KICK:
@@ -3317,6 +3324,8 @@ static void i915_hangcheck_elapsed(unsigned long data)
*/
if (ring->hangcheck.score > 0)
ring->hangcheck.score--;
+
+ ring->hangcheck.acthd = ring->hangcheck.max_acthd = 0;
}
ring->hangcheck.seqno = seqno;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ed59410..70525d0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -70,6 +70,7 @@ enum intel_ring_hangcheck_action {
HANGCHECK_IDLE = 0,
HANGCHECK_WAIT,
HANGCHECK_ACTIVE,
+ HANGCHECK_ACTIVE_LOOP,
HANGCHECK_KICK,
HANGCHECK_HUNG,
};
@@ -78,6 +79,7 @@ enum intel_ring_hangcheck_action {
struct intel_ring_hangcheck {
u64 acthd;
+ u64 max_acthd;
u32 seqno;
int score;
enum intel_ring_hangcheck_action action;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] drm/i915: Don't accumulate hangcheck score on forward progress
2014-08-05 14:16 ` [PATCH] drm/i915: Don't accumulate " Mika Kuoppala
@ 2014-08-05 14:53 ` Chris Wilson
2014-08-05 16:37 ` Daniel Vetter
0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2014-08-05 14:53 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Tue, Aug 05, 2014 at 05:16:26PM +0300, Mika Kuoppala wrote:
> If the actual head has progressed forward inside a batch (request),
> don't accumulate hangcheck score.
>
> As the hangcheck score in increased only by acthd jumping backwards,
> the result is that we only declare an active batch as stuck if it is
> trapped inside a loop. Or that the looping will dominate the batch
> progression so that it overcomes the bonus that forward progress gives.
>
> v2: Improved commit message (Chris Wilson)
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++
> drivers/gpu/drm/i915/i915_irq.c | 15 ++++++++++++---
> drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
> 3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 9a13eed..a422f96 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -229,6 +229,8 @@ static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
> return "wait";
> case HANGCHECK_ACTIVE:
> return "active";
> + case HANGCHECK_ACTIVE_LOOP:
> + return "active_loop";
This string is for the user, i.e. me, and I'd prefer it to say
"active (loop)" rather than have the underscore.
Other than that,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915: Don't accumulate hangcheck score on forward progress
2014-08-05 14:53 ` Chris Wilson
@ 2014-08-05 16:37 ` Daniel Vetter
0 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2014-08-05 16:37 UTC (permalink / raw)
To: Chris Wilson, Mika Kuoppala, intel-gfx, zhenyuw
On Tue, Aug 05, 2014 at 03:53:08PM +0100, Chris Wilson wrote:
> On Tue, Aug 05, 2014 at 05:16:26PM +0300, Mika Kuoppala wrote:
> > If the actual head has progressed forward inside a batch (request),
> > don't accumulate hangcheck score.
> >
> > As the hangcheck score in increased only by acthd jumping backwards,
> > the result is that we only declare an active batch as stuck if it is
> > trapped inside a loop. Or that the looping will dominate the batch
> > progression so that it overcomes the bonus that forward progress gives.
> >
> > v2: Improved commit message (Chris Wilson)
> >
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++
> > drivers/gpu/drm/i915/i915_irq.c | 15 ++++++++++++---
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
> > 3 files changed, 16 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 9a13eed..a422f96 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -229,6 +229,8 @@ static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
> > return "wait";
> > case HANGCHECK_ACTIVE:
> > return "active";
> > + case HANGCHECK_ACTIVE_LOOP:
> > + return "active_loop";
>
> This string is for the user, i.e. me, and I'd prefer it to say
> "active (loop)" rather than have the underscore.
Frobbed and
>
> Other than that,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
queued for -next, thanks for the patch.
-Daniel
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-08-05 16:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-05 11:10 [PATCH] drm/i915: Don't accumate hangcheck score on forward progress Mika Kuoppala
2014-08-05 13:29 ` Chris Wilson
2014-08-05 14:16 ` [PATCH] drm/i915: Don't accumulate " Mika Kuoppala
2014-08-05 14:53 ` Chris Wilson
2014-08-05 16:37 ` Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox