All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged
@ 2014-08-11  8:21 Chris Wilson
  2014-08-11  9:30 ` Damien Lespiau
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2014-08-11  8:21 UTC (permalink / raw)
  To: intel-gfx

During ring initialisation, sometimes we observe, though not in
production hardware, that the idle flag is not set even though the ring
is empty. Double check before giving up.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Damien Lespiau <damien.lespiau@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a0831c309eab..d72d5e0e693d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -467,7 +467,12 @@ static bool stop_ring(struct intel_engine_cs *ring)
 		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
 		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
 			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
-			return false;
+			/* Sometimes we observe that the idle flag is not
+			 * set even though the ring is empty. So double
+			 * check before giving up.
+			 */
+			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
+				return false;
 		}
 	}
 
-- 
2.1.0.rc1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged
  2014-08-11  8:21 [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged Chris Wilson
@ 2014-08-11  9:30 ` Damien Lespiau
  2014-08-11  9:35   ` Chris Wilson
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Lespiau @ 2014-08-11  9:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 09:21:35AM +0100, Chris Wilson wrote:
> During ring initialisation, sometimes we observe, though not in
> production hardware, that the idle flag is not set even though the ring
> is empty. Double check before giving up.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Damien Lespiau <damien.lespiau@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index a0831c309eab..d72d5e0e693d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -467,7 +467,12 @@ static bool stop_ring(struct intel_engine_cs *ring)
>  		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
>  		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
>  			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
> -			return false;
> +			/* Sometimes we observe that the idle flag is not
> +			 * set even though the ring is empty. So double
> +			 * check before giving up.
> +			 */
> +			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
> +				return false;

That means we propably want to just put the user visible error message
there as well?

-- 
Damien

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged
  2014-08-11  9:30 ` Damien Lespiau
@ 2014-08-11  9:35   ` Chris Wilson
  2014-08-11 10:07     ` Damien Lespiau
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2014-08-11  9:35 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 10:30:09AM +0100, Damien Lespiau wrote:
> On Mon, Aug 11, 2014 at 09:21:35AM +0100, Chris Wilson wrote:
> > During ring initialisation, sometimes we observe, though not in
> > production hardware, that the idle flag is not set even though the ring
> > is empty. Double check before giving up.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Damien Lespiau <damien.lespiau@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index a0831c309eab..d72d5e0e693d 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -467,7 +467,12 @@ static bool stop_ring(struct intel_engine_cs *ring)
> >  		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
> >  		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
> >  			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
> > -			return false;
> > +			/* Sometimes we observe that the idle flag is not
> > +			 * set even though the ring is empty. So double
> > +			 * check before giving up.
> > +			 */
> > +			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
> > +				return false;
> 
> That means we propably want to just put the user visible error message
> there as well?

It is still a 1 second timeout, so having a warning there that something
is wrong is important I thought.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged
  2014-08-11  9:35   ` Chris Wilson
@ 2014-08-11 10:07     ` Damien Lespiau
  2014-08-11 10:14       ` Chris Wilson
  2014-08-11 11:34       ` Daniel Vetter
  0 siblings, 2 replies; 6+ messages in thread
From: Damien Lespiau @ 2014-08-11 10:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Mon, Aug 11, 2014 at 10:35:25AM +0100, Chris Wilson wrote:
> On Mon, Aug 11, 2014 at 10:30:09AM +0100, Damien Lespiau wrote:
> > On Mon, Aug 11, 2014 at 09:21:35AM +0100, Chris Wilson wrote:
> > > During ring initialisation, sometimes we observe, though not in
> > > production hardware, that the idle flag is not set even though the ring
> > > is empty. Double check before giving up.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Damien Lespiau <damien.lespiau@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index a0831c309eab..d72d5e0e693d 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -467,7 +467,12 @@ static bool stop_ring(struct intel_engine_cs *ring)
> > >  		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
> > >  		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
> > >  			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
> > > -			return false;
> > > +			/* Sometimes we observe that the idle flag is not
> > > +			 * set even though the ring is empty. So double
> > > +			 * check before giving up.
> > > +			 */
> > > +			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
> > > +				return false;
> > 
> > That means we propably want to just put the user visible error message
> > there as well?
> 
> It is still a 1 second timeout, so having a warning there that something
> is wrong is important I thought.

Ah, I missed the "not in production hw" bits of the commit message, It
may mean simulation and then the guess is that flag is not implemented.
If that makes us not totally give up, I guess that's something.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged
  2014-08-11 10:07     ` Damien Lespiau
@ 2014-08-11 10:14       ` Chris Wilson
  2014-08-11 11:34       ` Daniel Vetter
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2014-08-11 10:14 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 11:07:10AM +0100, Damien Lespiau wrote:
> On Mon, Aug 11, 2014 at 10:35:25AM +0100, Chris Wilson wrote:
> > On Mon, Aug 11, 2014 at 10:30:09AM +0100, Damien Lespiau wrote:
> > > On Mon, Aug 11, 2014 at 09:21:35AM +0100, Chris Wilson wrote:
> > > > During ring initialisation, sometimes we observe, though not in
> > > > production hardware, that the idle flag is not set even though the ring
> > > > is empty. Double check before giving up.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Damien Lespiau <damien.lespiau@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index a0831c309eab..d72d5e0e693d 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -467,7 +467,12 @@ static bool stop_ring(struct intel_engine_cs *ring)
> > > >  		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
> > > >  		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
> > > >  			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
> > > > -			return false;
> > > > +			/* Sometimes we observe that the idle flag is not
> > > > +			 * set even though the ring is empty. So double
> > > > +			 * check before giving up.
> > > > +			 */
> > > > +			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
> > > > +				return false;
> > > 
> > > That means we propably want to just put the user visible error message
> > > there as well?
> > 
> > It is still a 1 second timeout, so having a warning there that something
> > is wrong is important I thought.
> 
> Ah, I missed the "not in production hw" bits of the commit message, It
> may mean simulation and then the guess is that flag is not implemented.
> If that makes us not totally give up, I guess that's something.

I was trying to avoid saying that it was only to work around sim bugs :)
Otherwise, it looks to be a useful piece of paranoia and logging against
hw trying to screw us over.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged
  2014-08-11 10:07     ` Damien Lespiau
  2014-08-11 10:14       ` Chris Wilson
@ 2014-08-11 11:34       ` Daniel Vetter
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2014-08-11 11:34 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx

On Mon, Aug 11, 2014 at 11:07:10AM +0100, Damien Lespiau wrote:
> On Mon, Aug 11, 2014 at 10:35:25AM +0100, Chris Wilson wrote:
> > On Mon, Aug 11, 2014 at 10:30:09AM +0100, Damien Lespiau wrote:
> > > On Mon, Aug 11, 2014 at 09:21:35AM +0100, Chris Wilson wrote:
> > > > During ring initialisation, sometimes we observe, though not in
> > > > production hardware, that the idle flag is not set even though the ring
> > > > is empty. Double check before giving up.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Damien Lespiau <damien.lespiau@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index a0831c309eab..d72d5e0e693d 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -467,7 +467,12 @@ static bool stop_ring(struct intel_engine_cs *ring)
> > > >  		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
> > > >  		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
> > > >  			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
> > > > -			return false;
> > > > +			/* Sometimes we observe that the idle flag is not
> > > > +			 * set even though the ring is empty. So double
> > > > +			 * check before giving up.
> > > > +			 */
> > > > +			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
> > > > +				return false;
> > > 
> > > That means we propably want to just put the user visible error message
> > > there as well?
> > 
> > It is still a 1 second timeout, so having a warning there that something
> > is wrong is important I thought.
> 
> Ah, I missed the "not in production hw" bits of the commit message, It
> may mean simulation and then the guess is that flag is not implemented.
> If that makes us not totally give up, I guess that's something.
> 
> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-11 11:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-11  8:21 [PATCH] drm/i915: Double check ring is idle before declaring the GPU wedged Chris Wilson
2014-08-11  9:30 ` Damien Lespiau
2014-08-11  9:35   ` Chris Wilson
2014-08-11 10:07     ` Damien Lespiau
2014-08-11 10:14       ` Chris Wilson
2014-08-11 11:34       ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.