public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
@ 2014-05-27 18:33 ville.syrjala
  2014-05-27 18:33 ` [PATCH igt] tests/kms_flip: Make flip-vs-panning-vs-hang change DSPSURF ville.syrjala
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: ville.syrjala @ 2014-05-27 18:33 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

During a GPU reset we need to get pending page flip cleared out
since the ring contents are gone and flip will never complete
on its own. This used to work until the mmio vs. CS flip race
detection came about. That piece of code is looking for a
specific surface address in the SURFLIVE register, but as
a flip to that address may never happen the check may never
pass. So we should just skip the SURFLIVE and flip counter
checks when the GPU gets reset.

intel_display_handle_reset() tries to effectively complete
the flip anyway by calling .update_primary_plane(). But that
may not satisfy the conditions of the mmio vs. CS race
detection since there's no guarantee that a modeset didn't
sneak in between the GPU reset and intel_display_handle_reset().
When that happens the primary plane updates performed by
intel_display_handle_reset() will already use the new surface
address, and thus the surface address the flip is waiting for
might never appear in SURFLIVE. The result is that the flip
will never complete and attempts to perform further page flips
will fail with -EBUSY.

During the GPU reset intel_crtc_has_pending_flip() will return
false regardless, so the deadlock with a modeset vs. the error
work acquiring crtc->mutex was avoided. And the reset_counter
check in intel_crtc_has_pending_flip() actually made this bug
even less severe since it allowed normal modesets to go through
even though there's a pending flip.

This is a regression introduced by me here:
 commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9
 Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
 Date:   Tue Apr 15 21:41:34 2014 +0300

    drm/i915: Fix mmio vs. CS flip race on ILK+

Testcase: igt/kms_flip/flip-vs-panning-vs-hang
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 94ac51f..cb9dd8e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
+	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+		return true;
+
 	/*
 	 * The relevant registers doen't exist on pre-ctg.
 	 * As the flip done interrupt doesn't trigger for mmio
-- 
1.8.5.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH igt] tests/kms_flip: Make flip-vs-panning-vs-hang change DSPSURF
  2014-05-27 18:33 [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset ville.syrjala
@ 2014-05-27 18:33 ` ville.syrjala
  2014-05-28 12:10 ` [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset Chris Wilson
  2014-10-24 14:28 ` Daniel Vetter
  2 siblings, 0 replies; 8+ messages in thread
From: ville.syrjala @ 2014-05-27 18:33 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Make sure DSPSURF will change during the panning operation
in flip-vs-panning-vs-hang.

This will now test agains bugs between the kernel's mmio vs.
CS flip race handling and GPU resets. If the kernel is buggy
if will fail to notice that the panning operation changed the
base address before the GPU reset had a chance to deal with the
pending page flips, and so the flip would never complete due to
DSPSURFLIVE not matching the expected value.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 tests/kms_flip.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/kms_flip.c b/tests/kms_flip.c
index dd64182..6711f09 100644
--- a/tests/kms_flip.c
+++ b/tests/kms_flip.c
@@ -923,6 +923,10 @@ static unsigned int run_test_step(struct test_output *o)
 			o->flip_state.count : o->vblank_state.count;
 		int x_ofs = count * 10 > o->fb_width - o->kmode[0].hdisplay ? o->fb_width - o->kmode[0].hdisplay : count * 10;
 
+		/* Make sure DSPSURF changes value */
+		if (o->flags & TEST_HANG)
+			o->current_fb_id = !o->current_fb_id;
+
 		igt_assert_f(set_mode(o, o->fb_ids[o->current_fb_id], x_ofs, 0) == 0,
 			     "failed to pan (%dx%d@%dHz)+%d: %s\n",
 			     o->kmode[0].hdisplay, o->kmode[0].vdisplay, o->kmode[0].vrefresh,
-- 
1.8.5.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
  2014-05-27 18:33 [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset ville.syrjala
  2014-05-27 18:33 ` [PATCH igt] tests/kms_flip: Make flip-vs-panning-vs-hang change DSPSURF ville.syrjala
@ 2014-05-28 12:10 ` Chris Wilson
  2014-10-24 12:40   ` Chris Wilson
  2014-10-24 14:28 ` Daniel Vetter
  2 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2014-05-28 12:10 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com wrote:
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 94ac51f..cb9dd8e 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
>  	struct drm_device *dev = crtc->base.dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> +		return true;

I really don't like this. The reset_count is incremented when the reset
starts, so we shouldn't get here with
crtc->reset_counter == gpu_error->reset_counter && reset_in_progress().

I'd prefer this to be
 if (i915_has_reset(dev_priv, crtc->reset_counter)) return true;

with a guard when reading the gpu reset_counter:

 ret = i915_get_reset_counter(dev_priv, &intel_crtc->reset_counter);
 if (ret)
	goto cleanup;

that does something like

 static inline int i915_get_reset_counter(struct drm_i915_private *dev_priv,
                                          int *value)
 {
     *value = atomic_read(dev_priv->gpu_error.reset_counter);
     if (*value & I915_WEDGED)
     	return -EIO;
     if (*value & I915_RESET_IN_PROGRESS_FLAG)
        return -EAGAIN;
     return 0;
  }
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
  2014-05-28 12:10 ` [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset Chris Wilson
@ 2014-10-24 12:40   ` Chris Wilson
  2014-10-24 12:50     ` Ville Syrjälä
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2014-10-24 12:40 UTC (permalink / raw)
  To: ville.syrjala, intel-gfx

On Wed, May 28, 2014 at 01:10:55PM +0100, Chris Wilson wrote:
> On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com wrote:
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 94ac51f..cb9dd8e 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
> >  	struct drm_device *dev = crtc->base.dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  
> > +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> > +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> > +		return true;
> 
> I really don't like this. The reset_count is incremented when the reset
> starts, so we shouldn't get here with
> crtc->reset_counter == gpu_error->reset_counter && reset_in_progress().
> 
> I'd prefer this to be
>  if (i915_has_reset(dev_priv, crtc->reset_counter)) return true;
> 
> with a guard when reading the gpu reset_counter:
> 
>  ret = i915_get_reset_counter(dev_priv, &intel_crtc->reset_counter);
>  if (ret)
> 	goto cleanup;
> 
> that does something like
> 
>  static inline int i915_get_reset_counter(struct drm_i915_private *dev_priv,
>                                           int *value)
>  {
>      *value = atomic_read(dev_priv->gpu_error.reset_counter);
>      if (*value & I915_WEDGED)
>      	return -EIO;
>      if (*value & I915_RESET_IN_PROGRESS_FLAG)
>         return -EAGAIN;
>      return 0;
>   }

Bleh, I've seen the light and this is overly complicated and doesn't
actually help make the code more readable than

if (intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
	return true;

The original patch is
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
  2014-10-24 12:40   ` Chris Wilson
@ 2014-10-24 12:50     ` Ville Syrjälä
  0 siblings, 0 replies; 8+ messages in thread
From: Ville Syrjälä @ 2014-10-24 12:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Fri, Oct 24, 2014 at 01:40:35PM +0100, Chris Wilson wrote:
> On Wed, May 28, 2014 at 01:10:55PM +0100, Chris Wilson wrote:
> > On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com wrote:
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > > index 94ac51f..cb9dd8e 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
> > >  	struct drm_device *dev = crtc->base.dev;
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  
> > > +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> > > +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> > > +		return true;
> > 
> > I really don't like this. The reset_count is incremented when the reset
> > starts, so we shouldn't get here with
> > crtc->reset_counter == gpu_error->reset_counter && reset_in_progress().
> > 
> > I'd prefer this to be
> >  if (i915_has_reset(dev_priv, crtc->reset_counter)) return true;
> > 
> > with a guard when reading the gpu reset_counter:
> > 
> >  ret = i915_get_reset_counter(dev_priv, &intel_crtc->reset_counter);
> >  if (ret)
> > 	goto cleanup;
> > 
> > that does something like
> > 
> >  static inline int i915_get_reset_counter(struct drm_i915_private *dev_priv,
> >                                           int *value)
> >  {
> >      *value = atomic_read(dev_priv->gpu_error.reset_counter);
> >      if (*value & I915_WEDGED)
> >      	return -EIO;
> >      if (*value & I915_RESET_IN_PROGRESS_FLAG)
> >         return -EAGAIN;
> >      return 0;
> >   }
> 
> Bleh, I've seen the light and this is overly complicated and doesn't
> actually help make the code more readable than
> 
> if (intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> 	return true;
> 
> The original patch is
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Thanks. I think we need cc:stable on this now.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
  2014-05-27 18:33 [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset ville.syrjala
  2014-05-27 18:33 ` [PATCH igt] tests/kms_flip: Make flip-vs-panning-vs-hang change DSPSURF ville.syrjala
  2014-05-28 12:10 ` [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset Chris Wilson
@ 2014-10-24 14:28 ` Daniel Vetter
  2014-10-24 14:39   ` Ville Syrjälä
  2014-11-21  7:43   ` Jani Nikula
  2 siblings, 2 replies; 8+ messages in thread
From: Daniel Vetter @ 2014-10-24 14:28 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> During a GPU reset we need to get pending page flip cleared out
> since the ring contents are gone and flip will never complete
> on its own. This used to work until the mmio vs. CS flip race
> detection came about. That piece of code is looking for a
> specific surface address in the SURFLIVE register, but as
> a flip to that address may never happen the check may never
> pass. So we should just skip the SURFLIVE and flip counter
> checks when the GPU gets reset.
> 
> intel_display_handle_reset() tries to effectively complete
> the flip anyway by calling .update_primary_plane(). But that
> may not satisfy the conditions of the mmio vs. CS race
> detection since there's no guarantee that a modeset didn't
> sneak in between the GPU reset and intel_display_handle_reset().
> When that happens the primary plane updates performed by
> intel_display_handle_reset() will already use the new surface
> address, and thus the surface address the flip is waiting for
> might never appear in SURFLIVE. The result is that the flip
> will never complete and attempts to perform further page flips
> will fail with -EBUSY.
> 
> During the GPU reset intel_crtc_has_pending_flip() will return
> false regardless, so the deadlock with a modeset vs. the error
> work acquiring crtc->mutex was avoided. And the reset_counter
> check in intel_crtc_has_pending_flip() actually made this bug
> even less severe since it allowed normal modesets to go through
> even though there's a pending flip.
> 
> This is a regression introduced by me here:
>  commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9
>  Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
>  Date:   Tue Apr 15 21:41:34 2014 +0300
> 
>     drm/i915: Fix mmio vs. CS flip race on ILK+
> 
> Testcase: igt/kms_flip/flip-vs-panning-vs-hang
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

So I wondered first how we'll get there since the set_base code waits for
pageflips before updating anything. Except it doesn't when there's a
pending gpu reset. I don't think we need Chris' additional paranoi since
we must complete the pageflip anyway, and we do it with hacks when there's
a gpu reset. So if the commit message gets augmented with some note that
the pageflip wait in set_base is short-circuited for resets this is

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org

Cheers, Daniel

> ---
>  drivers/gpu/drm/i915/intel_display.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 94ac51f..cb9dd8e 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
>  	struct drm_device *dev = crtc->base.dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> +		return true;
> +
>  	/*
>  	 * The relevant registers doen't exist on pre-ctg.
>  	 * As the flip done interrupt doesn't trigger for mmio
> -- 
> 1.8.5.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
  2014-10-24 14:28 ` Daniel Vetter
@ 2014-10-24 14:39   ` Ville Syrjälä
  2014-11-21  7:43   ` Jani Nikula
  1 sibling, 0 replies; 8+ messages in thread
From: Ville Syrjälä @ 2014-10-24 14:39 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Oct 24, 2014 at 04:28:50PM +0200, Daniel Vetter wrote:
> On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > During a GPU reset we need to get pending page flip cleared out
> > since the ring contents are gone and flip will never complete
> > on its own. This used to work until the mmio vs. CS flip race
> > detection came about. That piece of code is looking for a
> > specific surface address in the SURFLIVE register, but as
> > a flip to that address may never happen the check may never
> > pass. So we should just skip the SURFLIVE and flip counter
> > checks when the GPU gets reset.
> > 
> > intel_display_handle_reset() tries to effectively complete
> > the flip anyway by calling .update_primary_plane(). But that
> > may not satisfy the conditions of the mmio vs. CS race
> > detection since there's no guarantee that a modeset didn't
> > sneak in between the GPU reset and intel_display_handle_reset().

Such a modeset will not wait for pending flips due to the ongoing GPU
reset, and then the primary plane updates ...

> > When that happens the primary plane updates performed by
> > intel_display_handle_reset() will already use the new surface
> > address, and thus the surface address the flip is waiting for
> > might never appear in SURFLIVE. The result is that the flip
> > will never complete and attempts to perform further page flips
> > will fail with -EBUSY.
> > 
> > During the GPU reset intel_crtc_has_pending_flip() will return
> > false regardless, so the deadlock with a modeset vs. the error
> > work acquiring crtc->mutex was avoided. And the reset_counter
> > check in intel_crtc_has_pending_flip() actually made this bug
> > even less severe since it allowed normal modesets to go through
> > even though there's a pending flip.
> > 
> > This is a regression introduced by me here:
> >  commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9
> >  Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >  Date:   Tue Apr 15 21:41:34 2014 +0300
> > 
> >     drm/i915: Fix mmio vs. CS flip race on ILK+
> > 
> > Testcase: igt/kms_flip/flip-vs-panning-vs-hang
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> So I wondered first how we'll get there since the set_base code waits for
> pageflips before updating anything. Except it doesn't when there's a
> pending gpu reset. I don't think we need Chris' additional paranoi since
> we must complete the pageflip anyway, and we do it with hacks when there's
> a gpu reset. So if the commit message gets augmented with some note that
> the pageflip wait in set_base is short-circuited for resets this is
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org
> 
> Cheers, Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 94ac51f..cb9dd8e 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
> >  	struct drm_device *dev = crtc->base.dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  
> > +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> > +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> > +		return true;
> > +
> >  	/*
> >  	 * The relevant registers doen't exist on pre-ctg.
> >  	 * As the flip done interrupt doesn't trigger for mmio
> > -- 
> > 1.8.5.5
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
  2014-10-24 14:28 ` Daniel Vetter
  2014-10-24 14:39   ` Ville Syrjälä
@ 2014-11-21  7:43   ` Jani Nikula
  1 sibling, 0 replies; 8+ messages in thread
From: Jani Nikula @ 2014-11-21  7:43 UTC (permalink / raw)
  To: Daniel Vetter, ville.syrjala; +Cc: intel-gfx

On Fri, 24 Oct 2014, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@linux.intel.com wrote:
>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> 
>> During a GPU reset we need to get pending page flip cleared out
>> since the ring contents are gone and flip will never complete
>> on its own. This used to work until the mmio vs. CS flip race
>> detection came about. That piece of code is looking for a
>> specific surface address in the SURFLIVE register, but as
>> a flip to that address may never happen the check may never
>> pass. So we should just skip the SURFLIVE and flip counter
>> checks when the GPU gets reset.
>> 
>> intel_display_handle_reset() tries to effectively complete
>> the flip anyway by calling .update_primary_plane(). But that
>> may not satisfy the conditions of the mmio vs. CS race
>> detection since there's no guarantee that a modeset didn't
>> sneak in between the GPU reset and intel_display_handle_reset().
>> When that happens the primary plane updates performed by
>> intel_display_handle_reset() will already use the new surface
>> address, and thus the surface address the flip is waiting for
>> might never appear in SURFLIVE. The result is that the flip
>> will never complete and attempts to perform further page flips
>> will fail with -EBUSY.
>> 
>> During the GPU reset intel_crtc_has_pending_flip() will return
>> false regardless, so the deadlock with a modeset vs. the error
>> work acquiring crtc->mutex was avoided. And the reset_counter
>> check in intel_crtc_has_pending_flip() actually made this bug
>> even less severe since it allowed normal modesets to go through
>> even though there's a pending flip.
>> 
>> This is a regression introduced by me here:
>>  commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9
>>  Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>  Date:   Tue Apr 15 21:41:34 2014 +0300
>> 
>>     drm/i915: Fix mmio vs. CS flip race on ILK+
>> 
>> Testcase: igt/kms_flip/flip-vs-panning-vs-hang
>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> So I wondered first how we'll get there since the set_base code waits for
> pageflips before updating anything. Except it doesn't when there's a
> pending gpu reset. I don't think we need Chris' additional paranoi since
> we must complete the pageflip anyway, and we do it with hacks when there's
> a gpu reset. So if the commit message gets augmented with some note that
> the pageflip wait in set_base is short-circuited for resets this is
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org

Finally pushed to drm-intel-fixes. Thanks for the patch and review. And
the reminder.

I guess I ignored this because there was a highly negative review when
the patch was posted, and I didn't pay enough attention to notice the
change of mind that happened months later. Sorry.


BR,
Jani.



>
> Cheers, Daniel
>
>> ---
>>  drivers/gpu/drm/i915/intel_display.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> index 94ac51f..cb9dd8e 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
>>  	struct drm_device *dev = crtc->base.dev;
>>  	struct drm_i915_private *dev_priv = dev->dev_private;
>>  
>> +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
>> +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
>> +		return true;
>> +
>>  	/*
>>  	 * The relevant registers doen't exist on pre-ctg.
>>  	 * As the flip done interrupt doesn't trigger for mmio
>> -- 
>> 1.8.5.5
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-21  7:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-27 18:33 [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset ville.syrjala
2014-05-27 18:33 ` [PATCH igt] tests/kms_flip: Make flip-vs-panning-vs-hang change DSPSURF ville.syrjala
2014-05-28 12:10 ` [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset Chris Wilson
2014-10-24 12:40   ` Chris Wilson
2014-10-24 12:50     ` Ville Syrjälä
2014-10-24 14:28 ` Daniel Vetter
2014-10-24 14:39   ` Ville Syrjälä
2014-11-21  7:43   ` Jani Nikula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox