public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged
@ 2017-10-15 14:30 Chris Wilson
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-15 14:30 UTC (permalink / raw)
  To: intel-gfx

If we fail to recover the HW state upon resume (i.e. our attempt to
clear the wedged bit and reset during i915_gem_sanitize() fails), then
skip the HW restart inside i915_gem_init_hw(). We will ultimate do the
the HW restart when sucessfully unwedgeding and reseting the HW later,
but attempting to restore a wedged device upon resume is risky as the HW
is in an unknown state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9d39b309ce8..5993222c81ae 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
+	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+		ret = -EIO;
+		goto out;
+	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
-- 
2.15.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
@ 2017-10-15 14:37 ` Chris Wilson
  2017-10-15 20:31   ` Chris Wilson
                     ` (2 more replies)
  2017-10-15 14:53 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-15 14:37 UTC (permalink / raw)
  To: intel-gfx

If we fail to recover the HW state upon resume (i.e. our attempt to
clear the wedged bit and reset during i915_gem_sanitize() fails), then
skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
HW restart when successfully unwedging and resetting the HW later,
but attempting to restore a wedged device upon resume is risky as the HW
is in an unknown state.

v2: Suppress the error message when detecting the already wedged HW.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9d39b309ce8..449f8c3788b1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
+	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+		ret = -EIO;
+		goto out;
+	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
@@ -4933,8 +4937,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
-		i915_gem_set_wedged(dev_priv);
+		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+			DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
+			i915_gem_set_wedged(dev_priv);
+		}
 		ret = 0;
 	}
 
-- 
2.15.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
@ 2017-10-15 14:53 ` Patchwork
  2017-10-15 15:12 ` ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2) Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2017-10-15 14:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Skip HW reinitialisation on resume if still wedged
URL   : https://patchwork.freedesktop.org/series/31987/
State : success

== Summary ==

Series 31987v1 drm/i915: Skip HW reinitialisation on resume if still wedged
https://patchwork.freedesktop.org/api/1.0/series/31987/revisions/1/mbox/

Test chamelium:
        Subgroup dp-edid-read:
                pass       -> FAIL       (fi-kbl-7500u) fdo#102672

fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:460s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:472s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:389s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:582s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:288s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:530s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:516s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:542s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:525s
fi-cfl-s         total:289  pass:253  dwarn:4   dfail:0   fail:0   skip:32  time:567s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:440s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:275s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:607s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:445s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:465s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:512s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:481s
fi-kbl-7500u     total:289  pass:263  dwarn:1   dfail:0   fail:1   skip:24  time:506s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:490s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:600s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:653s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:471s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:663s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:536s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:511s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:477s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:591s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:430s

3d7ee91be487380ef6cad329fafbe424f6885372 drm-tip: 2017y-10m-14d-00h-14m-47s UTC integration manifest
bac6521dc7aa drm/i915: Skip HW reinitialisation on resume if still wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6041/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
  2017-10-15 14:53 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-10-15 15:12 ` Patchwork
  2017-10-15 16:10 ` ✓ Fi.CI.IGT: " Patchwork
  2017-10-16 15:29 ` [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Mika Kuoppala
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2017-10-15 15:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
URL   : https://patchwork.freedesktop.org/series/31987/
State : success

== Summary ==

Series 31987v2 drm/i915: Skip HW reinitialisation on resume if still wedged
https://patchwork.freedesktop.org/api/1.0/series/31987/revisions/2/mbox/

Test gem_exec_suspend:
        Subgroup basic-s3:
                dmesg-warn -> PASS       (fi-cfl-s) fdo#103186
Test drv_module_reload:
        Subgroup basic-reload-inject:
                dmesg-warn -> INCOMPLETE (fi-cfl-s) fdo#103206

fdo#103186 https://bugs.freedesktop.org/show_bug.cgi?id=103186
fdo#103206 https://bugs.freedesktop.org/show_bug.cgi?id=103206

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:463s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:480s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:387s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:570s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:287s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:520s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:521s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:531s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:528s
fi-cfl-s         total:288  pass:254  dwarn:2   dfail:0   fail:0   skip:31 
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:442s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:273s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:602s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:438s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:456s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:501s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:471s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:502s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:485s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:596s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:651s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:469s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:656s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:531s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:572s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:471s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:582s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:429s

3d7ee91be487380ef6cad329fafbe424f6885372 drm-tip: 2017y-10m-14d-00h-14m-47s UTC integration manifest
fd6af56a8f26 drm/i915: Skip HW reinitialisation on resume if still wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6042/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
                   ` (2 preceding siblings ...)
  2017-10-15 15:12 ` ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2) Patchwork
@ 2017-10-15 16:10 ` Patchwork
  2017-10-16 15:29 ` [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Mika Kuoppala
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2017-10-15 16:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
URL   : https://patchwork.freedesktop.org/series/31987/
State : success

== Summary ==

Test kms_plane:
        Subgroup plane-panning-bottom-right-suspend-pipe-C-planes:
                skip       -> PASS       (shard-hsw)
Test kms_frontbuffer_tracking:
        Subgroup fbc-rgb101010-draw-mmap-gtt:
                skip       -> PASS       (shard-hsw)

shard-hsw        total:2553 pass:1441 dwarn:0   dfail:0   fail:9   skip:1103 time:9681s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6042/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
@ 2017-10-15 20:31   ` Chris Wilson
  2017-10-16 14:24   ` Mika Kuoppala
  2017-10-16 15:30   ` Mika Kuoppala
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-15 20:31 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2017-10-15 15:37:25)
> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> HW restart when successfully unwedging and resetting the HW later,
> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
> 
> v2: Suppress the error message when detecting the already wedged HW.
 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103240
Testcase: igt/gem_eio/in-flight-suspend
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
  2017-10-15 20:31   ` Chris Wilson
@ 2017-10-16 14:24   ` Mika Kuoppala
  2017-10-16 14:28     ` Chris Wilson
  2017-10-16 15:30   ` Mika Kuoppala
  2 siblings, 1 reply; 11+ messages in thread
From: Mika Kuoppala @ 2017-10-16 14:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> HW restart when successfully unwedging and resetting the HW later,
> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
>
> v2: Suppress the error message when detecting the already wedged HW.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9d39b309ce8..449f8c3788b1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> +	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		ret = -EIO;
> +		goto out;
> +	}
>

You have done some hw initialization already before this point.
Is there a reason for not moving this right before acquiring
forcewake?

-Mika


>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> @@ -4933,8 +4937,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  		 * wedged. But we only want to do this where the GPU is angry,
>  		 * for all other failure, such as an allocation failure, bail.
>  		 */
> -		DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> -		i915_gem_set_wedged(dev_priv);
> +		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
> +			DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> +			i915_gem_set_wedged(dev_priv);
> +		}
>  		ret = 0;
>  	}
>  
> -- 
> 2.15.0.rc0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-16 14:24   ` Mika Kuoppala
@ 2017-10-16 14:28     ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-16 14:28 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-10-16 15:24:33)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If we fail to recover the HW state upon resume (i.e. our attempt to
> > clear the wedged bit and reset during i915_gem_sanitize() fails), then
> > skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> > HW restart when successfully unwedging and resetting the HW later,
> > but attempting to restore a wedged device upon resume is risky as the HW
> > is in an unknown state.
> >
> > v2: Suppress the error message when detecting the already wedged HW.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index d9d39b309ce8..449f8c3788b1 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
> >       init_unused_rings(dev_priv);
> >  
> >       BUG_ON(!dev_priv->kernel_context);
> > +     if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> > +             ret = -EIO;
> > +             goto out;
> > +     }
> >
> 
> You have done some hw initialization already before this point.
> Is there a reason for not moving this right before acquiring
> forcewake?

init_unused_rings() is part of the sanitisation I wanted to keep. The
other mmio writes we need to sort out in the right w/a category; if they
are display related we need to keep them. Hence, being chicken and
sticking the escape clause here, right before we commit to restarting
the engines.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
                   ` (3 preceding siblings ...)
  2017-10-15 16:10 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-10-16 15:29 ` Mika Kuoppala
  4 siblings, 0 replies; 11+ messages in thread
From: Mika Kuoppala @ 2017-10-16 15:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimate do the
> the HW restart when sucessfully unwedgeding and reseting the HW later,

successfully unwedging

> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9d39b309ce8..5993222c81ae 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> +	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		ret = -EIO;
> +		goto out;
> +	}
>  
>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> -- 
> 2.15.0.rc0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
  2017-10-15 20:31   ` Chris Wilson
  2017-10-16 14:24   ` Mika Kuoppala
@ 2017-10-16 15:30   ` Mika Kuoppala
  2017-10-16 20:15     ` Chris Wilson
  2 siblings, 1 reply; 11+ messages in thread
From: Mika Kuoppala @ 2017-10-16 15:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> HW restart when successfully unwedging and resetting the HW later,
> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
>
> v2: Suppress the error message when detecting the already wedged HW.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Stamping the right version is also a helpful.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9d39b309ce8..449f8c3788b1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> +	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		ret = -EIO;
> +		goto out;
> +	}
>  
>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> @@ -4933,8 +4937,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  		 * wedged. But we only want to do this where the GPU is angry,
>  		 * for all other failure, such as an allocation failure, bail.
>  		 */
> -		DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> -		i915_gem_set_wedged(dev_priv);
> +		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
> +			DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> +			i915_gem_set_wedged(dev_priv);
> +		}
>  		ret = 0;
>  	}
>  
> -- 
> 2.15.0.rc0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-16 15:30   ` Mika Kuoppala
@ 2017-10-16 20:15     ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-16 20:15 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-10-16 16:30:33)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If we fail to recover the HW state upon resume (i.e. our attempt to
> > clear the wedged bit and reset during i915_gem_sanitize() fails), then
> > skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> > HW restart when successfully unwedging and resetting the HW later,
> > but attempting to restore a wedged device upon resume is risky as the HW
> > is in an unknown state.
> >
> > v2: Suppress the error message when detecting the already wedged HW.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> Stamping the right version is also a helpful.
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Thanks for taking the time to question the code carefully. This clears
up CI, but I am still able to kill snb at the moment with
gem_exec_whisper/hang-normal (though since we have reset enabled, it
looks to be a different problem).

One step forward, and pushed,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-10-16 20:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
2017-10-15 20:31   ` Chris Wilson
2017-10-16 14:24   ` Mika Kuoppala
2017-10-16 14:28     ` Chris Wilson
2017-10-16 15:30   ` Mika Kuoppala
2017-10-16 20:15     ` Chris Wilson
2017-10-15 14:53 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-10-15 15:12 ` ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2) Patchwork
2017-10-15 16:10 ` ✓ Fi.CI.IGT: " Patchwork
2017-10-16 15:29 ` [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Mika Kuoppala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox