All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged
@ 2017-10-15 14:30 Chris Wilson
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-15 14:30 UTC (permalink / raw)
  To: intel-gfx

If we fail to recover the HW state upon resume (i.e. our attempt to
clear the wedged bit and reset during i915_gem_sanitize() fails), then
skip the HW restart inside i915_gem_init_hw(). We will ultimate do the
the HW restart when sucessfully unwedgeding and reseting the HW later,
but attempting to restore a wedged device upon resume is risky as the HW
is in an unknown state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9d39b309ce8..5993222c81ae 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
+	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+		ret = -EIO;
+		goto out;
+	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
-- 
2.15.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
@ 2017-10-15 14:37 ` Chris Wilson
  2017-10-15 20:31   ` Chris Wilson
                     ` (2 more replies)
  2017-10-15 14:53 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-15 14:37 UTC (permalink / raw)
  To: intel-gfx

If we fail to recover the HW state upon resume (i.e. our attempt to
clear the wedged bit and reset during i915_gem_sanitize() fails), then
skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
HW restart when successfully unwedging and resetting the HW later,
but attempting to restore a wedged device upon resume is risky as the HW
is in an unknown state.

v2: Suppress the error message when detecting the already wedged HW.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9d39b309ce8..449f8c3788b1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
+	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+		ret = -EIO;
+		goto out;
+	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
@@ -4933,8 +4937,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
-		i915_gem_set_wedged(dev_priv);
+		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+			DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
+			i915_gem_set_wedged(dev_priv);
+		}
 		ret = 0;
 	}
 
-- 
2.15.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
@ 2017-10-15 14:53 ` Patchwork
  2017-10-15 15:12 ` ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2) Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2017-10-15 14:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Skip HW reinitialisation on resume if still wedged
URL   : https://patchwork.freedesktop.org/series/31987/
State : success

== Summary ==

Series 31987v1 drm/i915: Skip HW reinitialisation on resume if still wedged
https://patchwork.freedesktop.org/api/1.0/series/31987/revisions/1/mbox/

Test chamelium:
        Subgroup dp-edid-read:
                pass       -> FAIL       (fi-kbl-7500u) fdo#102672

fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:460s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:472s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:389s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:582s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:288s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:530s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:516s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:542s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:525s
fi-cfl-s         total:289  pass:253  dwarn:4   dfail:0   fail:0   skip:32  time:567s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:440s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:275s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:607s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:445s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:465s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:512s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:481s
fi-kbl-7500u     total:289  pass:263  dwarn:1   dfail:0   fail:1   skip:24  time:506s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:490s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:600s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:653s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:471s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:663s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:536s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:511s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:477s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:591s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:430s

3d7ee91be487380ef6cad329fafbe424f6885372 drm-tip: 2017y-10m-14d-00h-14m-47s UTC integration manifest
bac6521dc7aa drm/i915: Skip HW reinitialisation on resume if still wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6041/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
  2017-10-15 14:53 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-10-15 15:12 ` Patchwork
  2017-10-15 16:10 ` ✓ Fi.CI.IGT: " Patchwork
  2017-10-16 15:29 ` [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Mika Kuoppala
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2017-10-15 15:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
URL   : https://patchwork.freedesktop.org/series/31987/
State : success

== Summary ==

Series 31987v2 drm/i915: Skip HW reinitialisation on resume if still wedged
https://patchwork.freedesktop.org/api/1.0/series/31987/revisions/2/mbox/

Test gem_exec_suspend:
        Subgroup basic-s3:
                dmesg-warn -> PASS       (fi-cfl-s) fdo#103186
Test drv_module_reload:
        Subgroup basic-reload-inject:
                dmesg-warn -> INCOMPLETE (fi-cfl-s) fdo#103206

fdo#103186 https://bugs.freedesktop.org/show_bug.cgi?id=103186
fdo#103206 https://bugs.freedesktop.org/show_bug.cgi?id=103206

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:463s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:480s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:387s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:570s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:287s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:520s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:521s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:531s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:528s
fi-cfl-s         total:288  pass:254  dwarn:2   dfail:0   fail:0   skip:31 
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:442s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:273s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:602s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:438s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:456s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:501s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:471s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:502s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:485s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:596s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:651s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:469s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:656s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:531s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:572s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:471s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:582s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:429s

3d7ee91be487380ef6cad329fafbe424f6885372 drm-tip: 2017y-10m-14d-00h-14m-47s UTC integration manifest
fd6af56a8f26 drm/i915: Skip HW reinitialisation on resume if still wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6042/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
                   ` (2 preceding siblings ...)
  2017-10-15 15:12 ` ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2) Patchwork
@ 2017-10-15 16:10 ` Patchwork
  2017-10-16 15:29 ` [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Mika Kuoppala
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2017-10-15 16:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Skip HW reinitialisation on resume if still wedged (rev2)
URL   : https://patchwork.freedesktop.org/series/31987/
State : success

== Summary ==

Test kms_plane:
        Subgroup plane-panning-bottom-right-suspend-pipe-C-planes:
                skip       -> PASS       (shard-hsw)
Test kms_frontbuffer_tracking:
        Subgroup fbc-rgb101010-draw-mmap-gtt:
                skip       -> PASS       (shard-hsw)

shard-hsw        total:2553 pass:1441 dwarn:0   dfail:0   fail:9   skip:1103 time:9681s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6042/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
@ 2017-10-15 20:31   ` Chris Wilson
  2017-10-16 14:24   ` Mika Kuoppala
  2017-10-16 15:30   ` Mika Kuoppala
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-15 20:31 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2017-10-15 15:37:25)
> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> HW restart when successfully unwedging and resetting the HW later,
> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
> 
> v2: Suppress the error message when detecting the already wedged HW.
 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103240
Testcase: igt/gem_eio/in-flight-suspend
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
  2017-10-15 20:31   ` Chris Wilson
@ 2017-10-16 14:24   ` Mika Kuoppala
  2017-10-16 14:28     ` Chris Wilson
  2017-10-16 15:30   ` Mika Kuoppala
  2 siblings, 1 reply; 11+ messages in thread
From: Mika Kuoppala @ 2017-10-16 14:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> HW restart when successfully unwedging and resetting the HW later,
> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
>
> v2: Suppress the error message when detecting the already wedged HW.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9d39b309ce8..449f8c3788b1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> +	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		ret = -EIO;
> +		goto out;
> +	}
>

You have done some hw initialization already before this point.
Is there a reason for not moving this right before acquiring
forcewake?

-Mika


>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> @@ -4933,8 +4937,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  		 * wedged. But we only want to do this where the GPU is angry,
>  		 * for all other failure, such as an allocation failure, bail.
>  		 */
> -		DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> -		i915_gem_set_wedged(dev_priv);
> +		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
> +			DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> +			i915_gem_set_wedged(dev_priv);
> +		}
>  		ret = 0;
>  	}
>  
> -- 
> 2.15.0.rc0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-16 14:24   ` Mika Kuoppala
@ 2017-10-16 14:28     ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-16 14:28 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-10-16 15:24:33)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If we fail to recover the HW state upon resume (i.e. our attempt to
> > clear the wedged bit and reset during i915_gem_sanitize() fails), then
> > skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> > HW restart when successfully unwedging and resetting the HW later,
> > but attempting to restore a wedged device upon resume is risky as the HW
> > is in an unknown state.
> >
> > v2: Suppress the error message when detecting the already wedged HW.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index d9d39b309ce8..449f8c3788b1 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
> >       init_unused_rings(dev_priv);
> >  
> >       BUG_ON(!dev_priv->kernel_context);
> > +     if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> > +             ret = -EIO;
> > +             goto out;
> > +     }
> >
> 
> You have done some hw initialization already before this point.
> Is there a reason for not moving this right before acquiring
> forcewake?

init_unused_rings() is part of the sanitisation I wanted to keep. The
other mmio writes we need to sort out in the right w/a category; if they
are display related we need to keep them. Hence, being chicken and
sticking the escape clause here, right before we commit to restarting
the engines.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
                   ` (3 preceding siblings ...)
  2017-10-15 16:10 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-10-16 15:29 ` Mika Kuoppala
  4 siblings, 0 replies; 11+ messages in thread
From: Mika Kuoppala @ 2017-10-16 15:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimate do the
> the HW restart when sucessfully unwedgeding and reseting the HW later,

successfully unwedging

> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9d39b309ce8..5993222c81ae 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> +	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		ret = -EIO;
> +		goto out;
> +	}
>  
>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> -- 
> 2.15.0.rc0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
  2017-10-15 20:31   ` Chris Wilson
  2017-10-16 14:24   ` Mika Kuoppala
@ 2017-10-16 15:30   ` Mika Kuoppala
  2017-10-16 20:15     ` Chris Wilson
  2 siblings, 1 reply; 11+ messages in thread
From: Mika Kuoppala @ 2017-10-16 15:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail to recover the HW state upon resume (i.e. our attempt to
> clear the wedged bit and reset during i915_gem_sanitize() fails), then
> skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> HW restart when successfully unwedging and resetting the HW later,
> but attempting to restore a wedged device upon resume is risky as the HW
> is in an unknown state.
>
> v2: Suppress the error message when detecting the already wedged HW.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Stamping the right version is also a helpful.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9d39b309ce8..449f8c3788b1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4835,6 +4835,10 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> +	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		ret = -EIO;
> +		goto out;
> +	}
>  
>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> @@ -4933,8 +4937,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  		 * wedged. But we only want to do this where the GPU is angry,
>  		 * for all other failure, such as an allocation failure, bail.
>  		 */
> -		DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> -		i915_gem_set_wedged(dev_priv);
> +		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
> +			DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
> +			i915_gem_set_wedged(dev_priv);
> +		}
>  		ret = 0;
>  	}
>  
> -- 
> 2.15.0.rc0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Skip HW reinitialisation on resume if still wedged
  2017-10-16 15:30   ` Mika Kuoppala
@ 2017-10-16 20:15     ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-10-16 20:15 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-10-16 16:30:33)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If we fail to recover the HW state upon resume (i.e. our attempt to
> > clear the wedged bit and reset during i915_gem_sanitize() fails), then
> > skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
> > HW restart when successfully unwedging and resetting the HW later,
> > but attempting to restore a wedged device upon resume is risky as the HW
> > is in an unknown state.
> >
> > v2: Suppress the error message when detecting the already wedged HW.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> Stamping the right version is also a helpful.
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Thanks for taking the time to question the code carefully. This clears
up CI, but I am still able to kill snb at the moment with
gem_exec_whisper/hang-normal (though since we have reset enabled, it
looks to be a different problem).

One step forward, and pushed,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-10-16 20:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-15 14:30 [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Chris Wilson
2017-10-15 14:37 ` [PATCH v2] " Chris Wilson
2017-10-15 20:31   ` Chris Wilson
2017-10-16 14:24   ` Mika Kuoppala
2017-10-16 14:28     ` Chris Wilson
2017-10-16 15:30   ` Mika Kuoppala
2017-10-16 20:15     ` Chris Wilson
2017-10-15 14:53 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-10-15 15:12 ` ✓ Fi.CI.BAT: success for drm/i915: Skip HW reinitialisation on resume if still wedged (rev2) Patchwork
2017-10-15 16:10 ` ✓ Fi.CI.IGT: " Patchwork
2017-10-16 15:29 ` [PATCH] drm/i915: Skip HW reinitialisation on resume if still wedged Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.