[PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
@ 2016-05-23 14:09 ville.syrjala
  2016-05-23 14:09 ` [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset ville.syrjala
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: ville.syrjala @ 2016-05-23 14:09 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

SNB (and IVB too I suppose) starts to misbehave if the GPU gets stuck
in an infinite batch buffer loop. The GPU apparently hogs something
critical and CPUs start to lose interrupts and whatnot. We can keep
the system limping along by unmasking some interrupts in
GEN6_PMINTRMSK. The EI up interrupt has been previously chosen for
that task, so let's never mask it.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 29bdd79d9039..576e98744a2d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4719,7 +4719,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 		else
 			gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
 		dev_priv->rps.last_adj = 0;
-		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
+		I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, ~0));
 	}
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset
  2016-05-23 14:09 [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB ville.syrjala
@ 2016-05-23 14:09 ` ville.syrjala
  2016-05-23 14:32   ` Chris Wilson
  2016-05-23 14:09 ` [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate ville.syrjala
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: ville.syrjala @ 2016-05-23 14:09 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Based on my observations GPU reset doesn't clobber the RPS state, so
frobbing with RPS around GPU reset seems rather pointless. Just get
rid of it.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c  | 11 -----------
 drivers/gpu/drm/i915/intel_drv.h |  1 -
 drivers/gpu/drm/i915/intel_pm.c  |  9 ---------
 3 files changed, 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index dba03c026151..3794dc67caa4 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -930,8 +930,6 @@ int i915_reset(struct drm_i915_private *dev_priv)
 	unsigned reset_counter;
 	int ret;
 
-	intel_reset_gt_powersave(dev_priv);
-
 	mutex_lock(&dev->struct_mutex);
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */
@@ -994,15 +992,6 @@ int i915_reset(struct drm_i915_private *dev_priv)
 
 	mutex_unlock(&dev->struct_mutex);
 
-	/*
-	 * rps/rc6 re-init is necessary to restore state lost after the
-	 * reset and the re-install of gt irqs. Skip for ironlake per
-	 * previous concerns that it doesn't respond well to some forms
-	 * of re-init after reset.
-	 */
-	if (INTEL_INFO(dev)->gen > 5)
-		intel_enable_gt_powersave(dev_priv);
-
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 0741b2d3aa65..ca3b69b61f06 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1652,7 +1652,6 @@ void intel_cleanup_gt_powersave(struct drm_i915_private *dev_priv);
 void intel_enable_gt_powersave(struct drm_i915_private *dev_priv);
 void intel_disable_gt_powersave(struct drm_i915_private *dev_priv);
 void intel_suspend_gt_powersave(struct drm_i915_private *dev_priv);
-void intel_reset_gt_powersave(struct drm_i915_private *dev_priv);
 void gen6_update_ring_freq(struct drm_i915_private *dev_priv);
 void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 576e98744a2d..817c84c22782 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6508,15 +6508,6 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
 	}
 }
 
-void intel_reset_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	if (INTEL_INFO(dev_priv)->gen < 6)
-		return;
-
-	gen6_suspend_rps(dev_priv);
-	dev_priv->rps.enabled = false;
-}
-
 static void ibx_init_clock_gating(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate
  2016-05-23 14:09 [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB ville.syrjala
  2016-05-23 14:09 ` [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset ville.syrjala
@ 2016-05-23 14:09 ` ville.syrjala
  2016-05-23 14:33   ` Chris Wilson
  2016-05-24  8:27   ` Daniel Vetter
  2016-05-23 14:30 ` [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: ville.syrjala @ 2016-05-23 14:09 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

On SNB (at least) it's dangeruopus to hang the GPU with an infinite
batch buffer loop while RPS is disabled. The only thing that keeps
the system going in such circumstances are the internal RPS timers,
so we should never feed the GPU with RPS disabled unless we want to
risk a total system hang.

Previously we didn't fully disable RPS, but that changes in
commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
so we probably didn't see the problem so often previously. But
even before that we were at the mercy of the BIOS for the initial
RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
upon boot could have been fatal.

To avoid the problems let's just make the RPS enable immediate.
This renders the delayed_resume_work useless actually. We could perhaps
just move the ring freq table initialization to the work and do the
other stull synchronously?

Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
This is more and RFC at this point. Perhaps we might want to keep the delayed
work but just for the ring freq table update (which is the main reson this whole
thing exists in the first place). Another factor is that wait_for() is not
exactly optiomal currently, so it makes the operation slower than it needs to
be. Would need some hard numbers to see if it's worth keeping the delayed work
or not I suppose.

 drivers/gpu/drm/i915/intel_pm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 817c84c22782..e65c5c2b9b4e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6505,6 +6505,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
 		if (schedule_delayed_work(&dev_priv->rps.delayed_resume_work,
 					   round_jiffies_up_relative(HZ)))
 			intel_runtime_pm_get_noresume(dev_priv);
+		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
 	}
 }
 
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
  2016-05-23 14:09 [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB ville.syrjala
  2016-05-23 14:09 ` [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset ville.syrjala
  2016-05-23 14:09 ` [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate ville.syrjala
@ 2016-05-23 14:30 ` Chris Wilson
  2016-05-23 14:34   ` Ville Syrjälä
  2016-05-23 14:42 ` [PATCH v2 " ville.syrjala
  2016-05-23 16:29 ` ✗ Ro.CI.BAT: warning for series starting with [v2,1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB (rev2) Patchwork
  4 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2016-05-23 14:30 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Mon, May 23, 2016 at 05:09:41PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> SNB (and IVB too I suppose) starts to misbehave if the GPU gets stuck
> in an infinite batch buffer loop. The GPU apparently hogs something
> critical and CPUs start to lose interrupts and whatnot. We can keep
> the system limping along by unmasking some interrupts in
> GEN6_PMINTRMSK. The EI up interrupt has been previously chosen for
> that task, so let's never mask it.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 29bdd79d9039..576e98744a2d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4719,7 +4719,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
>  		else
>  			gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
>  		dev_priv->rps.last_adj = 0;
> -		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
> +		I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, ~0));

gen6_sanitize_rps_pm_mask()

With that,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset
  2016-05-23 14:09 ` [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset ville.syrjala
@ 2016-05-23 14:32   ` Chris Wilson
  2016-05-24  8:24     ` Daniel Vetter
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2016-05-23 14:32 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Mon, May 23, 2016 at 05:09:42PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Based on my observations GPU reset doesn't clobber the RPS state, so
> frobbing with RPS around GPU reset seems rather pointless. Just get
> rid of it.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Testcase: igt/pm_rpm/reset ?

Looks like that is the desired test.

Maybe make it basic and see what CI says :)

Actually it probably should be a basic test considering how often we
hang.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate
  2016-05-23 14:09 ` [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate ville.syrjala
@ 2016-05-23 14:33   ` Chris Wilson
  2016-05-24  8:27   ` Daniel Vetter
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2016-05-23 14:33 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> batch buffer loop while RPS is disabled. The only thing that keeps
> the system going in such circumstances are the internal RPS timers,
> so we should never feed the GPU with RPS disabled unless we want to
> risk a total system hang.
> 
> Previously we didn't fully disable RPS, but that changes in
> commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> so we probably didn't see the problem so often previously. But
> even before that we were at the mercy of the BIOS for the initial
> RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> upon boot could have been fatal.
> 
> To avoid the problems let's just make the RPS enable immediate.
> This renders the delayed_resume_work useless actually. We could perhaps
> just move the ring freq table initialization to the work and do the
> other stull synchronously?
> 
> Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> This is more and RFC at this point. Perhaps we might want to keep the delayed
> work but just for the ring freq table update (which is the main reson this whole
> thing exists in the first place). Another factor is that wait_for() is not
> exactly optiomal currently, so it makes the operation slower than it needs to
> be. Would need some hard numbers to see if it's worth keeping the delayed work
> or not I suppose.

I am in favour - the entire chain should be async, not just this little
step to setup the ring freq.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
  2016-05-23 14:30 ` [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB Chris Wilson
@ 2016-05-23 14:34   ` Ville Syrjälä
  0 siblings, 0 replies; 14+ messages in thread
From: Ville Syrjälä @ 2016-05-23 14:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Mon, May 23, 2016 at 03:30:04PM +0100, Chris Wilson wrote:
> On Mon, May 23, 2016 at 05:09:41PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > SNB (and IVB too I suppose) starts to misbehave if the GPU gets stuck
> > in an infinite batch buffer loop. The GPU apparently hogs something
> > critical and CPUs start to lose interrupts and whatnot. We can keep
> > the system limping along by unmasking some interrupts in
> > GEN6_PMINTRMSK. The EI up interrupt has been previously chosen for
> > that task, so let's never mask it.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 29bdd79d9039..576e98744a2d 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -4719,7 +4719,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
> >  		else
> >  			gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
> >  		dev_priv->rps.last_adj = 0;
> > -		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
> > +		I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, ~0));
> 
> gen6_sanitize_rps_pm_mask()

Doh. That's what I had on the other machine. Shouldn't copy patches
by retyping them by hand.

> 
> With that,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
  2016-05-23 14:09 [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB ville.syrjala
                   ` (2 preceding siblings ...)
  2016-05-23 14:30 ` [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB Chris Wilson
@ 2016-05-23 14:42 ` ville.syrjala
  2016-05-23 16:29 ` ✗ Ro.CI.BAT: warning for series starting with [v2,1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB (rev2) Patchwork
  4 siblings, 0 replies; 14+ messages in thread
From: ville.syrjala @ 2016-05-23 14:42 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

SNB (and IVB too I suppose) starts to misbehave if the GPU gets stuck
in an infinite batch buffer loop. The GPU apparently hogs something
critical and CPUs start to lose interrupts and whatnot. We can keep
the system limping along by unmasking some interrupts in
GEN6_PMINTRMSK. The EI up interrupt has been previously chosen for
that task, so let's never mask it.

v2: s/gen6_rps_pm_mask/gen6_sanitize_rps_pm_mask/ (Chris)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 29bdd79d9039..96be0b6f5712 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4719,7 +4719,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 		else
 			gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
 		dev_priv->rps.last_adj = 0;
-		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
+		I915_WRITE(GEN6_PMINTRMSK, gen6_sanitize_rps_pm_mask(dev_priv, ~0));
 	}
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* ✗ Ro.CI.BAT: warning for series starting with [v2,1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB (rev2)
  2016-05-23 14:09 [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB ville.syrjala
                   ` (3 preceding siblings ...)
  2016-05-23 14:42 ` [PATCH v2 " ville.syrjala
@ 2016-05-23 16:29 ` Patchwork
  4 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2016-05-23 16:29 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB (rev2)
URL   : https://patchwork.freedesktop.org/series/7572/
State : warning

== Summary ==

Series 7572v2 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/7572/revisions/2/mbox

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-cmd:
                fail       -> PASS       (ro-byt-n2820)
Test kms_pipe_crc_basic:
        Subgroup bad-nb-words-1:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)

fi-bdw-i7-5557u  total:209  pass:197  dwarn:0   dfail:0   fail:0   skip:12 
fi-bsw-n3050     total:209  pass:167  dwarn:0   dfail:0   fail:2   skip:40 
fi-hsw-i7-4770r  total:209  pass:186  dwarn:0   dfail:0   fail:0   skip:23 
fi-skl-i5-6260u  total:209  pass:196  dwarn:2   dfail:0   fail:0   skip:11 
fi-skl-i7-6700k  total:209  pass:182  dwarn:2   dfail:0   fail:0   skip:25 
fi-snb-i7-2600   total:209  pass:170  dwarn:0   dfail:0   fail:0   skip:39 
ro-bdw-i5-5250u  total:209  pass:172  dwarn:0   dfail:0   fail:0   skip:37 
ro-bdw-i7-5557U  total:209  pass:197  dwarn:0   dfail:0   fail:0   skip:12 
ro-bdw-i7-5600u  total:209  pass:180  dwarn:0   dfail:0   fail:1   skip:28 
ro-bsw-n3050     total:209  pass:168  dwarn:0   dfail:0   fail:2   skip:39 
ro-byt-n2820     total:209  pass:170  dwarn:0   dfail:0   fail:2   skip:37 
ro-hsw-i3-4010u  total:209  pass:186  dwarn:0   dfail:0   fail:0   skip:23 
ro-hsw-i7-4770r  total:209  pass:186  dwarn:0   dfail:0   fail:0   skip:23 
ro-ilk-i7-620lm  total:209  pass:146  dwarn:0   dfail:0   fail:1   skip:62 
ro-ilk1-i5-650   total:204  pass:146  dwarn:0   dfail:0   fail:1   skip:57 
ro-ivb2-i7-3770  total:209  pass:181  dwarn:0   dfail:0   fail:0   skip:28 
ro-skl-i7-6700hq total:204  pass:180  dwarn:3   dfail:0   fail:0   skip:21 
ro-snb-i7-2620M  total:209  pass:170  dwarn:0   dfail:0   fail:1   skip:38 
fi-byt-n2820 failed to connect after reboot
fi-hsw-i7-4770k failed to connect after reboot
ro-ivb-i7-3770 failed to connect after reboot

Results at /archive/results/CI_IGT_test/RO_Patchwork_978/

facf329 drm-intel-nightly: 2016y-05m-23d-15h-31m-43s UTC integration manifest
c2b85aa drm/i915: Make RPS enable immediate
ec7c1f6 drm/i915: Don't frob with RPS around GPU reset
7311f7d drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset
  2016-05-23 14:32   ` Chris Wilson
@ 2016-05-24  8:24     ` Daniel Vetter
  2016-05-24  8:36       ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2016-05-24  8:24 UTC (permalink / raw)
  To: Chris Wilson, ville.syrjala, intel-gfx

On Mon, May 23, 2016 at 03:32:40PM +0100, Chris Wilson wrote:
> On Mon, May 23, 2016 at 05:09:42PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Based on my observations GPU reset doesn't clobber the RPS state, so
> > frobbing with RPS around GPU reset seems rather pointless. Just get
> > rid of it.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Testcase: igt/pm_rpm/reset ?

pm_rps&rc6_residency seem more appropriate. pm_rpm just outright yanks the
power cable for the GT. Unfortunately those 2 tests aren't particularly
nasty ...
-Daniel

> 
> Looks like that is the desired test.
> 
> Maybe make it basic and see what CI says :)
> 
> Actually it probably should be a basic test considering how often we
> hang.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate
  2016-05-23 14:09 ` [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate ville.syrjala
  2016-05-23 14:33   ` Chris Wilson
@ 2016-05-24  8:27   ` Daniel Vetter
  2016-05-24 10:14     ` Ville Syrjälä
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2016-05-24  8:27 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> batch buffer loop while RPS is disabled. The only thing that keeps
> the system going in such circumstances are the internal RPS timers,
> so we should never feed the GPU with RPS disabled unless we want to
> risk a total system hang.
> 
> Previously we didn't fully disable RPS, but that changes in
> commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> so we probably didn't see the problem so often previously. But
> even before that we were at the mercy of the BIOS for the initial
> RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> upon boot could have been fatal.
> 
> To avoid the problems let's just make the RPS enable immediate.
> This renders the delayed_resume_work useless actually. We could perhaps
> just move the ring freq table initialization to the work and do the
> other stull synchronously?
> 
> Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> This is more and RFC at this point. Perhaps we might want to keep the delayed
> work but just for the ring freq table update (which is the main reson this whole
> thing exists in the first place). Another factor is that wait_for() is not
> exactly optiomal currently, so it makes the operation slower than it needs to
> be. Would need some hard numbers to see if it's worth keeping the delayed work
> or not I suppose.

Loading the ring freq tables takes forever, so definitely want to keep
that. I'd just split rps setup in two parts and push the schedule_work
down a bit. Long term we can go overboard with async (and maybey only
stall for all the GT setup in the very first batchbuffer).
-Daniel

> 
>  drivers/gpu/drm/i915/intel_pm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 817c84c22782..e65c5c2b9b4e 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6505,6 +6505,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
>  		if (schedule_delayed_work(&dev_priv->rps.delayed_resume_work,
>  					   round_jiffies_up_relative(HZ)))
>  			intel_runtime_pm_get_noresume(dev_priv);
> +		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
>  	}
>  }
>  
> -- 
> 2.7.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset
  2016-05-24  8:24     ` Daniel Vetter
@ 2016-05-24  8:36       ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2016-05-24  8:36 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, May 24, 2016 at 10:24:58AM +0200, Daniel Vetter wrote:
> On Mon, May 23, 2016 at 03:32:40PM +0100, Chris Wilson wrote:
> > On Mon, May 23, 2016 at 05:09:42PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > Based on my observations GPU reset doesn't clobber the RPS state, so
> > > frobbing with RPS around GPU reset seems rather pointless. Just get
> > > rid of it.
> > > 
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Testcase: igt/pm_rpm/reset ?
> 
> pm_rps&rc6_residency seem more appropriate. pm_rpm just outright yanks the
> power cable for the GT. Unfortunately those 2 tests aren't particularly
> nasty ...

Typo, I was looking at pm_rps since it has the reset test that should be
checking whether RPS works after hang/reset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate
  2016-05-24  8:27   ` Daniel Vetter
@ 2016-05-24 10:14     ` Ville Syrjälä
  2016-05-24 10:43       ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Ville Syrjälä @ 2016-05-24 10:14 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, May 24, 2016 at 10:27:19AM +0200, Daniel Vetter wrote:
> On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> > batch buffer loop while RPS is disabled. The only thing that keeps
> > the system going in such circumstances are the internal RPS timers,
> > so we should never feed the GPU with RPS disabled unless we want to
> > risk a total system hang.
> > 
> > Previously we didn't fully disable RPS, but that changes in
> > commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > so we probably didn't see the problem so often previously. But
> > even before that we were at the mercy of the BIOS for the initial
> > RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> > upon boot could have been fatal.
> > 
> > To avoid the problems let's just make the RPS enable immediate.
> > This renders the delayed_resume_work useless actually. We could perhaps
> > just move the ring freq table initialization to the work and do the
> > other stull synchronously?
> > 
> > Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> > This is more and RFC at this point. Perhaps we might want to keep the delayed
> > work but just for the ring freq table update (which is the main reson this whole
> > thing exists in the first place). Another factor is that wait_for() is not
> > exactly optiomal currently, so it makes the operation slower than it needs to
> > be. Would need some hard numbers to see if it's worth keeping the delayed work
> > or not I suppose.
> 
> Loading the ring freq tables takes forever, so definitely want to keep
> that.

It only takes forever becasue wait_for() sucks.

with current sleep duration of 1000-2000 us:
[  308.231364] rps init took 5533 us
[  308.266322] ring freq init took 34952 us

sleep duration reduced to 100-200 us:
[  155.367588] rps init took 679 us
[  155.371100] ring freq init took 3509 us

So looks like someone just failed to root cause the slowness, and then
went on to optimize the wrong thing.

> I'd just split rps setup in two parts and push the schedule_work
> down a bit. Long term we can go overboard with async (and maybey only
> stall for all the GT setup in the very first batchbuffer).
> -Daniel
> 
> > 
> >  drivers/gpu/drm/i915/intel_pm.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 817c84c22782..e65c5c2b9b4e 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -6505,6 +6505,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
> >  		if (schedule_delayed_work(&dev_priv->rps.delayed_resume_work,
> >  					   round_jiffies_up_relative(HZ)))
> >  			intel_runtime_pm_get_noresume(dev_priv);
> > +		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
> >  	}
> >  }
> >  
> > -- 
> > 2.7.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate
  2016-05-24 10:14     ` Ville Syrjälä
@ 2016-05-24 10:43       ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2016-05-24 10:43 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Tue, May 24, 2016 at 01:14:53PM +0300, Ville Syrjälä wrote:
> On Tue, May 24, 2016 at 10:27:19AM +0200, Daniel Vetter wrote:
> > On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> > > batch buffer loop while RPS is disabled. The only thing that keeps
> > > the system going in such circumstances are the internal RPS timers,
> > > so we should never feed the GPU with RPS disabled unless we want to
> > > risk a total system hang.
> > > 
> > > Previously we didn't fully disable RPS, but that changes in
> > > commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > > so we probably didn't see the problem so often previously. But
> > > even before that we were at the mercy of the BIOS for the initial
> > > RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> > > upon boot could have been fatal.
> > > 
> > > To avoid the problems let's just make the RPS enable immediate.
> > > This renders the delayed_resume_work useless actually. We could perhaps
> > > just move the ring freq table initialization to the work and do the
> > > other stull synchronously?
> > > 
> > > Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > ---
> > > This is more and RFC at this point. Perhaps we might want to keep the delayed
> > > work but just for the ring freq table update (which is the main reson this whole
> > > thing exists in the first place). Another factor is that wait_for() is not
> > > exactly optiomal currently, so it makes the operation slower than it needs to
> > > be. Would need some hard numbers to see if it's worth keeping the delayed work
> > > or not I suppose.
> > 
> > Loading the ring freq tables takes forever, so definitely want to keep
> > that.
> 
> It only takes forever becasue wait_for() sucks.
> 
> with current sleep duration of 1000-2000 us:
> [  308.231364] rps init took 5533 us
> [  308.266322] ring freq init took 34952 us
> 
> sleep duration reduced to 100-200 us:
> [  155.367588] rps init took 679 us
> [  155.371100] ring freq init took 3509 us
> 
> So looks like someone just failed to root cause the slowness, and then
> went on to optimize the wrong thing.

It's still 4ms that can be done in parallel to userspace starting :)
(And looks can be reduced further with an smarter wait_for).

So we encourage Mika to send his updated patches...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-05-24 10:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-23 14:09 [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB ville.syrjala
2016-05-23 14:09 ` [PATCH 2/3] drm/i915: Don't frob with RPS around GPU reset ville.syrjala
2016-05-23 14:32   ` Chris Wilson
2016-05-24  8:24     ` Daniel Vetter
2016-05-24  8:36       ` Chris Wilson
2016-05-23 14:09 ` [RFC][PATCH 3/3] drm/i915: Make RPS enable immediate ville.syrjala
2016-05-23 14:33   ` Chris Wilson
2016-05-24  8:27   ` Daniel Vetter
2016-05-24 10:14     ` Ville Syrjälä
2016-05-24 10:43       ` Chris Wilson
2016-05-23 14:30 ` [PATCH 1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB Chris Wilson
2016-05-23 14:34   ` Ville Syrjälä
2016-05-23 14:42 ` [PATCH v2 " ville.syrjala
2016-05-23 16:29 ` ✗ Ro.CI.BAT: warning for series starting with [v2,1/3] drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB (rev2) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox