stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
@ 2016-03-10 11:44 Chris Wilson
  2016-03-10 12:01 ` Ville Syrjälä
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Chris Wilson @ 2016-03-10 11:44 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Ville Syrjälä, Antti Koskipää,
	Tvrtko Ursulin, stable

This effectively reverts

commit 8e5fd599eb219f1054e39b40d18b217af669eea9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Apr 9 13:28:50 2014 +0300

    drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed

as under continuous execlists load we can saturate the IRQ handler,
destablising the tsc clock and triggering the NMI watchdog to declare a hung
CPU.

[  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
[  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
[  552.756210] clocksource: Switched to clocksource refined-jiffies
[  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
[  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
[  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
[  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
[  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
[  575.217967] Call Trace:
[  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
[  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
[  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
[  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
[  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
[  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
[  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
[  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
[  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
[  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
[  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
[  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
[  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
[  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130

However, not servicing all available IIR within the handler does hurt the
throughput of pathological nop execbuf by about 20%, with a similar effect
upon the dispatch latency of a series of execbuf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Antti Koskipää <antti.koskipaa@linux.intel.com
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_irq.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 53e5104964b3..8a3230427884 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1829,35 +1829,33 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
 	disable_rpm_wakeref_asserts(dev_priv);
 
-	for (;;) {
-		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
-		iir = I915_READ(VLV_IIR);
+	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
+	iir = I915_READ(VLV_IIR);
 
-		if (master_ctl == 0 && iir == 0)
-			break;
+	if (master_ctl == 0 && iir == 0)
+		break;
 
-		ret = IRQ_HANDLED;
+	ret = IRQ_HANDLED;
 
-		I915_WRITE(GEN8_MASTER_IRQ, 0);
+	I915_WRITE(GEN8_MASTER_IRQ, 0);
 
-		/* Find, clear, then process each source of interrupt */
+	/* Find, clear, then process each source of interrupt */
 
-		if (iir) {
-			/* Consume port before clearing IIR or we'll miss events */
-			if (iir & I915_DISPLAY_PORT_INTERRUPT)
-				i9xx_hpd_irq_handler(dev);
-			I915_WRITE(VLV_IIR, iir);
-		}
+	if (iir) {
+		/* Consume port before clearing IIR or we'll miss events */
+		if (iir & I915_DISPLAY_PORT_INTERRUPT)
+			i9xx_hpd_irq_handler(dev);
+		I915_WRITE(VLV_IIR, iir);
+	}
 
-		gen8_gt_irq_handler(dev_priv, master_ctl);
+	gen8_gt_irq_handler(dev_priv, master_ctl);
 
-		/* Call regardless, as some status bits might not be
-		 * signalled in iir */
-		valleyview_pipestat_irq_handler(dev, iir);
+	/* Call regardless, as some status bits might not be
+	 * signalled in iir */
+	valleyview_pipestat_irq_handler(dev, iir);
 
-		I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
-		POSTING_READ(GEN8_MASTER_IRQ);
-	}
+	I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
+	POSTING_READ(GEN8_MASTER_IRQ);
 
 	enable_rpm_wakeref_asserts(dev_priv);
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 11:44 [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass Chris Wilson
@ 2016-03-10 12:01 ` Ville Syrjälä
  2016-03-10 12:10   ` Chris Wilson
  2016-03-10 12:01 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Ville Syrjälä @ 2016-03-10 12:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Antti Koskipää, Tvrtko Ursulin, stable

On Thu, Mar 10, 2016 at 11:44:28AM +0000, Chris Wilson wrote:
> This effectively reverts
> 
> commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> Author: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> Date:   Wed Apr 9 13:28:50 2014 +0300
> 
>     drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
> 
> as under continuous execlists load we can saturate the IRQ handler,
> destablising the tsc clock and triggering the NMI watchdog to declare a hung
> CPU.
> 
> [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> [  552.756210] clocksource: Switched to clocksource refined-jiffies
> [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> [  575.217967] Call Trace:
> [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
> 
> However, not servicing all available IIR within the handler does hurt the
> throughput of pathological nop execbuf by about 20%, with a similar effect
> upon the dispatch latency of a series of execbuf.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> Cc: Antti Koskip�� <antti.koskipaa@linux.intel.com
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 40 +++++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 53e5104964b3..8a3230427884 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1829,35 +1829,33 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
>  	disable_rpm_wakeref_asserts(dev_priv);
>  
> -	for (;;) {
> -		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> -		iir = I915_READ(VLV_IIR);
> +	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> +	iir = I915_READ(VLV_IIR);
>  
> -		if (master_ctl == 0 && iir == 0)
> -			break;
> +	if (master_ctl == 0 && iir == 0)
> +		break;

goto something?

Apart from that I have no objections if it doesn't cause problems
with interrupts getting lost and whatnot. That was the original reason
for it I think, but at least I myself never really looked into it. IIRC
Rafael just told me they needed to do it to get the thing working, so
I just put the patch in. And that was before I had even seen any silicon.

>  
> -		ret = IRQ_HANDLED;
> +	ret = IRQ_HANDLED;
>  
> -		I915_WRITE(GEN8_MASTER_IRQ, 0);
> +	I915_WRITE(GEN8_MASTER_IRQ, 0);
>  
> -		/* Find, clear, then process each source of interrupt */
> +	/* Find, clear, then process each source of interrupt */
>  
> -		if (iir) {
> -			/* Consume port before clearing IIR or we'll miss events */
> -			if (iir & I915_DISPLAY_PORT_INTERRUPT)
> -				i9xx_hpd_irq_handler(dev);
> -			I915_WRITE(VLV_IIR, iir);
> -		}
> +	if (iir) {
> +		/* Consume port before clearing IIR or we'll miss events */
> +		if (iir & I915_DISPLAY_PORT_INTERRUPT)
> +			i9xx_hpd_irq_handler(dev);
> +		I915_WRITE(VLV_IIR, iir);
> +	}
>  
> -		gen8_gt_irq_handler(dev_priv, master_ctl);
> +	gen8_gt_irq_handler(dev_priv, master_ctl);
>  
> -		/* Call regardless, as some status bits might not be
> -		 * signalled in iir */
> -		valleyview_pipestat_irq_handler(dev, iir);
> +	/* Call regardless, as some status bits might not be
> +	 * signalled in iir */
> +	valleyview_pipestat_irq_handler(dev, iir);
>  
> -		I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
> -		POSTING_READ(GEN8_MASTER_IRQ);
> -	}
> +	I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
> +	POSTING_READ(GEN8_MASTER_IRQ);
>  
>  	enable_rpm_wakeref_asserts(dev_priv);
>  
> -- 
> 2.7.0

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 11:44 [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass Chris Wilson
  2016-03-10 12:01 ` Ville Syrjälä
@ 2016-03-10 12:01 ` Tvrtko Ursulin
  2016-03-10 12:12 ` kbuild test robot
  2016-03-10 12:18 ` [PATCH v2] " Chris Wilson
  3 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2016-03-10 12:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Antti Koskipää, stable


On 10/03/16 11:44, Chris Wilson wrote:
> This effectively reverts
>
> commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Date:   Wed Apr 9 13:28:50 2014 +0300
>
>      drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
>
> as under continuous execlists load we can saturate the IRQ handler,
> destablising the tsc clock and triggering the NMI watchdog to declare a hung
> CPU.
>
> [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> [  552.756210] clocksource: Switched to clocksource refined-jiffies
> [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> [  575.217967] Call Trace:
> [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
>
> However, not servicing all available IIR within the handler does hurt the
> throughput of pathological nop execbuf by about 20%, with a similar effect
> upon the dispatch latency of a series of execbuf.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Antti Koskipää <antti.koskipaa@linux.intel.com
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: stable@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_irq.c | 40 +++++++++++++++++++---------------------
>   1 file changed, 19 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 53e5104964b3..8a3230427884 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1829,35 +1829,33 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>   	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
>   	disable_rpm_wakeref_asserts(dev_priv);
>
> -	for (;;) {
> -		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> -		iir = I915_READ(VLV_IIR);
> +	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> +	iir = I915_READ(VLV_IIR);
>
> -		if (master_ctl == 0 && iir == 0)
> -			break;
> +	if (master_ctl == 0 && iir == 0)
> +		break;

return ret; ?

>
> -		ret = IRQ_HANDLED;
> +	ret = IRQ_HANDLED;
>
> -		I915_WRITE(GEN8_MASTER_IRQ, 0);
> +	I915_WRITE(GEN8_MASTER_IRQ, 0);
>
> -		/* Find, clear, then process each source of interrupt */
> +	/* Find, clear, then process each source of interrupt */
>
> -		if (iir) {
> -			/* Consume port before clearing IIR or we'll miss events */
> -			if (iir & I915_DISPLAY_PORT_INTERRUPT)
> -				i9xx_hpd_irq_handler(dev);
> -			I915_WRITE(VLV_IIR, iir);
> -		}
> +	if (iir) {
> +		/* Consume port before clearing IIR or we'll miss events */
> +		if (iir & I915_DISPLAY_PORT_INTERRUPT)
> +			i9xx_hpd_irq_handler(dev);
> +		I915_WRITE(VLV_IIR, iir);
> +	}
>
> -		gen8_gt_irq_handler(dev_priv, master_ctl);
> +	gen8_gt_irq_handler(dev_priv, master_ctl);
>
> -		/* Call regardless, as some status bits might not be
> -		 * signalled in iir */
> -		valleyview_pipestat_irq_handler(dev, iir);
> +	/* Call regardless, as some status bits might not be
> +	 * signalled in iir */
> +	valleyview_pipestat_irq_handler(dev, iir);
>
> -		I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
> -		POSTING_READ(GEN8_MASTER_IRQ);
> -	}
> +	I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
> +	POSTING_READ(GEN8_MASTER_IRQ);
>
>   	enable_rpm_wakeref_asserts(dev_priv);
>
>

Ack on this from me since it looks obviously immensely dangerous to loop 
like that. I can't test it unfortunately.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 12:01 ` Ville Syrjälä
@ 2016-03-10 12:10   ` Chris Wilson
  2016-03-10 12:24     ` Ville Syrjälä
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2016-03-10 12:10 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: intel-gfx, Antti Koskipää, Tvrtko Ursulin, stable

On Thu, Mar 10, 2016 at 02:01:27PM +0200, Ville Syrj�l� wrote:
> On Thu, Mar 10, 2016 at 11:44:28AM +0000, Chris Wilson wrote:
> > This effectively reverts
> > 
> > commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> > Author: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > Date:   Wed Apr 9 13:28:50 2014 +0300
> > 
> >     drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
> > 
> > as under continuous execlists load we can saturate the IRQ handler,
> > destablising the tsc clock and triggering the NMI watchdog to declare a hung
> > CPU.
> > 
> > [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> > [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> > [  552.756210] clocksource: Switched to clocksource refined-jiffies
> > [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> > [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> > [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> > [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> > [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> > [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> > [  575.217967] Call Trace:
> > [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> > [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> > [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> > [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> > [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> > [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> > [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> > [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> > [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> > [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> > [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> > [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> > [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> > [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
> > 
> > However, not servicing all available IIR within the handler does hurt the
> > throughput of pathological nop execbuf by about 20%, with a similar effect
> > upon the dispatch latency of a series of execbuf.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> > Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > Cc: Antti Koskip�� <antti.koskipaa@linux.intel.com
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 40 +++++++++++++++++++---------------------
> >  1 file changed, 19 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 53e5104964b3..8a3230427884 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1829,35 +1829,33 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
> >  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
> >  	disable_rpm_wakeref_asserts(dev_priv);
> >  
> > -	for (;;) {
> > -		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > -		iir = I915_READ(VLV_IIR);
> > +	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > +	iir = I915_READ(VLV_IIR);
> >  
> > -		if (master_ctl == 0 && iir == 0)
> > -			break;
> > +	if (master_ctl == 0 && iir == 0)
> > +		break;
> 
> goto something?

Sigh. The problem of rewriting the "obvious" patch against -nightly. I
just changed the for(;;) into do {} while(0) for testing. Perhaps I
should stick with that in case we need to flip flop agin.

> Apart from that I have no objections if it doesn't cause problems
> with interrupts getting lost and whatnot. That was the original reason
> for it I think, but at least I myself never really looked into it. IIRC
> Rafael just told me they needed to do it to get the thing working, so
> I just put the patch in. And that was before I had even seen any silicon.

My testing only looks at the GT side, and we do stress that pretty hard
because of execlists and have reasonable methods of detection if we stop
processing execbuf. I'm more worried about the display and pipe interrupts.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 11:44 [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass Chris Wilson
  2016-03-10 12:01 ` Ville Syrjälä
  2016-03-10 12:01 ` [Intel-gfx] " Tvrtko Ursulin
@ 2016-03-10 12:12 ` kbuild test robot
  2016-03-10 12:18 ` [PATCH v2] " Chris Wilson
  3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-03-10 12:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: kbuild-all, intel-gfx, Antti Koskipää, stable

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

Hi Chris,

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on v4.5-rc7 next-20160310]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Chris-Wilson/drm-i915-Exit-cherryview_irq_handler-after-one-pass/20160310-194801
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-allyesconfig (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/gpu/drm/i915/i915_irq.c: In function 'cherryview_irq_handler':
>> drivers/gpu/drm/i915/i915_irq.c:1836:3: error: break statement not within loop or switch
      break;
      ^

vim +1836 drivers/gpu/drm/i915/i915_irq.c

1f814dac Imre Deak     2015-12-16  1830  	disable_rpm_wakeref_asserts(dev_priv);
1f814dac Imre Deak     2015-12-16  1831  
8e5fd599 Ville Syrj�l� 2014-04-09  1832  	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
43f328d7 Ville Syrj�l� 2014-04-09  1833  	iir = I915_READ(VLV_IIR);
43f328d7 Ville Syrj�l� 2014-04-09  1834  
3278f67f Ville Syrj�l� 2014-04-09  1835  	if (master_ctl == 0 && iir == 0)
8e5fd599 Ville Syrj�l� 2014-04-09 @1836  		break;
43f328d7 Ville Syrj�l� 2014-04-09  1837  
27b6c122 Oscar Mateo   2014-06-16  1838  	ret = IRQ_HANDLED;
43f328d7 Ville Syrj�l� 2014-04-09  1839  

:::::: The code at line 1836 was first introduced by commit
:::::: 8e5fd599eb219f1054e39b40d18b217af669eea9 drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed

:::::: TO: Ville Syrj�l� <ville.syrjala@linux.intel.com>
:::::: CC: Daniel Vetter <daniel.vetter@ffwll.ch>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 52462 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 11:44 [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass Chris Wilson
                   ` (2 preceding siblings ...)
  2016-03-10 12:12 ` kbuild test robot
@ 2016-03-10 12:18 ` Chris Wilson
  2016-03-10 12:25   ` Ville Syrjälä
  2016-03-10 12:38   ` [Intel-gfx] " Tvrtko Ursulin
  3 siblings, 2 replies; 11+ messages in thread
From: Chris Wilson @ 2016-03-10 12:18 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Ville Syrjälä, Antti Koskipää,
	Tvrtko Ursulin, stable

This effectively reverts

commit 8e5fd599eb219f1054e39b40d18b217af669eea9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Apr 9 13:28:50 2014 +0300

    drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed

as under continuous execlists load we can saturate the IRQ handler,
destablising the tsc clock and triggering the NMI watchdog to declare a hung
CPU.

[  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
[  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
[  552.756210] clocksource: Switched to clocksource refined-jiffies
[  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
[  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
[  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
[  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
[  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
[  575.217967] Call Trace:
[  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
[  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
[  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
[  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
[  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
[  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
[  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
[  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
[  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
[  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
[  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
[  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
[  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
[  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130

However, not servicing all available IIR within the handler does hurt the
throughput of pathological nop execbuf by about 20%, with a similar effect
upon the dispatch latency of a series of execbuf.

v2: use do {} while(0) for a smaller patch, and easier to revert again

We have reasonable confidence that we do not miss GT interrupts (as
execlists provides a stress case with a failure mechanism easily
detected by igt), however we have less confidence about all the other
sources of interrupts and worry that may lose a display hotplug
interrupt, for example.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Antti Koskipää <antti.koskipaa@linux.intel.com
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 53e5104964b3..30d8bb7bf078 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1829,7 +1829,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
 	disable_rpm_wakeref_asserts(dev_priv);
 
-	for (;;) {
+	do {
 		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
 		iir = I915_READ(VLV_IIR);
 
@@ -1857,7 +1857,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 
 		I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
 		POSTING_READ(GEN8_MASTER_IRQ);
-	}
+	} while (0);
 
 	enable_rpm_wakeref_asserts(dev_priv);
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 12:10   ` Chris Wilson
@ 2016-03-10 12:24     ` Ville Syrjälä
  2016-03-10 12:42       ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Ville Syrjälä @ 2016-03-10 12:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Antti Koskipää, Tvrtko Ursulin,
	stable

On Thu, Mar 10, 2016 at 12:10:46PM +0000, Chris Wilson wrote:
> On Thu, Mar 10, 2016 at 02:01:27PM +0200, Ville Syrj�l� wrote:
> > On Thu, Mar 10, 2016 at 11:44:28AM +0000, Chris Wilson wrote:
> > > This effectively reverts
> > > 
> > > commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> > > Author: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > Date:   Wed Apr 9 13:28:50 2014 +0300
> > > 
> > >     drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
> > > 
> > > as under continuous execlists load we can saturate the IRQ handler,
> > > destablising the tsc clock and triggering the NMI watchdog to declare a hung
> > > CPU.
> > > 
> > > [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> > > [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> > > [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> > > [  552.756210] clocksource: Switched to clocksource refined-jiffies
> > > [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> > > [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> > > [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> > > [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> > > [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> > > [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> > > [  575.217967] Call Trace:
> > > [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> > > [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> > > [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> > > [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> > > [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> > > [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > > [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > > [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> > > [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> > > [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > > [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> > > [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> > > [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > > [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > > [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > > [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> > > [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> > > [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> > > [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> > > [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
> > > 
> > > However, not servicing all available IIR within the handler does hurt the
> > > throughput of pathological nop execbuf by about 20%, with a similar effect
> > > upon the dispatch latency of a series of execbuf.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> > > Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > Cc: Antti Koskip�� <antti.koskipaa@linux.intel.com
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/i915_irq.c | 40 +++++++++++++++++++---------------------
> > >  1 file changed, 19 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > index 53e5104964b3..8a3230427884 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -1829,35 +1829,33 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
> > >  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
> > >  	disable_rpm_wakeref_asserts(dev_priv);
> > >  
> > > -	for (;;) {
> > > -		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > > -		iir = I915_READ(VLV_IIR);
> > > +	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > > +	iir = I915_READ(VLV_IIR);
> > >  
> > > -		if (master_ctl == 0 && iir == 0)
> > > -			break;
> > > +	if (master_ctl == 0 && iir == 0)
> > > +		break;
> > 
> > goto something?
> 
> Sigh. The problem of rewriting the "obvious" patch against -nightly. I
> just changed the for(;;) into do {} while(0) for testing. Perhaps I
> should stick with that in case we need to flip flop agin.
> 
> > Apart from that I have no objections if it doesn't cause problems
> > with interrupts getting lost and whatnot. That was the original reason
> > for it I think, but at least I myself never really looked into it. IIRC
> > Rafael just told me they needed to do it to get the thing working, so
> > I just put the patch in. And that was before I had even seen any silicon.
> 
> My testing only looks at the GT side, and we do stress that pretty hard
> because of execlists and have reasonable methods of detection if we stop
> processing execbuf. I'm more worried about the display and pipe interrupts.

IIRC GT was where the problem was originally.

And just as a side note, I do have a branch somewhere that rewrites all
the gmch irq handlers to not loop. Just never actually found the time to
really run it on anything :) So I like moving towards that direction in
any case.

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 12:18 ` [PATCH v2] " Chris Wilson
@ 2016-03-10 12:25   ` Ville Syrjälä
  2016-03-10 12:38   ` [Intel-gfx] " Tvrtko Ursulin
  1 sibling, 0 replies; 11+ messages in thread
From: Ville Syrjälä @ 2016-03-10 12:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Antti Koskipää, Tvrtko Ursulin, stable

On Thu, Mar 10, 2016 at 12:18:49PM +0000, Chris Wilson wrote:
> This effectively reverts
> 
> commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> Author: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> Date:   Wed Apr 9 13:28:50 2014 +0300
> 
>     drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
> 
> as under continuous execlists load we can saturate the IRQ handler,
> destablising the tsc clock and triggering the NMI watchdog to declare a hung
> CPU.
> 
> [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> [  552.756210] clocksource: Switched to clocksource refined-jiffies
> [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> [  575.217967] Call Trace:
> [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
> 
> However, not servicing all available IIR within the handler does hurt the
> throughput of pathological nop execbuf by about 20%, with a similar effect
> upon the dispatch latency of a series of execbuf.
> 
> v2: use do {} while(0) for a smaller patch, and easier to revert again
> 
> We have reasonable confidence that we do not miss GT interrupts (as
> execlists provides a stress case with a failure mechanism easily
> detected by igt), however we have less confidence about all the other
> sources of interrupts and worry that may lose a display hotplug
> interrupt, for example.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> Cc: Antti Koskip�� <antti.koskipaa@linux.intel.com
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Acked-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 53e5104964b3..30d8bb7bf078 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1829,7 +1829,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
>  	disable_rpm_wakeref_asserts(dev_priv);
>  
> -	for (;;) {
> +	do {
>  		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
>  		iir = I915_READ(VLV_IIR);
>  
> @@ -1857,7 +1857,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>  
>  		I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
>  		POSTING_READ(GEN8_MASTER_IRQ);
> -	}
> +	} while (0);
>  
>  	enable_rpm_wakeref_asserts(dev_priv);
>  
> -- 
> 2.7.0

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 12:18 ` [PATCH v2] " Chris Wilson
  2016-03-10 12:25   ` Ville Syrjälä
@ 2016-03-10 12:38   ` Tvrtko Ursulin
  1 sibling, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2016-03-10 12:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Antti Koskipää, stable


On 10/03/16 12:18, Chris Wilson wrote:
> This effectively reverts
>
> commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Date:   Wed Apr 9 13:28:50 2014 +0300
>
>      drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
>
> as under continuous execlists load we can saturate the IRQ handler,
> destablising the tsc clock and triggering the NMI watchdog to declare a hung
> CPU.
>
> [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> [  552.756210] clocksource: Switched to clocksource refined-jiffies
> [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> [  575.217967] Call Trace:
> [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
>
> However, not servicing all available IIR within the handler does hurt the
> throughput of pathological nop execbuf by about 20%, with a similar effect
> upon the dispatch latency of a series of execbuf.
>
> v2: use do {} while(0) for a smaller patch, and easier to revert again
>
> We have reasonable confidence that we do not miss GT interrupts (as
> execlists provides a stress case with a failure mechanism easily
> detected by igt), however we have less confidence about all the other
> sources of interrupts and worry that may lose a display hotplug
> interrupt, for example.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Antti Koskipää <antti.koskipaa@linux.intel.com
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 53e5104964b3..30d8bb7bf078 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1829,7 +1829,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>   	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
>   	disable_rpm_wakeref_asserts(dev_priv);
>
> -	for (;;) {
> +	do {
>   		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
>   		iir = I915_READ(VLV_IIR);
>
> @@ -1857,7 +1857,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>
>   		I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
>   		POSTING_READ(GEN8_MASTER_IRQ);
> -	}
> +	} while (0);
>
>   	enable_rpm_wakeref_asserts(dev_priv);
>
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 12:24     ` Ville Syrjälä
@ 2016-03-10 12:42       ` Chris Wilson
  2016-03-10 13:01         ` Ville Syrjälä
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2016-03-10 12:42 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: intel-gfx, Antti Koskipää, Tvrtko Ursulin, stable

On Thu, Mar 10, 2016 at 02:24:39PM +0200, Ville Syrj�l� wrote:
> On Thu, Mar 10, 2016 at 12:10:46PM +0000, Chris Wilson wrote:
> > My testing only looks at the GT side, and we do stress that pretty hard
> > because of execlists and have reasonable methods of detection if we stop
> > processing execbuf. I'm more worried about the display and pipe interrupts.
> 
> IIRC GT was where the problem was originally.

Before execlists, the only source of GT interrupts would be
user-interrupts. There the problem is usually not so much that we miss
the GT interrupt, but that the seqno write is not completed by the time
the interrupt is asserted. I hope that was the problem you saw.

Anyway, confidence improved if it was GT as the source of worries.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass
  2016-03-10 12:42       ` Chris Wilson
@ 2016-03-10 13:01         ` Ville Syrjälä
  0 siblings, 0 replies; 11+ messages in thread
From: Ville Syrjälä @ 2016-03-10 13:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Antti Koskipää, Tvrtko Ursulin,
	stable

On Thu, Mar 10, 2016 at 12:42:48PM +0000, Chris Wilson wrote:
> On Thu, Mar 10, 2016 at 02:24:39PM +0200, Ville Syrj�l� wrote:
> > On Thu, Mar 10, 2016 at 12:10:46PM +0000, Chris Wilson wrote:
> > > My testing only looks at the GT side, and we do stress that pretty hard
> > > because of execlists and have reasonable methods of detection if we stop
> > > processing execbuf. I'm more worried about the display and pipe interrupts.
> > 
> > IIRC GT was where the problem was originally.
> 
> Before execlists, the only source of GT interrupts would be
> user-interrupts. There the problem is usually not so much that we miss
> the GT interrupt, but that the seqno write is not completed by the time
> the interrupt is asserted. I hope that was the problem you saw.

IIRC those guys already had execlists in use with their Android stuff.
But we did have that whole snooping mess still unresolved at the time
so who knows what was really going on.

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-10 13:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-10 11:44 [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass Chris Wilson
2016-03-10 12:01 ` Ville Syrjälä
2016-03-10 12:10   ` Chris Wilson
2016-03-10 12:24     ` Ville Syrjälä
2016-03-10 12:42       ` Chris Wilson
2016-03-10 13:01         ` Ville Syrjälä
2016-03-10 12:01 ` [Intel-gfx] " Tvrtko Ursulin
2016-03-10 12:12 ` kbuild test robot
2016-03-10 12:18 ` [PATCH v2] " Chris Wilson
2016-03-10 12:25   ` Ville Syrjälä
2016-03-10 12:38   ` [Intel-gfx] " Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).