All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell
@ 2017-04-13 11:15 Mika Kuoppala
  2017-04-13 11:37 ` Chris Wilson
  2017-04-13 11:39 ` ✗ Fi.CI.BAT: failure for " Patchwork
  0 siblings, 2 replies; 8+ messages in thread
From: Mika Kuoppala @ 2017-04-13 11:15 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala, Chris Wilson, stable

Previously with commit a9c1f90c8e17
("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain,
seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed
to be unmasked for IVB and SNB in order to prevent system hang
with chained batchbuffers.

Our CI was seeing incomplete results with tests that used
chained batches and it was found out that HSW needs to have this
same bit unmasked to reliably survive chained batches.

Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to
prevent system hang with batch chaining.

Testcase: igt/gem_exec_fence/nb-await-default
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d9d1969..fd97fe0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4252,12 +4252,12 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 	dev_priv->rps.pm_intrmsk_mbz = 0;
 
 	/*
-	 * SNB,IVB can while VLV,CHV may hard hang on looping batchbuffer
+	 * SNB,IVB,HSW can while VLV,CHV may hard hang on looping batchbuffer
 	 * if GEN6_PM_UP_EI_EXPIRED is masked.
 	 *
 	 * TODO: verify if this can be reproduced on VLV,CHV.
 	 */
-	if (INTEL_INFO(dev_priv)->gen <= 7 && !IS_HASWELL(dev_priv))
+	if (INTEL_INFO(dev_priv)->gen <= 7)
 		dev_priv->rps.pm_intrmsk_mbz |= GEN6_PM_RP_UP_EI_EXPIRED;
 
 	if (INTEL_INFO(dev_priv)->gen >= 8)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell
  2017-04-13 11:15 [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell Mika Kuoppala
@ 2017-04-13 11:37 ` Chris Wilson
  2017-04-13 11:58     ` Mika Kuoppala
  2017-04-18 12:26   ` Mika Kuoppala
  2017-04-13 11:39 ` ✗ Fi.CI.BAT: failure for " Patchwork
  1 sibling, 2 replies; 8+ messages in thread
From: Chris Wilson @ 2017-04-13 11:37 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, stable

On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote:
> Previously with commit a9c1f90c8e17
> ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain,
> seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed
> to be unmasked for IVB and SNB in order to prevent system hang
> with chained batchbuffers.
> 
> Our CI was seeing incomplete results with tests that used
> chained batches and it was found out that HSW needs to have this
> same bit unmasked to reliably survive chained batches.
> 
> Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to
> prevent system hang with batch chaining.
> 
> Testcase: igt/gem_exec_fence/nb-await-default
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

* facepalm.

I am amazed that took so long for us to notice.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>

Did we ever get a w/a identifier for this?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Fix system hang with EI UP masked on Haswell
  2017-04-13 11:15 [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell Mika Kuoppala
  2017-04-13 11:37 ` Chris Wilson
@ 2017-04-13 11:39 ` Patchwork
  2017-04-13 11:51   ` Chris Wilson
  2017-04-18 12:06   ` Mika Kuoppala
  1 sibling, 2 replies; 8+ messages in thread
From: Patchwork @ 2017-04-13 11:39 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Fix system hang with EI UP masked on Haswell
URL   : https://patchwork.freedesktop.org/series/22991/
State : failure

== Summary ==

Series 22991v1 drm/i915: Fix system hang with EI UP masked on Haswell
https://patchwork.freedesktop.org/api/1.0/series/22991/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> FAIL       (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-kbl-7560u) fdo#100125
Test kms_cursor_legacy:
        Subgroup basic-flip-before-cursor-varying-size:
                pass       -> INCOMPLETE (fi-bxt-t5700)

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:434s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:429s
fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:580s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:507s
fi-bxt-t5700     total:206  pass:192  dwarn:0   dfail:0   fail:0   skip:13 
fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:487s
fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:482s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:414s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:405s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:418s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:488s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:463s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:455s
fi-kbl-7560u     total:278  pass:267  dwarn:1   dfail:0   fail:0   skip:10  time:565s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:459s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:574s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:460s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:491s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:441s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:533s
fi-snb-2600      total:278  pass:248  dwarn:0   dfail:0   fail:1   skip:29  time:402s

6184edce6665aee9c9131149a7b9314a1313eaf9 drm-tip: 2017y-04m-13d-08h-27m-10s UTC integration manifest
aee691a drm/i915: Fix system hang with EI UP masked on Haswell

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4501/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for drm/i915: Fix system hang with EI UP masked on Haswell
  2017-04-13 11:39 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2017-04-13 11:51   ` Chris Wilson
  2017-04-18 12:06   ` Mika Kuoppala
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2017-04-13 11:51 UTC (permalink / raw)
  To: intel-gfx

On Thu, Apr 13, 2017 at 11:39:40AM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Fix system hang with EI UP masked on Haswell
> URL   : https://patchwork.freedesktop.org/series/22991/
> State : failure
> 
> == Summary ==
> 
> Series 22991v1 drm/i915: Fix system hang with EI UP masked on Haswell
> https://patchwork.freedesktop.org/api/1.0/series/22991/revisions/1/mbox/
> 
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 pass       -> FAIL       (fi-snb-2600) fdo#100007
> Test gem_exec_suspend:
>         Subgroup basic-s4-devices:
>                 pass       -> DMESG-WARN (fi-kbl-7560u) fdo#100125
> Test kms_cursor_legacy:
>         Subgroup basic-flip-before-cursor-varying-size:
>                 pass       -> INCOMPLETE (fi-bxt-t5700)
> 
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

It passes the irony test ;)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell
  2017-04-13 11:37 ` Chris Wilson
@ 2017-04-13 11:58     ` Mika Kuoppala
  2017-04-18 12:26   ` Mika Kuoppala
  1 sibling, 0 replies; 8+ messages in thread
From: Mika Kuoppala @ 2017-04-13 11:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote:
>> Previously with commit a9c1f90c8e17
>> ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain,
>> seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed
>> to be unmasked for IVB and SNB in order to prevent system hang
>> with chained batchbuffers.
>> 
>> Our CI was seeing incomplete results with tests that used
>> chained batches and it was found out that HSW needs to have this
>> same bit unmasked to reliably survive chained batches.
>> 
>> Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to
>> prevent system hang with batch chaining.
>> 
>> Testcase: igt/gem_exec_fence/nb-await-default
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>
> * facepalm.
>
> I am amazed that took so long for us to notice.

It could be that we don't have chained so much in CI.
Also it seems to be more subtle than with IVB. With
spin batch it didnt surface but with nb-await-default
the store/spin and possibly(?) the cpu side sleep
lured it out.

> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Thanks.

>
> Did we ever get a w/a identifier for this?

Not that I know of. And in retrospect excluding
hsw was not wise in the original patch. It was v3
where it was excluded but I didn't find the trail that
lead there. Trusting it not to inherit the peculiarities...

I like to think that we tested and it never hung with
straight up busy chaining. nb-await-default is
more sophisticated.

-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell
@ 2017-04-13 11:58     ` Mika Kuoppala
  0 siblings, 0 replies; 8+ messages in thread
From: Mika Kuoppala @ 2017-04-13 11:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote:
>> Previously with commit a9c1f90c8e17
>> ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain,
>> seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed
>> to be unmasked for IVB and SNB in order to prevent system hang
>> with chained batchbuffers.
>> 
>> Our CI was seeing incomplete results with tests that used
>> chained batches and it was found out that HSW needs to have this
>> same bit unmasked to reliably survive chained batches.
>> 
>> Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to
>> prevent system hang with batch chaining.
>> 
>> Testcase: igt/gem_exec_fence/nb-await-default
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>
> * facepalm.
>
> I am amazed that took so long for us to notice.

It could be that we don't have chained so much in CI.
Also it seems to be more subtle than with IVB. With
spin batch it didnt surface but with nb-await-default
the store/spin and possibly(?) the cpu side sleep
lured it out.

> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Thanks.

>
> Did we ever get a w/a identifier for this?

Not that I know of. And in retrospect excluding
hsw was not wise in the original patch. It was v3
where it was excluded but I didn't find the trail that
lead there. Trusting it not to inherit the peculiarities...

I like to think that we tested and it never hung with
straight up busy chaining. nb-await-default is
more sophisticated.

-Mika

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for drm/i915: Fix system hang with EI UP masked on Haswell
  2017-04-13 11:39 ` ✗ Fi.CI.BAT: failure for " Patchwork
  2017-04-13 11:51   ` Chris Wilson
@ 2017-04-18 12:06   ` Mika Kuoppala
  1 sibling, 0 replies; 8+ messages in thread
From: Mika Kuoppala @ 2017-04-18 12:06 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Patchwork <patchwork@emeril.freedesktop.org> writes:

> == Series Details ==
>
> Series: drm/i915: Fix system hang with EI UP masked on Haswell
> URL   : https://patchwork.freedesktop.org/series/22991/
> State : failure
>
> == Summary ==
>
> Series 22991v1 drm/i915: Fix system hang with EI UP masked on Haswell
> https://patchwork.freedesktop.org/api/1.0/series/22991/revisions/1/mbox/
>
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 pass       -> FAIL       (fi-snb-2600) fdo#100007
> Test gem_exec_suspend:
>         Subgroup basic-s4-devices:
>                 pass       -> DMESG-WARN (fi-kbl-7560u) fdo#100125
> Test kms_cursor_legacy:
>         Subgroup basic-flip-before-cursor-varying-size:
>                 pass       -> INCOMPLETE (fi-bxt-t5700)

Patch is not affecting BXT so it has to be bug in drm-tip:
https://bugs.freedesktop.org/show_bug.cgi?id=100706

-Mika


>
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
>
> fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:434s
> fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:429s
> fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:580s
> fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:507s
> fi-bxt-t5700     total:206  pass:192  dwarn:0   dfail:0   fail:0   skip:13 
> fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:487s
> fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:482s
> fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:414s
> fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:405s
> fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:418s
> fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:488s
> fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:463s
> fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:455s
> fi-kbl-7560u     total:278  pass:267  dwarn:1   dfail:0   fail:0   skip:10  time:565s
> fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:459s
> fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:574s
> fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:460s
> fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:491s
> fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:441s
> fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:533s
> fi-snb-2600      total:278  pass:248  dwarn:0   dfail:0   fail:1   skip:29  time:402s
>
> 6184edce6665aee9c9131149a7b9314a1313eaf9 drm-tip: 2017y-04m-13d-08h-27m-10s UTC integration manifest
> aee691a drm/i915: Fix system hang with EI UP masked on Haswell
>
> == Logs ==
>
> For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4501/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell
  2017-04-13 11:37 ` Chris Wilson
  2017-04-13 11:58     ` Mika Kuoppala
@ 2017-04-18 12:26   ` Mika Kuoppala
  1 sibling, 0 replies; 8+ messages in thread
From: Mika Kuoppala @ 2017-04-18 12:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote:
>> Previously with commit a9c1f90c8e17
>> ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain,
>> seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed
>> to be unmasked for IVB and SNB in order to prevent system hang
>> with chained batchbuffers.
>> 
>> Our CI was seeing incomplete results with tests that used
>> chained batches and it was found out that HSW needs to have this
>> same bit unmasked to reliably survive chained batches.
>> 
>> Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to
>> prevent system hang with batch chaining.
>> 
>> Testcase: igt/gem_exec_fence/nb-await-default
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>
> * facepalm.
>
> I am amazed that took so long for us to notice.
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
>

Pushed to drm-intel-next-queued. Thanks.
-Mika

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-18 12:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-13 11:15 [PATCH] drm/i915: Fix system hang with EI UP masked on Haswell Mika Kuoppala
2017-04-13 11:37 ` Chris Wilson
2017-04-13 11:58   ` Mika Kuoppala
2017-04-13 11:58     ` Mika Kuoppala
2017-04-18 12:26   ` Mika Kuoppala
2017-04-13 11:39 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-04-13 11:51   ` Chris Wilson
2017-04-18 12:06   ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.