intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] e1000e: Taint a HW lockup
@ 2017-12-05 18:00 Chris Wilson
  2017-12-05 18:05 ` Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2017-12-05 18:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomi Sarvela, Daniel Vetter

When we see an e1000e HW lockup in CI, it is typically fatal with the
hang repeating until the host is forcibly rebooted. Speed up that
process by tainting the kernel, which CI can trivially detect (and is
being used to detect similarly fatal CI conditions) and reboot soon
after.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9f18d39bdc8f..bcc4b226a184 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1170,6 +1170,8 @@ static void e1000_print_hw_hang(struct work_struct *work)
 	/* Suggest workaround for known h/w issue */
 	if ((hw->mac.type == e1000_pchlan) && (er32(CTRL) & E1000_CTRL_TFCE))
 		e_err("Try turning off Tx pause (flow control) via ethtool\n");
+
+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
 
 /**
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] e1000e: Taint a HW lockup
  2017-12-05 18:00 [PATCH] e1000e: Taint a HW lockup Chris Wilson
@ 2017-12-05 18:05 ` Chris Wilson
  2017-12-06  9:47   ` Daniel Vetter
  2017-12-05 18:52 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-12-05 21:13 ` ✓ Fi.CI.IGT: " Patchwork
  2 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2017-12-05 18:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomi Sarvela, Daniel Vetter

Quoting Chris Wilson (2017-12-05 18:00:00)
> When we see an e1000e HW lockup in CI, it is typically fatal with the
> hang repeating until the host is forcibly rebooted. Speed up that
> process by tainting the kernel, which CI can trivially detect (and is
> being used to detect similarly fatal CI conditions) and reboot soon
> after.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>

I'm not concerned on selling this to e1000e, but if it helps improving
CI robustness, then topic/core-for-CI. Or maybe we should create a new
topic, Daniel? topic/taints-for-CI?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.BAT: success for e1000e: Taint a HW lockup
  2017-12-05 18:00 [PATCH] e1000e: Taint a HW lockup Chris Wilson
  2017-12-05 18:05 ` Chris Wilson
@ 2017-12-05 18:52 ` Patchwork
  2017-12-05 21:13 ` ✓ Fi.CI.IGT: " Patchwork
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2017-12-05 18:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: e1000e: Taint a HW lockup
URL   : https://patchwork.freedesktop.org/series/34931/
State : success

== Summary ==

Series 34931v1 e1000e: Taint a HW lockup
https://patchwork.freedesktop.org/api/1.0/series/34931/revisions/1/mbox/

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:442s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:385s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:515s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:281s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:504s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:504s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:484s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:472s
fi-elk-e7500     total:224  pass:163  dwarn:15  dfail:0   fail:0   skip:45 
fi-gdg-551       total:288  pass:178  dwarn:1   dfail:0   fail:1   skip:108 time:267s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:539s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:358s
fi-hsw-4770r     total:288  pass:224  dwarn:0   dfail:0   fail:0   skip:64  time:261s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:485s
fi-ivb-3770      total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:445s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:530s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:475s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:537s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:586s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:456s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:541s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:572s
fi-skl-6700k     total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:515s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:498s
fi-snb-2520m     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:548s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:411s
Blacklisted hosts:
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:600s
fi-cnl-y         total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:616s
fi-glk-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:489s

0d0fe916f52ad8f05dddab384ae7c90bb62ebac4 drm-tip: 2017y-12m-05d-14h-52m-17s UTC integration manifest
f0ee3df4e66c e1000e: Taint a HW lockup

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7417/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.IGT: success for e1000e: Taint a HW lockup
  2017-12-05 18:00 [PATCH] e1000e: Taint a HW lockup Chris Wilson
  2017-12-05 18:05 ` Chris Wilson
  2017-12-05 18:52 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-12-05 21:13 ` Patchwork
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2017-12-05 21:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: e1000e: Taint a HW lockup
URL   : https://patchwork.freedesktop.org/series/34931/
State : success

== Summary ==

Test kms_flip:
        Subgroup vblank-vs-modeset-suspend:
                pass       -> SKIP       (shard-snb) fdo#102365
        Subgroup modeset-vs-vblank-race-interruptible:
                pass       -> FAIL       (shard-hsw) fdo#103060
        Subgroup vblank-vs-modeset-suspend-interruptible:
                skip       -> PASS       (shard-snb)
Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-offscren-pri-shrfb-draw-blt:
                fail       -> PASS       (shard-snb) fdo#101623
        Subgroup fbc-rgb101010-draw-render:
                skip       -> PASS       (shard-snb) fdo#103167 +1
Test drv_module_reload:
        Subgroup basic-no-display:
                dmesg-warn -> PASS       (shard-hsw) fdo#102707
Test kms_chv_cursor_fail:
        Subgroup pipe-b-128x128-top-edge:
                incomplete -> PASS       (shard-hsw)
Test prime_mmap_kms:
        Subgroup buffer-sharing:
                skip       -> PASS       (shard-snb)

fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707

shard-hsw        total:2679 pass:1535 dwarn:1   dfail:0   fail:11  skip:1132 time:9438s
shard-snb        total:2679 pass:1306 dwarn:2   dfail:0   fail:11  skip:1360 time:8041s
Blacklisted hosts:
shard-apl        total:2636 pass:1636 dwarn:0   dfail:0   fail:23  skip:977 time:13356s
shard-kbl        total:2545 pass:1694 dwarn:5   dfail:1   fail:22  skip:822 time:10261s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7417/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] e1000e: Taint a HW lockup
  2017-12-05 18:05 ` Chris Wilson
@ 2017-12-06  9:47   ` Daniel Vetter
  2017-12-06 19:27     ` Jeff Kirsher
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2017-12-06  9:47 UTC (permalink / raw)
  To: Chris Wilson, Saarinen, Jani, Jeff Kirsher, intel-wired-lan
  Cc: Tomi Sarvela, intel-gfx

On Tue, Dec 5, 2017 at 7:05 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Chris Wilson (2017-12-05 18:00:00)
>> When we see an e1000e HW lockup in CI, it is typically fatal with the
>> hang repeating until the host is forcibly rebooted. Speed up that
>> process by tainting the kernel, which CI can trivially detect (and is
>> being used to detect similarly fatal CI conditions) and reboot soon
>> after.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
>
> I'm not concerned on selling this to e1000e, but if it helps improving
> CI robustness, then topic/core-for-CI. Or maybe we should create a new
> topic, Daniel? topic/taints-for-CI?

Sounds like a usable idea for CI. Would be especially interesting
because despite applying the suggested w/a, we still hit lockups.
Before we do that though I think we should get an ack from the e1000e
team. Jani S. maybe something you can driver?

Adding more folks to cc.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] e1000e: Taint a HW lockup
  2017-12-06  9:47   ` Daniel Vetter
@ 2017-12-06 19:27     ` Jeff Kirsher
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Kirsher @ 2017-12-06 19:27 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, Saarinen, Jani, intel-wired-lan
  Cc: Tomi Sarvela, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1386 bytes --]

On Wed, 2017-12-06 at 10:47 +0100, Daniel Vetter wrote:
> On Tue, Dec 5, 2017 at 7:05 PM, Chris Wilson <chris@chris-wilson.co.u
> k> wrote:
> > Quoting Chris Wilson (2017-12-05 18:00:00)
> > > When we see an e1000e HW lockup in CI, it is typically fatal with
> > > the
> > > hang repeating until the host is forcibly rebooted. Speed up that
> > > process by tainting the kernel, which CI can trivially detect
> > > (and is
> > > being used to detect similarly fatal CI conditions) and reboot
> > > soon
> > > after.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
> > 
> > I'm not concerned on selling this to e1000e, but if it helps
> > improving
> > CI robustness, then topic/core-for-CI. Or maybe we should create a
> > new
> > topic, Daniel? topic/taints-for-CI?
> 
> Sounds like a usable idea for CI. Would be especially interesting
> because despite applying the suggested w/a, we still hit lockups.
> Before we do that though I think we should get an ack from the e1000e
> team. Jani S. maybe something you can driver?
> 
> Adding more folks to cc.
> -Daniel

Please send any e1000e patches to the intel-wired-lan mailing list and
make sure to CC Sasha Neftin <sasha.neftin@intel.com>, since he is the
e1000e driver maintainer.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-12-06 19:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-05 18:00 [PATCH] e1000e: Taint a HW lockup Chris Wilson
2017-12-05 18:05 ` Chris Wilson
2017-12-06  9:47   ` Daniel Vetter
2017-12-06 19:27     ` Jeff Kirsher
2017-12-05 18:52 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-12-05 21:13 ` ✓ Fi.CI.IGT: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).