From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107154] [drm] GPU recovery disabled. Date: Sun, 08 Jul 2018 09:24:30 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1650793362==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 8E0A66E24E for ; Sun, 8 Jul 2018 09:24:33 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1650793362== Content-Type: multipart/alternative; boundary="15310418730.A4861cb3b.9874" Content-Transfer-Encoding: 7bit --15310418730.A4861cb3b.9874 Date: Sun, 8 Jul 2018 09:24:33 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107154 Bug ID: 107154 Summary: [drm] GPU recovery disabled. Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: freedesktop.org@nentwig.biz Hi! This is a surprisingly long standing problem with a RX 460, more precisely since 4.15 all the way up to 4.18 AMD staging DRM next [1].=20 After resuming from sleep (echo -n mem > /sys/power/state) amdgpu is dead (always, reliably). Here's what dmesg has to say about it: [Sun Jul 8 11:01:17 2018] PM: suspend exit [Sun Jul 8 11:01:19 2018] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amd= gpu: IB test timed out. [Sun Jul 8 11:01:19 2018] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdg= pu: failed testing IB on GFX ring (-110). [Sun Jul 8 11:01:19 2018] [drm:process_one_work] *ERROR* ib ring test fail= ed (-110). [Sun Jul 8 11:01:28 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring = gfx timeout, last signaled seq=3D864, last emitted seq=3D868 [Sun Jul 8 11:01:28 2018] [drm] GPU recovery disabled. >>From ealier versions: [ 42.802559] PM: suspend exit [ 42.824332] amdgpu 0000:41:00.0: GPU fault detected: 147 0x0bd84802 [ 42.824338] amdgpu 0000:41:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR=20= =20 0x0034F97B [ 42.824341] amdgpu 0000:41:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002 [ 42.824345] amdgpu 0000:41:00.0: VM fault (0x02, vmid 6) at page 3471739, read from 'TC0' (0x54433000) (72) [ 52.956306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D1287, last emitted seq=3D1289 [ 52.956316] [drm] IP block:gfx_v8_0 is hung! [ 52.956362] [drm] GPU recovery disabled. I've also seen fault 146 but other than that it mostly looks the same. 4.14= -lts (with dc=3D0) works fine. RX 460, Zenith Extreme, 1950x. [1] arch linux AUR; this versioning is a bit confusing, it may actually alr= eady be the 4.19 branch, latest commit is3838e387fd1eb17bfcf6ff7d443d931adb5cb41b --=20 You are receiving this mail because: You are the assignee for the bug.= --15310418730.A4861cb3b.9874 Date: Sun, 8 Jul 2018 09:24:33 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 107154
Summary [drm] GPU recovery disabled.
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter freedesktop.org@nentwig.biz

Hi!

This is a surprisingly long standing problem with a RX 460, more precisely
since 4.15 all the way up to 4.18 AMD staging DRM next [1].=20
After resuming from sleep (echo -n mem > /sys/power/state) amdgpu is dead
(always, reliably).
Here's what dmesg has to say about it:

[Sun Jul  8 11:01:17 2018] PM: suspend exit
[Sun Jul  8 11:01:19 2018] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amd=
gpu:
IB test timed out.
[Sun Jul  8 11:01:19 2018] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdg=
pu:
failed testing IB on GFX ring (-110).
[Sun Jul  8 11:01:19 2018] [drm:process_one_work] *ERROR* ib ring test fail=
ed
(-110).
[Sun Jul  8 11:01:28 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring =
gfx
timeout, last signaled seq=3D864, last emitted seq=3D868
[Sun Jul  8 11:01:28 2018] [drm] GPU recovery disabled.

>>From ealier versions:

[   42.802559] PM: suspend exit
[   42.824332] amdgpu 0000:41:00.0: GPU fault detected: 147 0x0bd84802
[   42.824338] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x0034F97B
[   42.824341] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0C048002
[   42.824345] amdgpu 0000:41:00.0: VM fault (0x02, vmid 6) at page 3471739,
read from 'TC0' (0x54433000) (72)
[   52.956306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=3D1287, last emitted seq=3D1289
[   52.956316] [drm] IP block:gfx_v8_0 is hung!
[   52.956362] [drm] GPU recovery disabled.

I've also seen fault 146 but other than that it mostly looks the same. 4.14=
-lts
(with dc=3D0) works fine.

RX 460, Zenith Extreme, 1950x.

[1] arch linux AUR; this versioning is a bit confusing, it may actually alr=
eady
be the 4.19 branch, latest commit is3838e387fd1eb17bfcf6ff7d443d931adb5cb41=
b


You are receiving this mail because:
  • You are the assignee for the bug.
= --15310418730.A4861cb3b.9874-- --===============1650793362== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1650793362==--