From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!"
/ [drm] IP block:sdma_v3_0 is hung!
Date: Tue, 21 Aug 2018 08:41:52 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0062121752=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 3FD556E293
for ; Tue, 21 Aug 2018 08:41:53 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0062121752==
Content-Type: multipart/alternative; boundary="15348409131.6Ebc.15406"
Content-Transfer-Encoding: 7bit
--15348409131.6Ebc.15406
Date: Tue, 21 Aug 2018 08:41:53 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D102322
--- Comment #54 from dwagner ---
(In reply to Andrey Grodzovsky from comment #53)
> Created attachment 141198 [details] [review]
> add_debug_info2.patch
>=20
> Try this patch instead, i might be missing some prints in the first one.
Can try that this evening.
> In the last log you attached I haven't seen any UMR dumps or GPU fault
> prints in dmesg. THe GPU fault has to be in the log to compare the faulty
> address against the debug prints in the patch.
In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
output at the time of the crash (238 seconds after the reboot):
----------------------------------------------
...
mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start:
driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
mpv/vo-897 [005] d... 235.191548: dma_fence_enable_signal:
driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled:
driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled:
driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
[ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D32624, emitted seq=3D32626
[ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
[ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
crash detected!
executing umr -O halt_waves -wa
No active waves!
executing umr -O verbose -R gfx[.]
polaris11.gfx.rptr =3D=3D 1792
polaris11.gfx.wptr =3D=3D 1792
polaris11.gfx.drv_wptr =3D=3D 1792
polaris11.gfx.ring[1761] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1762] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1763] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1764] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1765] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1766] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1767] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1768] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1769] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1770] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1771] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1772] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1773] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1774] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1775] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1776] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1777] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1778] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1779] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1780] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1781] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1782] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1783] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1784] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1785] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1786] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1787] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1788] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1789] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1790] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1791] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1792] =3D=3D 0xc0032200 rwD=20
trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
done after crash, flashing NUMLOCK LED.
amdgpu_cs:0-799 [001] .... 286.852838: amdgpu_bo_list_set:
list=3D0000000099c16b5c, bo=3D000000001771c26f, bo_size=3D131072
amdgpu_cs:0-799 [001] .... 286.852846: amdgpu_bo_list_set:
list=3D0000000099c16b5c, bo=3D0000000046bfd439, bo_size=3D131072
...
----------------------------------------------
But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages =
this
time. Sometimes such are emitted, sometimes not.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15348409131.6Ebc.15406
Date: Tue, 21 Aug 2018 08:41:53 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comme=
nt # 54
on bug 10232=
2
from dwagner
(In reply to Andrey Grodzovsky from comment #53)
> Created atta=
chment 141198 [details] [review] [re=
view]
> add_debug_info2.patch
>=20
> Try this patch instead, i might be missing some prints in the first on=
e.
Can try that this evening.
> In the last log you attached I haven't seen any =
UMR dumps or GPU fault
> prints in dmesg. THe GPU fault has to be in the log to compare the fau=
lty
> address against the debug prints in the patch.
In above attached file "xz-compressed output of gpu_debug3.sh" th=
ere is umr
output at the time of the crash (238 seconds after the reboot):
----------------------------------------------
...
mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start:
driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
mpv/vo-897 [005] d... 235.191548: dma_fence_enable_signal:
driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled:
driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled:
driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
[ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D32624, emitted seq=3D32626
[ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
[ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
crash detected!
executing umr -O halt_waves -wa
No active waves!
executing umr -O verbose -R gfx[.]
polaris11.gfx.rptr =3D=3D 1792
polaris11.gfx.wptr =3D=3D 1792
polaris11.gfx.drv_wptr =3D=3D 1792
polaris11.gfx.ring[1761] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1762] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1763] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1764] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1765] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1766] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1767] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1768] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1769] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1770] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1771] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1772] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1773] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1774] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1775] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1776] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1777] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1778] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1779] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1780] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1781] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1782] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1783] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1784] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1785] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1786] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1787] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1788] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1789] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1790] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1791] =3D=3D 0xffff1000 ...=20
polaris11.gfx.ring[1792] =3D=3D 0xc0032200 rwD=20
trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
done after crash, flashing NUMLOCK LED.
amdgpu_cs:0-799 [001] .... 286.852838: amdgpu_bo_list_set:
list=3D0000000099c16b5c, bo=3D000000001771c26f, bo_size=3D131072
amdgpu_cs:0-799 [001] .... 286.852846: amdgpu_bo_list_set:
list=3D0000000099c16b5c, bo=3D0000000046bfd439, bo_size=3D131072
...
----------------------------------------------
But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error=
messages this
time. Sometimes such are emitted, sometimes not.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15348409131.6Ebc.15406--
--===============0062121752==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==
--===============0062121752==--