From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 21 Aug 2018 21:16:52 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1052334486==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 7469789FE6 for ; Tue, 21 Aug 2018 21:16:52 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1052334486== Content-Type: multipart/alternative; boundary="15348862121.DdADd.15550" Content-Transfer-Encoding: 7bit --15348862121.DdADd.15550 Date: Tue, 21 Aug 2018 21:16:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #56 from dwagner --- (In reply to Andrey Grodzovsky from comment #55) > > In above attached file "xz-compressed output of gpu_debug3.sh" there is= umr > > output at the time of the crash (238 seconds after the reboot): > >=20 > > ---------------------------------------------- > > ... > > mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start: > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > > mpv/vo-897 [005] d... 235.191548: dma_fence_enable_signal: > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > > kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled: > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210 > > kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled: > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211 > > [ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > > timeout, signaled seq=3D32624, emitted seq=3D32626 > > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > >=20 > > crash detected! > >=20 > > executing umr -O halt_waves -wa > > No active waves! >=20 > Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happened= that > should have froze GPUs compute units and hence the above command would > produce a lot of wave info. Yes I did, as can be seen from the kernel command line at the very beginnin= g of the file I attached: [ 0.000000] Command line: BOOT_IMAGE=3D/vmlinuz-linux_amd root=3DUUID=3Db5d56e15-18f3-4783-af84-bbff3bbff3ef rw cryptdevice=3D/dev/nvme0n1p2:root:allow-discards libata.force=3D1.5 video= =3DDP-1:d video=3DDVI-D-1:d video=3DHDMI-A-1:1024x768 amdgpu.dc=3D1 amdgpu.vm_update_= mode=3D0 amdgpu.dpm=3D-1 amdgpu.ppfeaturemask=3D0xffffffff amdgpu.vm_fault_stop=3D2 amdgpu.vm_debug=3D1 Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicate a proced= ure that discards whatever has been in thoses "waves" before? If yes, could amdgpu.gpu_recovery=3D0 prevent that from happening? --=20 You are receiving this mail because: You are the assignee for the bug.= --15348862121.DdADd.15550 Date: Tue, 21 Aug 2018 21:16:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 56 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #55)
> > In above attached file "xz-compressed =
output of gpu_debug3.sh" there is umr
> > output at the time of the crash (238 seconds after the reboot):
> >=20
> > ----------------------------------------------
> > ...
> >           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_st=
art:
> > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
> >           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_=
signal:
> > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
> >      kworker/0:2-92    [000] ....   238.275988: dma_fence_signale=
d:
> > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
> >      kworker/0:2-92    [000] ....   238.276004: dma_fence_signale=
d:
> > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
> > [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sd=
ma0
> > timeout, signaled seq=3D32624, emitted seq=3D32626
> > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> >=20
> > crash detected!
> >=20
> > executing umr -O halt_waves -wa
> > No active waves!
>=20
> Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happe=
ned that
> should have froze GPUs compute units and hence the above command would
> produce a lot of wave info.

Yes I did, as can be seen from the kernel command line at the very beginnin=
g of
the file I attached:
[    0.000000] Command line: BOOT_IMAGE=3D/vmlinuz-linux_amd
root=3DUUID=3Db5d56e15-18f3-4783-af84-bbff3bbff3ef rw
cryptdevice=3D/dev/nvme0n1p2:root:allow-discards libata.force=3D1.5 video=
=3DDP-1:d
video=3DDVI-D-1:d video=3DHDMI-A-1:1024x768 amdgpu.dc=3D1 amdgpu.vm_update_=
mode=3D0
amdgpu.dpm=3D-1 amdgpu.ppfeaturemask=3D0xffffffff amdgpu.vm_fault_stop=3D2
amdgpu.vm_debug=3D1

Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicat=
e a procedure
that discards whatever has been in thoses "waves" before? If yes,=
 could
amdgpu.gpu_recovery=3D0 prevent that from happening?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348862121.DdADd.15550-- --===============1052334486== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1052334486==--