dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
Date: Tue, 21 Aug 2018 14:43:24 +0000	[thread overview]
Message-ID: <bug-102322-502-lJbfjD64rn@http.bugs.freedesktop.org/> (raw)
In-Reply-To: <bug-102322-502@http.bugs.freedesktop.org/>


[-- Attachment #1.1: Type: text/plain, Size: 4321 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #55 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #54)
> (In reply to Andrey Grodzovsky from comment #53)
> > Created attachment 141198 [details] [review] [review]
> > add_debug_info2.patch
> > 
> > Try this patch instead, i might be missing some prints in the first one.
> 
> Can try that this evening.
> 
> > In the last log you attached I haven't seen any UMR dumps or GPU fault
> > prints in dmesg. THe GPU fault has to be in the log to compare the faulty
> > address against the debug prints in the patch.
> 
> In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
> output at the time of the crash (238 seconds after the reboot):
> 
> ----------------------------------------------
> ...
>           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
> driver=drm_sched timeline=gfx context=162 seqno=87
>           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
> driver=drm_sched timeline=gfx context=162 seqno=87
>      kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
> driver=amdgpu timeline=sdma1 context=11 seqno=210
>      kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
> driver=amdgpu timeline=sdma1 context=11 seqno=211
> [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, signaled seq=32624, emitted seq=32626
> [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> 
> crash detected!
> 
> executing umr -O halt_waves -wa
> No active waves!

Did you use amdgpu.vm_fault_stop=2 parameter ? In case a fault happened that
should have froze GPUs compute units and hence the above command would produce
a lot of wave info.

> 
> 
> executing umr -O verbose -R gfx[.]
> 
> polaris11.gfx.rptr == 1792
> polaris11.gfx.wptr == 1792
> polaris11.gfx.drv_wptr == 1792
> polaris11.gfx.ring[1761] == 0xffff1000    ... 
> polaris11.gfx.ring[1762] == 0xffff1000    ... 
> polaris11.gfx.ring[1763] == 0xffff1000    ... 
> polaris11.gfx.ring[1764] == 0xffff1000    ... 
> polaris11.gfx.ring[1765] == 0xffff1000    ... 
> polaris11.gfx.ring[1766] == 0xffff1000    ... 
> polaris11.gfx.ring[1767] == 0xffff1000    ... 
> polaris11.gfx.ring[1768] == 0xffff1000    ... 
> polaris11.gfx.ring[1769] == 0xffff1000    ... 
> polaris11.gfx.ring[1770] == 0xffff1000    ... 
> polaris11.gfx.ring[1771] == 0xffff1000    ... 
> polaris11.gfx.ring[1772] == 0xffff1000    ... 
> polaris11.gfx.ring[1773] == 0xffff1000    ... 
> polaris11.gfx.ring[1774] == 0xffff1000    ... 
> polaris11.gfx.ring[1775] == 0xffff1000    ... 
> polaris11.gfx.ring[1776] == 0xffff1000    ... 
> polaris11.gfx.ring[1777] == 0xffff1000    ... 
> polaris11.gfx.ring[1778] == 0xffff1000    ... 
> polaris11.gfx.ring[1779] == 0xffff1000    ... 
> polaris11.gfx.ring[1780] == 0xffff1000    ... 
> polaris11.gfx.ring[1781] == 0xffff1000    ... 
> polaris11.gfx.ring[1782] == 0xffff1000    ... 
> polaris11.gfx.ring[1783] == 0xffff1000    ... 
> polaris11.gfx.ring[1784] == 0xffff1000    ... 
> polaris11.gfx.ring[1785] == 0xffff1000    ... 
> polaris11.gfx.ring[1786] == 0xffff1000    ... 
> polaris11.gfx.ring[1787] == 0xffff1000    ... 
> polaris11.gfx.ring[1788] == 0xffff1000    ... 
> polaris11.gfx.ring[1789] == 0xffff1000    ... 
> polaris11.gfx.ring[1790] == 0xffff1000    ... 
> polaris11.gfx.ring[1791] == 0xffff1000    ... 
> polaris11.gfx.ring[1792] == 0xc0032200    rwD 
> 
> trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
> trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
> 
> done after crash, flashing NUMLOCK LED.
>      amdgpu_cs:0-799   [001] ....   286.852838: amdgpu_bo_list_set:
> list=0000000099c16b5c, bo=000000001771c26f, bo_size=131072
>      amdgpu_cs:0-799   [001] ....   286.852846: amdgpu_bo_list_set:
> list=0000000099c16b5c, bo=0000000046bfd439, bo_size=131072
> ...
> ----------------------------------------------
> 
> But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages
> this time. Sometimes such are emitted, sometimes not.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5943 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2018-08-21 14:43 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
2017-11-19 16:40 ` bugzilla-daemon
2018-02-24 18:36 ` bugzilla-daemon
2018-06-03 21:00 ` bugzilla-daemon
2018-06-03 21:02 ` bugzilla-daemon
2018-06-25 21:43 ` bugzilla-daemon
2018-06-25 22:11 ` bugzilla-daemon
2018-06-25 23:08 ` bugzilla-daemon
2018-06-26 15:20 ` bugzilla-daemon
2018-06-26 15:21 ` bugzilla-daemon
2018-06-26 22:52 ` bugzilla-daemon
2018-06-27  7:48 ` bugzilla-daemon
2018-06-27 13:53 ` bugzilla-daemon
2018-06-27 23:15 ` bugzilla-daemon
2018-06-28  2:17 ` bugzilla-daemon
2018-06-28  4:17 ` bugzilla-daemon
2018-06-28  4:36 ` bugzilla-daemon
2018-06-28 10:33 ` bugzilla-daemon
2018-06-28 19:56 ` bugzilla-daemon
2018-06-28 21:09 ` bugzilla-daemon
2018-06-28 22:56 ` bugzilla-daemon
2018-06-28 22:57 ` bugzilla-daemon
2018-06-29  0:10 ` bugzilla-daemon
2018-07-04 23:03 ` bugzilla-daemon
2018-07-05 13:59 ` bugzilla-daemon
2018-07-05 23:32 ` bugzilla-daemon
2018-07-06 23:20 ` bugzilla-daemon
2018-07-07  8:36 ` bugzilla-daemon
2018-07-07 20:08 ` bugzilla-daemon
2018-07-09 14:34 ` bugzilla-daemon
2018-07-11 22:32 ` bugzilla-daemon
2018-07-15  8:56 ` bugzilla-daemon
2018-07-15  9:03 ` bugzilla-daemon
2018-07-15  9:07 ` bugzilla-daemon
2018-07-15 19:59 ` bugzilla-daemon
2018-07-16 14:06 ` bugzilla-daemon
2018-07-29 10:02 ` bugzilla-daemon
2018-08-08 23:07 ` bugzilla-daemon
2018-08-09 20:56 ` bugzilla-daemon
2018-08-14 21:27 ` bugzilla-daemon
2018-08-15 14:24 ` bugzilla-daemon
2018-08-15 22:03 ` bugzilla-daemon
2018-08-16 21:53 ` bugzilla-daemon
2018-08-16 21:55 ` bugzilla-daemon
2018-08-16 21:56 ` bugzilla-daemon
2018-08-16 21:57 ` bugzilla-daemon
2018-08-16 22:31 ` bugzilla-daemon
2018-08-17 21:25 ` bugzilla-daemon
2018-08-18 21:36 ` bugzilla-daemon
2018-08-18 21:37 ` bugzilla-daemon
2018-08-18 21:38 ` bugzilla-daemon
2018-08-18 21:40 ` bugzilla-daemon
2018-08-18 21:43 ` bugzilla-daemon
2018-08-20 14:16 ` bugzilla-daemon
2018-08-21  8:41 ` bugzilla-daemon
2018-08-21 14:43 ` bugzilla-daemon [this message]
2018-08-21 21:16 ` bugzilla-daemon
2018-08-21 21:29 ` bugzilla-daemon
2018-08-22  0:24 ` bugzilla-daemon
2018-08-22  0:26 ` bugzilla-daemon
2018-08-22 14:33 ` bugzilla-daemon
2018-08-22 22:18 ` bugzilla-daemon
2018-08-22 22:18 ` bugzilla-daemon
2018-09-19 23:35 ` bugzilla-daemon
2018-09-19 23:35 ` bugzilla-daemon
2018-09-23 22:04 ` bugzilla-daemon
2018-09-23 23:42 ` bugzilla-daemon
2018-09-25 12:11 ` bugzilla-daemon
2018-11-14  0:23 ` bugzilla-daemon
2018-11-15 23:37 ` bugzilla-daemon
2018-11-15 23:38 ` bugzilla-daemon
2018-11-15 23:39 ` bugzilla-daemon
2018-12-17 22:56 ` bugzilla-daemon
2018-12-22 20:41 ` bugzilla-daemon
2018-12-24 12:56 ` bugzilla-daemon
2018-12-24 14:49 ` bugzilla-daemon
2019-01-19 17:01 ` bugzilla-daemon
2019-02-16 15:06 ` bugzilla-daemon
2019-04-11  6:40 ` bugzilla-daemon
2019-04-12 22:11 ` bugzilla-daemon
2019-04-12 23:00 ` bugzilla-daemon
2019-04-13 13:27 ` bugzilla-daemon
2019-06-03 20:03 ` bugzilla-daemon
2019-07-08  7:51 ` bugzilla-daemon
2019-07-09  7:38 ` bugzilla-daemon
2019-07-09 21:50 ` bugzilla-daemon
2019-09-07  5:42 ` bugzilla-daemon
2019-09-12 23:09 ` bugzilla-daemon
2019-09-25 21:37 ` bugzilla-daemon
2019-09-26  8:35 ` bugzilla-daemon
2019-09-26 12:29 ` bugzilla-daemon
2019-11-19  8:22 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102322-502-lJbfjD64rn@http.bugs.freedesktop.org/ \
    --to=bugzilla-daemon@freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).