From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107572] Unrecoverable GPU hang with IP block:gfx_v8_0 is hung Date: Tue, 14 Aug 2018 23:45:33 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1253971646==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 23E076E02A for ; Tue, 14 Aug 2018 23:45:33 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1253971646== Content-Type: multipart/alternative; boundary="15342903330.FC2aC9D.32560" Content-Transfer-Encoding: 7bit --15342903330.FC2aC9D.32560 Date: Tue, 14 Aug 2018 23:45:33 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107572 Bug ID: 107572 Summary: Unrecoverable GPU hang with IP block:gfx_v8_0 is hung Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: madcatx@atlas.cz Hello, I have been experiencing a worrying amount of these ever since I got my RX = 570 a few months ago. I can reproduce the hang quite reliably by with some 3D workloads, for instance the Unigine Superposition run on High quality or Witcher 3 (through WINE) crash the GPU quite reliably within minutes. Once that happens I can always SSH into the machine and try to get at least some debugging information. Unfortunately, there does not seem to be much t= o go on. dmesg does not tell me more than this: [ 254.704581] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D103742, last emitted seq=3D103745 [ 254.704586] [drm] IP block:gfx_v8_0 is hung! [ 254.704629] [drm] GPU recovery disabled. Here are a few things I have tried so far: - Boot with amdgpu.dc=3D0 - Boot with amdgpu.vm_update_mode=3D3 - Force the GPU to max power state - Disable IOMMU (both by iommu=3Doff and by disabling VT-d in BIOS) - Boot with amdgpu.gpu_recovery=3D1 (does not produce any additional info) I grabbed the umr tool to try to get the state of the GPU when in crashes b= ut it does not seem to be able to read anything. Running: umr -R gfx[.] Leaves me with: [ERROR]: Could not open ring debugfs file#=20=20 I check that entries in /sys/kernel/debug/amdgpu that look relevant are the= re, cat'ing them gives me "Operation not permitted". Yes, I am doing it as root. Once this happens the only way out is a hard reboot. I am running up-to-date Fedora 28, kernel 4.17.2, Mesa 18.0 series, LLVM 6.= 0.1. Is there anything else I can do? Thanks. --=20 You are receiving this mail because: You are the assignee for the bug.= --15342903330.FC2aC9D.32560 Date: Tue, 14 Aug 2018 23:45:33 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 107572
Summary Unrecoverable GPU hang with IP block:gfx_v8_0 is hung
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter madcatx@atlas.cz

Hello,

I have been experiencing a worrying amount of these ever since I got my RX =
570
a few months ago. I can reproduce the hang quite reliably by with some 3D
workloads, for instance the Unigine Superposition run on High quality or
Witcher 3 (through WINE) crash the GPU quite reliably within minutes.

Once that happens I can always SSH into the machine and try to get at least
some debugging information. Unfortunately, there does not seem to be much t=
o go
on.

dmesg does not tell me more than this:
[  254.704581] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=3D103742, last emitted seq=3D103745
[  254.704586] [drm] IP block:gfx_v8_0 is hung!
[  254.704629] [drm] GPU recovery disabled.

Here are a few things I have tried so far:
- Boot with amdgpu.dc=3D0
- Boot with amdgpu.vm_update_mode=3D3
- Force the GPU to max power state
- Disable IOMMU (both by iommu=3Doff and by disabling VT-d in BIOS)
- Boot with amdgpu.gpu_recovery=3D1 (does not produce any additional info)

I grabbed the umr tool to try to get the state of the GPU when in crashes b=
ut
it does not seem to be able to read anything. Running:

umr -R gfx[.]

Leaves me with:

[ERROR]: Could not open ring debugfs file#=20=20

I check that entries in /sys/kernel/debug/amdgpu that look relevant are the=
re,
cat'ing them gives me "Operation not permitted". Yes, I am doing =
it as root.

Once this happens the only way out is a hard reboot.

I am running up-to-date Fedora 28, kernel 4.17.2, Mesa 18.0 series, LLVM 6.=
0.1.

Is there anything else I can do?

Thanks.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15342903330.FC2aC9D.32560-- --===============1253971646== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1253971646==--