From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working. Date: Sun, 04 Nov 2018 01:19:18 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1626257497==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F2EC96E01B for ; Sun, 4 Nov 2018 01:19:34 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1626257497== Content-Type: multipart/alternative; boundary="15412943741.e7ecaAa2.30172" Content-Transfer-Encoding: 7bit --15412943741.e7ecaAa2.30172 Date: Sun, 4 Nov 2018 01:19:34 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 --- Comment #40 from John W. --- Is there any resolution or work being done on this issue? I've tried the frequency hack and it slightly delayed the issue I also tried the latest amd staging kernel with latest firmware and XF86 dr= iver and found the same issue still happened but somewhat less. Reading my journalctl logs I found sometimes when it occurs it will attempt to recover= but in the process loses NRAM and freezes the screen covered in odd colors At least when this occurs the machine is otherwise functional and I can cha= nge TTYs and kill X11 I'm using a 580 and I've added the relevant logs of the attempted recovery. Nov 02 15:31:26 Towering-DG kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERR= OR* ring sdma1 timeout, signaled seq=3D59193, emitted seq=3D59194 Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: GPU reset begin! Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: GPU pci config res= et Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: GPU reset succeede= d, trying to resume Nov 02 15:31:27 Towering-DG kernel: [drm] PCIE GART of 256M enabled (table = at 0x000000F400300000). Nov 02 15:31:27 Towering-DG kernel: [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost! Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.2.1 test failed (-110) (Note: Usually it's ring SDMA0 instead of SDMA1 and occasionally GFX) --=20 You are receiving this mail because: You are the assignee for the bug.= --15412943741.e7ecaAa2.30172 Date: Sun, 4 Nov 2018 01:19:34 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 40 on bug 10573= 3 from John W.
Is there any resolution or work being done on this issue?
I've tried the frequency hack and it slightly delayed the issue
I also tried the latest amd staging kernel with latest firmware and XF86 dr=
iver
and found the same issue still happened but somewhat less. Reading my
journalctl logs I found sometimes when it occurs it will attempt to recover=
 but
in the process loses NRAM and freezes the screen covered in odd colors
At least when this occurs the machine is otherwise functional and I can cha=
nge
TTYs and kill X11
I'm using a 580 and I've added the relevant logs of the attempted recovery.

Nov 02 15:31:26 Towering-DG kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERR=
OR*
ring sdma1 timeout, signaled seq=3D59193, emitted seq=3D59194
Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: GPU reset begin!
Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: GPU pci config res=
et
Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0: GPU reset succeede=
d,
trying to resume
Nov 02 15:31:27 Towering-DG kernel: [drm] PCIE GART of 256M enabled (table =
at
0x000000F400300000).
Nov 02 15:31:27 Towering-DG kernel: [drm:amdgpu_device_gpu_recover [amdgpu]]
*ERROR* VRAM is lost!
Nov 02 15:31:27 Towering-DG kernel: amdgpu 0000:01:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.2.1 test failed
(-110)

(Note: Usually it's ring SDMA0 instead of SDMA1 and occasionally GFX)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15412943741.e7ecaAa2.30172-- --===============1626257497== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1626257497==--