From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working. Date: Sun, 25 Mar 2018 16:52:20 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1438590327==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 681DE6E42A for ; Sun, 25 Mar 2018 16:52:20 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1438590327== Content-Type: multipart/alternative; boundary="15219967400.CFaAe.18910" Content-Transfer-Encoding: 7bit --15219967400.CFaAe.18910 Date: Sun, 25 Mar 2018 16:52:20 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 --- Comment #2 from Allan --- Tried getting all binaries available here https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git= /tree/amdgpu . Even that I included the polaris binaries in the kernel, some binaries were missing (exactly those that were required...). I've seen that before, but since sometimes it got working I just thought th= at some other bin was being used instead. Well... I launched Unigine Valley as a test and now the problem is even wor= se : [From dmesg] ``` [ 517.630633] amdgpu 0000:0e:00.0: GPU fault detected: 147 0x00004802 [ 517.630636] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR=20= =20 0x00000000 [ 517.630638] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002 [ 517.630640] amdgpu 0000:0e:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72) [ 517.630644] amdgpu 0000:0e:00.0: GPU fault detected: 147 0x00004802 [ 517.630645] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR=20= =20 0x00000000 [ 517.630646] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08084002 [ 517.630648] amdgpu 0000:0e:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC7' (0x54433700) (132) ``` The symptoms and reactions are the same as above. I got the output from a s= sh because only the cursor was moving and nothing else working. So ... did my card die or is it a bug? By the way ... I also have an RX580 and the problem described firstly was happening too. (I had not tried forcing binaries before) --=20 You are receiving this mail because: You are the assignee for the bug.= --15219967400.CFaAe.18910 Date: Sun, 25 Mar 2018 16:52:20 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 2 on bug 10573= 3 from <= span class=3D"fn">Allan
Tried getting all binaries available here
https://git.kernel.org/pub/scm/linux/kernel/git/fi=
rmware/linux-firmware.git/tree/amdgpu
.

Even that I included the polaris binaries in the kernel, some binaries were
missing (exactly those that were required...).

I've seen that before, but since sometimes it got working I just thought th=
at
some other bin was being used instead.

Well... I launched Unigine Valley as a test and now the problem is even wor=
se :

[From dmesg]
```
[  517.630633] amdgpu 0000:0e:00.0: GPU fault detected: 147 0x00004802
[  517.630636] amdgpu 0000:0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00000000
[  517.630638] amdgpu 0000:0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08048002
[  517.630640] amdgpu 0000:0e:00.0: VM fault (0x02, vmid 4) at page 0, read
from 'TC4' (0x54433400) (72)
[  517.630644] amdgpu 0000:0e:00.0: GPU fault detected: 147 0x00004802
[  517.630645] amdgpu 0000:0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00000000
[  517.630646] amdgpu 0000:0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08084002
[  517.630648] amdgpu 0000:0e:00.0: VM fault (0x02, vmid 4) at page 0, read
from 'TC7' (0x54433700) (132)
```

The symptoms and reactions are the same as above. I got the output from a s=
sh
because only the cursor was moving and nothing else working.

So ... did my card die or is it a bug?

By the way ... I also have an RX580 and the problem described firstly was
happening too. (I had not tried forcing binaries before)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15219967400.CFaAe.18910-- --===============1438590327== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1438590327==--