From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Sat, 07 Jul 2018 20:03:25 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0433267866==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id B3B556E037 for ; Sat, 7 Jul 2018 20:03:25 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0433267866== Content-Type: multipart/alternative; boundary="15309938050.D93b92.1809" Content-Transfer-Encoding: 7bit --15309938050.D93b92.1809 Date: Sat, 7 Jul 2018 20:03:25 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 Bug ID: 107152 Summary: GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: jb5sgc1n.nya@20mm.eu While just doing some Firefox-browsing amdgpu and then the whole system cra= shed on me with the following messages emitted to the journal: Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146 0x0c80440c Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100190 Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7, pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68) Jul 07 01:08:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng gfx timeout, last signaled seq=3D75244, last emitted seq=3D75245 Jul 07 01:08:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin! Kernel version used: amd-staging-drm-next current as of commit bb2e406ba66c2573b68e609e148cab57b1447095, with patch=20 https://bugs.freedesktop.org/attachment.cgi?id=3D140418 applied on top of i= t. Mesa version: 18.1.3-1 (current from Arch Linux) (This report was separated from https://bugs.freedesktop.org/show_bug.cgi?id=3D102322) --=20 You are receiving this mail because: You are the assignee for the bug.= --15309938050.D93b92.1809 Date: Sat, 7 Jul 2018 20:03:25 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 107152
Summary GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring= gfx timeout
Product DRI
Version DRI git
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter jb5sgc1n.nya@20mm.eu

While just doing some Firefox-browsing amdgpu and then the who=
le system crashed
on me with the following messages emitted to the journal:

Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146
0x0c80440c
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)
Jul 07 01:08:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
gfx timeout, last signaled seq=3D75244, last emitted seq=3D75245
Jul 07 01:08:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!

Kernel version used: amd-staging-drm-next current as of commit
bb2e406ba66c2573b68e609e148cab57b1447095, with patch=20
https:/=
/bugs.freedesktop.org/attachment.cgi?id=3D140418 applied on top of it.

Mesa version: 18.1.3-1 (current from Arch Linux)

(This report was separated from
https://bugs.freedesktop.org/show_bug.=
cgi?id=3D102322)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15309938050.D93b92.1809-- --===============0433267866== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0433267866==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Sat, 07 Jul 2018 20:04:35 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1357124030==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 0052A6E2BE for ; Sat, 7 Jul 2018 20:04:35 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1357124030== Content-Type: multipart/alternative; boundary="15309938740.a317585.3147" Content-Transfer-Encoding: 7bit --15309938740.a317585.3147 Date: Sat, 7 Jul 2018 20:04:34 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #1 from dwagner --- Created attachment 140497 --> https://bugs.freedesktop.org/attachment.cgi?id=3D140497&action=3Dedit dmesg of from booting until gpu fault few minutes later --=20 You are receiving this mail because: You are the assignee for the bug.= --15309938740.a317585.3147 Date: Sat, 7 Jul 2018 20:04:34 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 1 on bug 10715= 2 from dwagner
Created attachment 140497 [details]
dmesg of from booting until gpu fault few minutes later


You are receiving this mail because:
  • You are the assignee for the bug.
= --15309938740.a317585.3147-- --===============1357124030== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1357124030==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Sat, 07 Jul 2018 20:05:22 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1818008366==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id DF29B6E331 for ; Sat, 7 Jul 2018 20:05:26 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1818008366== Content-Type: multipart/alternative; boundary="15309939220.1915.3904" Content-Transfer-Encoding: 7bit --15309939220.1915.3904 Date: Sat, 7 Jul 2018 20:05:22 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #2 from dwagner --- Created attachment 140498 --> https://bugs.freedesktop.org/attachment.cgi?id=3D140498&action=3Dedit Xorg.log from the session ending in the "gpu fault" --=20 You are receiving this mail because: You are the assignee for the bug.= --15309939220.1915.3904 Date: Sat, 7 Jul 2018 20:05:22 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 2 on bug 10715= 2 from dwagner
Created attachment 140498 [details]
Xorg.log from the session ending in the "gpu fault"


You are receiving this mail because:
  • You are the assignee for the bug.
= --15309939220.1915.3904-- --===============1818008366== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1818008366==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Mon, 30 Jul 2018 21:12:38 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1021676839==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id D70726E028 for ; Mon, 30 Jul 2018 21:12:38 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1021676839== Content-Type: multipart/alternative; boundary="15329851580.dceC.28024" Content-Transfer-Encoding: 7bit --15329851580.dceC.28024 Date: Mon, 30 Jul 2018 21:12:38 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #3 from krzysiek@cybulski.info --- Hi, I get this GPU hung, 1-2 a day, mostly when using PHPStorm (Java based PHP Editor) System is KDE Neon (Ubuntu 18.04 + latest KDE), I use padoka PPA (currently 1:18.2~git180730133900.0ea243d~b~padoka0) GPU POLARIS11 0x1002:0x67EF 0x1458:0x230A 0xE5 [ 8004.993577] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000620c [ 8004.993584] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR=20= =20 0x00000000 [ 8004.993587] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0806200C [ 8004.993591] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32771) at page 0, read from 'CBC0' (0x43424330) (98) [ 8263.966497] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D194416, last emitted seq=3D194418 [ 8263.966510] [drm] IP block:gfx_v8_0 is hung! [ 8263.966562] amdgpu 0000:01:00.0: GPU reset begin! If you need more info please let me know Krzysiek --=20 You are receiving this mail because: You are the assignee for the bug.= --15329851580.dceC.28024 Date: Mon, 30 Jul 2018 21:12:38 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 3 on bug 10715= 2 from krzysiek&#= 64;cybulski.info
Hi,

I get this GPU hung, 1-2 a day, mostly when using PHPStorm (Java based PHP
Editor)

System is KDE Neon (Ubuntu 18.04 + latest KDE), I use padoka PPA (currently
1:18.2~git180730133900.0ea243d~b~padoka0)

GPU POLARIS11 0x1002:0x67EF 0x1458:0x230A 0xE5
[ 8004.993577] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000620c
[ 8004.993584] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00000000
[ 8004.993587] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0806200C
[ 8004.993591] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32771) at
page 0, read from 'CBC0' (0x43424330) (98)
[ 8263.966497] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=3D194416, last emitted seq=3D194418
[ 8263.966510] [drm] IP block:gfx_v8_0 is hung!
[ 8263.966562] amdgpu 0000:01:00.0: GPU reset begin!

If you need more info please let me know
Krzysiek


You are receiving this mail because:
  • You are the assignee for the bug.
= --15329851580.dceC.28024-- --===============1021676839== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1021676839==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Tue, 31 Jul 2018 21:41:14 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1310216603==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 1FC0A89C68 for ; Tue, 31 Jul 2018 21:41:14 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1310216603== Content-Type: multipart/alternative; boundary="15330732740.39Be7E.21758" Content-Transfer-Encoding: 7bit --15330732740.39Be7E.21758 Date: Tue, 31 Jul 2018 21:41:14 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #4 from dwagner --- Saw this kind of crash (still with the latest amd-staging-drm-next kernel) three times in a row today, just by playing a specific video immediately af= ter rebooting and starting X11 with mpv, before the 10 minute video ended. The video (which just shows a static cover image) can be obtained via: youtube-dl -f 248+251 'https://www.youtube.com/watch?v=3DkYKE78Pcjog' The log messages were just like reported above, I guess the additional "hw_= done or flip_done timed out" after the "GPU reset begin!" is not really relevant: Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 147 0x0f580402 for process Xorg pid 793 thread amdgpu_cs:0 pid 794 Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010C3EB Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02004002 Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x02, vmid 1, pasid 32768) at page 1098731, read from 'TC3' (0x54433300) (4) Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146 0x0c984424 for process Xorg pid 793 thread amdgpu_cs:0 pid 794 Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100193 Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04044024 Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x24, vmid 2, pasid 32768) at page 1048979, read from 'TC1' (0x54433100) (68) Jul 31 22:20:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng gfx timeout, signaled seq=3D26570, emitted seq=3D26573 Jul 31 22:20:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin! Jul 31 22:20:35 ryzen kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:44:crtc-0] hw_done or flip_done timed out --=20 You are receiving this mail because: You are the assignee for the bug.= --15330732740.39Be7E.21758 Date: Tue, 31 Jul 2018 21:41:14 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 4 on bug 10715= 2 from dwagner
Saw this kind of crash (still with the latest amd-staging-drm-=
next kernel)
three times in a row today, just by playing a specific video immediately af=
ter
rebooting and starting X11 with mpv, before the 10 minute video ended.
The video (which just shows a static cover image) can be obtained via:

youtube-dl -f 248+251 'https://www.youtube.com/watch?v=3DkYKE78Pcjog'

The log messages were just like reported above, I guess the additional &quo=
t;hw_done
or flip_done timed out" after the "GPU reset begin!" is not =
really relevant:

Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 147
0x0f580402 for process Xorg pid 793 thread amdgpu_cs:0 pid 794
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010C3EB
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02004002
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x02, vmid 1,
pasid 32768) at page 1098731, read from 'TC3' (0x54433300) (4)
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146
0x0c984424 for process Xorg pid 793 thread amdgpu_cs:0 pid 794
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100193
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04044024
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x24, vmid 2,
pasid 32768) at page 1048979, read from 'TC1' (0x54433100) (68)
Jul 31 22:20:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
gfx timeout, signaled seq=3D26570, emitted seq=3D26573
Jul 31 22:20:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Jul 31 22:20:35 ryzen kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
[CRTC:44:crtc-0] hw_done or flip_done timed out


You are receiving this mail because:
  • You are the assignee for the bug.
= --15330732740.39Be7E.21758-- --===============1310216603== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1310216603==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Tue, 31 Jul 2018 21:45:29 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1781503115==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 28DCC6E28A for ; Tue, 31 Jul 2018 21:45:29 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1781503115== Content-Type: multipart/alternative; boundary="15330735290.bB2E00.23135" Content-Transfer-Encoding: 7bit --15330735290.bB2E00.23135 Date: Tue, 31 Jul 2018 21:45:29 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #5 from dwagner --- Just in case somebody aims at reproducing this with mpv, this is the conten= t of the .config/mpv/mpv.conf file in use: audio-device=3D'alsa/iec958:CARD=3DGeneric,DEV=3D0' audio-delay=3D0.2 fs=3Dno vo=3Dgpu gpu-api=3Dauto profile=3Dgpu-hq fbo-format=3Drgba16f hwdec=3Dno video-align-y=3D-1 hidpi-window-scale=3Dno target-prim=3Dbt.709 tone-mapping=3Dhable cache=3D65536 cache-initial=3D2048 cache-secs=3D20 So not the amdgpu HDMI audio output was used, and no video decoding hardware acceleration. --=20 You are receiving this mail because: You are the assignee for the bug.= --15330735290.bB2E00.23135 Date: Tue, 31 Jul 2018 21:45:29 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 5 on bug 10715= 2 from dwagner
Just in case somebody aims at reproducing this with mpv, this =
is the content of
the .config/mpv/mpv.conf file in use:

audio-device=3D'alsa/iec958:CARD=3DGeneric,DEV=3D0'
audio-delay=3D0.2
fs=3Dno
vo=3Dgpu
gpu-api=3Dauto
profile=3Dgpu-hq
fbo-format=3Drgba16f
hwdec=3Dno
video-align-y=3D-1
hidpi-window-scale=3Dno
target-prim=3Dbt.709
tone-mapping=3Dhable
cache=3D65536
cache-initial=3D2048
cache-secs=3D20

So not the amdgpu HDMI audio output was used, and no video decoding hardware
acceleration.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15330735290.bB2E00.23135-- --===============1781503115== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1781503115==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Thu, 02 Aug 2018 21:25:55 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0963931809==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 5CA846E636 for ; Thu, 2 Aug 2018 21:25:55 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0963931809== Content-Type: multipart/alternative; boundary="15332451552.f895.16115" Content-Transfer-Encoding: 7bit --15332451552.f895.16115 Date: Thu, 2 Aug 2018 21:25:55 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #6 from Andrey Grodzovsky --- dwanger, how quickly is this reproducible ? A wild guess, what if you boot kernel with IOMMU disabled ? Add iommu=3Doff= to grub command line. --=20 You are receiving this mail because: You are the assignee for the bug.= --15332451552.f895.16115 Date: Thu, 2 Aug 2018 21:25:55 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 6 on bug 10715= 2 from Andrey Grodzovsky
dwanger, how quickly is this reproducible ?
A wild guess, what if you boot kernel with IOMMU disabled ? Add iommu=3Doff=
 to
grub command line.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15332451552.f895.16115-- --===============0963931809== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0963931809==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Thu, 02 Aug 2018 21:54:20 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0328081189==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id D02086E63E for ; Thu, 2 Aug 2018 21:54:19 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0328081189== Content-Type: multipart/alternative; boundary="15332468590.6Da15.30316" Content-Transfer-Encoding: 7bit --15332468590.6Da15.30316 Date: Thu, 2 Aug 2018 21:54:19 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #7 from dwagner --- (In reply to Andrey Grodzovsky from comment #6) > dwanger, how quickly is this reproducible ? With the above video playback test (which I should refer to as the "Othan" test, because that is the name of the song in the video) actually quite fas= t - never took more than 10 minutes so far to get to the crash. > A wild guess, what if you boot kernel with IOMMU disabled ? Add iommu=3Do= ff to > grub command line. Tried this: No difference, two attempts with current amd-staging-drm-next, = one with hw_update_mode=3D0 and one with hw_update_mode=3D3, both crashed in < = 1 minute of replay. Interestingly, the "Othan test" can even crash the 4.13 kernel quicker then= the usual one or two days of uptime I can get with that old kernel. There isn't really anything special with the video other than it being enco= ded at only 6 frames per second. And btw., the video replay crashes even with --vo=3Dxv, so without mpv maki= ng use of opengl. Replay does not crash with --vo=3Dnull.=20 In contrast, when I replay videos with the usual 24fps, this runs much long= er without crashing. --=20 You are receiving this mail because: You are the assignee for the bug.= --15332468590.6Da15.30316 Date: Thu, 2 Aug 2018 21:54:19 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 7 on bug 10715= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #6)
> dwanger, how quickly is this reproducible ?

With the above video playback test (which I should refer to as the "Ot=
han"
test, because that is the name of the song in the video) actually quite fas=
t -
never took more than 10 minutes so far to get to the crash.

> A wild guess, what if you boot kernel with IOMMU=
 disabled ? Add iommu=3Doff to
> grub command line.

Tried this: No difference, two attempts with current amd-staging-drm-next, =
one
with hw_update_mode=3D0 and one with hw_update_mode=3D3, both crashed in &l=
t; 1 minute
of replay.

Interestingly, the "Othan test" can even crash the 4.13 kernel qu=
icker then the
usual one or two days of uptime I can get with that old kernel.

There isn't really anything special with the video other than it being enco=
ded
at only 6 frames per second.

And btw., the video replay crashes even with --vo=3Dxv, so without mpv maki=
ng use
of opengl. Replay does not crash with --vo=3Dnull.=20

In contrast, when I replay videos with the usual 24fps, this runs much long=
er
without crashing.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15332468590.6Da15.30316-- --===============0328081189== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0328081189==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Fri, 03 Aug 2018 16:54:28 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0642734398==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 5900A6E781 for ; Fri, 3 Aug 2018 16:54:28 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0642734398== Content-Type: multipart/alternative; boundary="15333152681.6DBd0B.30931" Content-Transfer-Encoding: 7bit --15333152681.6DBd0B.30931 Date: Fri, 3 Aug 2018 16:54:28 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #8 from Andrey Grodzovsky --- dwanger, i think you already have all the trace tools installed from previo= us debug sessions so this should be quick for you -=20 Update to latest kernel from https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next Load the system and before starting reproduce run the following trace comma= nd - sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv" after VM_FAULT happened extract the log from /sys/kernel/debug/tracing also run=20 sudo umr -O verbose -R gfx[.] sudo umr -O halt_waves -wa Now let's say this your log crash=20 Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100190 Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7, pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68) Do umr -O verbose -vm 7@100190000 1=20 where 7 is vmid value and 100190000 is VM_CONTEXT1_PROTECTION_FAULT_ADDR va= lue with extra '000' to get from virtual page number to actual virtual address (left shift 4096b). I can look at the log then and also run it by our MESA/LLVM experts to try = and figure out what's going on. --=20 You are receiving this mail because: You are the assignee for the bug.= --15333152681.6DBd0B.30931 Date: Fri, 3 Aug 2018 16:54:28 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 8 on bug 10715= 2 from Andrey Grodzovsky
dwanger, i think you already have all the trace tools installe=
d from previous
debug sessions so this should be quick for you -=20

Update to latest kernel from
https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm=
-next

Load the system and before starting reproduce run the following trace comma=
nd -

sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
"amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e &=
quot;amdgpu:amdgpu_iv"


after VM_FAULT happened extract the log from /sys/kernel/debug/tracing

also run=20
sudo umr -O verbose -R gfx[.]
sudo umr -O halt_waves -wa

Now let's say this your log crash=20

Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)

Do

umr -O verbose -vm 7@100190000 1=20

where 7 is vmid value and 100190000 is VM_CONTEXT1_PROTECTION_FAULT_ADDR va=
lue
with extra '000' to get from  virtual page number to actual virtual address
(left shift 4096b).

I can look at the log then and also run it by our MESA/LLVM experts to try =
and
figure out what's going on.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15333152681.6DBd0B.30931-- --===============0642734398== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0642734398==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Fri, 03 Aug 2018 23:42:04 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0657972113==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id D68C76E7D6 for ; Fri, 3 Aug 2018 23:42:04 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0657972113== Content-Type: multipart/alternative; boundary="15333397242.de1f9e.30083" Content-Transfer-Encoding: 7bit --15333397242.de1f9e.30083 Date: Fri, 3 Aug 2018 23:42:04 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #9 from dwagner --- Created attachment 140959 --> https://bugs.freedesktop.org/attachment.cgi?id=3D140959&action=3Dedit test script that attempted to catch useful output after crashes - but failed --=20 You are receiving this mail because: You are the assignee for the bug.= --15333397242.de1f9e.30083 Date: Fri, 3 Aug 2018 23:42:04 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 9 on bug 10715= 2 from dwagner
Created attachment 140959 [details]
test script that attempted to catch useful output after crashes - but faile=
d


You are receiving this mail because:
  • You are the assignee for the bug.
= --15333397242.de1f9e.30083-- --===============0657972113== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0657972113==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Fri, 03 Aug 2018 23:48:14 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1676118392==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id D996689C08 for ; Fri, 3 Aug 2018 23:48:13 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1676118392== Content-Type: multipart/alternative; boundary="15333400932.97c2FEeC.30083" Content-Transfer-Encoding: 7bit --15333400932.97c2FEeC.30083 Date: Fri, 3 Aug 2018 23:48:13 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #10 from dwagner --- (In reply to Andrey Grodzovsky from comment #8) > dwanger, i think you already have all the trace tools installed from > previous debug sessions so this should be quick for you Yes, and I tried really hard (with above attached script run as "root" on a text console while the "othan_test.sh" script played the video on the scree= n) to catch any useful output, but that failed for the same reason I mentioned= in https://bugs.freedesktop.org/show_bug.cgi?id=3D102322#c20 - the system simply crashes too hard to quickly to be able to do anything a= fter amdgpu.ko crashes. The output I get in gpu_result.txt stops at the "waiting= for the crash" line. It is only in about 1 out of 10 crashes that the syslog at least contains t= he error messages from the amdgpu crash, in the other 90% of cases the same cr= ash occurs with no message being recorded at all. If there was any method to let other processes survive for a while after am= dgpu crashes, please let me know. --=20 You are receiving this mail because: You are the assignee for the bug.= --15333400932.97c2FEeC.30083 Date: Fri, 3 Aug 2018 23:48:13 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 10 on bug 10715= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #8)
> dwanger, i think you already have all the trace =
tools installed from
> previous debug sessions so this should be quick for you
Yes, and I tried really hard (with above attached script run as "root&=
quot; on a
text console while the "othan_test.sh" script played the video on=
 the screen)
to catch any useful output, but that failed for the same reason I mentioned=
 in
https://bugs.freedesktop.org/show_=
bug.cgi?id=3D102322#c20
- the system simply crashes too hard to quickly to be able to do anything a=
fter
amdgpu.ko crashes. The output I get in gpu_result.txt stops at the "wa=
iting for
the crash" line.

It is only in about 1 out of 10 crashes that the syslog at least contains t=
he
error messages from the amdgpu crash, in the other 90% of cases the same cr=
ash
occurs with no message being recorded at all.

If there was any method to let other processes survive for a while after am=
dgpu
crashes, please let me know.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15333400932.97c2FEeC.30083-- --===============1676118392== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1676118392==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Sun, 05 Aug 2018 19:59:27 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0190645660==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 228906E0FC for ; Sun, 5 Aug 2018 19:59:27 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0190645660== Content-Type: multipart/alternative; boundary="15334991670.71c10DF.14671" Content-Transfer-Encoding: 7bit --15334991670.71c10DF.14671 Date: Sun, 5 Aug 2018 19:59:27 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #11 from dwagner --- I did some additional experiments to understand what is so special about the "Othan" video that playing it causes amdgpu to crash relatively fast. Since the only "odd" parameter of it is its "6 fps" frame rate, I tried replaying other videos, first at their normal rate (like 24 fps), which did= not cause quick crashes, then at an artificially lower set rate - and indeed, t= hat causes fast crashing regardless of what video I play. The framerate that caused the "quickest" crashing seemed to be 3 fps, runni= ng > mpv --no-correct-pts --fps=3D3 --ao=3Dnull some_arbitrary_video.webm was usually crashing amd-staging-drm-next within < 1 minute for me. Just some random thought: Could the reason be some timed hysteresis in power management of the GPU? Would there be some possibility to lock the GPU on a specific power level to then try if those crashes still occur? --=20 You are receiving this mail because: You are the assignee for the bug.= --15334991670.71c10DF.14671 Date: Sun, 5 Aug 2018 19:59:27 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 11 on bug 10715= 2 from dwagner
I did some additional experiments to understand what is so spe=
cial about the
"Othan" video that playing it causes amdgpu to crash relatively f=
ast.

Since the only "odd" parameter of it is its "6 fps" fra=
me rate, I tried
replaying other videos, first at their normal rate (like 24 fps), which did=
 not
cause quick crashes, then at an artificially lower set rate - and indeed, t=
hat
causes fast crashing regardless of what video I play.

The framerate that caused the "quickest" crashing seemed to be 3 =
fps, running
> mpv --no-correct-pts --fps=3D3 --ao=3Dnull some_=
arbitrary_video.webm
was usually crashing amd-staging-drm-next within < 1 minute for me.


Just some random thought: Could the reason be some timed hysteresis in power
management of the GPU?
Would there be some possibility to lock the GPU on a specific power level to
then try if those crashes still occur?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15334991670.71c10DF.14671-- --===============0190645660== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0190645660==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Wed, 08 Aug 2018 23:13:20 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2096985535==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 7967D89B0B for ; Wed, 8 Aug 2018 23:13:20 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2096985535== Content-Type: multipart/alternative; boundary="15337700001.13d1E44aC.8584" Content-Transfer-Encoding: 7bit --15337700001.13d1E44aC.8584 Date: Wed, 8 Aug 2018 23:13:20 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #12 from dwagner --- Indeed, I found my theory confirmed by many experiments: If I use a script = like > #!/bin/bash > cd /sys/class/drm/card0/device > echo manual >power_dpm_force_performance_level > # low > echo 0 >pp_dpm_mclk=20 > echo 0 >pp_dpm_sclk > # medium > #echo 1 >pp_dpm_mclk=20 > #echo 1 >pp_dpm_sclk > # high > #echo 1 >pp_dpm_mclk=20 > #echo 6 >pp_dpm_sclk to enforce just any performance level, then the crashes do not occur anymor= e - also with the "low frame rate video test". So it seems that the transition from one "dpm" performance level to another, with a certain probability, causes these crashes. And the more often the transitions occur, the sooner one will experience them. The dynamic power management issue can now be pursued with the original bug report https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 for the vm_update_mode=3D0 case - there is probably not much sense in keeping this = bug report open just because errors also occur with wm_update_mode=3D3, just le= ss often. --=20 You are receiving this mail because: You are the assignee for the bug.= --15337700001.13d1E44aC.8584 Date: Wed, 8 Aug 2018 23:13:20 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 12 on bug 10715= 2 from dwagner
Indeed, I found my theory confirmed by many experiments: If I =
use a script like
> #!/bin/bash
> cd /sys/class/drm/card0/device
> echo manual >power_dpm_force_performance_level
> # low
> echo 0 >pp_dpm_mclk=20
> echo 0 >pp_dpm_sclk
> # medium
> #echo 1 >pp_dpm_mclk=20
> #echo 1 >pp_dpm_sclk
> # high
> #echo 1 >pp_dpm_mclk=20
> #echo 6 >pp_dpm_sclk
to enforce just any performance level, then the crashes do not occur anymor=
e -
also with the "low frame rate video test".

So it seems that the transition from one "dpm" performance level =
to another,
with a certain probability, causes these crashes. And the more often the
transitions occur, the sooner one will experience them.

The dynamic power management issue can now be pursued with the original bug
report https://bugs.freedesktop.org/show_bug.=
cgi?id=3D102322 for the
vm_update_mode=3D0 case - there is probably not much sense in keeping this =
bug
report open just because errors also occur with wm_update_mode=3D3, just le=
ss
often.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15337700001.13d1E44aC.8584-- --===============2096985535== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2096985535==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Thu, 09 Aug 2018 16:25:40 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1969862724==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id EE8818940E for ; Thu, 9 Aug 2018 16:25:39 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1969862724== Content-Type: multipart/alternative; boundary="15338319390.eF0DCdfF2.10424" Content-Transfer-Encoding: 7bit --15338319390.eF0DCdfF2.10424 Date: Thu, 9 Aug 2018 16:25:39 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #13 from Andrey Grodzovsky --- (In reply to dwagner from comment #12) > Indeed, I found my theory confirmed by many experiments: If I use a script > like > > #!/bin/bash > > cd /sys/class/drm/card0/device > > echo manual >power_dpm_force_performance_level > > # low > > echo 0 >pp_dpm_mclk=20 > > echo 0 >pp_dpm_sclk > > # medium > > #echo 1 >pp_dpm_mclk=20 > > #echo 1 >pp_dpm_sclk > > # high > > #echo 1 >pp_dpm_mclk=20 > > #echo 6 >pp_dpm_sclk > to enforce just any performance level, then the crashes do not occur anym= ore > - also with the "low frame rate video test". >=20 > So it seems that the transition from one "dpm" performance level to anoth= er, > with a certain probability, causes these crashes. And the more often the > transitions occur, the sooner one will experience them. >=20 > The dynamic power management issue can now be pursued with the original b= ug > report https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 for the > vm_update_mode=3D0 case - there is probably not much sense in keeping thi= s bug > report open just because errors also occur with wm_update_mode=3D3, just = less > often. Agreed. --=20 You are receiving this mail because: You are the assignee for the bug.= --15338319390.eF0DCdfF2.10424 Date: Thu, 9 Aug 2018 16:25:39 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 13 on bug 10715= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #12)
> Indeed, I found my theory confirmed by many expe=
riments: If I use a script
> like
> > #!/bin/bash
> > cd /sys/class/drm/card0/device
> > echo manual >power_dpm_force_performance_level
> > # low
> > echo 0 >pp_dpm_mclk=20
> > echo 0 >pp_dpm_sclk
> > # medium
> > #echo 1 >pp_dpm_mclk=20
> > #echo 1 >pp_dpm_sclk
> > # high
> > #echo 1 >pp_dpm_mclk=20
> > #echo 6 >pp_dpm_sclk
> to enforce just any performance level, then the crashes do not occur a=
nymore
> - also with the "low frame rate video test".
>=20
> So it seems that the transition from one "dpm" performance l=
evel to another,
> with a certain probability, causes these crashes. And the more often t=
he
> transitions occur, the sooner one will experience them.
>=20
> The dynamic power management issue can now be pursued with the origina=
l bug
> report https://bugs.freedesktop.org/show_bug.=
cgi?id=3D102322 for the
> vm_update_mode=3D0 case - there is probably not much sense in keeping =
this bug
> report open just because errors also occur with wm_update_mode=3D3, ju=
st less
> often.

Agreed.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15338319390.eF0DCdfF2.10424-- --===============1969862724== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1969862724==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Thu, 09 Aug 2018 20:56:06 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1652286283==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id B8B0F6E82E for ; Thu, 9 Aug 2018 20:56:06 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1652286283== Content-Type: multipart/alternative; boundary="15338481662.3CBC8e.6075" Content-Transfer-Encoding: 7bit --15338481662.3CBC8e.6075 Date: Thu, 9 Aug 2018 20:56:06 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 dwagner changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #14 from dwagner --- *** This bug has been marked as a duplicate of bug 102322 *** --=20 You are receiving this mail because: You are the assignee for the bug.= --15338481662.3CBC8e.6075 Date: Thu, 9 Aug 2018 20:56:06 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated dwagner changed bug 10715= 2
What Removed Added
Resolution --- DUPLICATE
Status NEW RESOLVED

Comme= nt # 14 on bug 10715= 2 from dwagner

*** This bug has been marked as a duplicate of bug 102322 ***


You are receiving this mail because:
  • You are the assignee for the bug.
= --15338481662.3CBC8e.6075-- --===============1652286283== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1652286283==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout Date: Thu, 24 Jan 2019 06:45:47 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0882369937==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 45C326F0A5 for ; Thu, 24 Jan 2019 06:45:47 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0882369937== Content-Type: multipart/alternative; boundary="15483123471.A424CD.17347" Content-Transfer-Encoding: 7bit --15483123471.A424CD.17347 Date: Thu, 24 Jan 2019 06:45:47 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 --- Comment #15 from Ida Wallace --- Thanks for letting us know about the duplicate bug of GPU fault and System crashes, so solution seekers can refer both references to understand the bug and try to solve it easily. Ida, http://www.assignmenthelpfolks.com/ --=20 You are receiving this mail because: You are the assignee for the bug.= --15483123471.A424CD.17347 Date: Thu, 24 Jan 2019 06:45:47 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 15 on bug 10715= 2 from Ida Wallace
Thanks for letting us know about the duplicate bug of GPU faul=
t and System
crashes, so solution seekers can refer both references to understand the bug
and try to solve it easily.

Ida,
http://www.assignmenthelpfo=
lks.com/


You are receiving this mail because:
  • You are the assignee for the bug.
= --15483123471.A424CD.17347-- --===============0882369937== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0882369937==--