From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 110413] GPU crash and failed reset leading to deadlock on
Polaris 22 XL [Radeon RX Vega M GL]
Date: Fri, 12 Apr 2019 14:44:42 +0000
Message-ID:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0017851197=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 8945A891B1
for ; Fri, 12 Apr 2019 14:44:43 +0000 (UTC)
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0017851197==
Content-Type: multipart/alternative; boundary="15550802831.b5b8.18035"
Content-Transfer-Encoding: 7bit
--15550802831.b5b8.18035
Date: Fri, 12 Apr 2019 14:44:43 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D110413
Bug ID: 110413
Summary: GPU crash and failed reset leading to deadlock on
Polaris 22 XL [Radeon RX Vega M GL]
Product: DRI
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: major
Priority: medium
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: rverschelde@gmail.com
Created attachment 143950
--> https://bugs.freedesktop.org/attachment.cgi?id=3D143950&action=3Dedit
lspci -vvv output for HP Spectre 360x
My HP Spectre x360 laptop bought in March 2019 comes with KabyLake G HD
Graphics 630 and a discrete AMD Radeon RX Vega M GL GPU.
I only enable the Radeon GPU when needed to play graphics intensive games w=
ith
`DRI_PRIME=3D1`, and so far I experience a lot of GPU deadlocks with the
following symptoms:
- Temperatures raise, the CPUs are throttled. Framerate drops when this
happens.
- Later on, GPU faults are reported in dmesg, the game's rendering freezes =
(but
music continues playing). I am still able to alt+tab back to desktop or ope=
n a
terminal, but the game's process can't be killed. If I'm monitoring
temperatures, lm_sensors always reports a bogus 511=C2=B0C temperature for =
the AMD
dGPU at this point, before breaking.
- Any subsequent attempt at using the AMD GPU will cause a system deadlock,=
and
I need to force shutdown with the power button.
My testing so far has covered:
- Unity3D games like For The King or StarCrawlers. The crash happens mid-ga=
me,
not in a strictly reproducible manner, but seems related to CPU
temperature/throttling.
* I could also reproduce the crash with SuperTuxKart, not in-game but when
alt-tabbing back to desktop.
* I could not get the crash yet with glmark2. With For The King, I can
reliably get a crash within 1 to 10 minutes in-game when playing with "High=
" or
"Dream" graphics quality.
- Kernel 5.0.x (up to 5.0.7) from Mageia 7 (Cauldron), e.g.
5.0.7-desktop-4.mga7.
* I also tried `git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next` at b07c394a327fc9e435ee03288584c111fa73d963, but I st=
ill
got the same symptoms. dmesg output was in part different though, more spam=
my.
* Following discussions in bug 109692, I tried the patches provided by An=
drey
Grodzovsky in bug 109692 comment 34, but they did not solve the issue for m=
e.
- Mesa 19.0.0 to 19.0.2 built against LLVM 7.0.1.
- Suspecting the CPU temperature/throttling as a trigger, I'm using
https://github.com/kitsunyan/intel-undervolt to undervolt the CPU Cache by =
-100
mV and set the CPU limit temperature to 80=C2=B0C instead of 100=C2=B0C. Th=
is has helped
with throttling issues I had during code compilation, but no visible change=
on
my GPU crashes that I can tell. I can disable this undervolting when doing
tests if required.
I found various bug reports which might well be duplicates, but I'm opening=
my
own to avoid hijacking discussions on what may or may not be the same root
cause: bug 109461, bug 109466, bug 109692 (I installed Shadow of the Tomb
Raider but haven't checked if I can reproduce this one's symptoms yet), bug
109819.
I attach some relevant logs on the system and the bug. Please ask for anyth=
ing
else you may need.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15550802831.b5b8.18035
Date: Fri, 12 Apr 2019 14:44:43 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
| Bug ID |
110413
|
| Summary |
GPU crash and failed reset leading to deadlock on Polaris 22 =
XL [Radeon RX Vega M GL]
|
| Product |
DRI
|
| Version |
unspecified
|
| Hardware |
x86-64 (AMD64)
|
| OS |
Linux (All)
|
| Status |
NEW
|
| Severity |
major
|
| Priority |
medium
|
| Component |
DRM/AMDgpu
|
| Assignee |
dri-devel@lists.freedesktop.org
|
| Reporter |
rverschelde@gmail.com
|
Created at=
tachment 143950 [details]
lspci -vvv output for HP Spectre 360x
My HP Spectre x360 laptop bought in March 2019 comes with KabyLake G HD
Graphics 630 and a discrete AMD Radeon RX Vega M GL GPU.
I only enable the Radeon GPU when needed to play graphics intensive games w=
ith
`DRI_PRIME=3D1`, and so far I experience a lot of GPU deadlocks with the
following symptoms:
- Temperatures raise, the CPUs are throttled. Framerate drops when this
happens.
- Later on, GPU faults are reported in dmesg, the game's rendering freezes =
(but
music continues playing). I am still able to alt+tab back to desktop or ope=
n a
terminal, but the game's process can't be killed. If I'm monitoring
temperatures, lm_sensors always reports a bogus 511=C2=B0C temperature for =
the AMD
dGPU at this point, before breaking.
- Any subsequent attempt at using the AMD GPU will cause a system deadlock,=
and
I need to force shutdown with the power button.
My testing so far has covered:
- Unity3D games like For The King or StarCrawlers. The crash happens mid-ga=
me,
not in a strictly reproducible manner, but seems related to CPU
temperature/throttling.
* I could also reproduce the crash with SuperTuxKart, not in-game but when
alt-tabbing back to desktop.
* I could not get the crash yet with glmark2. With For The King, I can
reliably get a crash within 1 to 10 minutes in-game when playing with "=
;High" or
"Dream" graphics quality.
- Kernel 5.0.x (up to 5.0.7) from Mageia 7 (Cauldron), e.g.
5.0.7-desktop-4.mga7.
* I also tried `git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next` at b07c394a327fc9e435ee03288584c111fa73d963, but I st=
ill
got the same symptoms. dmesg output was in part different though, more spam=
my.
* Following discussions in bug 109692, I tried the patches pr=
ovided by Andrey
Grodzovsky in bug 109692 comment 34, but the=
y did not solve the issue for me.
- Mesa 19.0.0 to 19.0.2 built against LLVM 7.0.1.
- Suspecting the CPU temperature/throttling as a trigger, I'm using
https://github.com=
/kitsunyan/intel-undervolt to undervolt the CPU Cache by -100
mV and set the CPU limit temperature to 80=C2=B0C instead of 100=C2=B0C. Th=
is has helped
with throttling issues I had during code compilation, but no visible change=
on
my GPU crashes that I can tell. I can disable this undervolting when doing
tests if required.
I found various bug reports which might well be duplicates, but I'm opening=
my
own to avoid hijacking discussions on what may or may not be the same root
cause: bug 109461, bug 109466, bug 109692 (I installed Shadow of =
the Tomb
Raider but haven't checked if I can reproduce this one's symptoms yet), bug
109819.
I attach some relevant logs on the system and the bug. Please ask for anyth=
ing
else you may need.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15550802831.b5b8.18035--
--===============0017851197==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============0017851197==--