From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 112226] [HadesCanyon] GPU hangs don't anymore recover (although
kernel still claims that they do)
Date: Thu, 07 Nov 2019 13:53:19 +0000
Message-ID:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1608487009=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id 168CE6F6A6
for ; Thu, 7 Nov 2019 13:53:19 +0000 (UTC)
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1608487009==
Content-Type: multipart/alternative; boundary="15731347990.f6fF7A6.2411"
Content-Transfer-Encoding: 7bit
--15731347990.f6fF7A6.2411
Date: Thu, 7 Nov 2019 13:53:19 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D112226
Bug ID: 112226
Summary: [HadesCanyon] GPU hangs don't anymore recover
(although kernel still claims that they do)
Product: DRI
Version: DRI git
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: critical
Priority: not set
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: eero.t.tamminen@intel.com
Setup:
* HW: KBL HadesCanyon (i7-8809G with Radeon RX Vega M GH)
* OS: Ubuntu 18.04 with Unity desktop (compiz)
* SW: Git builds of drm-tip kernel, Mesa and X server
Issue:
* AMD GPU driver stopped recovering from bug 108898 KBL HadesCanyon GPU han=
gs.
It still claims to recover from the bug:
-------------------------------------------------------
[ 1057.512690] Iteration 2/3: bin/testfw_app --gfx glfw --gl_api desktop_co=
re
--width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan
[ 1119.867403] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting =
for
fences timed out!
[ 1124.987449] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,=
but
soft recovered
-------------------------------------------------------
But now all 3D tests run after this error will fail.
This started to happen between following (drm-tip) kernel commits:
* 2019-10-28 16:01:46: 912b87256c: drm-tip: 2019y-10m-28d-16h-00m-10s UTC
integration manifest
* 2019-10-29 17:58:05: a2c9f8ce2a: drm-tip: 2019y-10m-29d-17h-57m-39s UTC
integration manifest
And following Mesa commits:
* 2019-10-28 17:47:06: d298740a1c: iris: Disallow incomplete resource creat=
ion
* 2019-10-29 16:19:34: ff6e148a3d: freedreno/a6xx: add a618 support
Note:
* I'm not seeing the same issue by using few months old Mesa with latest
drm-tip kernel, so some change in Mesa triggers this kernel issue
* If latest Mesa is used with drm-tip kernel 5.3, 4/5 times X fails to star=
t.=20
This started to happen with Mesa version within couple of days of the GPU h=
ang
recovery issue, so potentially there are more issue in Mesa (HadesCanyon) A=
MD
support
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15731347990.f6fF7A6.2411
Date: Thu, 7 Nov 2019 13:53:19 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
| Bug ID |
112226
|
| Summary |
[HadesCanyon] GPU hangs don't anymore recover (although kerne=
l still claims that they do)
|
| Product |
DRI
|
| Version |
DRI git
|
| Hardware |
x86-64 (AMD64)
|
| OS |
Linux (All)
|
| Status |
NEW
|
| Severity |
critical
|
| Priority |
not set
|
| Component |
DRM/AMDgpu
|
| Assignee |
dri-devel@lists.freedesktop.org
|
| Reporter |
eero.t.tamminen@intel.com
|
Setup:
* HW: KBL HadesCanyon (i7-8809G with Radeon RX Vega M GH)
* OS: Ubuntu 18.04 with Unity desktop (compiz)
* SW: Git builds of drm-tip kernel, Mesa and X server
Issue:
* AMD GPU driver stopped recovering from bug 108898 KBL HadesCanyon GPU han=
gs.
It still claims to recover from the bug:
-------------------------------------------------------
[ 1057.512690] Iteration 2/3: bin/testfw_app --gfx glfw --gl_api desktop_co=
re
--width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan
[ 1119.867403] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting =
for
fences timed out!
[ 1124.987449] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,=
but
soft recovered
-------------------------------------------------------
But now all 3D tests run after this error will fail.
This started to happen between following (drm-tip) kernel commits:
* 2019-10-28 16:01:46: 912b87256c: drm-tip: 2019y-10m-28d-16h-00m-10s UTC
integration manifest
* 2019-10-29 17:58:05: a2c9f8ce2a: drm-tip: 2019y-10m-29d-17h-57m-39s UTC
integration manifest
And following Mesa commits:
* 2019-10-28 17:47:06: d298740a1c: iris: Disallow incomplete resource creat=
ion
* 2019-10-29 16:19:34: ff6e148a3d: freedreno/a6xx: add a618 support
Note:
* I'm not seeing the same issue by using few months old Mesa with latest
drm-tip kernel, so some change in Mesa triggers this kernel issue
* If latest Mesa is used with drm-tip kernel 5.3, 4/5 times X fails to star=
t.=20
This started to happen with Mesa version within couple of days of the GPU h=
ang
recovery issue, so potentially there are more issue in Mesa (HadesCanyon) A=
MD
support
You are receiving this mail because:
- You are the assignee for the bug.
=
--15731347990.f6fF7A6.2411--
--===============1608487009==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============1608487009==--