From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 112226] [HadesCanyon] GPU hangs don't anymore recover (although kernel still claims that they do) Date: Thu, 07 Nov 2019 13:53:19 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1608487009==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 168CE6F6A6 for ; Thu, 7 Nov 2019 13:53:19 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1608487009== Content-Type: multipart/alternative; boundary="15731347990.f6fF7A6.2411" Content-Transfer-Encoding: 7bit --15731347990.f6fF7A6.2411 Date: Thu, 7 Nov 2019 13:53:19 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D112226 Bug ID: 112226 Summary: [HadesCanyon] GPU hangs don't anymore recover (although kernel still claims that they do) Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: critical Priority: not set Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: eero.t.tamminen@intel.com Setup: * HW: KBL HadesCanyon (i7-8809G with Radeon RX Vega M GH) * OS: Ubuntu 18.04 with Unity desktop (compiz) * SW: Git builds of drm-tip kernel, Mesa and X server Issue: * AMD GPU driver stopped recovering from bug 108898 KBL HadesCanyon GPU han= gs. It still claims to recover from the bug: ------------------------------------------------------- [ 1057.512690] Iteration 2/3: bin/testfw_app --gfx glfw --gl_api desktop_co= re --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan [ 1119.867403] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting = for fences timed out! [ 1124.987449] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,= but soft recovered ------------------------------------------------------- But now all 3D tests run after this error will fail. This started to happen between following (drm-tip) kernel commits: * 2019-10-28 16:01:46: 912b87256c: drm-tip: 2019y-10m-28d-16h-00m-10s UTC integration manifest * 2019-10-29 17:58:05: a2c9f8ce2a: drm-tip: 2019y-10m-29d-17h-57m-39s UTC integration manifest And following Mesa commits: * 2019-10-28 17:47:06: d298740a1c: iris: Disallow incomplete resource creat= ion * 2019-10-29 16:19:34: ff6e148a3d: freedreno/a6xx: add a618 support Note: * I'm not seeing the same issue by using few months old Mesa with latest drm-tip kernel, so some change in Mesa triggers this kernel issue * If latest Mesa is used with drm-tip kernel 5.3, 4/5 times X fails to star= t.=20 This started to happen with Mesa version within couple of days of the GPU h= ang recovery issue, so potentially there are more issue in Mesa (HadesCanyon) A= MD support --=20 You are receiving this mail because: You are the assignee for the bug.= --15731347990.f6fF7A6.2411 Date: Thu, 7 Nov 2019 13:53:19 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 112226
Summary [HadesCanyon] GPU hangs don't anymore recover (although kerne= l still claims that they do)
Product DRI
Version DRI git
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity critical
Priority not set
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter eero.t.tamminen@intel.com

Setup:
* HW: KBL HadesCanyon (i7-8809G with Radeon RX Vega M GH)
* OS: Ubuntu 18.04 with Unity desktop (compiz)
* SW: Git builds of drm-tip kernel, Mesa and X server

Issue:
* AMD GPU driver stopped recovering from bug 108898 KBL HadesCanyon GPU han=
gs.

It still claims to recover from the bug:
-------------------------------------------------------
[ 1057.512690] Iteration 2/3: bin/testfw_app --gfx glfw --gl_api desktop_co=
re
--width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan
[ 1119.867403] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting =
for
fences timed out!
[ 1124.987449] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,=
 but
soft recovered
-------------------------------------------------------
But now all 3D tests run after this error will fail.

This started to happen between following (drm-tip) kernel commits:
* 2019-10-28 16:01:46: 912b87256c: drm-tip: 2019y-10m-28d-16h-00m-10s UTC
integration manifest
* 2019-10-29 17:58:05: a2c9f8ce2a: drm-tip: 2019y-10m-29d-17h-57m-39s UTC
integration manifest

And following Mesa commits:
* 2019-10-28 17:47:06: d298740a1c: iris: Disallow incomplete resource creat=
ion
* 2019-10-29 16:19:34: ff6e148a3d: freedreno/a6xx: add a618 support


Note:
* I'm not seeing the same issue by using few months old Mesa with latest
drm-tip kernel, so some change in Mesa triggers this kernel issue
* If latest Mesa is used with drm-tip kernel 5.3, 4/5 times X fails to star=
t.=20
This started to happen with Mesa version within couple of days of the GPU h=
ang
recovery issue, so potentially there are more issue in Mesa (HadesCanyon) A=
MD
support


You are receiving this mail because:
  • You are the assignee for the bug.
= --15731347990.f6fF7A6.2411-- --===============1608487009== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1608487009==--