From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 111807] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout cause process into Disk sleep state Date: Wed, 25 Sep 2019 02:38:57 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1814224872==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 90A8D6EB18 for ; Wed, 25 Sep 2019 02:38:59 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1814224872== Content-Type: multipart/alternative; boundary="15693791390.F5EEFd4e.22011" Content-Transfer-Encoding: 7bit --15693791390.F5EEFd4e.22011 Date: Wed, 25 Sep 2019 02:38:59 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D111807 Bug ID: 111807 Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout cause process into Disk sleep state Product: DRI Version: DRI git Hardware: ARM OS: Linux (All) Status: NEW Severity: major Priority: not set Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: liansz@fzcyjh.com Created attachment 145506 --> https://bugs.freedesktop.org/attachment.cgi?id=3D145506&action=3Dedit timeoutlog We ran into some gfx timeout problems. Currently, we use the kernel of 4.19.36. We merged some patches regarding G= PU from the community. There are multiple GPUs on each server, and each GPU is running some rendering programs. Now, there are 2 different cases of failur= es. The first one is that one graphics card of a server fails, rendering program does not have a D state, and it shows error code 110 tested by /sys/kernel/debug/dri/1/amdgpu_test_ib, then shows pass after a second test. See tmp-618-2.zip for details. The second one is that one graphics card of a server fails, the whole rende= ring program running on the server fails and has D state. It fails at drm_releas= e. See tmp-619.zip for details. Could you please help us out? --=20 You are receiving this mail because: You are the assignee for the bug.= --15693791390.F5EEFd4e.22011 Date: Wed, 25 Sep 2019 02:38:59 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 111807
Summary [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout = cause process into Disk sleep state
Product DRI
Version DRI git
Hardware ARM
OS Linux (All)
Status NEW
Severity major
Priority not set
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter liansz@fzcyjh.com

Created attachment 145506 [det=
ails]
timeoutlog

We ran into some gfx timeout problems.
Currently, we use the kernel of 4.19.36. We merged some patches regarding G=
PU
from the community. There are multiple GPUs on each server, and each GPU is
running some rendering programs. Now, there are 2 different cases of failur=
es.
The first one is that one graphics card of a server fails, rendering program
does not have a D state, and it shows error code 110 tested by
/sys/kernel/debug/dri/1/amdgpu_test_ib, then shows pass after a second test.
See tmp-618-2.zip for details.
The second one is that one graphics card of a server fails, the whole rende=
ring
program running on the server fails and has D state. It fails at drm_releas=
e.
See tmp-619.zip for details.
Could you please help us out?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15693791390.F5EEFd4e.22011-- --===============1814224872== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1814224872==--