From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 112242] amdgpu [RX Vega 56]: ring sdma0 timeout Date: Mon, 11 Nov 2019 09:33:46 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1317989561==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 90E246E874 for ; Mon, 11 Nov 2019 09:33:46 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1317989561== Content-Type: multipart/alternative; boundary="15734648260.dAab4CB95.29609" Content-Transfer-Encoding: 7bit --15734648260.dAab4CB95.29609 Date: Mon, 11 Nov 2019 09:33:46 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D112242 Bug ID: 112242 Summary: amdgpu [RX Vega 56]: ring sdma0 timeout Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: major Priority: not set Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: mh@familie-heinz.name Hi, I've reported this over at bugzilla.kernel.org but didn't get any help ther= e. Maybe because nobody is expecting bugreports about the amdgpu driver over on the kernels bugtracker? So this started a while ago, when I updated from 5.0.0 to a newer kernel. I= 'm currently at 5.3.0 and for almost any game I play I run into this problem: Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* = ring sdma0 timeout, signaled seq=3D368056, emitted seq=3D368057 Aug 24 11:13:33 egalite kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process 7DaysToDie.x86_ pid 8108 thread 7DaysToDie:cs0 Aug 24 11:13:33 egalite kernel: amdgpu 0000:0c:00.0: GPU reset begin! Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* = ring gfx timeout, but soft recovered Only a hard reset made me recover from that. I did some kernel traces which I will copy over to this report, if necessar= y, but for now you can download them here: https://bugzilla.kernel.org/show_bug.cgi?id=3D204683 It also looks a bit like this bug: https://bugzilla.kernel.org/show_bug.cgi?id=3D201957 , because I also get t= he "ring gfx timeout". And there are lots and lots of people having this issue. I tried bisecting it, but failed, because either I missed the commit that causes this, because there are multiple reasons why this happens or this re= ally goes way back to the time, where 4.18 was the base for drm-next (which does= n't compile on modern compilers anymore. Also steam doesn't want to run on those old kernels, so even when I was able to compile an older kernel, there was = no way to test them) I even tried debugging it over ethernet (KGDBoE is a nice thing if you need performance), but somehow this slowed everything down enough to not trigger= the bug. I also tried the suggestions from https://bugs.freedesktop.org/show_bug.cgi?id=3D109955, but forbidding the l= owest clock mode doesn't help either. (It fixes my RocketLeague problems, though). Please advise what I should try next. Best regards Matthias --=20 You are receiving this mail because: You are the assignee for the bug.= --15734648260.dAab4CB95.29609 Date: Mon, 11 Nov 2019 09:33:46 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 112242
Summary amdgpu [RX Vega 56]: ring sdma0 timeout
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity major
Priority not set
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter mh@familie-heinz.name

Hi,

I've reported this over at bugzilla.kernel.org but didn't get any help ther=
e.
Maybe because nobody is expecting bugreports about the amdgpu driver over on
the kernels bugtracker?

So this started a while ago, when I updated from 5.0.0 to a newer kernel. I=
'm
currently at 5.3.0 and for almost any game I play I run into this problem:

Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* =
ring
sdma0 timeout, signaled seq=3D368056, emitted seq=3D368057
Aug 24 11:13:33 egalite kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process 7DaysToDie.x86_ pid 8108 thread 7DaysToDie:cs0
Aug 24 11:13:33 egalite kernel: amdgpu 0000:0c:00.0: GPU reset begin!
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* =
ring
gfx timeout, but soft recovered

Only a hard reset made me recover from that.

I did some kernel traces which I will copy over to this report, if necessar=
y,
but for now you can download them here:
https://bu=
gzilla.kernel.org/show_bug.cgi?id=3D204683

It also looks a bit like this bug:
https://bu=
gzilla.kernel.org/show_bug.cgi?id=3D201957 , because I also get the
"ring gfx timeout". And there are lots and lots of people having =
this issue.

I tried bisecting it, but failed, because either I missed the commit that
causes this, because there are multiple reasons why this happens or this re=
ally
goes way back to the time, where 4.18 was the base for drm-next (which does=
n't
compile on modern compilers anymore. Also steam doesn't want to run on those
old kernels, so even when I was able to compile an older kernel, there was =
no
way to test them)

I even tried debugging it over ethernet (KGDBoE is a nice thing if you need
performance), but somehow this slowed everything down enough to not trigger=
 the
bug.

I also tried the suggestions from
https://bugs.freedesktop.org/show_bug.=
cgi?id=3D109955, but forbidding the lowest
clock mode doesn't help either. (It fixes my RocketLeague problems, though).

Please advise what I should try next.

Best regards
Matthias


You are receiving this mail because:
  • You are the assignee for the bug.
= --15734648260.dAab4CB95.29609-- --===============1317989561== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1317989561==--