From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Sat, 17 Aug 2019 02:37:53 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0767161163==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 02BA76E9D1 for ; Sat, 17 Aug 2019 02:37:53 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0767161163== Content-Type: multipart/alternative; boundary="15660094720.7ECdb08.861" Content-Transfer-Encoding: 7bit --15660094720.7ECdb08.861 Date: Sat, 17 Aug 2019 02:37:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #113 from ReddestDream --- 4.=20 > Given that two different versions of the code produce the same result, my= hunch is that the problem is B. The card is not in a state where it's able= to receive power changes. Something to consider: In pretty much all the dmesg logs we see, amdgpu attempts to reset the GPU, sometimes successfully, and yet it still can't properly message the GPU afterward and we see the same sequence of failures starting with "amdgpu: [powerplay] Failed to send message 0x28, response 0x0 amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed!" Eventually we start to see: "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed = to initialize parser -125!" This comes from: https://github.com/torvalds/linux/commits/master/drivers/gpu/drm/amd/amdgpu= /amdgpu_cs.c I'm not sure what the -125 error code indicates. My guess is ECANCELED (Operation Cancelled) as the negated error code 125. https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/errn= o.h --=20 You are receiving this mail because: You are the assignee for the bug.= --15660094720.7ECdb08.861 Date: Sat, 17 Aug 2019 02:37:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comm= ent # 113 on bug 11067= 4 from ReddestDream
4.=20

> Given that two different versions of the code pr=
oduce the same result, my hunch is that the problem is B. The card is not i=
n a state where it's able to receive power changes.

Something to consider: In pretty much all the dmesg logs we see, amdgpu
attempts to reset the GPU, sometimes successfully, and yet it still can't
properly message the GPU afterward and we see the same sequence of failures
starting with "amdgpu: [powerplay] Failed to send message 0x28, respon=
se 0x0
amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed!&q=
uot;

Eventually we start to see: "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Fa=
iled to
initialize parser -125!"

This comes from:

https://github.com/torvalds/linux/commits/master/d=
rivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

I'm not sure what the -125 error code indicates. My guess is ECANCELED
(Operation Cancelled) as the negated error code 125.

https://github.com/torvalds/linux/blob/master/include/uapi/=
asm-generic/errno.h


You are receiving this mail because:
  • You are the assignee for the bug.
= --15660094720.7ECdb08.861-- --===============0767161163== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0767161163==--