From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Mon, 26 Aug 2019 03:47:41 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0494438840==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id D871E6E16B for ; Mon, 26 Aug 2019 03:47:41 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0494438840== Content-Type: multipart/alternative; boundary="15667912613.C1c00.13995" Content-Transfer-Encoding: 7bit --15667912613.C1c00.13995 Date: Mon, 26 Aug 2019 03:47:41 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #121 from ReddestDream --- Some observations: 1. Nothing at all seems to be up with cur_speed and cur_width. They get set several times in a row in both runs, but the values are all the same in bot= h. 2. I can't really see anything up with msg/parameter either. When I compare them to each other nothing seems particularly wacky. And we also have an instance in my AMD+iGPU run where we see msg/parameter after "[drm] Initial= ized amdgpu", so the theory that all messages have to be sent before Initializat= ion is complete must be wrong. Now the real question is if we can decode what these msg/parameter values m= ean. But it looks more likely to me that vega20_hwmgr.c and vega20_ppt.c are just bugged somewhere (probably in the same way since they seem to be alternate versions of each) and that the rest of the amdgpu code is (relatively) fine. I'm thinking we'll have to go through and knock out/debug pretty much everything in those files until we figure out where the breakage is. That's about 3000-4000 lines of code in each of those two files tho. So any though= ts anyone has about where we should start would be helpful. My focus will prob= ably be on UCLK (since it seems to break first), SCLK (since it gets set to 0 MHz when there's multiple displays), DCEFCLK, and basically anything else that smells like it might control the memory clock and/or be affected by multiple monitors. Thanks! --=20 You are receiving this mail because: You are the assignee for the bug.= --15667912613.C1c00.13995 Date: Mon, 26 Aug 2019 03:47:41 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comm= ent # 121 on bug 11067= 4 from ReddestDream
Some observations:

1. Nothing at all seems to be up with cur_speed and cur_width. They get set
several times in a row in both runs, but the values are all the same in bot=
h.

2. I can't really see anything up with msg/parameter either. When I compare
them to each other nothing seems particularly wacky. And we also have an
instance in my AMD+iGPU run where we see msg/parameter after "[drm] In=
itialized
amdgpu", so the theory that all messages have to be sent before Initia=
lization
is complete must be wrong.

Now the real question is if we can decode what these msg/parameter values m=
ean.
But it looks more likely to me that vega20_hwmgr.c and vega20_ppt.c are just
bugged somewhere (probably in the same way since they seem to be alternate
versions of each) and that the rest of the amdgpu code is (relatively) fine.

I'm thinking we'll have to go through and knock out/debug pretty much
everything in those files until we figure out where the breakage is. That's
about 3000-4000 lines of code in each of those two files tho. So any though=
ts
anyone has about where we should start would be helpful. My focus will prob=
ably
be on UCLK (since it seems to break first), SCLK (since it gets set to 0 MHz
when there's multiple displays), DCEFCLK, and basically anything else that
smells like it might control the memory clock and/or be affected by multiple
monitors.

Thanks!


You are receiving this mail because:
  • You are the assignee for the bug.
= --15667912613.C1c00.13995-- --===============0494438840== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0494438840==--