From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
Date: Fri, 16 Aug 2019 23:19:25 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0725341565=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id BD2BF6E9BD
for ; Fri, 16 Aug 2019 23:19:25 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0725341565==
Content-Type: multipart/alternative; boundary="15659975654.7B625ca4E.3273"
Content-Transfer-Encoding: 7bit
--15659975654.7B625ca4E.3273
Date: Fri, 16 Aug 2019 23:19:25 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D110674
--- Comment #110 from ReddestDream ---
> 1. The functions in vega20_ppt.c are used with this new patch so that ans=
wers my question from earlier, that's what this file is for and why it cont=
ains similar/identical functions.
I was hoping this was the case as the duplicated functions were confusing me
too. Glad we got this figured out! :)
> I tried it, it didn't help the crashing issue and I was stuck at 30w. As =
soon as I started sddm the system froze. I've attached my dmesg from amdgpu=
.dpm=3D2 boot. It doesn't fix the issue but it does help answer a few quest=
ions I had:
This is disappointing tho. I was hoping that setting amdgpu.dpm=3D2 would u=
se the
more "actively developed" path and that would fix the issue. :/
> Given that two different versions of the code produce the same result, my=
hunch is that the problem is B. The card is not in a state where it's able=
to receive power changes.
I tend to agree, but it's still not clear why or how the card ends up in a =
bad
state when commands to it via smu_send_smc_msg_with_param seem to just sudd=
enly
stop working. And given the amount of same/similar functions in vega20_hwmg=
r.c
and vega20_ppt.c it's hard to rule out A entirely.
Since amdgpu.dpm=3D0 resolves the issue (albeit at the cost of being stuck =
at
minimum clocks inherited from the VBIOS/GOP/UEFI/firmware), it seems that t=
he
card is starting out in a reasonable state and then being thrown into a bad
state later by bad driver code. And that code is part of the DPM (Dynamic P=
ower
Management) system. We are pretty confident that dpm_state.hard_min_level is
stable the whole time, so that's probably not what's throwing the card into=
a
bad state. But perhaps another value in the DPM table is . . .=20
It doesn't make intuitive sense that the soft min/max values would be
problematic since they are presumably "more flexible," but it's possible th=
at
they get calculated out of spec or something and logging them should be
possible like how dpm_state.hard_min_level was logged.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15659975654.7B625ca4E.3273
Date: Fri, 16 Aug 2019 23:19:25 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comm=
ent # 110
on bug 11067=
4
from ReddestDream
> 1. The functions in vega20_ppt.c ar=
e used with this new patch so that answers my question from earlier, that's=
what this file is for and why it contains similar/identical functions.
I was hoping this was the case as the duplicated functions were confusing me
too. Glad we got this figured out! :)
> I tried it, it didn't help the crashing issue an=
d I was stuck at 30w. As soon as I started sddm the system froze. I've atta=
ched my dmesg from amdgpu.dpm=3D2 boot. It doesn't fix the issue but it doe=
s help answer a few questions I had:
This is disappointing tho. I was hoping that setting amdgpu.dpm=3D2 would u=
se the
more "actively developed" path and that would fix the issue. :/
> Given that two different versions of the code pr=
oduce the same result, my hunch is that the problem is B. The card is not i=
n a state where it's able to receive power changes.
I tend to agree, but it's still not clear why or how the card ends up in a =
bad
state when commands to it via smu_send_smc_msg_with_param seem to just sudd=
enly
stop working. And given the amount of same/similar functions in vega20_hwmg=
r.c
and vega20_ppt.c it's hard to rule out A entirely.
Since amdgpu.dpm=3D0 resolves the issue (albeit at the cost of being stuck =
at
minimum clocks inherited from the VBIOS/GOP/UEFI/firmware), it seems that t=
he
card is starting out in a reasonable state and then being thrown into a bad
state later by bad driver code. And that code is part of the DPM (Dynamic P=
ower
Management) system. We are pretty confident that dpm_state.hard_min_level is
stable the whole time, so that's probably not what's throwing the card into=
a
bad state. But perhaps another value in the DPM table is . . .=20
It doesn't make intuitive sense that the soft min/max values would be
problematic since they are presumably "more flexible," but it's p=
ossible that
they get calculated out of spec or something and logging them should be
possible like how dpm_state.hard_min_level was logged.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15659975654.7B625ca4E.3273--
--===============0725341565==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============0725341565==--