From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
Date: Mon, 12 Aug 2019 14:34:52 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1632874178=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id C9D0F89AD2
for ; Mon, 12 Aug 2019 14:34:52 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1632874178==
Content-Type: multipart/alternative; boundary="15656204922.b67c4D0be.2646"
Content-Transfer-Encoding: 7bit
--15656204922.b67c4D0be.2646
Date: Mon, 12 Aug 2019 14:34:52 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D110674
--- Comment #81 from Tom B ---
Created attachment 145038
--> https://bugs.freedesktop.org/attachment.cgi?id=3D145038&action=3Dedit
5.2.7 dmesg with hard_min_level logged
As mentioned in the previous post, I started logging the value of
hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level
would be called so many times.
Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
5.2.7 so the issue is not the value from the dpm table. The dpm table is
probably correct. Something prevents smum_send_msg_to_smc_with_parameter
accepting the value.
However, what is interesting is that it doesn't always fail.
[ 4.082105] amdgpu: [powerplay] hard_min_level: 1001
[ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
Each hard_min_level line in the log is from
vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, wh=
ich
don't fail, before the card is initialised.
This is from 5.2.7:
[ 3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 4.082105] amdgpu: [powerplay] hard_min_level: 1001
[ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[ 5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
And the same from 5.0.13:
[ 3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 3.722422] amdgpu: [powerplay] hard_min_level: 1001
[ 3.766269] amdgpu: [powerplay] hard_min_level: 1001
[ 4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0
There are a couple of things here:
1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring v=
ce2"
line and "Initialized"
2. My patched code looks like this:
pr_err("hard_min_level: %d\n",
dpm_table->dpm_state.hard_min_level=
);
PP_ASSERT_WITH_CODE(!(ret =3D
smum_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_SetHardMinByFreq,
(PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
"[SetUclkToHightestDpmLevel] Set hard min u=
clk
failed!",
return ret);
Yet the log shows:
- My debug line=20
- Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0
- [SetUclkToHightestDpmLevel] Set hard min uclk failed!
So initialization is happening between (and possibly a result of) sending t=
he
message and getting the response.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15656204922.b67c4D0be.2646
Date: Mon, 12 Aug 2019 14:34:52 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comme=
nt # 81
on bug 11067=
4
from Tom =
B
Created a=
ttachment 145038 [details]
5.2.7 dmesg with hard_min_level logged
As mentioned in the previous post, I started logging the value of
hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level
would be called so many times.
Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
5.2.7 so the issue is not the value from the dpm table. The dpm table is
probably correct. Something prevents smum_send_msg_to_smc_with_parameter
accepting the value.
However, what is interesting is that it doesn't always fail.
[ 4.082105] amdgpu: [powerplay] hard_min_level: 1001
[ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
Each hard_min_level line in the log is from
vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, wh=
ich
don't fail, before the card is initialised.
This is from 5.2.7:
[ 3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 4.082105] amdgpu: [powerplay] hard_min_level: 1001
[ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[ 5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
And the same from 5.0.13:
[ 3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 3.722422] amdgpu: [powerplay] hard_min_level: 1001
[ 3.766269] amdgpu: [powerplay] hard_min_level: 1001
[ 4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0
There are a couple of things here:
1. vega20_set_fclk_to_highest_dpm_level is called twice between the "r=
ing vce2"
line and "Initialized"
2. My patched code looks like this:
pr_err("hard_min_level: %d\n",
dpm_table->dpm_state.hard_min_le=
vel);
PP_ASSERT_WITH_CODE(!(ret =3D
smum_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_SetHardMinByFreq,
(PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
"[SetUclkToHightestDpmLevel] Set hard =
min uclk
failed!",
return ret);
Yet the log shows:
- My debug line=20
- Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0
- [SetUclkToHightestDpmLevel] Set hard min uclk failed!
So initialization is happening between (and possibly a result of) sending t=
he
message and getting the response.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15656204922.b67c4D0be.2646--
--===============1632874178==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============1632874178==--