From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Mon, 12 Aug 2019 14:34:52 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1632874178==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id C9D0F89AD2 for ; Mon, 12 Aug 2019 14:34:52 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1632874178== Content-Type: multipart/alternative; boundary="15656204922.b67c4D0be.2646" Content-Transfer-Encoding: 7bit --15656204922.b67c4D0be.2646 Date: Mon, 12 Aug 2019 14:34:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #81 from Tom B --- Created attachment 145038 --> https://bugs.freedesktop.org/attachment.cgi?id=3D145038&action=3Dedit 5.2.7 dmesg with hard_min_level logged As mentioned in the previous post, I started logging the value of hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level would be called so many times. Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and 5.2.7 so the issue is not the value from the dpm table. The dpm table is probably correct. Something prevents smum_send_msg_to_smc_with_parameter accepting the value. However, what is interesting is that it doesn't always fail. [ 4.082105] amdgpu: [powerplay] hard_min_level: 1001 [ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0 [ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0 [ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed! Each hard_min_level line in the log is from vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, wh= ich don't fail, before the card is initialised. This is from 5.2.7: [ 3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1 [ 4.082105] amdgpu: [powerplay] hard_min_level: 1001 [ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0 [ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0 [ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed! [ 5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0 And the same from 5.0.13: [ 3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1 [ 3.722422] amdgpu: [powerplay] hard_min_level: 1001 [ 3.766269] amdgpu: [powerplay] hard_min_level: 1001 [ 4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on minor 0 There are a couple of things here: 1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring v= ce2" line and "Initialized" 2. My patched code looks like this: pr_err("hard_min_level: %d\n", dpm_table->dpm_state.hard_min_level= ); PP_ASSERT_WITH_CODE(!(ret =3D smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinByFreq, (PPCLK_UCLK << 16 ) | dpm_table->dpm_state.hard_min_level)), "[SetUclkToHightestDpmLevel] Set hard min u= clk failed!", return ret); Yet the log shows: - My debug line=20 - Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0 - [SetUclkToHightestDpmLevel] Set hard min uclk failed! So initialization is happening between (and possibly a result of) sending t= he message and getting the response. --=20 You are receiving this mail because: You are the assignee for the bug.= --15656204922.b67c4D0be.2646 Date: Mon, 12 Aug 2019 14:34:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 81 on bug 11067= 4 from Tom = B
Created a=
ttachment 145038 [details]
5.2.7 dmesg with hard_min_level logged

As mentioned in the previous post, I started logging the value of
hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level
would be called so many times.

Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
5.2.7 so the issue is not the value from the dpm table. The dpm table is
probably correct. Something prevents smum_send_msg_to_smc_with_parameter
accepting the value.

However, what is interesting is that it doesn't always fail.


[    4.082105] amdgpu: [powerplay] hard_min_level: 1001
[    4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!





Each hard_min_level line in the log is from
vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, wh=
ich
don't fail, before the card is initialised.


This is from 5.2.7:

[    3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    4.082105] amdgpu: [powerplay] hard_min_level: 1001
[    4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[    5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0


And the same from 5.0.13:

[    3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    3.722422] amdgpu: [powerplay] hard_min_level: 1001
[    3.766269] amdgpu: [powerplay] hard_min_level: 1001
[    4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0


There are a couple of things here:

1. vega20_set_fclk_to_highest_dpm_level is called twice between the "r=
ing vce2"
line and "Initialized"

2. My patched code looks like this:

                pr_err("hard_min_level: %d\n",
                                        dpm_table->dpm_state.hard_min_le=
vel);

                PP_ASSERT_WITH_CODE(!(ret =3D
smum_send_msg_to_smc_with_parameter(hwmgr,
                                PPSMC_MSG_SetHardMinByFreq,
                                (PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
                                "[SetUclkToHightestDpmLevel] Set hard =
min uclk
failed!",
                                return ret);

Yet the log shows:

- My debug line=20
- Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0
- [SetUclkToHightestDpmLevel] Set hard min uclk failed!

So initialization is happening between (and possibly a result of) sending t=
he
message and getting the response.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15656204922.b67c4D0be.2646-- --===============1632874178== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1632874178==--