From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Sun, 11 Aug 2019 18:43:48 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1355727596==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 0FB606E393 for ; Sun, 11 Aug 2019 18:43:49 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1355727596== Content-Type: multipart/alternative; boundary="15655490290.dfE48e.31185" Content-Transfer-Encoding: 7bit --15655490290.dfE48e.31185 Date: Sun, 11 Aug 2019 18:43:48 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #72 from Tom B --- > The nasty displayport mst thingy? I would always set this to false. I don't believe mst is being used here, it's two monitors both with separate cables. Here's some additional investigation. [SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by: PP_ASSERT_WITH_CODE(!(ret =3D smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinByFreq, (PPCLK_UCLK << 16 ) | dpm_table->dpm_state.hard_min_level)), "[SetUclkToHightestDpmLevel] Set hard min u= clk failed!", return ret); hard_min_level is adjusted if disable_mclk_switching is set on line 3497. disable_mclk_switching =3D ((1 < hwmgr->display_config->num_display= ) && !hwmgr->display_config->multi_monitor_in_sync) || vblank_too_short; /* Hardmin is dependent on displayconfig */ if (disable_mclk_switching) { dpm_table->dpm_state.hard_min_level =3D dpm_table->dpm_levels[dpm_table->count - 1].value; for (i =3D 0; i < data->mclk_latency_table.count - 1; i++) { if (data->mclk_latency_table.entries[i].latency <=3D latency) { if (dpm_table->dpm_levels[i].value >=3D (hwmgr->display_config->min_mem_set_clock / 100)) { dpm_table->dpm_state.hard_min_level= =3D dpm_table->dpm_levels[i].value; break; } } } } Interestingly, this also checks for the presence of multiple displays so we= at least have a connection between the code, error message and cause of the bug (multiple displays). As a very crude test, I tried forcing it on and compil= ing with disable_mclk_switching =3D true; No difference, so I also tried: disable_mclk_switching =3D false; Again, it didn't help. I will note that this code is identical in 5.0.13 so= my test was really only checking for an incorrect value being set elsewhere in hwmgr->display_config->multi_monitor_in_sync or=20 hwmgr->display_config->num_display. In 5.0.13 I do get mclk boosting, It id= les at 351mhz and boosts to 1001mhz so I don't think that forcing the memory to= max clock all the time is the correct solution. I also diff'd vega20_hwmgr.c from 5.0.13 and 5.2.7 (I'll attach it). Here'= s a few things I noticed: in vega20_init_smc_table, this line has been added in this commit https://github.com/torvalds/linux/commit/f5e79735cab448981e245a41ee6cbebf0e= 334f61 :=20 + data->vbios_boot_state.fclock =3D boot_up_values.ulFClk; I don't know what fclock is, but this was never set in 5.0.13. in vega20_setup_default_dpm_tables: @@ -710,8 +729,10 @@ static int vega20_setup_default_dpm_tables(struct pp_h= wmgr *hwmgr) PP_ASSERT_WITH_CODE(!ret, "[SetupDefaultDpmTable] failed to get fclk = dpm levels!", return ret); - } else - dpm_table->count =3D 0; + } else { + dpm_table->count =3D 1; + dpm_table->dpm_levels[0].value =3D data->vbios_boot_state.f= clock / 100; + } in 5.0.13, dpm_table->count is set to 0, in 5.2.7 it's set and a dpm_level added based on fclock. fclock appears throughout as a new addition. I don't think this is the cause, but the addition of fclock may be worth exploring. --=20 You are receiving this mail because: You are the assignee for the bug.= --15655490290.dfE48e.31185 Date: Sun, 11 Aug 2019 18:43:49 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 72 on bug 11067= 4 from Tom = B
> The nasty displayport mst thingy? I=
 would always set this to false.

I don't believe mst is being used here, it's two monitors both with separate
cables.


Here's some additional investigation.

[SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the
first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by:


                PP_ASSERT_WITH_CODE(!(ret =3D
smum_send_msg_to_smc_with_parameter(hwmgr,
                                PPSMC_MSG_SetHardMinByFreq,
                                (PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
                                "[SetUclkToHightestDpmLevel] Set hard =
min uclk
failed!",
                                return ret);




hard_min_level is adjusted if disable_mclk_switching is set on line 3497.


        disable_mclk_switching =3D ((1 < hwmgr->display_config->nu=
m_display) &&
                           !hwmgr->display_config->multi_monitor_in_s=
ync) ||
                            vblank_too_short;


        /* Hardmin is dependent on displayconfig */
        if (disable_mclk_switching) {
                dpm_table->dpm_state.hard_min_level =3D
dpm_table->dpm_levels[dpm_table->count - 1].value;
                for (i =3D 0; i < data->mclk_latency_table.count - 1;=
 i++) {
                        if (data->mclk_latency_table.entries[i].latency =
<=3D
latency) {
                                if (dpm_table->dpm_levels[i].value >=
=3D
(hwmgr->display_config->min_mem_set_clock / 100)) {
                                        dpm_table->dpm_state.hard_min_le=
vel =3D
dpm_table->dpm_levels[i].value;
                                        break;
                                }
                        }
                }
        }


Interestingly, this also checks for the presence of multiple displays so we=
 at
least have a connection between the code, error message and cause of the bug
(multiple displays). As a very crude test, I tried forcing it on and compil=
ing
with

disable_mclk_switching =3D true;

No difference, so I also tried:

disable_mclk_switching =3D false;

Again, it didn't help. I will note that this code is identical in 5.0.13 so=
 my
test was really only checking for an incorrect value being set elsewhere in
hwmgr->display_config->multi_monitor_in_sync or=20
hwmgr->display_config->num_display. In 5.0.13 I do get mclk boosting,=
 It idles
at 351mhz and boosts to 1001mhz so I don't think that forcing the memory to=
 max
clock all the time is the correct solution.


I also diff'd vega20_hwmgr.c from 5.0.13 and 5.2.7  (I'll attach it). Here'=
s a
few things I noticed:


in vega20_init_smc_table, this line has been added in this commit
https://github.com/torvalds/linux/commit/f5e79735cab4489=
81e245a41ee6cbebf0e334f61
:=20

+       data->vbios_boot_state.fclock =3D boot_up_values.ulFClk;

I don't know what fclock is, but this was never set in 5.0.13.


in vega20_setup_default_dpm_tables:

@@ -710,8 +729,10 @@ static int vega20_setup_default_dpm_ta=
bles(struct pp_hwmgr
*hwmgr)
                PP_ASSERT_WITH_CODE(!ret,
                                "[SetupDefaultDpmTable] failed to get =
fclk dpm
levels!",
                                return ret);
-       } else
-               dpm_table->count =3D 0;
+       } else {
+               dpm_table->count =3D 1;
+               dpm_table->dpm_levels[0].value =3D data->vbios_boot_s=
tate.fclock
/ 100;
+       }


in 5.0.13, dpm_table->count is set to 0, in 5.2.7 it's set and a dpm_lev=
el
added based on fclock. fclock appears throughout as a new addition. I don't
think this is the cause, but the addition of fclock may be worth exploring.=


You are receiving this mail because:
  • You are the assignee for the bug.
= --15655490290.dfE48e.31185-- --===============1355727596== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1355727596==--