From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Mon, 12 Aug 2019 17:40:12 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0351146533==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 150C06E58F for ; Mon, 12 Aug 2019 17:40:12 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0351146533== Content-Type: multipart/alternative; boundary="15656316120.Bf4a.31562" Content-Transfer-Encoding: 7bit --15656316120.Bf4a.31562 Date: Mon, 12 Aug 2019 17:40:12 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #90 from Tom B --- I'm not sure this is helpful but I managed to somewhat test the race condit= ion theory. If you follow the callstack: vega20_set_fclk_to_highest_dpm_level -> smum_send_msg_to_smc_with_parameter= -> vega20_send_msg_to_smc_with_parameter -> vega20_wait_for_response -> phm_wait_for_register_unequal you find this code in smu_helper.c: int phm_wait_on_register(struct pp_hwmgr *hwmgr, uint32_t index, uint32_t value, uint32_t mask) { uint32_t i; uint32_t cur_value; if (hwmgr =3D=3D NULL || hwmgr->device =3D=3D NULL) { pr_err("Invalid Hardware Manager!"); return -EINVAL; } for (i =3D 0; i < hwmgr->usec_timeout; i++) { cur_value =3D cgs_read_register(hwmgr->device, index); if ((cur_value & mask) =3D=3D (value & mask)) break; udelay(1); } /* timeout means wrong logic*/ if (i =3D=3D hwmgr->usec_timeout) return -1; return 0; } The timeout there is interesting. I increased it. for (i =3D 0; i < hwmgr->usec_timeout*10; i++) { cur_value =3D cgs_read_register(hwmgr->device, index); if ((cur_value & mask) =3D=3D (value & mask)) break; udelay(1); } The PC takes significantly longer to boot (10 or so seconds when it's usual= ly instant) and the error still occurs. So I'm not sure it's just a matter of waiting. --=20 You are receiving this mail because: You are the assignee for the bug.= --15656316120.Bf4a.31562 Date: Mon, 12 Aug 2019 17:40:12 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 90 on bug 11067= 4 from Tom = B
I'm not sure this is helpful but I managed to somewhat test th=
e race condition
theory.

If you follow the callstack:

vega20_set_fclk_to_highest_dpm_level -> smum_send_msg_to_smc_with_parame=
ter ->
vega20_send_msg_to_smc_with_parameter -> vega20_wait_for_response ->
phm_wait_for_register_unequal you find this code in smu_helper.c:

int phm_wait_on_register(struct pp_hwmgr *hwmgr, uint32_t index,
                         uint32_t value, uint32_t mask)
{
        uint32_t i;
        uint32_t cur_value;

        if (hwmgr =3D=3D NULL || hwmgr->device =3D=3D NULL) {
                pr_err("Invalid Hardware Manager!");
                return -EINVAL;
        }

        for (i =3D 0; i < hwmgr->usec_timeout; i++) {
                cur_value =3D cgs_read_register(hwmgr->device, index);
                if ((cur_value & mask) =3D=3D (value & mask))
                        break;
                udelay(1);
        }

        /* timeout means wrong logic*/
        if (i =3D=3D hwmgr->usec_timeout)
                return -1;
        return 0;
}


The timeout there is interesting. I increased it.


for (i =3D 0; i < hwmgr->usec_timeout*10; i++) {
                cur_value =3D cgs_read_register(hwmgr->device, index);
                if ((cur_value & mask) =3D=3D (value & mask))
                        break;
                udelay(1);
        }


The PC takes significantly longer to boot (10 or so seconds when it's usual=
ly
instant) and the error still occurs. So I'm not sure it's just a matter of
waiting.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15656316120.Bf4a.31562-- --===============0351146533== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0351146533==--