From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
Date: Mon, 12 Aug 2019 18:37:40 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1000069588=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id 4226D6E5BE
for ; Mon, 12 Aug 2019 18:37:40 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1000069588==
Content-Type: multipart/alternative; boundary="15656350603.B7cD.7919"
Content-Transfer-Encoding: 7bit
--15656350603.B7cD.7919
Date: Mon, 12 Aug 2019 18:37:40 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D110674
--- Comment #91 from ReddestDream ---
>It returns 0 on success and -EIO on failure, which is then in turn returne=
d from vega20_set_fclk_to_highest_dpm_leve. Where did you see the check/ret=
ry on EINVAL? Perhaps -EIO should be -EINVAL?
I didn't find check/retry code. It was more just a thought that maybe we co=
uld
keep vega20_set_uclk_to_highest_dpm_level from just returning despite the e=
rror
and allowing further initialization to proceed. Even if it crashed, that mi=
ght
be even be helpful since it's not clear if it's the initialization
(drm_dev_register) or something else that is silent in the logs that is
changing something and causing vega20_set_uclk_to_highest_dpm_level to fail
where we know it succeeded so many times before.
>I'm not sure this is helpful but I managed to somewhat test the race condi=
tion theory.
If there is a race, I'm not sure it's in the time the driver waits for the
hardware registers to respond and/or the value to set. But it's still
enlightening.
At this point it seems more likely that something else we aren't seeing in =
the
logs is breaking vega20_set_uclk_to_highest_dpm_level in the last moments
(unlikely due to the dpm_state.hard_min_level value), it falls through and
drm_dev_register runs and initialization message prints. amdgpu doesn't
consider the "[SetUclkToHightestDpmLevel] Set hard min uclk failed!" to be a
significant enough error to stop initialization. But maybe it should . . .
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15656350603.B7cD.7919
Date: Mon, 12 Aug 2019 18:37:40 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comme=
nt # 91
on bug 11067=
4
from ReddestDream
>It returns 0 on success and -EIO on =
failure, which is then in turn returned from vega20_set_fclk_to_highest_dpm=
_leve. Where did you see the check/retry on EINVAL? Perhaps -EIO should be =
-EINVAL?
I didn't find check/retry code. It was more just a thought that maybe we co=
uld
keep vega20_set_uclk_to_highest_dpm_level from just returning despite the e=
rror
and allowing further initialization to proceed. Even if it crashed, that mi=
ght
be even be helpful since it's not clear if it's the initialization
(drm_dev_register) or something else that is silent in the logs that is
changing something and causing vega20_set_uclk_to_highest_dpm_level to fail
where we know it succeeded so many times before.
>I'm not sure this is helpful but I managed to som=
ewhat test the race condition theory.
If there is a race, I'm not sure it's in the time the driver waits for the
hardware registers to respond and/or the value to set. But it's still
enlightening.
At this point it seems more likely that something else we aren't seeing in =
the
logs is breaking vega20_set_uclk_to_highest_dpm_level in the last moments
(unlikely due to the dpm_state.hard_min_level value), it falls through and
drm_dev_register runs and initialization message prints. amdgpu doesn't
consider the "[SetUclkToHightestDpmLevel] Set hard min uclk failed!&qu=
ot; to be a
significant enough error to stop initialization. But maybe it should . . .<=
/pre>
You are receiving this mail because:
- You are the assignee for the bug.
=
--15656350603.B7cD.7919--
--===============1000069588==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============1000069588==--