From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Sun, 11 Aug 2019 01:15:48 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0464967178==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 64D426E478 for ; Sun, 11 Aug 2019 01:15:48 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0464967178== Content-Type: multipart/alternative; boundary="15654861483.3575Afd.22325" Content-Transfer-Encoding: 7bit --15654861483.3575Afd.22325 Date: Sun, 11 Aug 2019 01:15:48 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #69 from ReddestDream --- >The inconsistent nature of this bug and the fact that it sometimes doesn't= appear suggests a race condition. I'd assume something else on the system = happens before or after amdgpu is expecting. >Is there any way to delay loading the amdgpu driver and manually loading i= t after everything else? Based on all the data you (Tom B) and others have provided as well as my own tests, my current suspicion is that there is a bug in the display mode/type detection and enumeration, leading to the driver losing state consistency a= nd eventually contact entirely with the hardware. I think the clock dysregulation and excessive voltage/wattage are symptoms = of the underlying disease rather than the cause. If something is wrong between what the driver thinks the hardware state is and what the hardware state actually is, it's only a matter of time before this inconsistency leads to dysregulation, instability, and crashing. For this reason, I'm not convinced there is any better workaround than "just use one monitor." Pushing up the clocks only seems to at best prolong the inevitable. :( I'm also not convinced there is one commit in particular to point to here. Rather it was probably in the restructuring of something between 5.0 and 5.1 that it became fundamentally broken while it was always somewhat flawed bef= ore. Unfortunately, Radeon VII probably isn't really being tested by kernel developers anymore and it's likely that multimonitor with this card on Linux was never fully tested at all. It also seems like AMD's kernel development = has moved on to Navi and that the upcoming new Vega card, Arcturus, won't have display outs at all, so work on that can't fix this issue. As this card is fairly uncommon and expensive, the only real hope for a fix seems to be to get the card into the hands of someone who has the skill to = fix graphics drivers and a willingness/need to test multimonitor. Perhaps someone like gnif who has been able to solve the infamous Vega Reset Bug on Vega 10 cards might be able to fix it. It's likely he will encounter= our issue while testing Radeon VII with Looking Glass and such. Someone has alr= eady offered to lend him a Radeon VII as he states in the video, so there's some hope that his work will lead to a solution. https://www.youtube.com/watch?v=3D1ShkjXoG0O0 --=20 You are receiving this mail because: You are the assignee for the bug.= --15654861483.3575Afd.22325 Date: Sun, 11 Aug 2019 01:15:48 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 69 on bug 11067= 4 from ReddestDream
>The inconsistent nature of this bug =
and the fact that it sometimes doesn't appear suggests a race condition. I'=
d assume something else on the system happens before or after amdgpu is exp=
ecting.

>Is there any way to delay loading the amdgpu driv=
er and manually loading it after everything else?

Based on all the data you (Tom B) and others have provided as well as my own
tests, my current suspicion is that there is a bug in the display mode/type
detection and enumeration, leading to the driver losing state consistency a=
nd
eventually contact entirely with the hardware.

I think the clock dysregulation and excessive voltage/wattage are symptoms =
of
the underlying disease rather than the cause. If something is wrong between
what the driver thinks the hardware state is and what the hardware state
actually is, it's only a matter of time before this inconsistency leads to
dysregulation, instability, and crashing. For this reason, I'm not convinced
there is any better workaround than "just use one monitor." Pushi=
ng up the
clocks only seems to at best prolong the inevitable. :(

I'm also not convinced there is one commit in particular to point to here.
Rather it was probably in the restructuring of something between 5.0 and 5.1
that it became fundamentally broken while it was always somewhat flawed bef=
ore.

Unfortunately, Radeon VII probably isn't really being tested by kernel
developers anymore and it's likely that multimonitor with this card on Linux
was never fully tested at all. It also seems like AMD's kernel development =
has
moved on to Navi and that the upcoming new Vega card, Arcturus, won't have
display outs at all, so work on that can't fix this issue.

As this card is fairly uncommon and expensive, the only real hope for a fix
seems to be to get the card into the hands of someone who has the skill to =
fix
graphics drivers and a willingness/need to test multimonitor.

Perhaps someone like gnif who has been able to solve the infamous Vega Reset
Bug on Vega 10 cards might be able to fix it. It's likely he will encounter=
 our
issue while testing Radeon VII with Looking Glass and such. Someone has alr=
eady
offered to lend him a Radeon VII as he states in the video, so there's some
hope that his work will lead to a solution.

https://www.youtu=
be.com/watch?v=3D1ShkjXoG0O0


You are receiving this mail because:
  • You are the assignee for the bug.
= --15654861483.3575Afd.22325-- --===============0464967178== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0464967178==--