From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working. Date: Sun, 10 Mar 2019 09:37:56 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0509523475==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id AEF7089CF4 for ; Sun, 10 Mar 2019 09:37:56 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0509523475== Content-Type: multipart/alternative; boundary="15522106765.80b1B.21602" Content-Transfer-Encoding: 7bit --15522106765.80b1B.21602 Date: Sun, 10 Mar 2019 09:37:56 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 --- Comment #75 from Allan --- Well, after a long time I'm here again to tell what happened: A very nice AMD staff was following me up because of the CPU, and it ended = up solving the problems I had with the video card (seems like). 1. Regarding the kernel timing (In reply to fin4478 from comment #52) > To prevent random kernel lock ups with Ryzen, fix this with bios, set to > Typical Current Idle in the bios Advanced/AMD CBS menu. >=20 > Use latest AMD wip kernel and Oibaf ppa Mesa. Disable display composting = and > vsync with Xfce. Use 300Hz kernel timer. >=20 > Working kernel config file for my system as attachment. Yes, I tried it a lot, believe me, all combinations possible, 300hz, 250hz, 1000hz, your config, linux-firmware drivers. At least 10 attempts with variations of your config, including a pure one only activating dmcrypt tha= t is not enabled in yours. 2. Regarding the PSU profile As already said by fin4478 and requested by AMD, I requested to BIOSTAR a b= ios that allowed to change it. They sent me a beta version to test it. No luck at all, nothing related. 3. The madness Nothing worked, but the CPU was already ok. The mobo was already ok, the vi= deo card was hunging sometimes, even while on Windows now. Ok, I made a shot in the dark suspecting of some nonsense incompatibility of the ram. And this is it. Even after sending it to the warranty, even after making 10= 0+ tests, the ram was the issue. Was a Corsair Vengeance one : 2x4GB DDR4 CL15, 2133MHz SPD (JEDEC), 3000MHz XMP2. Even at JEDEC specifications it caused the system to fail. Even if I delayed the latencies by much it was causing it. It was what was causing the amdgpu driver to fail. Along with any heavy application. Since the RAM is used before sending things to VRAM, makes sen= se to the driver/device to process something unexpected. I warn everyone that uses Corsair memories, specially if they don't have th= eir "Ryzen ready" merchandise. Even though there's a standard called JEDEC, they simply don't implement it very well. It was the reason why sometimes I could use the system for 1-2 hours, and sometimes not even 5 minutes before crashing. There is some kind of instabi= lity there. I sold it to a guy that uses an 8700k or something, exaplined the situation= , he agreed. Until now (more than 2 months) there is not a single issue related = to the memory chips. They must have done somthing to optimize for Intel beyond the XMP profile and compromised the entire project. Along with 1 year of my life and a bunch of money spent. But, the fixes along time in amdgpu indeed was proven to be useful, so it w= as not only a ram's fault. Because using the same ram chips, I had a lot less problems compared to when I reported this problem. Now I'm using a G.Skill Tridentz 3200MHz @ 2666MHz that is the speed assure= d by AMD that the 1800X must work with. Stable without a single problem related = to it. 4. To confirm that I have won the raffle of a not working system my RX480 d= ied a month ago probably because of a BGA problem. Then I found a label in the card, looked for it, and discovered that a sell= ed sold me a refurbished product as new. Then I'm evaluating if I'll sue him or just fix the card. And I told about it because this is why I can't test it again until I get another amd card. I'm using the nvidia that I couldn't sell in the meantime. 5. The funny part. The nvidia driver that seemed to be a lot stable at first, started to fail = like hell after replacing the truly problematic CPU. And the amdgpu driver started to be more stable, more than any other driver from linux or windows. Well, I think that this is it. I'll return when I'm able to test amdgpu aga= in. But the veredict for now is : I tested the RX480 without a single problem while using amdgpu. Not used intensively, just common tests and played a little bit of Left for Dead 2 without any issue (good point, it always crashed). The card showed the BGA problem when using a variation of the Adrenalin dri= ver for windows, when I was doing some verifications requested by AMD. Cheers for all. Prefer G. Skill instead of Corsair. --=20 You are receiving this mail because: You are the assignee for the bug.= --15522106765.80b1B.21602 Date: Sun, 10 Mar 2019 09:37:56 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 75 on bug 10573= 3 from <= span class=3D"fn">Allan
Well, after a long time I'm here again to tell what happened:

A very nice AMD staff was following me up because of the CPU, and it ended =
up
solving the problems I had with the video card (seems like).


1. Regarding the kernel timing
(In reply to fin4478 from comment =
#52)
> To prevent random kernel lock ups with Ryzen, fi=
x this with bios, set to
> Typical Current Idle  in the bios Advanced/AMD CBS menu.
>=20
> Use latest AMD wip kernel and Oibaf ppa Mesa. Disable display composti=
ng and
> vsync with Xfce. Use 300Hz kernel timer.
>=20
> Working kernel config file for my system as attachment.

Yes, I tried it a lot, believe me, all combinations possible, 300hz, 250hz,
1000hz, your config, linux-firmware drivers. At least 10 attempts with
variations of your config, including a pure one only activating dmcrypt tha=
t is
not enabled in yours.

2. Regarding the PSU profile
As already said by fin4478 and requested by AMD, I requested to BIOSTAR a b=
ios
that allowed to change it. They sent me a beta version to test it.

No luck at all, nothing related.

3. The madness
Nothing worked, but the CPU was already ok. The mobo was already ok, the vi=
deo
card was hunging sometimes, even while on Windows now.

Ok, I made a shot in the dark suspecting of some nonsense incompatibility of
the ram.

And this is it. Even after sending it to the warranty, even after making 10=
0+
tests, the ram was the issue.

Was a Corsair Vengeance one : 2x4GB DDR4 CL15, 2133MHz SPD (JEDEC), 3000MHz
XMP2.

Even at JEDEC specifications it caused the system to fail.

Even if I delayed the latencies by much it was causing it.

It was what was causing the amdgpu driver to fail. Along with any heavy
application. Since the RAM is used before sending things to VRAM, makes sen=
se
to the driver/device to process something unexpected.

I warn everyone that uses Corsair memories, specially if they don't have th=
eir
"Ryzen ready" merchandise. Even though there's a standard called =
JEDEC, they
simply don't implement it very well.

It was the reason why sometimes I could use the system for 1-2 hours, and
sometimes not even 5 minutes before crashing. There is some kind of instabi=
lity
there.

I sold it to a guy that uses an 8700k or something, exaplined the situation=
, he
agreed. Until now (more than 2 months) there is not a single issue related =
to
the memory chips. They must have  done somthing to optimize for Intel beyond
the XMP profile and compromised the entire project. Along with 1 year of my
life and a bunch of money spent.

But, the fixes along time in amdgpu indeed was proven to be useful, so it w=
as
not only a ram's fault. Because using the same ram chips, I had a lot less
problems compared to when I reported this problem.

Now I'm using a G.Skill Tridentz 3200MHz @ 2666MHz that is the speed as=
sured by
AMD that the 1800X must work with. Stable without a single problem related =
to
it.

4. To confirm that I have won the raffle of a not working system my RX480 d=
ied
a month ago probably because of a BGA problem.

Then I found a label in the card, looked for it, and discovered that a sell=
ed
sold me a refurbished product as new.

Then I'm evaluating if I'll sue him or just fix the card.

And I told about it because this is why I can't test it again until I get
another amd card. I'm using the nvidia that I couldn't sell in the meantime.

5. The funny part.

The nvidia driver that seemed to be a lot stable at first, started to fail =
like
hell after replacing the truly problematic CPU.

And the amdgpu driver started to be more stable, more than any other driver
from linux or windows.


Well, I think that this is it. I'll return when I'm able to test amdgpu aga=
in.

But the veredict for now is :

I tested the RX480 without a single problem while using amdgpu. Not used
intensively, just common tests and played a little bit of Left for Dead 2
without any issue (good point, it always crashed).

The card showed the BGA problem when using a variation of the Adrenalin dri=
ver
for windows, when I was doing some verifications requested by AMD.

Cheers for all.
Prefer G. Skill instead of Corsair.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15522106765.80b1B.21602-- --===============0509523475== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0509523475==--