public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen
@ 2017-07-25 21:54 Satoru Takeuchi
  2017-07-31 12:04 ` Alan Cox
  2017-08-11 10:07 ` Borislav Petkov
  0 siblings, 2 replies; 7+ messages in thread
From: Satoru Takeuchi @ 2017-07-25 21:54 UTC (permalink / raw)
  To: LKML; +Cc: x86

# I'm a LKML subscriber, but not a x86 list subscriber

I found the following new linux kernel bugzilla about Ryzen related problem.
Since many developers don't check this bugzilla and I've also
encountered this problem,
I decided to introduce this problem here.

https://bugzilla.kernel.org/show_bug.cgi?id=196481:
> I am running Ubuntu and installed the mainline kernel from the mainline PPA.
> It seems like the Ryzen processor has some bug that leads to gcc crashing
> when compiling a very large program under heavy load. This is easily reproduced
> in my system using the script from
>
> https://github.com/suaefar/ryzen-test
>
> (It assumes that you are running Ubuntu, maybe Debian also works. Just clone it and run the > script kill_ryzen.sh. It downloads the gcc 7.1 code and start multiple compilations of it. If any
> compilations fails its warns the user giving the time to detect failure).
>
> There is already a bug report about this in the FreeBSD bugzilla
> (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399#c89).
> There is also a thread on the subject in AMD community forum
> (https://community.amd.com/thread/215773?start=300&tstart=0)
> and Phoronix (https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads).
>
> This is probably a processor bug. But I thought that I should try to call the attention of
> the kernel developers to this issue as it may be possible to workaround it in the kernel.
>
> Obs: If I disable SMT in BIOS the problem gets much better moving from failures
> after a couple of minute to one failure in 3 to 4 hours)

What I want here is that this problem is known by many people,
especially by x86 experts,
asking the hint to find the root cause, and making the reliable
workaround patch.

Summary of this problem from my point of view:
- gcc sometimes fails with SEGV at random
- at least part of this problem is caused by running instructions at
"RIP - 0x40"
- tens of people encountered this problem
- probably it is a hardware problem: many OSes WSL, NetBSD, and
FreeBSD encountered the very similar problem. In addition, this
problem happens with ECC memory and  memtest86 clean memory
- the root cause is not found yet. AMD have seemed to try to find it
for several months, but there have been no update from AMD yet
- There are workaround patch in FreeBSD, but it's not sure that it's a
reliable one since the root cause is not sure

Fore more detail, please refer to the links at the above mentioned bugzilla.

Regards,
Satoru

^ permalink raw reply	[flat|nested] 7+ messages in thread
[parent not found: <u7g1Z-8nQ-33@gated-at.bofh.it>]

end of thread, other threads:[~2017-09-16 12:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-25 21:54 [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen Satoru Takeuchi
2017-07-31 12:04 ` Alan Cox
2017-07-31 12:22   ` Markus Trippelsdorf
2017-08-11 10:07 ` Borislav Petkov
2017-09-16 12:25   ` Satoru Takeuchi
     [not found] <u7g1Z-8nQ-33@gated-at.bofh.it>
2017-07-26  1:42 ` Andreas Hartmann
     [not found] ` <u9hGi-686-19@gated-at.bofh.it>
2017-07-31 17:00   ` Andreas Hartmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox