* Current state of the Linux kernel on SPARC
@ 2025-08-24 21:09 John Paul Adrian Glaubitz
2025-08-26 16:08 ` John Paul Adrian Glaubitz
0 siblings, 1 reply; 2+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-08-24 21:09 UTC (permalink / raw)
To: sparclinux
Cc: <debian-sparc@lists.debian.org>, gentoo-sparc,
Michael Karcher, Anthony Yznaga
Hello,
since there has been a lot of recent activity around the Linux kernel on SPARC
and there are also a lot of issues to be dealt with and unmerged patches, I have
decided to summarize the current state of the Linux kernel on SPARC to bring
anyone interested up to date.
First, let's start with the bugs. For a while it has been known that recent kernel
versions can be very unstable on certain SPARC machines, this has been observed in
particular with UltraSPARC III CPUs but also on certain newer CPUs such as SPARC T1.
After I started bisecting the issue, I ran into multiple false positives until I
identified d53d2f78cead as the culprit which makes use of a new vmalloc flag called
VM_FLUSH_RESET_PERMS.
However, this particular change is actually not broken but rather just uncovered the
original bug. The introduction of VM_FLUSH_RESET_PERMS allowed the kernel to flush
TLBs earlier after booting and more often. And since the original problem was suspected
with the TLB flush management on SPARC, it was just natural that the change in d53d2f78cead
turned out in the bisect.
Further investigation showed that the actual culprit are the CPU-specific implementations
of copy_{from,to}_user which can be found in arch/sparc/lib. These are broken on different
CPU types to a different degree which explains perfectly fine why recent kernels are more
unstable on certain CPU types than on others.
Luckily, Michael Karcher has already made good progress in investigating and fixing these
bugs so that, for example, a trial patch for the UltraSPARC III showed that a simple
one-line change would fix all the stability issues currently seen on these CPU types.
It is expected that a series of patches will follow shortly that will address the bugs in
the copy_{from,to}_user on all affected CPU types. In the mean time, it should be possible
to switch the kernel to the generic code for copy_{from,to}_user that can be found in the
same source directory to get a stable system on any CPU type.
Another issue that was discovered was that support for HugeTLB was broken on sun4u. A patch
addressing the problem has been posted by Anthony Yznaga in [1]. Additional pending patches
fix the error handling in the scan_one_device() [2] and switch sparc64 to the generic vDSO
library [3].
Once the stability issues have been fixed, the focus should be on upstreaming feature patches
that Oracle engineers developed but never sent in for review. These can be found in Oracle's
Github repository for the UEK kernel in the uek4/qu7 branch [4].
These feature patches include useful additions such as support for kexec [5], 5-level page
table support [6], EFI support for newer servers [7], support for SPARC M8 [8], fixed for
SPARC M7 [9] and even support for running the Linux kernel as a primary LDOM [10] and many
other improvements.
So, there is definitely a lot of work to be done on Linux for SPARC so that we're going to
be busy for some more years. Hopefully, some folk from Oracle can step in and help upstreaming
some of the patches of Oracle's UEK kernel. Primary domain support in particular would be
very nice to have on Linux as this would allow creating logical domains on sun4v hardware
without having to install Solaris.
Cheers,
Adrian
> [1] https://lore.kernel.org/all/20250716012446.10357-1-anthony.yznaga@oracle.com/
> [2] https://lore.kernel.org/all/20250718093205.3403010-1-make24@iscas.ac.cn/
> [3] https://lore.kernel.org/all/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347@linutronix.de/
> [4] https://github.com/oracle/linux-uek/tree/uek4/qu7/
> [5] https://github.com/oracle/linux-uek/commit/6fa4477f7e671b4882517a0862d3ee3f65ff4bde (there are multiple patches for kexec)
> [6] https://github.com/oracle/linux-uek/commit/9783abbe2d19da0d36a2b1caa4b15d965ee68384
> [7] https://github.com/oracle/linux-uek/commit/127ca6582a90567ded4fa6168c1582d2d5ac37f0
> [8] https://github.com/oracle/linux-uek/commit/5fe100ac31a6f977ebb64ce4eea7b0e3de7dbe04
> [9] https://github.com/oracle/linux-uek/commit/efcafbab1b123d615c1f2683c98fccc5ccee1527
> [10] https://github.com/oracle/linux-uek/commit/6c87154b63230bc5e35c5df133e7ecfadf47b828
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Current state of the Linux kernel on SPARC
2025-08-24 21:09 Current state of the Linux kernel on SPARC John Paul Adrian Glaubitz
@ 2025-08-26 16:08 ` John Paul Adrian Glaubitz
0 siblings, 0 replies; 2+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-08-26 16:08 UTC (permalink / raw)
To: sparclinux
Cc: <debian-sparc@lists.debian.org>, gentoo-sparc,
Michael Karcher, Anthony Yznaga
Hello,
On Sun, 2025-08-24 at 23:09 +0200, John Paul Adrian Glaubitz wrote:
> First, let's start with the bugs. For a while it has been known that recent kernel
> versions can be very unstable on certain SPARC machines, this has been observed in
> particular with UltraSPARC III CPUs but also on certain newer CPUs such as SPARC T1.
>
> After I started bisecting the issue, I ran into multiple false positives until I
> identified d53d2f78cead as the culprit which makes use of a new vmalloc flag called
> VM_FLUSH_RESET_PERMS.
>
> However, this particular change is actually not broken but rather just uncovered the
> original bug. The introduction of VM_FLUSH_RESET_PERMS allowed the kernel to flush
> TLBs earlier after booting and more often. And since the original problem was suspected
> with the TLB flush management on SPARC, it was just natural that the change in d53d2f78cead
> turned out in the bisect.
>
> Further investigation showed that the actual culprit are the CPU-specific implementations
> of copy_{from,to}_user which can be found in arch/sparc/lib. These are broken on different
> CPU types to a different degree which explains perfectly fine why recent kernels are more
> unstable on certain CPU types than on others.
>
> Luckily, Michael Karcher has already made good progress in investigating and fixing these
> bugs so that, for example, a trial patch for the UltraSPARC III showed that a simple
> one-line change would fix all the stability issues currently seen on these CPU types.
>
> It is expected that a series of patches will follow shortly that will address the bugs in
> the copy_{from,to}_user on all affected CPU types. In the mean time, it should be possible
> to switch the kernel to the generic code for copy_{from,to}_user that can be found in the
> same source directory to get a stable system on any CPU type.
A patch series to address these bugs was just posted:
https://lore.kernel.org/all/20250826160312.2070-1-kernel@mkarcher.dialup.fu-berlin.de/
Please test and report back!
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-08-26 16:08 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24 21:09 Current state of the Linux kernel on SPARC John Paul Adrian Glaubitz
2025-08-26 16:08 ` John Paul Adrian Glaubitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).