From: Thomas Gleixner <tglx@kernel.org>
To: Mathias Stearn <mathias@mongodb.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Chris Kennelly <ckennelly@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
regressions@lists.linux.dev, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Ingo Molnar <mingo@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Jinjie Ruan <ruanjinjie@huawei.com>,
Blake Oler <blake.oler@mongodb.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
Date: Thu, 23 Apr 2026 19:19:41 +0200 [thread overview]
Message-ID: <87cxzp1tn6.ffs@tglx> (raw)
In-Reply-To: <CAHnCjA07ER=9xBqXq4jf5yn052W-E9ZXD86HxnRGtai6bpXbbQ@mail.gmail.com>
On Thu, Apr 23 2026 at 14:11, Mathias Stearn wrote:
Cc+ Linus
> Of course, even if we make that change, it will only apply to _future_
> binaries. That's why we prefer a kernel fix so that users will be able
> to run our existing releases (or any containers that use them) on a
> modern kernel.
I understand that and as everyone else I would be happy to do that, but
the price everyone pays for proliferating the tcmalloc insanity is not
cheap either.
So let me recap the whole situation and how we got there:
1) The original RSEQ implementation updates the rseq::cpu_id_start
field in user space more or less unconditionally on every exit to
user, whether the CPU/MMCID have been changed or not.
That went unnoticed for years because nothing used rseq aside of
google and tcmalloc. Once glibc registered rseq, this resulted in a
up to 15% performance penalty for syscall heavy workloads.
2) The rseq::cpu_id_start field is documented as read only for user
space in the ABI contract and guaranteed to be updated by the
kernel when a task is migrated to a different CPU.
3) The RO for userspace property has been enforced by RSEQ debugging
mode since day one. If such a debug enabled kernel detects user
space changing the field it kills the task/application.
4) tcmalloc abused the suboptimal implementation (see #1) and
scribbled over rseq::cpu_id_start for their own nefarious purposes.
5) As a consequence of #4 tcmalloc cannot be used on a RSEQ debug
enabled kernel. Which means a developer cannot validate his RSEQ
code against a debug kernel when tcmalloc is in use on the system
as that would crash the tcmalloc dependent applications due to #3.
6) As a consequence of #4 tcmalloc cannot be used together with any
other facility/library which wants to utilize the ABI guaranteed
properties of rseq::cpu_id_start in the same application.
7) tcmalloc violates the ABI from day one and has since refused to
address the problem despite being offered a kernel side rseq
extension to solve it many years ago.
8) When addressing the performance issues of RSEQ the unconditional
update stopped to exist under the valid assumption that the kernel
has only to satisfy the guaranteed ABI properties, especially when
they are enforcable by RSEQ debug.
As a consequence this exposed the tcmalloc ABI violation because
the unconditional pointless overwriting of something which did not
change stopped to happen.
Due to #4 everyone is in a hard place and up a creek without a paddle.
Here are the possible solutions:
A) Mathias suggested to force overwrite rseq:cpu_id_start everytime
the rseq::rseq_cs field is cleared by the kernel under the not yet
validated theoretical assumption that this cures the problem for
tcmalloc.
If that's sufficient that would be harmless performance wise
because the write would be inside the already existing STAC/CLAC
section and just add some more noise to the rseq critical section
operations.
That would allow existing tcmalloc usage to continue, but
obviously would neither solve #5 and #6 above nor provide an
incentive for tcmalloc to actually fix their crap.
B) If that's not sufficient then keeping tcmalloc alive would require
to go back to the previous state and let everyone else pay the
price in terms of performance overhead.
C) Declare that this is not a regression because the ABI guarantee is
not violated and the RO property has been enforcable by RSEQ
debugging since day one.
In my opinion #C is the right thing to do, but I can see a case being
made for the lightweight fix Mathias suggested (#A) _if_ and only _if_
that is sufficient. Picking #A would also mean that user space people
have to take up the fight against tcmalloc when they want to use the
RSEQ guaranteed ABI along with tcmalloc in the same application or use a
RSEQ debug kernel to validate their own code.
Going back to the full unconditional nightmare (#B) is not an option at
all as anybody else has to take the massive performance hit.
Oh well...
Thanks,
tglx
next prev parent reply other threads:[~2026-04-23 17:19 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com>
2026-04-22 12:56 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Peter Zijlstra
2026-04-22 13:13 ` Peter Zijlstra
2026-04-23 10:38 ` Mathias Stearn
[not found] ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
2026-04-23 11:48 ` Thomas Gleixner
2026-04-23 12:11 ` Mathias Stearn
2026-04-23 17:19 ` Thomas Gleixner [this message]
2026-04-23 17:38 ` Chris Kennelly
2026-04-23 17:47 ` Mathieu Desnoyers
2026-04-23 19:39 ` Thomas Gleixner
2026-04-23 17:41 ` Linus Torvalds
2026-04-23 18:35 ` Mathias Stearn
2026-04-23 18:53 ` Mark Rutland
2026-04-23 21:03 ` Thomas Gleixner
2026-04-23 21:28 ` Linus Torvalds
2026-04-23 23:08 ` Linus Torvalds
2026-04-22 13:09 ` Mark Rutland
2026-04-22 17:49 ` Thomas Gleixner
2026-04-22 18:11 ` Mark Rutland
2026-04-22 19:47 ` Thomas Gleixner
2026-04-23 1:48 ` Jinjie Ruan
2026-04-23 5:53 ` Dmitry Vyukov
2026-04-23 10:39 ` Thomas Gleixner
2026-04-23 10:51 ` Mathias Stearn
2026-04-23 12:24 ` David Laight
2026-04-23 19:31 ` Thomas Gleixner
2026-04-24 7:56 ` Dmitry Vyukov
2026-04-24 8:32 ` Mathias Stearn
2026-04-24 9:30 ` Dmitry Vyukov
2026-04-24 14:16 ` Thomas Gleixner
2026-04-24 15:03 ` Peter Zijlstra
2026-04-24 19:44 ` Thomas Gleixner
2026-04-23 12:11 ` Alejandro Colomar
2026-04-23 12:54 ` Mathieu Desnoyers
2026-04-23 12:29 ` Mathieu Desnoyers
2026-04-23 12:36 ` Dmitry Vyukov
2026-04-23 12:53 ` Mathieu Desnoyers
2026-04-23 12:58 ` Dmitry Vyukov
2026-04-24 16:45 ` [PATCH] arm64/entry: Fix arm64-specific rseq brokenness (was: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64) " Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cxzp1tn6.ffs@tglx \
--to=tglx@kernel.org \
--cc=blake.oler@mongodb.com \
--cc=boqun.feng@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=ckennelly@google.com \
--cc=dvyukov@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathias@mongodb.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=regressions@lists.linux.dev \
--cc=ruanjinjie@huawei.com \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox