public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Mathias Stearn <mathias@mongodb.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Chris Kennelly <ckennelly@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	regressions@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Ingo Molnar <mingo@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Jinjie Ruan <ruanjinjie@huawei.com>,
	Blake Oler <blake.oler@mongodb.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
Date: Thu, 23 Apr 2026 19:19:41 +0200	[thread overview]
Message-ID: <87cxzp1tn6.ffs@tglx> (raw)
In-Reply-To: <CAHnCjA07ER=9xBqXq4jf5yn052W-E9ZXD86HxnRGtai6bpXbbQ@mail.gmail.com>

On Thu, Apr 23 2026 at 14:11, Mathias Stearn wrote:

Cc+ Linus

> Of course, even if we make that change, it will only apply to _future_
> binaries. That's why we prefer a kernel fix so that users will be able
> to run our existing releases (or any containers that use them) on a
> modern kernel.

I understand that and as everyone else I would be happy to do that, but
the price everyone pays for proliferating the tcmalloc insanity is not
cheap either.

So let me recap the whole situation and how we got there:

  1) The original RSEQ implementation updates the rseq::cpu_id_start
     field in user space more or less unconditionally on every exit to
     user, whether the CPU/MMCID have been changed or not.

     That went unnoticed for years because nothing used rseq aside of
     google and tcmalloc. Once glibc registered rseq, this resulted in a
     up to 15% performance penalty for syscall heavy workloads.

  2) The rseq::cpu_id_start field is documented as read only for user
     space in the ABI contract and guaranteed to be updated by the
     kernel when a task is migrated to a different CPU.

  3) The RO for userspace property has been enforced by RSEQ debugging
     mode since day one. If such a debug enabled kernel detects user
     space changing the field it kills the task/application.

  4) tcmalloc abused the suboptimal implementation (see #1) and
     scribbled over rseq::cpu_id_start for their own nefarious purposes.

  5) As a consequence of #4 tcmalloc cannot be used on a RSEQ debug
     enabled kernel. Which means a developer cannot validate his RSEQ
     code against a debug kernel when tcmalloc is in use on the system
     as that would crash the tcmalloc dependent applications due to #3.

  6) As a consequence of #4 tcmalloc cannot be used together with any
     other facility/library which wants to utilize the ABI guaranteed
     properties of rseq::cpu_id_start in the same application.

  7) tcmalloc violates the ABI from day one and has since refused to
     address the problem despite being offered a kernel side rseq
     extension to solve it many years ago.

  8) When addressing the performance issues of RSEQ the unconditional
     update stopped to exist under the valid assumption that the kernel
     has only to satisfy the guaranteed ABI properties, especially when
     they are enforcable by RSEQ debug.

     As a consequence this exposed the tcmalloc ABI violation because
     the unconditional pointless overwriting of something which did not
     change stopped to happen.

Due to #4 everyone is in a hard place and up a creek without a paddle.

Here are the possible solutions:

  A) Mathias suggested to force overwrite rseq:cpu_id_start everytime
     the rseq::rseq_cs field is cleared by the kernel under the not yet
     validated theoretical assumption that this cures the problem for
     tcmalloc.

     If that's sufficient that would be harmless performance wise
     because the write would be inside the already existing STAC/CLAC
     section and just add some more noise to the rseq critical section
     operations.

     That would allow existing tcmalloc usage to continue, but
     obviously would neither solve #5 and #6 above nor provide an
     incentive for tcmalloc to actually fix their crap.

  B) If that's not sufficient then keeping tcmalloc alive would require
     to go back to the previous state and let everyone else pay the
     price in terms of performance overhead.

  C) Declare that this is not a regression because the ABI guarantee is
     not violated and the RO property has been enforcable by RSEQ
     debugging since day one.

In my opinion #C is the right thing to do, but I can see a case being
made for the lightweight fix Mathias suggested (#A) _if_ and only _if_
that is sufficient. Picking #A would also mean that user space people
have to take up the fight against tcmalloc when they want to use the
RSEQ guaranteed ABI along with tcmalloc in the same application or use a
RSEQ debug kernel to validate their own code.

Going back to the full unconditional nightmare (#B) is not an option at
all as anybody else has to take the massive performance hit.

Oh well...

Thanks,

        tglx


  reply	other threads:[~2026-04-23 17:19 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com>
2026-04-22 12:56 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Peter Zijlstra
2026-04-22 13:13   ` Peter Zijlstra
2026-04-23 10:38     ` Mathias Stearn
     [not found]     ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
2026-04-23 11:48       ` Thomas Gleixner
2026-04-23 12:11         ` Mathias Stearn
2026-04-23 17:19           ` Thomas Gleixner [this message]
2026-04-23 17:38             ` Chris Kennelly
2026-04-23 17:47               ` Mathieu Desnoyers
2026-04-23 19:39               ` Thomas Gleixner
2026-04-23 17:41             ` Linus Torvalds
2026-04-23 18:35               ` Mathias Stearn
2026-04-23 18:53               ` Mark Rutland
2026-04-23 21:03               ` Thomas Gleixner
2026-04-23 21:28                 ` Linus Torvalds
2026-04-23 23:08                   ` Linus Torvalds
2026-04-22 13:09 ` Mark Rutland
2026-04-22 17:49   ` Thomas Gleixner
2026-04-22 18:11     ` Mark Rutland
2026-04-22 19:47       ` Thomas Gleixner
2026-04-23  1:48         ` Jinjie Ruan
2026-04-23  5:53           ` Dmitry Vyukov
2026-04-23 10:39             ` Thomas Gleixner
2026-04-23 10:51               ` Mathias Stearn
2026-04-23 12:24                 ` David Laight
2026-04-23 19:31                 ` Thomas Gleixner
2026-04-24  7:56                   ` Dmitry Vyukov
2026-04-24  8:32                     ` Mathias Stearn
2026-04-24  9:30                       ` Dmitry Vyukov
2026-04-24 14:16                       ` Thomas Gleixner
2026-04-24 15:03                         ` Peter Zijlstra
2026-04-24 19:44                           ` Thomas Gleixner
2026-04-23 12:11             ` Alejandro Colomar
2026-04-23 12:54               ` Mathieu Desnoyers
2026-04-23 12:29             ` Mathieu Desnoyers
2026-04-23 12:36               ` Dmitry Vyukov
2026-04-23 12:53                 ` Mathieu Desnoyers
2026-04-23 12:58                   ` Dmitry Vyukov
2026-04-24 16:45 ` [PATCH] arm64/entry: Fix arm64-specific rseq brokenness (was: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64) " Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cxzp1tn6.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=blake.oler@mongodb.com \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=ckennelly@google.com \
    --cc=dvyukov@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathias@mongodb.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=regressions@lists.linux.dev \
    --cc=ruanjinjie@huawei.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox