From: Mark Rutland <mark.rutland@arm.com>
To: Mathias Stearn <mathias@mongodb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Chris Kennelly <ckennelly@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
regressions@lists.linux.dev, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Jinjie Ruan <ruanjinjie@huawei.com>,
Blake Oler <blake.oler@mongodb.com>
Subject: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
Date: Wed, 22 Apr 2026 14:09:09 +0100 [thread overview]
Message-ID: <aejCaG6n9s7ak5TO@J2N7QTR9R3.cambridge.arm.com> (raw)
In-Reply-To: <CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com>
Hi Mathias,
On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
> TL;DR: As of 6.19, rseq no longer provides the documented atomicity
> guarantees on arm64 by failing to abort the critical section on same-core
> preemption/resumption. Additionally, it breaks tcmalloc specifically by
> failing to overwrite the cpu_id_start field at points where it was relied
> on for correctness.
Thanks for the report, and the test case.
As a holding reply, I'm looking into this now from the arm64 side.
I'll leave it to Thomas/Peter/Mathieu to comment w.r.t. the issue you
raise with cpu_id_start.
For some reason, this mail didn't make it to my inbox, and I had to grab
it from lore using b4. That might be a problem with my local mail
server; I'm just noting that in case others also didn't receive this.
Mark.
> This is a SEVERE breakage for MongoDB. We received several user reports of
> crashes on 6.19. I made a stress test that showed that 6.19 can cause
> malloc to return the same pointer twice without it being freed. Because
> that can cause arbitrary corruption, our latest releases have all been
> patched to refuse to start at all on 6.19+.
>
> TCMalloc uses rseq in a "creative" way described at
> https://github.com/google/tcmalloc/blob/master/docs/rseq.md. In particular,
> the "Current CPU Slabs Pointer Caching" section describes an optimization
> that relies on an undocumented fact that the kernel was always overwriting
> cpu_id_start (even when it wouldn't change) to invalidate a user-space
> cache. Since the change to stop writing cpu_id_start seemed to be
> intentional as part of a refactoring merged in 2b09f480f0a1, I started
> working on a userspace patch to stop relying on that. Unfortunately when
> that was complete I ran into a wall that is impossible to work around from
> userspace.
>
> On arm64, the kernel no longer meets the documented guarantee that rseq
> critical sections are atomic with respect to preemption. It seems to only
> abort the critical section when the thread is migrated to a different core.
> The attached test proves it and passes on x86 both before and after 6.19,
> and on arm before 6.19, but fails on arm with 6.19. It pins the process to
> a single core and then has an rseq critical section that observes a change
> made by another thread which is supposed to be impossible. I think this
> will break basically any real usage of rseq, other than just reading the
> current cpu_id.
>
> An LLM pointed to these two specific commits in the refactor as causing
> this (oldest first):
> - 39a167560a61 rseq: Optimize event setting
> This assumed that user_irq would be set on preemption but it wasn't on
> arm64, so TIF_NOTIFY_RESUME isn't raised on same cpu preemption.
> - 566d8015f7ee rseq: Avoid CPU/MM CID updates when no event pending
> This broke TCMalloc slab caching trick by not overwriting cpu_id_start on
> every return to userspace
>
> (I have a lot more analysis and suggested fixes from LLMs since I used them
> heavily in this testing and analysis, but I won't spam you with the slop
> unless requested)
>
> The arm64 change is a clear breakage and I'm sure it will be
> uncontroversial to fix. I can imagine more resistance to reverting to the
> old behavior of always overwriting the cpu_id_start field since that seems
> to have been an intentional optimization choice. I have reached out to the
> TCMalloc maintainers (CC'd) and believe there is a solution that gets the
> vast majority of the optimization while still preserving the behavior that
> TCMalloc currently relies on[1].
>
> Any time a critical section might be aborted (migration, preemption, signal
> delivery, and membarrier IPI), the kernel already must (but doesn't on
> arm64 at the moment) check the rseq_cs field to see if the thread is in a
> critical section, and is documented as nulling the pointer after (I assume
> to make later checks cheaper). It would be sufficient for tcmalloc's
> internal usage if every time the kernel nulled out rseq_cs, it also wrote
> the cpu id to cpu_id_start. That should be essentially free since you are
> already writing to the same cache line. It was pointed out that that could
> be an issue if another rseq user in the same thread nulled rseq_cs after
> its critical section, which would require the kernel to update cpu_id_start
> each time it checks rseq_cs, regardless of whether it nulls it. We aren't
> aware of any processes that mix tcmalloc with other rseq usages that null
> out the field from userspace, but we can't rule them out since it is open
> source. Either way, this preserves the property of not updating
> cpu_id_start on every syscall return and non-membarrier interrupts, which I
> assume is where the majority of the optimization win was from.
>
> All testing of problematic versions was performed on x86_64 and
> aarch64 Ubuntu 24.04.4 with the kernel manually upgraded to
> 6.19.8-061908-generic. Source analysis was performed on the v6.19 tag. I
> had a few AI agents confirm that nothing in the relevant changes to master
> should have solved this, but I have not yet tested there.
>
> $ cat /proc/version
> Linux version 6.19.8-061908-generic (kernel@balboa)
> (aarch64-linux-gnu-gcc-15 (Ubuntu 15.2.0-15ubuntu1) 15.2.0, GNU ld (GNU
> Binutils for Ubuntu) 2.46) #202603131837 SMP PREEMPT_DYNAMIC Sat Mar 14
> 00:00:07 UTC 2026
>
> [1] There is also an exploration of some options to make tcmalloc not rely
> on the cpu_id_start overwriting. However we would strongly prefer that
> existing binaries continue to work on 6.19 kernels, even if newer binaries
> don't need that. At least for a good while.
next prev parent reply other threads:[~2026-04-22 13:09 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com>
2026-04-22 12:56 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Peter Zijlstra
2026-04-22 13:13 ` Peter Zijlstra
2026-04-23 10:38 ` Mathias Stearn
[not found] ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
2026-04-23 11:48 ` Thomas Gleixner
2026-04-23 12:11 ` Mathias Stearn
2026-04-23 17:19 ` Thomas Gleixner
2026-04-23 17:38 ` Chris Kennelly
2026-04-23 17:47 ` Mathieu Desnoyers
2026-04-23 19:39 ` Thomas Gleixner
2026-04-23 17:41 ` Linus Torvalds
2026-04-23 18:35 ` Mathias Stearn
2026-04-23 18:53 ` Mark Rutland
2026-04-23 21:03 ` Thomas Gleixner
2026-04-23 21:28 ` Linus Torvalds
2026-04-23 23:08 ` Linus Torvalds
2026-04-27 7:06 ` Florian Weimer
2026-04-27 16:12 ` Linus Torvalds
2026-04-22 13:09 ` Mark Rutland [this message]
2026-04-22 17:49 ` Thomas Gleixner
2026-04-22 18:11 ` Mark Rutland
2026-04-22 19:47 ` Thomas Gleixner
2026-04-23 1:48 ` Jinjie Ruan
2026-04-23 5:53 ` Dmitry Vyukov
2026-04-23 10:39 ` Thomas Gleixner
2026-04-23 10:51 ` Mathias Stearn
2026-04-23 12:24 ` David Laight
2026-04-23 19:31 ` Thomas Gleixner
2026-04-24 7:56 ` Dmitry Vyukov
2026-04-24 8:32 ` Mathias Stearn
2026-04-24 9:30 ` Dmitry Vyukov
2026-04-24 14:16 ` Thomas Gleixner
2026-04-24 15:03 ` Peter Zijlstra
2026-04-24 19:44 ` Thomas Gleixner
2026-04-26 22:04 ` Thomas Gleixner
2026-04-27 7:40 ` Florian Weimer
2026-04-27 11:03 ` Thomas Gleixner
2026-04-27 18:35 ` Mathieu Desnoyers
2026-04-27 21:06 ` Thomas Gleixner
2026-04-28 6:11 ` Dmitry Vyukov
2026-04-28 8:07 ` Thomas Gleixner
2026-04-28 8:18 ` Thomas Gleixner
[not found] ` <CACT4Y+b_RH2eZMuh1YUyqnoK-5KUpdWW4z1q2ZQWkY_GcBqmNw@mail.gmail.com>
[not found] ` <CAHnCjA2sCwOumOjWm=wW=Kj0C83KVW5zS+51=9=YSeAzuEaVQA@mail.gmail.com>
2026-04-28 15:46 ` Thomas Gleixner
2026-04-28 7:39 ` Peter Zijlstra
2026-04-28 8:13 ` Peter Zijlstra
2026-04-28 8:51 ` Thomas Gleixner
2026-04-28 8:03 ` Peter Zijlstra
2026-04-28 8:36 ` Thomas Gleixner
2026-04-23 12:11 ` Alejandro Colomar
2026-04-23 12:54 ` Mathieu Desnoyers
2026-04-23 12:29 ` Mathieu Desnoyers
2026-04-23 12:36 ` Dmitry Vyukov
2026-04-23 12:53 ` Mathieu Desnoyers
2026-04-23 12:58 ` Dmitry Vyukov
2026-04-24 16:45 ` [PATCH] arm64/entry: Fix arm64-specific rseq brokenness (was: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64) " Mark Rutland
2026-04-28 1:39 ` [PATCH] arm64/entry: Fix arm64-specific rseq brokenness Jinjie Ruan
2026-04-28 13:40 ` Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aejCaG6n9s7ak5TO@J2N7QTR9R3.cambridge.arm.com \
--to=mark.rutland@arm.com \
--cc=blake.oler@mongodb.com \
--cc=boqun.feng@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=ckennelly@google.com \
--cc=dvyukov@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathias@mongodb.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=regressions@lists.linux.dev \
--cc=ruanjinjie@huawei.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox