public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
@ 2026-04-22  9:50 Mathias Stearn
  2026-04-22 12:56 ` Peter Zijlstra
  2026-04-22 13:09 ` Mark Rutland
  0 siblings, 2 replies; 32+ messages in thread
From: Mathias Stearn @ 2026-04-22  9:50 UTC (permalink / raw)
  To: Thomas Gleixner, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney
  Cc: Chris Kennelly, Dmitry Vyukov, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Jinjie Ruan, Blake Oler


[-- Attachment #1.1: Type: text/plain, Size: 4968 bytes --]

TL;DR: As of 6.19, rseq no longer provides the documented atomicity
guarantees on arm64 by failing to abort the critical section on same-core
preemption/resumption. Additionally, it breaks tcmalloc specifically by
failing to overwrite the cpu_id_start field at points where it was relied
on for correctness.

This is a SEVERE breakage for MongoDB. We received several user reports of
crashes on 6.19. I made a stress test that showed that 6.19 can cause
malloc to return the same pointer twice without it being freed. Because
that can cause arbitrary corruption, our latest releases have all been
patched to refuse to start at all on 6.19+.

TCMalloc uses rseq in a "creative" way described at
https://github.com/google/tcmalloc/blob/master/docs/rseq.md. In particular,
the "Current CPU Slabs Pointer Caching" section describes an optimization
that relies on an undocumented fact that the kernel was always overwriting
cpu_id_start (even when it wouldn't change) to invalidate a user-space
cache. Since the change to stop writing cpu_id_start seemed to be
intentional as part of a refactoring merged in 2b09f480f0a1, I started
working on a userspace patch to stop relying on that. Unfortunately when
that was complete I ran into a wall that is impossible to work around from
userspace.

On arm64, the kernel no longer meets the documented guarantee that rseq
critical sections are atomic with respect to preemption. It seems to only
abort the critical section when the thread is migrated to a different core.
The attached test proves it and passes on x86 both before and after 6.19,
and on arm before 6.19, but fails on arm with 6.19. It pins the process to
a single core and then has an rseq critical section that observes a change
made by another thread which is supposed to be impossible. I think this
will break basically any real usage of rseq, other than just reading the
current cpu_id.

An LLM pointed to these two specific commits in the refactor as causing
this (oldest first):
- 39a167560a61 rseq: Optimize event setting
This assumed that user_irq would be set on preemption but it wasn't on
arm64, so TIF_NOTIFY_RESUME isn't raised on same cpu preemption.
- 566d8015f7ee rseq: Avoid CPU/MM CID updates when no event pending
This broke TCMalloc slab caching trick by not overwriting cpu_id_start on
every return to userspace

(I have a lot more analysis and suggested fixes from LLMs since I used them
heavily in this testing and analysis, but I won't spam you with the slop
unless requested)

The arm64 change is a clear breakage and I'm sure it will be
uncontroversial to fix. I can imagine more resistance to reverting to the
old behavior of always overwriting the cpu_id_start field since that seems
to have been an intentional optimization choice. I have reached out to the
TCMalloc maintainers (CC'd) and believe there is a solution that gets the
vast majority of the optimization while still preserving the behavior that
TCMalloc currently relies on[1].

Any time a critical section might be aborted (migration, preemption, signal
delivery, and membarrier IPI), the kernel already must (but doesn't on
arm64 at the moment) check the rseq_cs field to see if the thread is in a
critical section, and is documented as nulling the pointer after (I assume
to make later checks cheaper). It would be sufficient for tcmalloc's
internal usage if every time the kernel nulled out rseq_cs, it also wrote
the cpu id to cpu_id_start. That should be essentially free since you are
already writing to the same cache line. It was pointed out that that could
be an issue if another rseq user in the same thread nulled rseq_cs after
its critical section, which would require the kernel to update cpu_id_start
each time it checks rseq_cs, regardless of whether it nulls it. We aren't
aware of any processes that mix tcmalloc with other rseq usages that null
out the field from userspace, but we can't rule them out since it is open
source. Either way, this preserves the property of not updating
cpu_id_start on every syscall return and non-membarrier interrupts, which I
assume is where the majority of the optimization win was from.

All testing of problematic versions was performed on x86_64 and
aarch64 Ubuntu 24.04.4 with the kernel manually upgraded to
6.19.8-061908-generic. Source analysis was performed on the v6.19 tag. I
had a few AI agents confirm that nothing in the relevant changes to master
should have solved this, but I have not yet tested there.

$ cat /proc/version
Linux version 6.19.8-061908-generic (kernel@balboa)
(aarch64-linux-gnu-gcc-15 (Ubuntu 15.2.0-15ubuntu1) 15.2.0, GNU ld (GNU
Binutils for Ubuntu) 2.46) #202603131837 SMP PREEMPT_DYNAMIC Sat Mar 14
00:00:07 UTC 2026

[1]  There is also an exploration of some options to make tcmalloc not rely
on the cpu_id_start overwriting. However we would strongly prefer that
existing binaries continue to work on 6.19 kernels, even if newer binaries
don't need that. At least for a good while.

[-- Attachment #1.2: Type: text/html, Size: 5426 bytes --]

[-- Attachment #2: rseq_same_cpu_preempt_test.cc --]
[-- Type: application/octet-stream, Size: 8419 bytes --]

// Minimal single-file rseq repro for same-CPU preemption handling.
//
// Build:
//   g++ -O2 -std=c++20 -pthread rseq_same_cpu_preempt_test.cc -o rseq_same_cpu_preempt_test
//
// The main thread pins itself and a writer thread to one CPU. It then enters an
// rseq critical section that stores 0 to a shared flag and spins until it sees
// the flag become 1. If the critical section resumes after a preemption without
// being aborted, it will eventually observe the writer's 1 and abort.
//
// The writer thread wakes every 10 usec and stores 1 to the shared flag.
//
// Expected behavior if rseq preemption aborts work correctly:
//   the program runs for 10 seconds and exits 0.
//
// Expected behavior if same-CPU preemption can resume inside the CS:
//   the main thread eventually reads 1 inside the CS and aborts.
//
// Note to readers: the top of this file is boring setup code. The interesting
// code starts at run_one_rseq_attempt() so you should skip down there first.

#include <errno.h>
#include <linux/rseq.h>
#include <sched.h>
#include <sys/rseq.h>
#include <unistd.h>

#include <chrono>
#include <cstdarg>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <thread>

#if !defined(__aarch64__) && !defined(__x86_64__)
#error "This repro is currently implemented for aarch64 and x86_64 only."
#endif

namespace {

constexpr std::chrono::seconds kRuntime{10};
constexpr long kWriterSleepNs = 10'000;  // 10 usec

alignas(4) uint32_t g_shared_flag = 0;

struct rseq* current_rseq_abi() {
    auto* thread_ptr = reinterpret_cast<char*>(__builtin_thread_pointer());
    return reinterpret_cast<struct rseq*>(thread_ptr + __rseq_offset);
}

[[noreturn, gnu::format(printf, 1, 2)]] void die(const char* fmt, ...) {
    va_list args;
    va_start(args, fmt);
    std::vfprintf(stderr, fmt, args);
    va_end(args);
    std::fprintf(stderr, "\n");
    _Exit(1);
}

[[noreturn]] void die_errno(const char* what) {
    die("%s failed: %s", what, std::strerror(errno));
}

int pick_and_pin_first_allowed_cpu() {
    cpu_set_t set;
    CPU_ZERO(&set);
    if (sched_getaffinity(0, sizeof(set), &set) != 0) {
        die_errno("sched_getaffinity");
    }

    for (int cpu = 0; cpu < CPU_SETSIZE; ++cpu) {
        if (CPU_ISSET(cpu, &set)) {
            CPU_ZERO(&set);
            CPU_SET(cpu, &set);
            if (sched_setaffinity(0, sizeof(set), &set) != 0) {
                die_errno("sched_setaffinity");
            }
            return cpu;
        }
    }
    die("No allowed CPU found");
}

#define RSEQ_STR_1(x) #x
#define RSEQ_STR(x) RSEQ_STR_1(x)

#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
    ".pushsection __rseq_cs, \"aw\"  \n\t"                                              \
    ".balign 32  \n\t"                                                                  \
    RSEQ_STR(label) ":  \n\t"                                                           \
    ".long 0  \n\t"  /* version */                                                      \
    ".long 0  \n\t"  /* flags */                                                        \
    ".quad " RSEQ_STR(start_ip) "  \n\t"  /* start_ip */                                \
    ".quad " RSEQ_STR((post_commit_ip) - (start_ip)) "  \n\t"  /* post_commit_offset */ \
    ".quad " RSEQ_STR(abort_ip) "  \n\t"  /* abort_ip */                                \
    ".popsection  \n\t"

int run_one_rseq_attempt(struct rseq* abi, uint32_t* shared_flag) {
    int result = 0;

#ifdef C_EQUIVALENT  // C equivalent:
    // Critical section: store 0, then spin until flag becomes 1
    *shared_flag = 0;
    while (*shared_flag == 0) {
        // spin
    }
    result = 1;   // Observed flag == 1
abort:            // Abort handler (kernel jumps here if preempted inside CS)
    result = -1;  // We correctly observed a preemption inside the CS
#elif defined(__aarch64__)
    __asm__ __volatile__(
        RSEQ_ASM_DEFINE_TABLE(1, 2f, 3f, 4f)
        //  Store address of rseq_cs descriptor into abi->rseq_cs
        "  adrp x15, 1b  \n"
        "  add x15, x15, :lo12:1b  \n"
        "  str x15, %[rseq_cs]  \n"
        "2:  \n"                         //  Critical section start (label 2)
        "  str wzr, %[shared_flag]  \n"  //  *shared_flag = 0
        "5:  \n"                         //  Spin loop: while (*shared_flag == 0) {}
        "  ldr w15, %[shared_flag]  \n"  //  w15 = *shared_flag
        "  cbz w15, 5b  \n"              //  if (w15 == 0) goto 5 (spin)
        "  mov %w[result], #1  \n"       //  result = 1 (observed flag == 1)
        "3:  \n"                         //  Critical section end - fall through
        "  b 99f  \n"                    //  Jump past abort handler
        "  .long %c[sig]  \n"            //  RSEQ signature (magic bytes required by kernel)
        "4:  \n"                         //  Abort handler entry (label 4)
        "  mov %w[result], #-1  \n"      //  result = -1
        "99:  \n"                        //  End of abort handler
        : [result] "+r"(result), [rseq_cs] "=m"(abi->rseq_cs), [shared_flag] "+Q"(*shared_flag)
        : [sig] "i"(RSEQ_SIG)
        : "memory", "x15");
#elif defined(__x86_64__)
    __asm__ __volatile__(
        RSEQ_ASM_DEFINE_TABLE(1, 2f, 3f, 4f)
        //  Store address of rseq_cs descriptor into abi->rseq_cs
        "  leaq 1b(%%rip), %%rax  \n\t"
        "  movq %%rax, %[rseq_cs]  \n\t"
        "2:  \n\t"                            //  Critical section start (label 2)
        "  movl $0, %[shared_flag]  \n\t"     //  *shared_flag = 0
        "5:  \n\t"                            //  Spin loop: while (*shared_flag == 0) {}
        "  movl %[shared_flag], %%eax  \n\t"  //  eax = *shared_flag
        "  testl %%eax, %%eax  \n\t"          //  test eax == 0
        "  jz 5b  \n\t"                       //  if (eax == 0) goto 5 (spin)
        "  movl $1, %[result]  \n\t"          //  result = 1 (observed flag == 1)
        "3:  \n\t"                            //  Critical section end - fall through
        "  jmp 99f  \n\t"                     //  Jump past abort handler
        "  .long %c[sig]  \n\t"               //  RSEQ signature (magic bytes required by kernel)
        "4:  \n\t"                            //  Abort handler entry (label 4)
        "  movl $-1, %[result]  \n\t"         //  result = -1
        "99:  \n\t"                           //  End of abort handler
        : [result] "+r"(result), [rseq_cs] "=m"(abi->rseq_cs), [shared_flag] "+m"(*shared_flag)
        : [sig] "i"(RSEQ_SIG)
        : "memory", "cc", "rax");
#endif

    return result;
}

void writer_thread_main() {
    while (true) {
        std::this_thread::sleep_for(std::chrono::nanoseconds(kWriterSleepNs));
        __atomic_store_n(&g_shared_flag, 1u, __ATOMIC_RELAXED);
    }
}

}  // namespace

int main() {
    if (__rseq_size == 0) {
        die("rseq is not registered for this thread (glibc __rseq_size == 0); "
            "need glibc >= 2.35 with rseq support and a kernel that supports rseq");
    }
    const int cpu = pick_and_pin_first_allowed_cpu();
    if ((int)current_rseq_abi()->cpu_id != cpu) {
        die("rseq abi cpu_id is %d after pinning rather than %d",
            current_rseq_abi()->cpu_id,
            cpu);
    }

    std::thread(writer_thread_main).detach();

    const auto deadline = std::chrono::steady_clock::now() + kRuntime;
    uint64_t attempts = 0;
    uint64_t abort_retries = 0;

    while (std::chrono::steady_clock::now() < deadline) {
        ++attempts;
        const int rc = run_one_rseq_attempt(current_rseq_abi(), &g_shared_flag);
        if (rc == 1) {
            die("Observed shared_flag == 1 inside the rseq critical section "
                "after %llu attempts on cpu %d",
                static_cast<unsigned long long>(attempts),
                cpu);
        } else if (rc != -1) {
            die("Unexpected return value from rseq: %d after %llu attempts",
                rc,
                static_cast<unsigned long long>(attempts));
        }
        ++abort_retries;
    }

    std::fprintf(stderr,
                 "PASS: ran for %lld seconds on cpu %d, attempts=%llu abort_retries=%llu\n",
                 static_cast<long long>(kRuntime.count()),
                 cpu,
                 static_cast<unsigned long long>(attempts),
                 static_cast<unsigned long long>(abort_retries));
    _Exit(0);
}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22  9:50 [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Mathias Stearn
@ 2026-04-22 12:56 ` Peter Zijlstra
  2026-04-22 13:13   ` Peter Zijlstra
  2026-04-22 13:09 ` Mark Rutland
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-04-22 12:56 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Thomas Gleixner, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
	Mark Rutland, Jinjie Ruan, Blake Oler

On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:

> Additionally, it breaks tcmalloc specifically by failing to overwrite
> the cpu_id_start field at points where it was relied on for
> correctness.

This specific behaviour was documented as being wrong and running with
DEBUG_RSEQ would have flagged it.

The tcmalloc issue has been contentious for a long time. The tcmalloc
folks relied on something that was documented to be wrong. It has been
reported to the tcmalloc people many years ago and if you were to run
tcmalloc on most any kernel (very much including 6.19) with
DEBUG_RSEQ=y, it would have yelled.

The tcmalloc people didn't care. There was a proposal for an RSEQ
extension for what they need, and they didn't care. All this should be
in their bugzilla or whatever.

The RSEQ rework improved performance significantly for everyone, and
kept all the documented behaviour (+- arm64 bug). Tcmalloc got screwed
over because they relied on implementation behaviour that was
specifically documented to be broken. And they didn't care. Google was
very much aware of this. And hasn't lifted a finger to remedy it.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22  9:50 [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Mathias Stearn
  2026-04-22 12:56 ` Peter Zijlstra
@ 2026-04-22 13:09 ` Mark Rutland
  2026-04-22 17:49   ` Thomas Gleixner
  1 sibling, 1 reply; 32+ messages in thread
From: Mark Rutland @ 2026-04-22 13:09 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Thomas Gleixner, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Ingo Molnar, Jinjie Ruan, Blake Oler

Hi Mathias,

On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
> TL;DR: As of 6.19, rseq no longer provides the documented atomicity
> guarantees on arm64 by failing to abort the critical section on same-core
> preemption/resumption. Additionally, it breaks tcmalloc specifically by
> failing to overwrite the cpu_id_start field at points where it was relied
> on for correctness.

Thanks for the report, and the test case.

As a holding reply, I'm looking into this now from the arm64 side.

I'll leave it to Thomas/Peter/Mathieu to comment w.r.t. the issue you
raise with cpu_id_start.

For some reason, this mail didn't make it to my inbox, and I had to grab
it from lore using b4. That might be a problem with my local mail
server; I'm just noting that in case others also didn't receive this.

Mark.

> This is a SEVERE breakage for MongoDB. We received several user reports of
> crashes on 6.19. I made a stress test that showed that 6.19 can cause
> malloc to return the same pointer twice without it being freed. Because
> that can cause arbitrary corruption, our latest releases have all been
> patched to refuse to start at all on 6.19+.
> 
> TCMalloc uses rseq in a "creative" way described at
> https://github.com/google/tcmalloc/blob/master/docs/rseq.md. In particular,
> the "Current CPU Slabs Pointer Caching" section describes an optimization
> that relies on an undocumented fact that the kernel was always overwriting
> cpu_id_start (even when it wouldn't change) to invalidate a user-space
> cache. Since the change to stop writing cpu_id_start seemed to be
> intentional as part of a refactoring merged in 2b09f480f0a1, I started
> working on a userspace patch to stop relying on that. Unfortunately when
> that was complete I ran into a wall that is impossible to work around from
> userspace.
> 
> On arm64, the kernel no longer meets the documented guarantee that rseq
> critical sections are atomic with respect to preemption. It seems to only
> abort the critical section when the thread is migrated to a different core.
> The attached test proves it and passes on x86 both before and after 6.19,
> and on arm before 6.19, but fails on arm with 6.19. It pins the process to
> a single core and then has an rseq critical section that observes a change
> made by another thread which is supposed to be impossible. I think this
> will break basically any real usage of rseq, other than just reading the
> current cpu_id.
> 
> An LLM pointed to these two specific commits in the refactor as causing
> this (oldest first):
> - 39a167560a61 rseq: Optimize event setting
> This assumed that user_irq would be set on preemption but it wasn't on
> arm64, so TIF_NOTIFY_RESUME isn't raised on same cpu preemption.
> - 566d8015f7ee rseq: Avoid CPU/MM CID updates when no event pending
> This broke TCMalloc slab caching trick by not overwriting cpu_id_start on
> every return to userspace
> 
> (I have a lot more analysis and suggested fixes from LLMs since I used them
> heavily in this testing and analysis, but I won't spam you with the slop
> unless requested)
> 
> The arm64 change is a clear breakage and I'm sure it will be
> uncontroversial to fix. I can imagine more resistance to reverting to the
> old behavior of always overwriting the cpu_id_start field since that seems
> to have been an intentional optimization choice. I have reached out to the
> TCMalloc maintainers (CC'd) and believe there is a solution that gets the
> vast majority of the optimization while still preserving the behavior that
> TCMalloc currently relies on[1].
> 
> Any time a critical section might be aborted (migration, preemption, signal
> delivery, and membarrier IPI), the kernel already must (but doesn't on
> arm64 at the moment) check the rseq_cs field to see if the thread is in a
> critical section, and is documented as nulling the pointer after (I assume
> to make later checks cheaper). It would be sufficient for tcmalloc's
> internal usage if every time the kernel nulled out rseq_cs, it also wrote
> the cpu id to cpu_id_start. That should be essentially free since you are
> already writing to the same cache line. It was pointed out that that could
> be an issue if another rseq user in the same thread nulled rseq_cs after
> its critical section, which would require the kernel to update cpu_id_start
> each time it checks rseq_cs, regardless of whether it nulls it. We aren't
> aware of any processes that mix tcmalloc with other rseq usages that null
> out the field from userspace, but we can't rule them out since it is open
> source. Either way, this preserves the property of not updating
> cpu_id_start on every syscall return and non-membarrier interrupts, which I
> assume is where the majority of the optimization win was from.
> 
> All testing of problematic versions was performed on x86_64 and
> aarch64 Ubuntu 24.04.4 with the kernel manually upgraded to
> 6.19.8-061908-generic. Source analysis was performed on the v6.19 tag. I
> had a few AI agents confirm that nothing in the relevant changes to master
> should have solved this, but I have not yet tested there.
> 
> $ cat /proc/version
> Linux version 6.19.8-061908-generic (kernel@balboa)
> (aarch64-linux-gnu-gcc-15 (Ubuntu 15.2.0-15ubuntu1) 15.2.0, GNU ld (GNU
> Binutils for Ubuntu) 2.46) #202603131837 SMP PREEMPT_DYNAMIC Sat Mar 14
> 00:00:07 UTC 2026
> 
> [1]  There is also an exploration of some options to make tcmalloc not rely
> on the cpu_id_start overwriting. However we would strongly prefer that
> existing binaries continue to work on 6.19 kernels, even if newer binaries
> don't need that. At least for a good while.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22 12:56 ` Peter Zijlstra
@ 2026-04-22 13:13   ` Peter Zijlstra
  2026-04-23 10:38     ` Mathias Stearn
       [not found]     ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-04-22 13:13 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Thomas Gleixner, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
	Mark Rutland, Jinjie Ruan, Blake Oler

On Wed, Apr 22, 2026 at 02:56:47PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
> 
> > Additionally, it breaks tcmalloc specifically by failing to overwrite
> > the cpu_id_start field at points where it was relied on for
> > correctness.
> 
> This specific behaviour was documented as being wrong and running with
> DEBUG_RSEQ would have flagged it.
> 
> The tcmalloc issue has been contentious for a long time. The tcmalloc
> folks relied on something that was documented to be wrong. It has been
> reported to the tcmalloc people many years ago and if you were to run
> tcmalloc on most any kernel (very much including 6.19) with
> DEBUG_RSEQ=y, it would have yelled.
> 
> The tcmalloc people didn't care. There was a proposal for an RSEQ
> extension for what they need, and they didn't care. All this should be
> in their bugzilla or whatever.
> 
> The RSEQ rework improved performance significantly for everyone, and
> kept all the documented behaviour (+- arm64 bug). Tcmalloc got screwed
> over because they relied on implementation behaviour that was
> specifically documented to be broken. And they didn't care. Google was
> very much aware of this. And hasn't lifted a finger to remedy it.

Also: https://lore.kernel.org/all/874io5andc.ffs@tglx/ 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22 13:09 ` Mark Rutland
@ 2026-04-22 17:49   ` Thomas Gleixner
  2026-04-22 18:11     ` Mark Rutland
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-22 17:49 UTC (permalink / raw)
  To: Mark Rutland, Mathias Stearn
  Cc: Mathieu Desnoyers, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, Dmitry Vyukov, regressions,
	linux-kernel, linux-arm-kernel, Peter Zijlstra, Ingo Molnar,
	Jinjie Ruan, Blake Oler

On Wed, Apr 22 2026 at 14:09, Mark Rutland wrote:
> On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
>> TL;DR: As of 6.19, rseq no longer provides the documented atomicity
>> guarantees on arm64 by failing to abort the critical section on same-core
>> preemption/resumption. Additionally, it breaks tcmalloc specifically by
>> failing to overwrite the cpu_id_start field at points where it was relied
>> on for correctness.
>
> Thanks for the report, and the test case.
>
> As a holding reply, I'm looking into this now from the arm64 side.

I assume it's the partial conversion to the generic entry code which
screws that up. The problem reproduces with rseq selftests nicely.

The patch below fixes it as it puts ARM64 back to the non-optimized code
for now. Once ARM64 is fully converted it gets all the nice improvements.

Thanks,

        tglx
---
diff --git a/include/linux/rseq.h b/include/linux/rseq.h
index 2266f4dc77b6..d55476e2a336 100644
--- a/include/linux/rseq.h
+++ b/include/linux/rseq.h
@@ -30,7 +30,7 @@ void __rseq_signal_deliver(int sig, struct pt_regs *regs);
  */
 static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs)
 {
-	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
+	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
 		/* '&' is intentional to spare one conditional branch */
 		if (current->rseq.event.has_rseq & current->rseq.event.user_irq)
 			__rseq_signal_deliver(ksig->sig, regs);
@@ -50,7 +50,7 @@ static __always_inline void rseq_sched_switch_event(struct task_struct *t)
 {
 	struct rseq_event *ev = &t->rseq.event;
 
-	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
+	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
 		/*
 		 * Avoid a boat load of conditionals by using simple logic
 		 * to determine whether NOTIFY_RESUME needs to be raised.
diff --git a/include/linux/rseq_entry.h b/include/linux/rseq_entry.h
index a36b472627de..8ccd464a108d 100644
--- a/include/linux/rseq_entry.h
+++ b/include/linux/rseq_entry.h
@@ -80,7 +80,7 @@ bool rseq_debug_validate_ids(struct task_struct *t);
 
 static __always_inline void rseq_note_user_irq_entry(void)
 {
-	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY))
+	if (IS_ENABLED(CONFIG_GENERIC_ENTRY))
 		current->rseq.event.user_irq = true;
 }
 
@@ -171,8 +171,8 @@ bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs,
 		if (unlikely(usig != t->rseq.sig))
 			goto die;
 
-		/* rseq_event.user_irq is only valid if CONFIG_GENERIC_IRQ_ENTRY=y */
-		if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
+		/* rseq_event.user_irq is only valid if CONFIG_GENERIC_ENTRY=y */
+		if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
 			/* If not in interrupt from user context, let it die */
 			if (unlikely(!t->rseq.event.user_irq))
 				goto die;
@@ -387,7 +387,7 @@ static rseq_inline bool rseq_update_usr(struct task_struct *t, struct pt_regs *r
 	 * allows to skip the critical section when the entry was not from
 	 * a user space interrupt, unless debug mode is enabled.
 	 */
-	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
+	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
 		if (!static_branch_unlikely(&rseq_debug_enabled)) {
 			if (likely(!t->rseq.event.user_irq))
 				return true;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22 17:49   ` Thomas Gleixner
@ 2026-04-22 18:11     ` Mark Rutland
  2026-04-22 19:47       ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Rutland @ 2026-04-22 18:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathias Stearn, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Ingo Molnar, Jinjie Ruan, Blake Oler

On Wed, Apr 22, 2026 at 07:49:30PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 22 2026 at 14:09, Mark Rutland wrote:
> > On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
> >> TL;DR: As of 6.19, rseq no longer provides the documented atomicity
> >> guarantees on arm64 by failing to abort the critical section on same-core
> >> preemption/resumption. Additionally, it breaks tcmalloc specifically by
> >> failing to overwrite the cpu_id_start field at points where it was relied
> >> on for correctness.
> >
> > Thanks for the report, and the test case.
> >
> > As a holding reply, I'm looking into this now from the arm64 side.
> 
> I assume it's the partial conversion to the generic entry code which
> screws that up. 

It's slightly more than that, but in a sense, yes. ;)

The fix is conceptually simple, but I'll need to do some refactoring.

Conceptually we just need to use syscall_enter_from_user_mode() and
irqentry_enter_from_user_mode() appropriately.

In practice, I can't use those as-is without introducing the exception
masking problems I just fixed up for irqentry_enter_from_kernel_mode(),
so I'll need to do some similar refactoring first.

That and I *think* a couple of of the current checks for CONFIG_GENERIC_ENTRY
should be checking CONFIG_GENERIC_IRQ_ENTRY, since all of the relevant
bits are in the generic irqentry code rather than the GENERIC_SYSCALL
code (and GENERIC_ENTRY is just GENERIC_IRQ_ENTRY + GENERIC_SYSCALL).

> The problem reproduces with rseq selftests nicely.

Ah; that's both good to know, and worrying that we've never had a report
from all the automated testing people are supposedly running. :/

> The patch below fixes it as it puts ARM64 back to the non-optimized code
> for now. Once ARM64 is fully converted it gets all the nice improvements.

Thanks; I'll give that a test tomorrow.

I haven't paged everything in yet, so just to cehck, is there anything
that would behave incorrectly if current->rseq.event.user_irq were set
for syscall entry? IIUC it means we'll effectively do the slow path, and
I was wondering if that might be acceptable as a one-line bodge for
stable.

As above, I'd like if the actual fix could make this work for
GENERIC_IRQ_ENTRY rather than GENERIC_ENTRY, since that way we can make
this work as it was supposed to *before* moving to GENERIC_SYSCALL
(which has a whole lot more ABI impact to worry about).

I think that just needs a small amount of refactoring that arm64 will
need regardless.

Mark.

> 
> Thanks,
> 
>         tglx
> ---
> diff --git a/include/linux/rseq.h b/include/linux/rseq.h
> index 2266f4dc77b6..d55476e2a336 100644
> --- a/include/linux/rseq.h
> +++ b/include/linux/rseq.h
> @@ -30,7 +30,7 @@ void __rseq_signal_deliver(int sig, struct pt_regs *regs);
>   */
>  static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs)
>  {
> -	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
> +	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
>  		/* '&' is intentional to spare one conditional branch */
>  		if (current->rseq.event.has_rseq & current->rseq.event.user_irq)
>  			__rseq_signal_deliver(ksig->sig, regs);
> @@ -50,7 +50,7 @@ static __always_inline void rseq_sched_switch_event(struct task_struct *t)
>  {
>  	struct rseq_event *ev = &t->rseq.event;
>  
> -	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
> +	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
>  		/*
>  		 * Avoid a boat load of conditionals by using simple logic
>  		 * to determine whether NOTIFY_RESUME needs to be raised.
> diff --git a/include/linux/rseq_entry.h b/include/linux/rseq_entry.h
> index a36b472627de..8ccd464a108d 100644
> --- a/include/linux/rseq_entry.h
> +++ b/include/linux/rseq_entry.h
> @@ -80,7 +80,7 @@ bool rseq_debug_validate_ids(struct task_struct *t);
>  
>  static __always_inline void rseq_note_user_irq_entry(void)
>  {
> -	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY))
> +	if (IS_ENABLED(CONFIG_GENERIC_ENTRY))
>  		current->rseq.event.user_irq = true;
>  }
>  
> @@ -171,8 +171,8 @@ bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs,
>  		if (unlikely(usig != t->rseq.sig))
>  			goto die;
>  
> -		/* rseq_event.user_irq is only valid if CONFIG_GENERIC_IRQ_ENTRY=y */
> -		if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
> +		/* rseq_event.user_irq is only valid if CONFIG_GENERIC_ENTRY=y */
> +		if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
>  			/* If not in interrupt from user context, let it die */
>  			if (unlikely(!t->rseq.event.user_irq))
>  				goto die;
> @@ -387,7 +387,7 @@ static rseq_inline bool rseq_update_usr(struct task_struct *t, struct pt_regs *r
>  	 * allows to skip the critical section when the entry was not from
>  	 * a user space interrupt, unless debug mode is enabled.
>  	 */
> -	if (IS_ENABLED(CONFIG_GENERIC_IRQ_ENTRY)) {
> +	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
>  		if (!static_branch_unlikely(&rseq_debug_enabled)) {
>  			if (likely(!t->rseq.event.user_irq))
>  				return true;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22 18:11     ` Mark Rutland
@ 2026-04-22 19:47       ` Thomas Gleixner
  2026-04-23  1:48         ` Jinjie Ruan
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-22 19:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mathias Stearn, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Ingo Molnar, Jinjie Ruan, Blake Oler

On Wed, Apr 22 2026 at 19:11, Mark Rutland wrote:
> On Wed, Apr 22, 2026 at 07:49:30PM +0200, Thomas Gleixner wrote:
> Conceptually we just need to use syscall_enter_from_user_mode() and
> irqentry_enter_from_user_mode() appropriately.

Right. I figured that out.

> In practice, I can't use those as-is without introducing the exception
> masking problems I just fixed up for irqentry_enter_from_kernel_mode(),
> so I'll need to do some similar refactoring first.

See below.

> I haven't paged everything in yet, so just to cehck, is there anything
> that would behave incorrectly if current->rseq.event.user_irq were set
> for syscall entry? IIUC it means we'll effectively do the slow path, and
> I was wondering if that might be acceptable as a one-line bodge for
> stable.

It might work, but it's trivial enough to avoid that. See below. That on
top of 6.19.y makes the selftests pass too.

Thanks,

        tglx
---
 arch/arm64/kernel/entry-common.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -58,6 +58,12 @@ static void noinstr exit_to_kernel_mode(
 	irqentry_exit(regs, state);
 }
 
+static __always_inline void arm64_enter_from_user_mode_syscall(struct pt_regs *regs)
+{
+	enter_from_user_mode(regs);
+	mte_disable_tco_entry(current);
+}
+
 /*
  * Handle IRQ/context state management when entering from user mode.
  * Before this function is called it is not safe to call regular kernel code,
@@ -65,8 +71,8 @@ static void noinstr exit_to_kernel_mode(
  */
 static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)
 {
-	enter_from_user_mode(regs);
-	mte_disable_tco_entry(current);
+	arm64_enter_from_user_mode_syscall(regs);
+	rseq_note_user_irq_entry();
 }
 
 /*
@@ -717,7 +723,7 @@ static void noinstr el0_brk64(struct pt_
 
 static void noinstr el0_svc(struct pt_regs *regs)
 {
-	arm64_enter_from_user_mode(regs);
+	arm64_enter_from_user_mode_syscall(regs);
 	cortex_a76_erratum_1463225_svc_handler();
 	fpsimd_syscall_enter();
 	local_daif_restore(DAIF_PROCCTX);
@@ -869,7 +875,7 @@ static void noinstr el0_cp15(struct pt_r
 
 static void noinstr el0_svc_compat(struct pt_regs *regs)
 {
-	arm64_enter_from_user_mode(regs);
+	arm64_enter_from_user_mode_syscall(regs);
 	cortex_a76_erratum_1463225_svc_handler();
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_svc_compat(regs);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22 19:47       ` Thomas Gleixner
@ 2026-04-23  1:48         ` Jinjie Ruan
  2026-04-23  5:53           ` Dmitry Vyukov
  0 siblings, 1 reply; 32+ messages in thread
From: Jinjie Ruan @ 2026-04-23  1:48 UTC (permalink / raw)
  To: Thomas Gleixner, Mark Rutland
  Cc: Mathias Stearn, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Ingo Molnar, Blake Oler



On 4/23/2026 3:47 AM, Thomas Gleixner wrote:
> On Wed, Apr 22 2026 at 19:11, Mark Rutland wrote:
>> On Wed, Apr 22, 2026 at 07:49:30PM +0200, Thomas Gleixner wrote:
>> Conceptually we just need to use syscall_enter_from_user_mode() and
>> irqentry_enter_from_user_mode() appropriately.
> 
> Right. I figured that out.
> 
>> In practice, I can't use those as-is without introducing the exception
>> masking problems I just fixed up for irqentry_enter_from_kernel_mode(),
>> so I'll need to do some similar refactoring first.
> 
> See below.
> 
>> I haven't paged everything in yet, so just to cehck, is there anything
>> that would behave incorrectly if current->rseq.event.user_irq were set
>> for syscall entry? IIUC it means we'll effectively do the slow path, and
>> I was wondering if that might be acceptable as a one-line bodge for
>> stable.
> 
> It might work, but it's trivial enough to avoid that. See below. That on
> top of 6.19.y makes the selftests pass too.

This aligns with my thoughts when convert arm64 to generic syscall
entry. Currently, the arm64 entry code does not distinguish between IRQ
and syscall entries. It fails to call rseq_note_user_irq_entry() for IRQ
entries as the generic entry framework does, because arm64 uses
enter_from_user_mode() exclusively instead of
irqentry_enter_from_user_mode().

https://lore.kernel.org/all/20260320102620.1336796-10-ruanjinjie@huawei.com/

> 
> Thanks,
> 
>         tglx
> ---
>  arch/arm64/kernel/entry-common.c |   14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -58,6 +58,12 @@ static void noinstr exit_to_kernel_mode(
>  	irqentry_exit(regs, state);
>  }
>  
> +static __always_inline void arm64_enter_from_user_mode_syscall(struct pt_regs *regs)
> +{
> +	enter_from_user_mode(regs);
> +	mte_disable_tco_entry(current);
> +}
> +
>  /*
>   * Handle IRQ/context state management when entering from user mode.
>   * Before this function is called it is not safe to call regular kernel code,
> @@ -65,8 +71,8 @@ static void noinstr exit_to_kernel_mode(
>   */
>  static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)
>  {
> -	enter_from_user_mode(regs);
> -	mte_disable_tco_entry(current);
> +	arm64_enter_from_user_mode_syscall(regs);
> +	rseq_note_user_irq_entry();
>  }
>  
>  /*
> @@ -717,7 +723,7 @@ static void noinstr el0_brk64(struct pt_
>  
>  static void noinstr el0_svc(struct pt_regs *regs)
>  {
> -	arm64_enter_from_user_mode(regs);
> +	arm64_enter_from_user_mode_syscall(regs);
>  	cortex_a76_erratum_1463225_svc_handler();
>  	fpsimd_syscall_enter();
>  	local_daif_restore(DAIF_PROCCTX);
> @@ -869,7 +875,7 @@ static void noinstr el0_cp15(struct pt_r
>  
>  static void noinstr el0_svc_compat(struct pt_regs *regs)
>  {
> -	arm64_enter_from_user_mode(regs);
> +	arm64_enter_from_user_mode_syscall(regs);
>  	cortex_a76_erratum_1463225_svc_handler();
>  	local_daif_restore(DAIF_PROCCTX);
>  	do_el0_svc_compat(regs);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23  1:48         ` Jinjie Ruan
@ 2026-04-23  5:53           ` Dmitry Vyukov
  2026-04-23 10:39             ` Thomas Gleixner
                               ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Dmitry Vyukov @ 2026-04-23  5:53 UTC (permalink / raw)
  To: Jinjie Ruan, linux-man
  Cc: Thomas Gleixner, Mark Rutland, Mathias Stearn, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Chris Kennelly, regressions, linux-kernel, linux-arm-kernel,
	Peter Zijlstra, Ingo Molnar, Blake Oler

On Thu, 23 Apr 2026 at 03:48, Jinjie Ruan <ruanjinjie@huawei.com> wrote:
>
> On 4/23/2026 3:47 AM, Thomas Gleixner wrote:
> > On Wed, Apr 22 2026 at 19:11, Mark Rutland wrote:
> >> On Wed, Apr 22, 2026 at 07:49:30PM +0200, Thomas Gleixner wrote:
> >> Conceptually we just need to use syscall_enter_from_user_mode() and
> >> irqentry_enter_from_user_mode() appropriately.
> >
> > Right. I figured that out.
> >
> >> In practice, I can't use those as-is without introducing the exception
> >> masking problems I just fixed up for irqentry_enter_from_kernel_mode(),
> >> so I'll need to do some similar refactoring first.
> >
> > See below.
> >
> >> I haven't paged everything in yet, so just to cehck, is there anything
> >> that would behave incorrectly if current->rseq.event.user_irq were set
> >> for syscall entry? IIUC it means we'll effectively do the slow path, and
> >> I was wondering if that might be acceptable as a one-line bodge for
> >> stable.
> >
> > It might work, but it's trivial enough to avoid that. See below. That on
> > top of 6.19.y makes the selftests pass too.
>
> This aligns with my thoughts when convert arm64 to generic syscall
> entry. Currently, the arm64 entry code does not distinguish between IRQ
> and syscall entries. It fails to call rseq_note_user_irq_entry() for IRQ
> entries as the generic entry framework does, because arm64 uses
> enter_from_user_mode() exclusively instead of
> irqentry_enter_from_user_mode().
>
> https://lore.kernel.org/all/20260320102620.1336796-10-ruanjinjie@huawei.com/
>
> >
> > Thanks,
> >
> >         tglx
> > ---
> >  arch/arm64/kernel/entry-common.c |   14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > --- a/arch/arm64/kernel/entry-common.c
> > +++ b/arch/arm64/kernel/entry-common.c
> > @@ -58,6 +58,12 @@ static void noinstr exit_to_kernel_mode(
> >       irqentry_exit(regs, state);
> >  }
> >
> > +static __always_inline void arm64_enter_from_user_mode_syscall(struct pt_regs *regs)
> > +{
> > +     enter_from_user_mode(regs);
> > +     mte_disable_tco_entry(current);
> > +}
> > +
> >  /*
> >   * Handle IRQ/context state management when entering from user mode.
> >   * Before this function is called it is not safe to call regular kernel code,
> > @@ -65,8 +71,8 @@ static void noinstr exit_to_kernel_mode(
> >   */
> >  static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)
> >  {
> > -     enter_from_user_mode(regs);
> > -     mte_disable_tco_entry(current);
> > +     arm64_enter_from_user_mode_syscall(regs);
> > +     rseq_note_user_irq_entry();
> >  }
> >
> >  /*
> > @@ -717,7 +723,7 @@ static void noinstr el0_brk64(struct pt_
> >
> >  static void noinstr el0_svc(struct pt_regs *regs)
> >  {
> > -     arm64_enter_from_user_mode(regs);
> > +     arm64_enter_from_user_mode_syscall(regs);
> >       cortex_a76_erratum_1463225_svc_handler();
> >       fpsimd_syscall_enter();
> >       local_daif_restore(DAIF_PROCCTX);
> > @@ -869,7 +875,7 @@ static void noinstr el0_cp15(struct pt_r
> >
> >  static void noinstr el0_svc_compat(struct pt_regs *regs)
> >  {
> > -     arm64_enter_from_user_mode(regs);
> > +     arm64_enter_from_user_mode_syscall(regs);
> >       cortex_a76_erratum_1463225_svc_handler();
> >       local_daif_restore(DAIF_PROCCTX);
> >       do_el0_svc_compat(regs);


+linux-man

This part of the rseq man page needs to be fixed as well I think. The
kernel no longer reliably provides clearing of rseq_cs on preemption,
right?

https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241

"and set to NULL by the kernel when it restarts an assembly
instruction sequence block,
as well as when the kernel detects that it is preempting or delivering
a signal outside of the range targeted by the rseq_cs."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-22 13:13   ` Peter Zijlstra
@ 2026-04-23 10:38     ` Mathias Stearn
       [not found]     ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Mathias Stearn @ 2026-04-23 10:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
	Mark Rutland, Jinjie Ruan, Blake Oler

On Wed, Apr 22, 2026 at 3:13 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 22, 2026 at 02:56:47PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
> >
> > > Additionally, it breaks tcmalloc specifically by failing to overwrite
> > > the cpu_id_start field at points where it was relied on for
> > > correctness.
> >
> > This specific behaviour was documented as being wrong and running with
> > DEBUG_RSEQ would have flagged it.
> >
> > The tcmalloc issue has been contentious for a long time. The tcmalloc
> > folks relied on something that was documented to be wrong. It has been
> > reported to the tcmalloc people many years ago and if you were to run
> > tcmalloc on most any kernel (very much including 6.19) with
> > DEBUG_RSEQ=y, it would have yelled.
> >
> > The tcmalloc people didn't care. There was a proposal for an RSEQ
> > extension for what they need, and they didn't care. All this should be
> > in their bugzilla or whatever.
> >
> > The RSEQ rework improved performance significantly for everyone, and
> > kept all the documented behaviour (+- arm64 bug). Tcmalloc got screwed
> > over because they relied on implementation behaviour that was
> > specifically documented to be broken. And they didn't care. Google was
> > very much aware of this. And hasn't lifted a finger to remedy it.
>
> Also: https://lore.kernel.org/all/874io5andc.ffs@tglx/

(Sorry for the resend to folks who got this already - I got an alert
that it was rejected by the mailinglists because it contained html so
attempting to resend as plain text)

I won't claim that tcmalloc _should_ be abusing cpu_id_start as it is.
I agree that it seems questionable at best. However, I will strongly
disagree with the following comment in that message:

> What it not longer does is updating the
> CPU number for the preemption case on the same CPU
> because that's just a massive waste of CPU cycles.

I don't think it will cost _any_ cycles to implement what I proposed.
And it especially should have no impact from just enabling rseq on a
thread as glibc now does. It should only result in different
instructions being executed when the program actually _uses_ rseq by
setting the rseq_cs variable to a non-null pointer. I will repeat the
proposal with a bit more commentary in case you missed some of the
details that make it free:

Any time a critical section might be aborted (migration, preemption,
signal delivery, and membarrier IPI), the kernel _already_ must check
the rseq_cs field to see if the thread is in a critical section [and
if it is null because the program isn't using rseq critical sections,
no further action is taken]. This is documented as nulling the pointer
after (I assume to make later checks cheaper) [if this changed, then
it *is* a change in _documented behavior_, not just an implementation
detail]. It would be sufficient for tcmalloc's internal usage if every
time the kernel nulled out rseq_cs, it also wrote the cpu id to
cpu_id_start. [This is one additional store to a cacheline you are
already writing to so it should be ~free on modern OoO CPUs and cheap
on others. There might be a small cost to loading the current cpu, but
since nothing depends on that other than the store, I still expect it
to be ~free]

To make this more concrete, I am proposing adding

unsafe_put_user((u32)task_cpu(t), &t->rseq.usrptr->cpu_id_start, efault);

after each place where you currently do

unsafe_put_user(0ULL, &t->rseq.usrptr->rseq_cs, efault);

in rseq_update_user_cs. Is that something that you would expect to
cause a performance issue?

Again, I'm not claiming that it is "good" that this needs to be done.
But it does seem like a small price to pay to keep existing binaries
working on new kernels. Quoting the first paragraph of
https://docs.kernel.org/admin-guide/reporting-regressions.html:

> “We don’t cause regressions” is the first rule of Linux kernel development; Linux founder and lead developer Linus Torvalds established it himself and ensures it’s obeyed.

I don't see anything on that page that says it doesn't count as a
regression if the userspace program "relied on implementation
behaviour that was specifically documented to be broken".

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23  5:53           ` Dmitry Vyukov
@ 2026-04-23 10:39             ` Thomas Gleixner
  2026-04-23 10:51               ` Mathias Stearn
  2026-04-23 12:11             ` Alejandro Colomar
  2026-04-23 12:29             ` Mathieu Desnoyers
  2 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-23 10:39 UTC (permalink / raw)
  To: Dmitry Vyukov, Jinjie Ruan, linux-man
  Cc: Mark Rutland, Mathias Stearn, Mathieu Desnoyers, Catalin Marinas,
	Will Deacon, Boqun Feng, Paul E. McKenney, Chris Kennelly,
	regressions, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Ingo Molnar, Blake Oler

On Thu, Apr 23 2026 at 07:53, Dmitry Vyukov wrote:
> On Thu, 23 Apr 2026 at 03:48, Jinjie Ruan <ruanjinjie@huawei.com> wrote:
>
> This part of the rseq man page needs to be fixed as well I think. The
> kernel no longer reliably provides clearing of rseq_cs on preemption,
> right?
>
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241
>
> "and set to NULL by the kernel when it restarts an assembly
> instruction sequence block,
> as well as when the kernel detects that it is preempting or delivering
> a signal outside of the range targeted by the rseq_cs."

The kernel clears rseq_cs reliably when user space was interrupted and:

    the task was preempted
or
    the return from interrupt delivers a signal

If the task invoked a syscall then there is absolutely no reason to do
either of this because syscalls from within a critical section are a
bug and catched when enabling rseq debugging.

The original code did this along with unconditionally updating CPU/MMCID
which resulted in ~15% performance regression on a syscall heavy
database benchmark once glibc started to register rseq.

Thanks,

        tglx







^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 10:39             ` Thomas Gleixner
@ 2026-04-23 10:51               ` Mathias Stearn
  2026-04-23 12:24                 ` David Laight
  2026-04-23 19:31                 ` Thomas Gleixner
  0 siblings, 2 replies; 32+ messages in thread
From: Mathias Stearn @ 2026-04-23 10:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Vyukov, Jinjie Ruan, linux-man, Mark Rutland,
	Mathieu Desnoyers, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Blake Oler

On Thu, Apr 23, 2026 at 12:39 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> The kernel clears rseq_cs reliably when user space was interrupted and:
>
>     the task was preempted
> or
>     the return from interrupt delivers a signal
>
> If the task invoked a syscall then there is absolutely no reason to do
> either of this because syscalls from within a critical section are a
> bug and catched when enabling rseq debugging.
>
> The original code did this along with unconditionally updating CPU/MMCID
> which resulted in ~15% performance regression on a syscall heavy
> database benchmark once glibc started to register rseq.

Just to be clear TCMalloc does not need either rseq_cs to be cleared
or cpu_id_start to be written to on syscalls because it doesn't do
syscalls from critical sections. It will actually benefit (slightly)
from not updating cpu_id_start on syscalls.

It is specifically in the cases where an rseq would need to be aborted
(preemption, signals, migration, and membarrier IPI with the rseq
flag) that TCMalloc relies on cpu_id_start being written. It does rely
on that write even when not inside the critical section, because it
effectively uses that to detect if there were any would-cause-abort
events in between two critical sections. But since it leaves the
rseq_cs pointer non-null between critical sections, so you dont need
to add _any_ overhead for programs that never make use of rseq after
registration, or add any overhead to syscalls even for those who do.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
       [not found]     ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
@ 2026-04-23 11:48       ` Thomas Gleixner
  2026-04-23 12:11         ` Mathias Stearn
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-23 11:48 UTC (permalink / raw)
  To: Mathias Stearn, Peter Zijlstra
  Cc: Mathieu Desnoyers, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, Dmitry Vyukov, regressions,
	linux-kernel, linux-arm-kernel, Ingo Molnar, Mark Rutland,
	Jinjie Ruan, Blake Oler

On Thu, Apr 23 2026 at 11:24, Mathias Stearn wrote:
> On Wed, Apr 22, 2026 at 3:13 PM Peter Zijlstra <peterz@infradead.org> wrote:
> To make this more concrete, I am proposing adding
>
> unsafe_put_user((u32)task_cpu(t), &t->rseq.usrptr->cpu_id_start, efault);
>
> after each place where you currently do
>
> unsafe_put_user(0ULL, &t->rseq.usrptr->rseq_cs, efault);
>
> in rseq_update_user_cs. Is that something that you would expect to cause a
> performance issue?

That would work and not bring the performance issues back, but:

  1) Did you validate that adding the reset into rseq_update_user_cs() is
     actually sufficient?

     If adding it to rseq_update_user_cs() is not sufficient, then we
     have a really serious problem. Because we'd need to go back and do
     it unconditionally, which then makes the 15% performance
     regression, which happened when glibc enabled rseq, come back
     instantaneously. And in that case the damage for tcmalloc() is the
     lesser of two evils.

  2) The tcmalloc abuse breaks the documented and guaranteed user space
     ABI and therefore it makes it impossible for any other library in
     an application which uses tcmalloc to rely on the documented and
     guaranteed rseq::cpu_id_start/rseq::cpu_id semantics.

     Which means, that tcmalloc is holding everybody else hostage.
     That's just not acceptable. Not even under the no regression rule.

  3) The fact that tcmalloc prevents a user from enabling rseq debugging
     is equally unacceptable as it does not allow me to validate my own
     rseq magic code in my mongodb client because enabling it will make
     the DB I want to test against go away.

     Again tcmalloc holds everybody else hostage for no reason at all.

The most amazing part is that tcmalloc uses this to spare two
instruction cycles, but nobody noticed in 8 years how much performance
the unconditional rseq nonsense in the kernel left on the table.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 11:48       ` Thomas Gleixner
@ 2026-04-23 12:11         ` Mathias Stearn
  2026-04-23 17:19           ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Mathias Stearn @ 2026-04-23 12:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
	Mark Rutland, Jinjie Ruan, Blake Oler

On Thu, Apr 23, 2026 at 1:48 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> That would work and not bring the performance issues back, but:
>
>   1) Did you validate that adding the reset into rseq_update_user_cs() is
>      actually sufficient?

Not yet, although I confirmed with the tcmalloc maintainers that they
thought it would be sufficient before suggesting it. I'm currently
building your patch from upthread to test that out. I can try this
after, although I don't think I'll be able to get to that today. I'll
try to get a coworker to test it though.

>      Which means, that tcmalloc is holding everybody else hostage.
>      That's just not acceptable. Not even under the no regression rule.

Agree. I don't love the situation either. Or that we need to advise
setting the environment variable to tell glibc not to use rseq. But I
also want our users to be able to use existing mongo binaries on new
kernels.

>   3) The fact that tcmalloc prevents a user from enabling rseq debugging
>      is equally unacceptable as it does not allow me to validate my own
>      rseq magic code in my mongodb client because enabling it will make
>      the DB I want to test against go away.

Glad to hear you use mongodb :)

> The most amazing part is that tcmalloc uses this to spare two
> instruction cycles, but nobody noticed in 8 years how much performance
> the unconditional rseq nonsense in the kernel left on the table.

I am looking into a change to our copy of tcmalloc to have it stop
squatting on cpu_id_start, and will run that through our correctness
and performance tests. I can't promise anything (and I certainly can't
speak for what Google may choose to do), but I share your expectation
that it should be possible with minimal impact. It _is_ more than 2
cycles though, since it extends the load dependency chain by one or
two pointer chases and a bit of ALU ops. I'd guesstimate it will
likely cost on the order of 5-10 cycles per call to malloc or free. I
think we can absorb that, but will need to test.

Of course, even if we make that change, it will only apply to _future_
binaries. That's why we prefer a kernel fix so that users will be able
to run our existing releases (or any containers that use them) on a
modern kernel.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23  5:53           ` Dmitry Vyukov
  2026-04-23 10:39             ` Thomas Gleixner
@ 2026-04-23 12:11             ` Alejandro Colomar
  2026-04-23 12:54               ` Mathieu Desnoyers
  2026-04-23 12:29             ` Mathieu Desnoyers
  2 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar @ 2026-04-23 12:11 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jinjie Ruan, linux-man, Thomas Gleixner, Mark Rutland,
	Mathias Stearn, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, regressions,
	linux-kernel, linux-arm-kernel, Peter Zijlstra, Ingo Molnar,
	Blake Oler, Michael Jeanson

[-- Attachment #1: Type: text/plain, Size: 4335 bytes --]

Hello Dmitry,

On 2026-04-23T07:53:55+0200, Dmitry Vyukov wrote:
> On Thu, 23 Apr 2026 at 03:48, Jinjie Ruan <ruanjinjie@huawei.com> wrote:
> >
> > On 4/23/2026 3:47 AM, Thomas Gleixner wrote:
> > > On Wed, Apr 22 2026 at 19:11, Mark Rutland wrote:
> > >> On Wed, Apr 22, 2026 at 07:49:30PM +0200, Thomas Gleixner wrote:
> > >> Conceptually we just need to use syscall_enter_from_user_mode() and
> > >> irqentry_enter_from_user_mode() appropriately.
> > >
> > > Right. I figured that out.
> > >
> > >> In practice, I can't use those as-is without introducing the exception
> > >> masking problems I just fixed up for irqentry_enter_from_kernel_mode(),
> > >> so I'll need to do some similar refactoring first.
> > >
> > > See below.
> > >
> > >> I haven't paged everything in yet, so just to cehck, is there anything
> > >> that would behave incorrectly if current->rseq.event.user_irq were set
> > >> for syscall entry? IIUC it means we'll effectively do the slow path, and
> > >> I was wondering if that might be acceptable as a one-line bodge for
> > >> stable.
> > >
> > > It might work, but it's trivial enough to avoid that. See below. That on
> > > top of 6.19.y makes the selftests pass too.
> >
> > This aligns with my thoughts when convert arm64 to generic syscall
> > entry. Currently, the arm64 entry code does not distinguish between IRQ
> > and syscall entries. It fails to call rseq_note_user_irq_entry() for IRQ
> > entries as the generic entry framework does, because arm64 uses
> > enter_from_user_mode() exclusively instead of
> > irqentry_enter_from_user_mode().
> >
> > https://lore.kernel.org/all/20260320102620.1336796-10-ruanjinjie@huawei.com/
> >
> > >
> > > Thanks,
> > >
> > >         tglx
> > > ---
> > >  arch/arm64/kernel/entry-common.c |   14 ++++++++++----
> > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > >
> > > --- a/arch/arm64/kernel/entry-common.c
> > > +++ b/arch/arm64/kernel/entry-common.c
> > > @@ -58,6 +58,12 @@ static void noinstr exit_to_kernel_mode(
> > >       irqentry_exit(regs, state);
> > >  }
> > >
> > > +static __always_inline void arm64_enter_from_user_mode_syscall(struct pt_regs *regs)
> > > +{
> > > +     enter_from_user_mode(regs);
> > > +     mte_disable_tco_entry(current);
> > > +}
> > > +
> > >  /*
> > >   * Handle IRQ/context state management when entering from user mode.
> > >   * Before this function is called it is not safe to call regular kernel code,
> > > @@ -65,8 +71,8 @@ static void noinstr exit_to_kernel_mode(
> > >   */
> > >  static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)
> > >  {
> > > -     enter_from_user_mode(regs);
> > > -     mte_disable_tco_entry(current);
> > > +     arm64_enter_from_user_mode_syscall(regs);
> > > +     rseq_note_user_irq_entry();
> > >  }
> > >
> > >  /*
> > > @@ -717,7 +723,7 @@ static void noinstr el0_brk64(struct pt_
> > >
> > >  static void noinstr el0_svc(struct pt_regs *regs)
> > >  {
> > > -     arm64_enter_from_user_mode(regs);
> > > +     arm64_enter_from_user_mode_syscall(regs);
> > >       cortex_a76_erratum_1463225_svc_handler();
> > >       fpsimd_syscall_enter();
> > >       local_daif_restore(DAIF_PROCCTX);
> > > @@ -869,7 +875,7 @@ static void noinstr el0_cp15(struct pt_r
> > >
> > >  static void noinstr el0_svc_compat(struct pt_regs *regs)
> > >  {
> > > -     arm64_enter_from_user_mode(regs);
> > > +     arm64_enter_from_user_mode_syscall(regs);
> > >       cortex_a76_erratum_1463225_svc_handler();
> > >       local_daif_restore(DAIF_PROCCTX);
> > >       do_el0_svc_compat(regs);
> 
> 
> +linux-man
> 
> This part of the rseq man page needs to be fixed as well I think. The
> kernel no longer reliably provides clearing of rseq_cs on preemption,
> right?
> 
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241

+Michael Jeanson

That page seems to be maintained separately, as part of the librseq
project.


Have a lovely day!
Alex

> 
> "and set to NULL by the kernel when it restarts an assembly
> instruction sequence block,
> as well as when the kernel detects that it is preempting or delivering
> a signal outside of the range targeted by the rseq_cs."
> 

-- 
<https://www.alejandro-colomar.es>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 10:51               ` Mathias Stearn
@ 2026-04-23 12:24                 ` David Laight
  2026-04-23 19:31                 ` Thomas Gleixner
  1 sibling, 0 replies; 32+ messages in thread
From: David Laight @ 2026-04-23 12:24 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Thomas Gleixner, Dmitry Vyukov, Jinjie Ruan, linux-man,
	Mark Rutland, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, regressions,
	linux-kernel, linux-arm-kernel, Peter Zijlstra, Ingo Molnar,
	Blake Oler

On Thu, 23 Apr 2026 12:51:22 +0200
Mathias Stearn <mathias@mongodb.com> wrote:

> On Thu, Apr 23, 2026 at 12:39 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > The kernel clears rseq_cs reliably when user space was interrupted and:
> >
> >     the task was preempted
> > or
> >     the return from interrupt delivers a signal
> >
> > If the task invoked a syscall then there is absolutely no reason to do
> > either of this because syscalls from within a critical section are a
> > bug and catched when enabling rseq debugging.
> >
> > The original code did this along with unconditionally updating CPU/MMCID
> > which resulted in ~15% performance regression on a syscall heavy
> > database benchmark once glibc started to register rseq.  
> 
> Just to be clear TCMalloc does not need either rseq_cs to be cleared
> or cpu_id_start to be written to on syscalls because it doesn't do
> syscalls from critical sections. It will actually benefit (slightly)
> from not updating cpu_id_start on syscalls.
> 
> It is specifically in the cases where an rseq would need to be aborted
> (preemption, signals, migration, and membarrier IPI with the rseq
> flag) that TCMalloc relies on cpu_id_start being written. It does rely
> on that write even when not inside the critical section, because it
> effectively uses that to detect if there were any would-cause-abort
> events in between two critical sections. But since it leaves the
> rseq_cs pointer non-null between critical sections, so you dont need
> to add _any_ overhead for programs that never make use of rseq after
> registration, or add any overhead to syscalls even for those who do.
> 

That sounds like one long rseq sequence where the 'restart' path
detects that some of the operations have already been done.

	David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23  5:53           ` Dmitry Vyukov
  2026-04-23 10:39             ` Thomas Gleixner
  2026-04-23 12:11             ` Alejandro Colomar
@ 2026-04-23 12:29             ` Mathieu Desnoyers
  2026-04-23 12:36               ` Dmitry Vyukov
  2 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-04-23 12:29 UTC (permalink / raw)
  To: Dmitry Vyukov, Jinjie Ruan, linux-man
  Cc: Thomas Gleixner, Mark Rutland, Mathias Stearn, Catalin Marinas,
	Will Deacon, Boqun Feng, Paul E. McKenney, Chris Kennelly,
	regressions, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Ingo Molnar, Blake Oler

On 2026-04-23 01:53, Dmitry Vyukov wrote:
[...]
> +linux-man
> 
> This part of the rseq man page needs to be fixed as well I think. The
> kernel no longer reliably provides clearing of rseq_cs on preemption,
> right?
> 
> https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241

I'm maintaining this manual page in librseq.

> 
> "and set to NULL by the kernel when it restarts an assembly
> instruction sequence block,
> as well as when the kernel detects that it is preempting or delivering
> a signal outside of the range targeted by the rseq_cs."

I think you got two things confused here.

1) There is currently a bug on arm64 where it fails to honor the
    rseq ABI contract wrt critical section abort. AFAIU there is a
    fix proposed for this.

2) Thomas relaxed the implementation of cpu_id_start field updates
    so it only stores to the rseq area when the current cpu actually
    changes (migration).

So AFAIU the statement in the man page is still fine. It's just arm64
that needs fixing.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 12:29             ` Mathieu Desnoyers
@ 2026-04-23 12:36               ` Dmitry Vyukov
  2026-04-23 12:53                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Dmitry Vyukov @ 2026-04-23 12:36 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jinjie Ruan, linux-man, Thomas Gleixner, Mark Rutland,
	Mathias Stearn, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Blake Oler

On Thu, 23 Apr 2026 at 14:29, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2026-04-23 01:53, Dmitry Vyukov wrote:
> [...]
> > +linux-man
> >
> > This part of the rseq man page needs to be fixed as well I think. The
> > kernel no longer reliably provides clearing of rseq_cs on preemption,
> > right?
> >
> > https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241
>
> I'm maintaining this manual page in librseq.
>
> >
> > "and set to NULL by the kernel when it restarts an assembly
> > instruction sequence block,
> > as well as when the kernel detects that it is preempting or delivering
> > a signal outside of the range targeted by the rseq_cs."
>
> I think you got two things confused here.
>
> 1) There is currently a bug on arm64 where it fails to honor the
>     rseq ABI contract wrt critical section abort. AFAIU there is a
>     fix proposed for this.
>
> 2) Thomas relaxed the implementation of cpu_id_start field updates
>     so it only stores to the rseq area when the current cpu actually
>     changes (migration).
>
> So AFAIU the statement in the man page is still fine. It's just arm64
> that needs fixing.


My understanding was that due to the ev->user_irq check here:

+static __always_inline void rseq_sched_switch_event(struct task_struct *t)
...
+               bool raise = (ev->user_irq | ev->ids_changed) & ev->has_rseq;
+
+               if (raise) {
+                       ev->sched_switch = true;
+                       rseq_raise_notify_resume(t);
+               }

There won't be any rseq-related processing for threads preempted in
syscalls, which means that rseq_cs won't be NULLed for threads
preempted inside of syscalls.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 12:36               ` Dmitry Vyukov
@ 2026-04-23 12:53                 ` Mathieu Desnoyers
  2026-04-23 12:58                   ` Dmitry Vyukov
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-04-23 12:53 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jinjie Ruan, linux-man, Thomas Gleixner, Mark Rutland,
	Mathias Stearn, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Blake Oler,
	Michael Jeanson

On 2026-04-23 08:36, Dmitry Vyukov wrote:
> On Thu, 23 Apr 2026 at 14:29, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2026-04-23 01:53, Dmitry Vyukov wrote:
>> [...]
>>> +linux-man
>>>
>>> This part of the rseq man page needs to be fixed as well I think. The
>>> kernel no longer reliably provides clearing of rseq_cs on preemption,
>>> right?
>>>
>>> https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241
>>
>> I'm maintaining this manual page in librseq.
>>
>>>
>>> "and set to NULL by the kernel when it restarts an assembly
>>> instruction sequence block,
>>> as well as when the kernel detects that it is preempting or delivering
>>> a signal outside of the range targeted by the rseq_cs."
>>
>> I think you got two things confused here.
>>
>> 1) There is currently a bug on arm64 where it fails to honor the
>>      rseq ABI contract wrt critical section abort. AFAIU there is a
>>      fix proposed for this.
>>
>> 2) Thomas relaxed the implementation of cpu_id_start field updates
>>      so it only stores to the rseq area when the current cpu actually
>>      changes (migration).
>>
>> So AFAIU the statement in the man page is still fine. It's just arm64
>> that needs fixing.
> 
> 
> My understanding was that due to the ev->user_irq check here:
> 
> +static __always_inline void rseq_sched_switch_event(struct task_struct *t)
> ...
> +               bool raise = (ev->user_irq | ev->ids_changed) & ev->has_rseq;
> +
> +               if (raise) {
> +                       ev->sched_switch = true;
> +                       rseq_raise_notify_resume(t);
> +               }
> 
> There won't be any rseq-related processing for threads preempted in
> syscalls, which means that rseq_cs won't be NULLed for threads
> preempted inside of syscalls.

Let's see if I understand your concern correctly. Scenario:

A thread is within a rseq critical section. It exits the critical
section without clearing the rseq_cs pointer, expecting the kernel
to lazily clear the rseq_cs pointer eventually when it detects that
it's not nested on top of the userspace critical section anymore.
It then calls a system call _outside_ of the rseq critical section,
but with rseq_cs pointer set. Based on the rseq man page wording,
it would then expect the preemption within the system call to guarantee
clearing that that pointer.

Here is the relevant comment block in the man page:

                      Updated by user-space, which sets the address of  the  cur‐
                      rently active rseq_cs at the beginning of assembly instruc‐
                      tion sequence block, and set to NULL by the kernel when  it
                      restarts an assembly instruction sequence block, as well as
>>>>>>>>>
                      when the kernel detects that it is preempting or delivering
                      a  signal  outside  of  the  range targeted by the rseq_cs.
>>>>>>>>>
                           ^^^ this

The whole point about lazy-clearing of rseq_cs is that it _may_ happen when
the kernel preempts or delivers a signal (or at any point really), but it's
just an optimization.

Updating the manual page with this wording would match the intent:

                      Updated by user-space, which sets the address of  the  cur‐
                      rently active rseq_cs at the beginning of assembly instruc‐
                      tion sequence block, and set to NULL by the kernel when  it
                      restarts an assembly instruction sequence block. May be set
                      to NULL by the kernel when it detects that the current
                      instruction pointer is outside of the range targeted by
                      the rseq_cs.
                      Also needs to be set to NULL by user-space before  reclaim‐
                      ing memory that contains the targeted struct rseq_cs.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 12:11             ` Alejandro Colomar
@ 2026-04-23 12:54               ` Mathieu Desnoyers
  0 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-04-23 12:54 UTC (permalink / raw)
  To: Alejandro Colomar, Dmitry Vyukov
  Cc: Jinjie Ruan, linux-man, Thomas Gleixner, Mark Rutland,
	Mathias Stearn, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Blake Oler,
	Michael Jeanson

On 2026-04-23 08:11, Alejandro Colomar wrote:
[...]
>>
>> +linux-man
>>
>> This part of the rseq man page needs to be fixed as well I think. The
>> kernel no longer reliably provides clearing of rseq_cs on preemption,
>> right?
>>
>> https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241
> 
> +Michael Jeanson
> 
> That page seems to be maintained separately, as part of the librseq
> project.

Yes, I maintain the librseq project, thanks Alejandro!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 12:53                 ` Mathieu Desnoyers
@ 2026-04-23 12:58                   ` Dmitry Vyukov
  0 siblings, 0 replies; 32+ messages in thread
From: Dmitry Vyukov @ 2026-04-23 12:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jinjie Ruan, linux-man, Thomas Gleixner, Mark Rutland,
	Mathias Stearn, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Blake Oler,
	Michael Jeanson

On Thu, 23 Apr 2026 at 14:53, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2026-04-23 08:36, Dmitry Vyukov wrote:
> > On Thu, 23 Apr 2026 at 14:29, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2026-04-23 01:53, Dmitry Vyukov wrote:
> >> [...]
> >>> +linux-man
> >>>
> >>> This part of the rseq man page needs to be fixed as well I think. The
> >>> kernel no longer reliably provides clearing of rseq_cs on preemption,
> >>> right?
> >>>
> >>> https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2#n241
> >>
> >> I'm maintaining this manual page in librseq.
> >>
> >>>
> >>> "and set to NULL by the kernel when it restarts an assembly
> >>> instruction sequence block,
> >>> as well as when the kernel detects that it is preempting or delivering
> >>> a signal outside of the range targeted by the rseq_cs."
> >>
> >> I think you got two things confused here.
> >>
> >> 1) There is currently a bug on arm64 where it fails to honor the
> >>      rseq ABI contract wrt critical section abort. AFAIU there is a
> >>      fix proposed for this.
> >>
> >> 2) Thomas relaxed the implementation of cpu_id_start field updates
> >>      so it only stores to the rseq area when the current cpu actually
> >>      changes (migration).
> >>
> >> So AFAIU the statement in the man page is still fine. It's just arm64
> >> that needs fixing.
> >
> >
> > My understanding was that due to the ev->user_irq check here:
> >
> > +static __always_inline void rseq_sched_switch_event(struct task_struct *t)
> > ...
> > +               bool raise = (ev->user_irq | ev->ids_changed) & ev->has_rseq;
> > +
> > +               if (raise) {
> > +                       ev->sched_switch = true;
> > +                       rseq_raise_notify_resume(t);
> > +               }
> >
> > There won't be any rseq-related processing for threads preempted in
> > syscalls, which means that rseq_cs won't be NULLed for threads
> > preempted inside of syscalls.
>
> Let's see if I understand your concern correctly. Scenario:
>
> A thread is within a rseq critical section. It exits the critical
> section without clearing the rseq_cs pointer, expecting the kernel
> to lazily clear the rseq_cs pointer eventually when it detects that
> it's not nested on top of the userspace critical section anymore.
> It then calls a system call _outside_ of the rseq critical section,
> but with rseq_cs pointer set. Based on the rseq man page wording,
> it would then expect the preemption within the system call to guarantee
> clearing that that pointer.

Yes, this is the scenario I had in mind.

> Here is the relevant comment block in the man page:
>
>                       Updated by user-space, which sets the address of  the  cur‐
>                       rently active rseq_cs at the beginning of assembly instruc‐
>                       tion sequence block, and set to NULL by the kernel when  it
>                       restarts an assembly instruction sequence block, as well as
> >>>>>>>>>
>                       when the kernel detects that it is preempting or delivering
>                       a  signal  outside  of  the  range targeted by the rseq_cs.
> >>>>>>>>>
>                            ^^^ this
>
> The whole point about lazy-clearing of rseq_cs is that it _may_ happen when
> the kernel preempts or delivers a signal (or at any point really), but it's
> just an optimization.
>
> Updating the manual page with this wording would match the intent:
>
>                       Updated by user-space, which sets the address of  the  cur‐
>                       rently active rseq_cs at the beginning of assembly instruc‐
>                       tion sequence block, and set to NULL by the kernel when  it
>                       restarts an assembly instruction sequence block. May be set
>                       to NULL by the kernel when it detects that the current
>                       instruction pointer is outside of the range targeted by
>                       the rseq_cs.
>                       Also needs to be set to NULL by user-space before  reclaim‐
>                       ing memory that contains the targeted struct rseq_cs.
>
> Thoughts ?
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 12:11         ` Mathias Stearn
@ 2026-04-23 17:19           ` Thomas Gleixner
  2026-04-23 17:38             ` Chris Kennelly
  2026-04-23 17:41             ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-23 17:19 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Peter Zijlstra, Mathieu Desnoyers, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Chris Kennelly, Dmitry Vyukov,
	regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
	Mark Rutland, Jinjie Ruan, Blake Oler, Linus Torvalds

On Thu, Apr 23 2026 at 14:11, Mathias Stearn wrote:

Cc+ Linus

> Of course, even if we make that change, it will only apply to _future_
> binaries. That's why we prefer a kernel fix so that users will be able
> to run our existing releases (or any containers that use them) on a
> modern kernel.

I understand that and as everyone else I would be happy to do that, but
the price everyone pays for proliferating the tcmalloc insanity is not
cheap either.

So let me recap the whole situation and how we got there:

  1) The original RSEQ implementation updates the rseq::cpu_id_start
     field in user space more or less unconditionally on every exit to
     user, whether the CPU/MMCID have been changed or not.

     That went unnoticed for years because nothing used rseq aside of
     google and tcmalloc. Once glibc registered rseq, this resulted in a
     up to 15% performance penalty for syscall heavy workloads.

  2) The rseq::cpu_id_start field is documented as read only for user
     space in the ABI contract and guaranteed to be updated by the
     kernel when a task is migrated to a different CPU.

  3) The RO for userspace property has been enforced by RSEQ debugging
     mode since day one. If such a debug enabled kernel detects user
     space changing the field it kills the task/application.

  4) tcmalloc abused the suboptimal implementation (see #1) and
     scribbled over rseq::cpu_id_start for their own nefarious purposes.

  5) As a consequence of #4 tcmalloc cannot be used on a RSEQ debug
     enabled kernel. Which means a developer cannot validate his RSEQ
     code against a debug kernel when tcmalloc is in use on the system
     as that would crash the tcmalloc dependent applications due to #3.

  6) As a consequence of #4 tcmalloc cannot be used together with any
     other facility/library which wants to utilize the ABI guaranteed
     properties of rseq::cpu_id_start in the same application.

  7) tcmalloc violates the ABI from day one and has since refused to
     address the problem despite being offered a kernel side rseq
     extension to solve it many years ago.

  8) When addressing the performance issues of RSEQ the unconditional
     update stopped to exist under the valid assumption that the kernel
     has only to satisfy the guaranteed ABI properties, especially when
     they are enforcable by RSEQ debug.

     As a consequence this exposed the tcmalloc ABI violation because
     the unconditional pointless overwriting of something which did not
     change stopped to happen.

Due to #4 everyone is in a hard place and up a creek without a paddle.

Here are the possible solutions:

  A) Mathias suggested to force overwrite rseq:cpu_id_start everytime
     the rseq::rseq_cs field is cleared by the kernel under the not yet
     validated theoretical assumption that this cures the problem for
     tcmalloc.

     If that's sufficient that would be harmless performance wise
     because the write would be inside the already existing STAC/CLAC
     section and just add some more noise to the rseq critical section
     operations.

     That would allow existing tcmalloc usage to continue, but
     obviously would neither solve #5 and #6 above nor provide an
     incentive for tcmalloc to actually fix their crap.

  B) If that's not sufficient then keeping tcmalloc alive would require
     to go back to the previous state and let everyone else pay the
     price in terms of performance overhead.

  C) Declare that this is not a regression because the ABI guarantee is
     not violated and the RO property has been enforcable by RSEQ
     debugging since day one.

In my opinion #C is the right thing to do, but I can see a case being
made for the lightweight fix Mathias suggested (#A) _if_ and only _if_
that is sufficient. Picking #A would also mean that user space people
have to take up the fight against tcmalloc when they want to use the
RSEQ guaranteed ABI along with tcmalloc in the same application or use a
RSEQ debug kernel to validate their own code.

Going back to the full unconditional nightmare (#B) is not an option at
all as anybody else has to take the massive performance hit.

Oh well...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:19           ` Thomas Gleixner
@ 2026-04-23 17:38             ` Chris Kennelly
  2026-04-23 17:47               ` Mathieu Desnoyers
  2026-04-23 19:39               ` Thomas Gleixner
  2026-04-23 17:41             ` Linus Torvalds
  1 sibling, 2 replies; 32+ messages in thread
From: Chris Kennelly @ 2026-04-23 17:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathias Stearn, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Dmitry Vyukov, regressions, linux-kernel, linux-arm-kernel,
	Ingo Molnar, Mark Rutland, Jinjie Ruan, Blake Oler,
	Linus Torvalds

On Thu, Apr 23, 2026 at 1:19 PM Thomas Gleixner <tglx@kernel.org> wrote:
>
> On Thu, Apr 23 2026 at 14:11, Mathias Stearn wrote:
>
> Cc+ Linus
>
> > Of course, even if we make that change, it will only apply to _future_
> > binaries. That's why we prefer a kernel fix so that users will be able
> > to run our existing releases (or any containers that use them) on a
> > modern kernel.
>
> I understand that and as everyone else I would be happy to do that, but
> the price everyone pays for proliferating the tcmalloc insanity is not
> cheap either.
>
> So let me recap the whole situation and how we got there:
>
>   1) The original RSEQ implementation updates the rseq::cpu_id_start
>      field in user space more or less unconditionally on every exit to
>      user, whether the CPU/MMCID have been changed or not.
>
>      That went unnoticed for years because nothing used rseq aside of
>      google and tcmalloc. Once glibc registered rseq, this resulted in a
>      up to 15% performance penalty for syscall heavy workloads.
>
>   2) The rseq::cpu_id_start field is documented as read only for user
>      space in the ABI contract and guaranteed to be updated by the
>      kernel when a task is migrated to a different CPU.
>
>   3) The RO for userspace property has been enforced by RSEQ debugging
>      mode since day one. If such a debug enabled kernel detects user
>      space changing the field it kills the task/application.

The optimization in TCMalloc that you're describing has been available
since September 2023:
https://github.com/google/tcmalloc/commit/aaa4fbf6fcdce1b7f86fcadd659874645c75ddb9

I thought the RSEQ debug checks were added in December 2024:
https://github.com/torvalds/linux/commit/7d5265ffcd8b41da5e09066360540d6e0716e9cd,
but perhaps I misidentified the ones in question.

>
>   4) tcmalloc abused the suboptimal implementation (see #1) and
>      scribbled over rseq::cpu_id_start for their own nefarious purposes.
>
>   5) As a consequence of #4 tcmalloc cannot be used on a RSEQ debug
>      enabled kernel. Which means a developer cannot validate his RSEQ
>      code against a debug kernel when tcmalloc is in use on the system
>      as that would crash the tcmalloc dependent applications due to #3.
>
>   6) As a consequence of #4 tcmalloc cannot be used together with any
>      other facility/library which wants to utilize the ABI guaranteed
>      properties of rseq::cpu_id_start in the same application.
>
>   7) tcmalloc violates the ABI from day one and has since refused to
>      address the problem despite being offered a kernel side rseq
>      extension to solve it many years ago.

I know there was some discussion around a preemption notification
scheme, rseq_sched_state; but I thought the discussion moved in favor
of the timeslice extension interface that recently landed. Timeslice
extension solves some use cases, but I'm not sure it addresses this
one.

>
>   8) When addressing the performance issues of RSEQ the unconditional
>      update stopped to exist under the valid assumption that the kernel
>      has only to satisfy the guaranteed ABI properties, especially when
>      they are enforcable by RSEQ debug.
>
>      As a consequence this exposed the tcmalloc ABI violation because
>      the unconditional pointless overwriting of something which did not
>      change stopped to happen.
>
> Due to #4 everyone is in a hard place and up a creek without a paddle.
>
> Here are the possible solutions:
>
>   A) Mathias suggested to force overwrite rseq:cpu_id_start everytime
>      the rseq::rseq_cs field is cleared by the kernel under the not yet
>      validated theoretical assumption that this cures the problem for
>      tcmalloc.
>
>      If that's sufficient that would be harmless performance wise
>      because the write would be inside the already existing STAC/CLAC
>      section and just add some more noise to the rseq critical section
>      operations.
>
>      That would allow existing tcmalloc usage to continue, but
>      obviously would neither solve #5 and #6 above nor provide an
>      incentive for tcmalloc to actually fix their crap.
>
>   B) If that's not sufficient then keeping tcmalloc alive would require
>      to go back to the previous state and let everyone else pay the
>      price in terms of performance overhead.
>
>   C) Declare that this is not a regression because the ABI guarantee is
>      not violated and the RO property has been enforcable by RSEQ
>      debugging since day one.
>
> In my opinion #C is the right thing to do, but I can see a case being
> made for the lightweight fix Mathias suggested (#A) _if_ and only _if_
> that is sufficient. Picking #A would also mean that user space people
> have to take up the fight against tcmalloc when they want to use the
> RSEQ guaranteed ABI along with tcmalloc in the same application or use a
> RSEQ debug kernel to validate their own code.
>
> Going back to the full unconditional nightmare (#B) is not an option at
> all as anybody else has to take the massive performance hit.
>
> Oh well...
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:19           ` Thomas Gleixner
  2026-04-23 17:38             ` Chris Kennelly
@ 2026-04-23 17:41             ` Linus Torvalds
  2026-04-23 18:35               ` Mathias Stearn
                                 ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Linus Torvalds @ 2026-04-23 17:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathias Stearn, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Chris Kennelly, Dmitry Vyukov, regressions, linux-kernel,
	linux-arm-kernel, Ingo Molnar, Mark Rutland, Jinjie Ruan,
	Blake Oler

On Thu, 23 Apr 2026 at 10:19, Thomas Gleixner <tglx@kernel.org> wrote:
>
>   C) Declare that this is not a regression because the ABI guarantee is
>      not violated and the RO property has been enforcable by RSEQ
>      debugging since day one.

No, if this actually hits real users, that is not an option. If real
users never used RSEQ debugging options, those options are simply
irrelevant.

Regression rules have never been about "it wouldn't have worked in
some other configuration". That's like saying "that code would never
have worked on another architecture". It may be true, but it's
irrelevant for the people whose binaries no longer work.

We will have to fix this.

This is not some kind of gray area. It clearly violates our regression rules.

The only "ABI guarantee" is what people actually see and use, not some
debug option that wasn't enabled.

And I just checked - it's not enabled in at least the Fedora distro
kernels. Presumably other distros don't enable it either. So no actual
non-kernel developer would *ever* have hit it, and claiming it is
relevant is just garbage.

IOW, that debug option was always a complete no-op except for kernel developers.

In fact, that debug option is actively *hidden* - you have to enable
EXPERT to even see it. Soi it really is not a real option for normal
people AT ALL.

Christ, even *I* don't enable EXPERT except for build testing. It's
literally something that only embedded people doing odd things should
do.

If that rule was actually an important part of the ABI, it shouldn't
have been a debug thing.

So:

 (a) the debug code in question needs to just be removed, since it's
now actively detrimental, and means that any kernel developer who
*does* enable it can't actually test this case any more. It's checking
for something that has been shown to not be true.

 (b) we need to fix this (revert if it can't be fixed otherwise)

I see some patches flying around, but am not clear on whether there
was an actual patch that make this work again?

             Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:38             ` Chris Kennelly
@ 2026-04-23 17:47               ` Mathieu Desnoyers
  2026-04-23 19:39               ` Thomas Gleixner
  1 sibling, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-04-23 17:47 UTC (permalink / raw)
  To: Chris Kennelly, Thomas Gleixner
  Cc: Mathias Stearn, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Boqun Feng, Paul E. McKenney, Dmitry Vyukov, regressions,
	linux-kernel, linux-arm-kernel, Ingo Molnar, Mark Rutland,
	Jinjie Ruan, Blake Oler, Linus Torvalds

On 2026-04-23 13:38, Chris Kennelly wrote:
> On Thu, Apr 23, 2026 at 1:19 PM Thomas Gleixner <tglx@kernel.org> wrote:

[...]

>>
>>    3) The RO for userspace property has been enforced by RSEQ debugging
>>       mode since day one. If such a debug enabled kernel detects user
>>       space changing the field it kills the task/application.
> 
> The optimization in TCMalloc that you're describing has been available
> since September 2023:
> https://github.com/google/tcmalloc/commit/aaa4fbf6fcdce1b7f86fcadd659874645c75ddb9
> 
> I thought the RSEQ debug checks were added in December 2024:
> https://github.com/torvalds/linux/commit/7d5265ffcd8b41da5e09066360540d6e0716e9cd,
> but perhaps I misidentified the ones in question.

You are correct, I added the RSEQ field corruption validation under
debug config in Nov. 2024 when I noticed the world of pain we were
heading towards with incompatible tcmalloc vs glibc (and general) use
due to tcmalloc not respecting the ABI contract. RSEQ has been
upstreamed in 2018. So that's not exactly a day one enforcement.
The ABI contract was clear about this being an invalid use from
day one though.

[...]

>>    7) tcmalloc violates the ABI from day one and has since refused to
>>       address the problem despite being offered a kernel side rseq
>>       extension to solve it many years ago.
> 
> I know there was some discussion around a preemption notification
> scheme, rseq_sched_state; but I thought the discussion moved in favor
> of the timeslice extension interface that recently landed. Timeslice
> extension solves some use cases, but I'm not sure it addresses this
> one.

I have actively engaged with the tcmalloc developers to
understand their needs and figure out a proper solution for the
past ~3-4 years, without success.

I have done a POC branch extending rseq with a "reset a linked list of
userspace areas on preemption" back in 2024 which would have solved
tcmalloc's issues cleanly. I never posted it publicly because the
tcmalloc devs told me they could not justify spending time even trying
this out to their managers.

I still have that feature branch gathering dust somewhere.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:41             ` Linus Torvalds
@ 2026-04-23 18:35               ` Mathias Stearn
  2026-04-23 18:53               ` Mark Rutland
  2026-04-23 21:03               ` Thomas Gleixner
  2 siblings, 0 replies; 32+ messages in thread
From: Mathias Stearn @ 2026-04-23 18:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Chris Kennelly, Dmitry Vyukov, regressions, linux-kernel,
	linux-arm-kernel, Ingo Molnar, Mark Rutland, Jinjie Ruan,
	Blake Oler

On Thu, Apr 23, 2026 at 7:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I see some patches flying around, but am not clear on whether there
> was an actual patch that make this work again?

Thomas's patch from upthread appears in initial testing to address the
arm64 preemption breakage. Thanks! I'm currently building with the
following patch on top of that and will test it once it is ready.

---

diff --git a/include/linux/rseq_entry.h b/include/linux/rseq_entry.h
index a36b472627de..e26bf249bbd8 100644
--- a/include/linux/rseq_entry.h
+++ b/include/linux/rseq_entry.h
@@ -300,12 +300,15 @@ rseq_update_user_cs(struct task_struct *t,
struct pt_regs *regs, unsigned long c

         /* Invalidate the critical section */
         unsafe_put_user(0ULL, &t->rseq.usrptr->rseq_cs, efault);
+        /* TCMalloc kludge - it relies on cpu_id_start being overwritten */
+        unsafe_put_user((u32)task_cpu(t),
&t->rseq.usrptr->cpu_id_start, efault);
         /* Update the instruction pointer */
         instruction_pointer_set(regs, (unsigned long)abort_ip);
         rseq_stat_inc(rseq_stats.fixup);
         break;
     clear:
         unsafe_put_user(0ULL, &t->rseq.usrptr->rseq_cs, efault);
+        unsafe_put_user((u32)task_cpu(t),
&t->rseq.usrptr->cpu_id_start, efault);
         rseq_stat_inc(rseq_stats.clear);
         abort_ip = 0ULL;
     }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:41             ` Linus Torvalds
  2026-04-23 18:35               ` Mathias Stearn
@ 2026-04-23 18:53               ` Mark Rutland
  2026-04-23 21:03               ` Thomas Gleixner
  2 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2026-04-23 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Mathias Stearn, Peter Zijlstra,
	Mathieu Desnoyers, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, Dmitry Vyukov, regressions,
	linux-kernel, linux-arm-kernel, Ingo Molnar, Jinjie Ruan,
	Blake Oler

On Thu, Apr 23, 2026 at 10:41:02AM -0700, Linus Torvalds wrote:
> On Thu, 23 Apr 2026 at 10:19, Thomas Gleixner <tglx@kernel.org> wrote:
> I see some patches flying around, but am not clear on whether there
> was an actual patch that make this work again?

There's not a patch yet.

The diffs sent so far were options for fixing the arm64-specific issue
(missing aborts on preemption), NOT the generic issue (missing
clobbering of cpu_id_start that tcmalloc was depending upon).

For the arm64 issue, I think we can have a fix tomorrow (as it's end of
day here in the UK). Now that I've pored the entry code and the rseq
code, I think a variant of one of Thomas's proposed fixes will work, but
I'd like to make the naming/layering crystal clear so that it's harder
to break this by accident in future.

For the generic issue, hopefully the option Mathias proposed (clearing
cpu_id_start when rseq_cs is cleared) is sufficient. I'll work with
Mathias and Thomas for that.

I've also poked folk to make sure that CI systems run the rseq selftests
(which they evidently weren't), so that we catch this sort of thing
earlier.

Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 10:51               ` Mathias Stearn
  2026-04-23 12:24                 ` David Laight
@ 2026-04-23 19:31                 ` Thomas Gleixner
  1 sibling, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-23 19:31 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Dmitry Vyukov, Jinjie Ruan, linux-man, Mark Rutland,
	Mathieu Desnoyers, Catalin Marinas, Will Deacon, Boqun Feng,
	Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Ingo Molnar, Blake Oler

On Thu, Apr 23 2026 at 12:51, Mathias Stearn wrote:
> On Thu, Apr 23, 2026 at 12:39 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> The kernel clears rseq_cs reliably when user space was interrupted and:
>>
>>     the task was preempted
>> or
>>     the return from interrupt delivers a signal
>>
>> If the task invoked a syscall then there is absolutely no reason to do
>> either of this because syscalls from within a critical section are a
>> bug and catched when enabling rseq debugging.
>>
>> The original code did this along with unconditionally updating CPU/MMCID
>> which resulted in ~15% performance regression on a syscall heavy
>> database benchmark once glibc started to register rseq.
>
> Just to be clear TCMalloc does not need either rseq_cs to be cleared
> or cpu_id_start to be written to on syscalls because it doesn't do
> syscalls from critical sections. It will actually benefit (slightly)
> from not updating cpu_id_start on syscalls.

I know that it does not do syscalls from within critical sections, but
it relies on cpu_id_start being unconditionally updated in one way or
the other.

> It is specifically in the cases where an rseq would need to be aborted
> (preemption, signals, migration, and membarrier IPI with the rseq
> flag) that TCMalloc relies on cpu_id_start being written. It does rely
> on that write even when not inside the critical section, because it
> effectively uses that to detect if there were any would-cause-abort
> events in between two critical sections. But since it leaves the
> rseq_cs pointer non-null between critical sections, so you dont need
> to add _any_ overhead for programs that never make use of rseq after
> registration, or add any overhead to syscalls even for those who do.

Well. According to the comment in the tcmalloc code:

// Calculation of the address of the current CPU slabs region is needed for
// allocation/deallocation fast paths, but is quite expensive. Due to variable
// shift and experimental support for "virtual CPUs", the calculation involves
// several additional loads and dependent calculations. Pseudo-code for the
// address calculation is as follows:
//
//   cpu_offset = TcmallocSlab.virtual_cpu_id_offset_;
//   cpu = *(&__rseq_abi + virtual_cpu_id_offset_);
//   slabs_and_shift = TcmallocSlab.slabs_and_shift_;
//   shift = slabs_and_shift & kShiftMask;
//   shifted_cpu = cpu << shift;
//   slabs = slabs_and_shift & kSlabsMask;
//   slabs += shifted_cpu;
//
// To remove this calculation from fast paths, we cache the slabs address
// for the current CPU in thread local storage. However, when a thread is
// rescheduled to another CPU, we somehow need to understand that the cached

                  ^^^^^^^^^^^

// address is not valid anymore. To achieve this, we overlap the top 4 bytes
// of the cached address with __rseq_abi.cpu_id_start. When a thread is
// rescheduled the kernel overwrites cpu_id_start with the current CPU number,
// which gives us the signal that the cached address is not valid anymore.

The kernel still as of today (the arm64 bug aside) updates the
cpu_id_start and cpu_id fields in rseq when a task is rescheduled to
another CPU.

So if the code only requires to know when it got rescheduled to another
CPU then it still should work, no?

But it does not, which makes it clear that it relies on this
undocumented behaviour of the kernel to rewrite rseq::cpu_id_start
unconditionally. I'm not yet convinced that it relies on it only when
interrupted between two subsequent critical sections. We'll see.

....

Now we come to the best part of this comment:

// Note: this makes __rseq_abi.cpu_id_start unusable for its original purpose.

So any code sequence which ends up in:

   x = tcmalloc();
   dostuff(x)
     evaluate(rseq::cpu_id_start, rseq::cpu_id)

is doomed. This might be acceptable for Google internal usage where they
control the full stack and can prevent anyone else to utilize rseq, but
in an open ecosystem that's obviously a non-starter.

And they definitely forgot to add this to the comment:

// Never enable CONFIG_RSEQ_DEBUG in the kernel when you use tcmalloc as
// it will expose the blatant ABI abuse and therefore will kill your
// application.

If your assumption that the rewrite is only required when rseq::rseq_cs
is non NULL and user space was interrupted is correct, then the obvious
no-brainer would have been to add:

        __u64	rseq_usr_data;

to struct rseq and clear that unconditionally when rseq::rseq_cs is
cleared.

But that would have been too simple, would work independent of endianess
and not in the way of anybody else.

But I know that's incompatible with the features first, correctness
later and we own the world anyway mindset.

Just for giggles I asked Google Gemini about the implications of
tmalloc's rseq abuse. The answer is pretty clear:

   "In short, TCMalloc treats RSEQ as a private optimization rather than
    a shared system resource, which compromises the stability and
    extensibility of any application that needs RSEQ for anything other
    than memory allocation."

It's also very clear about the wilful ignorance of the tcmalloc people:

   "In summary, the developers have known for at least 6 years that the
    implementation was non-standard and conflicting with other rseq
    usage. The github issue which requested glibc compatibility was
    opened in 2022 and has been unresolved since then."

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:38             ` Chris Kennelly
  2026-04-23 17:47               ` Mathieu Desnoyers
@ 2026-04-23 19:39               ` Thomas Gleixner
  1 sibling, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-23 19:39 UTC (permalink / raw)
  To: Chris Kennelly
  Cc: Mathias Stearn, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Dmitry Vyukov, regressions, linux-kernel, linux-arm-kernel,
	Ingo Molnar, Mark Rutland, Jinjie Ruan, Blake Oler,
	Linus Torvalds

On Thu, Apr 23 2026 at 13:38, Chris Kennelly wrote:
> On Thu, Apr 23, 2026 at 1:19 PM Thomas Gleixner <tglx@kernel.org> wrote:
>>   3) The RO for userspace property has been enforced by RSEQ debugging
>>      mode since day one. If such a debug enabled kernel detects user
>>      space changing the field it kills the task/application.
>
> The optimization in TCMalloc that you're describing has been available
> since September 2023:
> https://github.com/google/tcmalloc/commit/aaa4fbf6fcdce1b7f86fcadd659874645c75ddb9

And the github issue which requested glibc compatibility was opened in
Sept. 2022:

      https://github.com/google/tcmalloc/issues/144

> I thought the RSEQ debug checks were added in December 2024:
> https://github.com/torvalds/linux/commit/7d5265ffcd8b41da5e09066360540d6e0716e9cd,
> but perhaps I misidentified the ones in question.

I might have misread the git log. But that still does not justify the
violation of a documented ABI for the price that nobody else can use it
once tcmalloc is in play:

   x = tcmalloc();
   dostuff(x)
     evaluate(rseq::cpu_id_start, rseq::cpu_id) <- FAIL

>>   7) tcmalloc violates the ABI from day one and has since refused to
>>      address the problem despite being offered a kernel side rseq
>>      extension to solve it many years ago.
>
> I know there was some discussion around a preemption notification
> scheme, rseq_sched_state; but I thought the discussion moved in favor
> of the timeslice extension interface that recently landed. Timeslice
> extension solves some use cases, but I'm not sure it addresses this
> one.

No it does not. That's an orthogonal optimization.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 17:41             ` Linus Torvalds
  2026-04-23 18:35               ` Mathias Stearn
  2026-04-23 18:53               ` Mark Rutland
@ 2026-04-23 21:03               ` Thomas Gleixner
  2026-04-23 21:28                 ` Linus Torvalds
  2 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-04-23 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mathias Stearn, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Chris Kennelly, Dmitry Vyukov, regressions, linux-kernel,
	linux-arm-kernel, Ingo Molnar, Mark Rutland, Jinjie Ruan,
	Blake Oler

On Thu, Apr 23 2026 at 10:41, Linus Torvalds wrote:
> If that rule was actually an important part of the ABI, it shouldn't
> have been a debug thing.

It's a debug thing because it's too expensive to be enabled by
default. And it's actually valuable for validating RSEQ critical section
ABI correctness as they can't be single stepped with a debugger as the
break point interruption would immediately canceled.

> So:
>
>  (a) the debug code in question needs to just be removed, since it's
> now actively detrimental, and means that any kernel developer who
> *does* enable it can't actually test this case any more. It's checking
> for something that has been shown to not be true.
>
>  (b) we need to fix this (revert if it can't be fixed otherwise)
>
> I see some patches flying around, but am not clear on whether there
> was an actual patch that make this work again?

There are two issues:

  1) ARM64

     On ARM64 RSEQ got broken completely with the partial move to the
     generic entry code. There are patches flying around which "fix" it
     and Mark is working on a more complete solution as there are other
     subtle issues with that aside of the obvious RSEQ wreckage. The
     latter could have been detected with the existing RSEQ selftests if
     any CI would actually run them on -next.

     That's uninteresting and unrelated to the tcmalloc issue. It's just
     a boring bug which will be fixed in the next couple of days.


  2) The tcmalloc problem

     That's a known problem for at least 6 years. tcmalloc assumes that
     it "owns" rseq and can do whatever it wants with it.

     In 2022 the glibc people requested that tcmalloc becomes
     interoperable with the reasonable expection of glibc to utilize
     rseq as well:

          https://github.com/google/tcmalloc/issues/144

     Status unresolved.

     That means that using tcmalloc requires to tell glibc to _NOT_ use
     rseq and at the same time precludes any other library which wants
     to use it for the documented purposes. So this code sequence blows
     up in your face:

        x = tcmalloc();
        dostuff(x)
          evaluate(rseq::cpu_id_start, rseq::cpu_id)

     because tcmalloc overwrites rseq::cpu_id_start and thereby breaks
     the ABI which evaluate() is rightfully depending on.

     That has absolutely nothing to do with the kernel as there is no
     kernel interaction between tcmalloc's abuse and the subsequent
     evaluation of rseq::cpu_id_start. The kernel has no way to fix that
     problem at all.

     Now back to your generally correct and agreed on "observed
     behaviour" rule.

     Feel free to enforce it, but be aware that you thereby set a
     precedence that a single abuser can then rightfully own a general
     shared interface of the kernel forever and force everybody else to
     give up.

     The tcmalloc developers actually documented that they own the
     world:

     // Note: this makes __rseq_abi.cpu_id_start unusable for its original purpose.

     Do you seriously want to proliferate that?

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 21:03               ` Thomas Gleixner
@ 2026-04-23 21:28                 ` Linus Torvalds
  2026-04-23 23:08                   ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2026-04-23 21:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathias Stearn, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Chris Kennelly, Dmitry Vyukov, regressions, linux-kernel,
	linux-arm-kernel, Ingo Molnar, Mark Rutland, Jinjie Ruan,
	Blake Oler

On Thu, 23 Apr 2026 at 14:03, Thomas Gleixner <tglx@kernel.org> wrote:
>
>      Feel free to enforce it, but be aware that you thereby set a
>      precedence that a single abuser can then rightfully own a general
>      shared interface of the kernel forever and force everybody else to
>      give up.

That's not a new precedent. That is *literally* the rule we have always had.

This is why system calls and ABI's need to have hard rules that they
actually check, because if they don't, they are stuck with the
semantics that people assume.

And no, "documented behavior" is BS. It has absolutely no relevance.
All that matters is hard harsh reality.

Yes, this has led to issues before.

Most new system calls have learnt their lesson, and they check for
unused bits in flags etc, and error out on bits that the lernel
doesn't really care about being randomly set - so that one day we
*can* extend on things and start caring about them.

But they do it because we've been burnt so many times before because
we haven't checked those bits, and then we were forced to just live
with the fact that people passed in random values.

>     // Note: this makes __rseq_abi.cpu_id_start unusable for its original purpose.
>
>     Do you seriously want to proliferate that?

Absolutely.

That's how clever hacks work - they take advantage of things past
their design parameters. "If it works, it's not stupid".

We don't then turn around and say "you were clever, and we did
something stupid, so now we'll hurt you".

This is all 100% on the RSEQ kernel code, not on users who took advantage of it.

                Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
  2026-04-23 21:28                 ` Linus Torvalds
@ 2026-04-23 23:08                   ` Linus Torvalds
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2026-04-23 23:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathias Stearn, Peter Zijlstra, Mathieu Desnoyers,
	Catalin Marinas, Will Deacon, Boqun Feng, Paul E. McKenney,
	Chris Kennelly, Dmitry Vyukov, regressions, linux-kernel,
	linux-arm-kernel, Ingo Molnar, Mark Rutland, Jinjie Ruan,
	Blake Oler

On Thu, 23 Apr 2026 at 14:28, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This is all 100% on the RSEQ kernel code, not on users who took advantage of it.

Side note: when RSEQ was merged, the *primary* documented use case was
literally user space allocators with percpu caches. That's what I was
told at the time.

Now I think it was jemalloc(), not tcmalloc, but it's not like
tcmalloc is some odd minor use-case.

We are pretty much talking about the raison d'être of the whole rseq
feature, not some odd small corner case.

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-04-23 23:08 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22  9:50 [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Mathias Stearn
2026-04-22 12:56 ` Peter Zijlstra
2026-04-22 13:13   ` Peter Zijlstra
2026-04-23 10:38     ` Mathias Stearn
     [not found]     ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
2026-04-23 11:48       ` Thomas Gleixner
2026-04-23 12:11         ` Mathias Stearn
2026-04-23 17:19           ` Thomas Gleixner
2026-04-23 17:38             ` Chris Kennelly
2026-04-23 17:47               ` Mathieu Desnoyers
2026-04-23 19:39               ` Thomas Gleixner
2026-04-23 17:41             ` Linus Torvalds
2026-04-23 18:35               ` Mathias Stearn
2026-04-23 18:53               ` Mark Rutland
2026-04-23 21:03               ` Thomas Gleixner
2026-04-23 21:28                 ` Linus Torvalds
2026-04-23 23:08                   ` Linus Torvalds
2026-04-22 13:09 ` Mark Rutland
2026-04-22 17:49   ` Thomas Gleixner
2026-04-22 18:11     ` Mark Rutland
2026-04-22 19:47       ` Thomas Gleixner
2026-04-23  1:48         ` Jinjie Ruan
2026-04-23  5:53           ` Dmitry Vyukov
2026-04-23 10:39             ` Thomas Gleixner
2026-04-23 10:51               ` Mathias Stearn
2026-04-23 12:24                 ` David Laight
2026-04-23 19:31                 ` Thomas Gleixner
2026-04-23 12:11             ` Alejandro Colomar
2026-04-23 12:54               ` Mathieu Desnoyers
2026-04-23 12:29             ` Mathieu Desnoyers
2026-04-23 12:36               ` Dmitry Vyukov
2026-04-23 12:53                 ` Mathieu Desnoyers
2026-04-23 12:58                   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox