* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
[not found] ` <87jyttz8cf.ffs@tglx>
@ 2026-04-27 7:40 ` Florian Weimer
2026-04-27 11:03 ` Thomas Gleixner
2026-04-27 18:35 ` Mathieu Desnoyers
0 siblings, 2 replies; 4+ messages in thread
From: Florian Weimer @ 2026-04-27 7:40 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Peter Zijlstra, Mathias Stearn, Dmitry Vyukov, Jinjie Ruan,
linux-man, Mark Rutland, Mathieu Desnoyers, Catalin Marinas,
Will Deacon, Boqun Feng, Paul E. McKenney, Chris Kennelly,
regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
Blake Oler, Rich Felker, Matthew Wilcox, Greg Kroah-Hartman,
Linus Torvalds, criu
* Thomas Gleixner:
> The real question is how to differentiate between the legacy and the
> optimized mode. I have two working variants to achieve that:
>
> 1) The fully safe option requires a new flag for RSEQ
> registration. It obviously requires a glibc update. (Suggested by
> PeterZ)
Without glibc changes, RSEQ would keep working, but with the old,
problematic performance, right?
If we don't have a notification in the auxiliary vector, we'd have to do
two system calls at process start, which isn't ideal, but is probably
not a significant issue, either.
I haven't verified this, but it looks like introducing the flag breaks
CRIU? In dump_thread_rseq, we have this:
if (rseqc.flags != 0) {
pr_err("something wrong with ptrace(PTRACE_GET_RSEQ_CONFIGURATION, %d) flags = 0x%x\n", tid,
rseqc.flags);
return -1;
}
I suppose a workaround could make this behavior flag a prctl flag. CRIU
wouldn't dump and restore that until taught about it. If the new
behavior is switched on explicitly by the flag, it would be
backwards-compatible, except that restoring with unpatched CRIU would
lead to a performance loss.
> 2) Determine the requirements of the registering task via the size of
> the registered RSEQ area.
>
> The original implementation, which TCMalloc depends on, registers
> a 32 byte region (ORIG_RSEG_SIZE). This region has 32 byte
> alignment requirement.
>
> The extension safe newer variant exposes the kernel RSEQ feature
> size via getauxval(AT_RSEQ_FEATURE_SIZE) and the alignment
> requirement via getauxval(AT_RSEQ_ALIGN). The alignment
> requirement is that the registered rseq region is aligned to the
> next power of two of the feature size. The kernel currently has a
> feature size of 33 bytes, which means the alignment requirement is
> 64 bytes.
There are still glibc builds in use that do not use AT_RSEQ_ALIGN, and
instead unconditionally reserve a size of 32. In some builds, the RSEQ
area is not aligned to a multiple of 64, which makes glibc
indistinguishable from tcmalloc. You could look at the location of the
thread pointer relative to the RSEQ area at registration to tell them
apart, but that is perhaps too nasty.
Switching to the new extensible RSEQ allocation code in older glibc
builds is not entirely trivial, and I would prefer not doing that.
Registering with a new flag is comparatively simple, and we could
backport it, except that it might not be compatible with CRIU.
Thanks,
Florian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
2026-04-27 7:40 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Florian Weimer
@ 2026-04-27 11:03 ` Thomas Gleixner
2026-04-27 18:35 ` Mathieu Desnoyers
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2026-04-27 11:03 UTC (permalink / raw)
To: Florian Weimer
Cc: Peter Zijlstra, Mathias Stearn, Dmitry Vyukov, Jinjie Ruan,
linux-man, Mark Rutland, Mathieu Desnoyers, Catalin Marinas,
Will Deacon, Boqun Feng, Paul E. McKenney, Chris Kennelly,
regressions, linux-kernel, linux-arm-kernel, Ingo Molnar,
Blake Oler, Rich Felker, Matthew Wilcox, Greg Kroah-Hartman,
Linus Torvalds, criu
On Mon, Apr 27 2026 at 09:40, Florian Weimer wrote:
> * Thomas Gleixner:
>> The real question is how to differentiate between the legacy and the
>> optimized mode. I have two working variants to achieve that:
>>
>> 1) The fully safe option requires a new flag for RSEQ
>> registration. It obviously requires a glibc update. (Suggested by
>> PeterZ)
>
> Without glibc changes, RSEQ would keep working, but with the old,
> problematic performance, right?
Correct.
> If we don't have a notification in the auxiliary vector, we'd have to do
> two system calls at process start, which isn't ideal, but is probably
> not a significant issue, either.
>
> I haven't verified this, but it looks like introducing the flag breaks
> CRIU? In dump_thread_rseq, we have this:
>
> if (rseqc.flags != 0) {
> pr_err("something wrong with ptrace(PTRACE_GET_RSEQ_CONFIGURATION, %d) flags = 0x%x\n", tid,
> rseqc.flags);
> return -1;
> }
Yeah. That'd need to be fixed or work around.
> I suppose a workaround could make this behavior flag a prctl flag. CRIU
> wouldn't dump and restore that until taught about it. If the new
> behavior is switched on explicitly by the flag, it would be
> backwards-compatible, except that restoring with unpatched CRIU would
> lead to a performance loss.
It's worse. The flag will also enable extended RSEQ features beyond
mmcid and requires that the registered rseq size is >= offsetof(struct
rseq, end)'
>> 2) Determine the requirements of the registering task via the size of
>> the registered RSEQ area.
>>
>> The original implementation, which TCMalloc depends on, registers
>> a 32 byte region (ORIG_RSEG_SIZE). This region has 32 byte
>> alignment requirement.
>>
>> The extension safe newer variant exposes the kernel RSEQ feature
>> size via getauxval(AT_RSEQ_FEATURE_SIZE) and the alignment
>> requirement via getauxval(AT_RSEQ_ALIGN). The alignment
>> requirement is that the registered rseq region is aligned to the
>> next power of two of the feature size. The kernel currently has a
>> feature size of 33 bytes, which means the alignment requirement is
>> 64 bytes.
>
> There are still glibc builds in use that do not use AT_RSEQ_ALIGN, and
> instead unconditionally reserve a size of 32. In some builds, the RSEQ
> area is not aligned to a multiple of 64, which makes glibc
> indistinguishable from tcmalloc.
That's how it is. So with a size of 32 this will fallback to legacy mode
and not unlock the extended features independent of the alignment. The
alignment requirements are:
Size 32: 32 bytes
Size >32: 64 bytes
> You could look at the location of the thread pointer relative to the
> RSEQ area at registration to tell them apart, but that is perhaps too
> nasty.
*Blink*
> Switching to the new extensible RSEQ allocation code in older glibc
> builds is not entirely trivial, and I would prefer not doing that.
> Registering with a new flag is comparatively simple, and we could
> backport it, except that it might not be compatible with CRIU.
Neither with CRIU nor with the requirement to support additional
features which require the registered rseq memory size to be at least as
large as the kernel requires. That's why we have AT_RSEQ_FEATURE_SIZE.
Otherwise we'd end up with runtime conditionals for every single
feature, which just adds more gunk into the hotpaths and ends up in a
ever growing compatibility nightmare.
So if a process runs on a newer kernel with let's say 40 bytes rseq
size, then it can't be safely migrated with CRIU to a older kernel with
32 bytes rseq size as you don't know whether the process uses some of
the extended features in the newer kernel already. But that's not any
different from extended syscall features etc.
So with the size based detection we end up with the following:
Size 32: legacy mode no matter whether that's TCMalloc or
glibc. Does not support extended features
Size >= kernel size: optimized mode with support for extended features
Thanks,
tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
2026-04-27 7:40 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Florian Weimer
2026-04-27 11:03 ` Thomas Gleixner
@ 2026-04-27 18:35 ` Mathieu Desnoyers
2026-04-27 21:06 ` Thomas Gleixner
1 sibling, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2026-04-27 18:35 UTC (permalink / raw)
To: Florian Weimer, Thomas Gleixner
Cc: Peter Zijlstra, Mathias Stearn, Dmitry Vyukov, Jinjie Ruan,
linux-man, Mark Rutland, Catalin Marinas, Will Deacon, Boqun Feng,
Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
linux-arm-kernel, Ingo Molnar, Blake Oler, Rich Felker,
Matthew Wilcox, Greg Kroah-Hartman, Linus Torvalds, criu,
Michael Jeanson
On 2026-04-27 03:40, Florian Weimer wrote:
> * Thomas Gleixner:
>
>> The real question is how to differentiate between the legacy and the
>> optimized mode. I have two working variants to achieve that:
[...]
>
> Switching to the new extensible RSEQ allocation code in older glibc
> builds is not entirely trivial, and I would prefer not doing that.
> Registering with a new flag is comparatively simple, and we could
> backport it, except that it might not be compatible with CRIU.
A third option would allow the entire range of older libc versions to
benefit from rseq optimizations, gating the "v2" behavior on:
rseq_len > 32 || (flags & RSEQ_FLAG_V2)
As a result:
- restore compatibility with existing tcmalloc binaries.
- glibc 2.41+ would benefit from optimization without changes.
- glibc 2.35-2.40 would be able to easily backport minimal changes [*]
to benefit from kernel optimizations (flags & RSEQ_FLAG_V2).
Likewise for RHEL glibc 2.34 with backported rseq support.
[*] Minimal changes to allow older libc to use the optimized mode
involve implementing a new query for getauxval(AT_RSEQ_V2),
which would return nonzero when the kernel supports the v2
flag, and when supported pass a new RSEQ_FLAG_V2 flag to rseq
on registration.
That v2 behavior would:
A) Enforce the ABI contract:
- RO fields corruption -> kill process,
- System call within rseq critical section -> kill process,
B) Allow optimization of the rseq field updates (only update relevant
fields on migration),
This entirely decouples the feature enablement concern (rseq_len) from
the strictness/optimization mode (v2).
This keeps compatibility with current tcmalloc binaries because
tcmalloc always registers a 32 bytes rseq_len without the v2
flag set. tcmalloc already has its own internal fields at fixed
offsets from the rseq structure which conflict with extended rseq
fields, so limiting the tcmalloc work-around behavior to
rseq_len == 32 seem to align well with the tcmalloc project
approach towards extensibility and ecosystem inter-compatibility.
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
2026-04-27 18:35 ` Mathieu Desnoyers
@ 2026-04-27 21:06 ` Thomas Gleixner
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2026-04-27 21:06 UTC (permalink / raw)
To: Mathieu Desnoyers, Florian Weimer
Cc: Peter Zijlstra, Mathias Stearn, Dmitry Vyukov, Jinjie Ruan,
linux-man, Mark Rutland, Catalin Marinas, Will Deacon, Boqun Feng,
Paul E. McKenney, Chris Kennelly, regressions, linux-kernel,
linux-arm-kernel, Ingo Molnar, Blake Oler, Rich Felker,
Matthew Wilcox, Greg Kroah-Hartman, Linus Torvalds, criu,
Michael Jeanson
On Mon, Apr 27 2026 at 14:35, Mathieu Desnoyers wrote:
> On 2026-04-27 03:40, Florian Weimer wrote:
>> Switching to the new extensible RSEQ allocation code in older glibc
>> builds is not entirely trivial, and I would prefer not doing that.
>> Registering with a new flag is comparatively simple, and we could
>> backport it, except that it might not be compatible with CRIU.
> A third option would allow the entire range of older libc versions to
> benefit from rseq optimizations, gating the "v2" behavior on:
>
> rseq_len > 32 || (flags & RSEQ_FLAG_V2)
No. Features beyond mm_cid require optimized mode and a larger rseq
area. That's not negotiable. See below.
> That v2 behavior would:
>
> A) Enforce the ABI contract:
>
> - RO fields corruption -> kill process,
My patch does that already and the time slice extension muck does so too
from day one.
> - System call within rseq critical section -> kill process,
No. That's overkill for syscall heavy workloads.
Also it's not a functional correctness problem which affects multiple
RSEQ users in an application. User space can do even worse things.
cs_start
call foo // foo uses rseq too ....
cs_end
Invoking a syscall from within the critical section is stupid, but at
least harmless vs. other usage in the same thread as the syscall needs
to return before anything else can go and use RSEQ in that thread, no?
People who develop RSEQ critical sections can enable debug mode via the
sysfs knob if they want to prove that their code is correct. That's a
debug aid, not more.
> B) Allow optimization of the rseq field updates (only update relevant
> fields on migration),
That's part of the whole combo. Optimized behaviour and new features.
> This entirely decouples the feature enablement concern (rseq_len) from
> the strictness/optimization mode (v2).
Which causes us to sprinkle more conditionals into the hot paths for
individual features instead of simply doing unconditional stores and be
done with it. It's bad enough that we have one, we don't need more.
User space knows the size the kernel expects and if it insists on using
the original size, so be it. Keep it simple.
Thanks,
tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-27 21:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <aekPXvvuKHKlETjm@J2N7QTR9R3.cambridge.arm.com>
[not found] ` <87wlxy22x7.ffs@tglx>
[not found] ` <c5331cd6-76c8-430d-978e-fcad164e48f6@huawei.com>
[not found] ` <CACT4Y+bxnQyHGdVNE1BYTx+Z2-cscLb38HYS9jBM5gPAz8=4bw@mail.gmail.com>
[not found] ` <87ik9i0xlj.ffs@tglx>
[not found] ` <CAHnCjA0UBNXfjHw=Y34OrAyGRNUtVF+zWd3ugyX6pd_mCk8K9w@mail.gmail.com>
[not found] ` <87a4ut1njh.ffs@tglx>
[not found] ` <CACT4Y+bBD7uCHXKqGo=epBXeEmsZ67Og2YO9kjNMT3ryjUY_sA@mail.gmail.com>
[not found] ` <CAHnCjA1LqbaUGkPe79EeP6Mpaki8QWeR-JBSbrG0z6pTm9CmUg@mail.gmail.com>
[not found] ` <87v7dgzbo7.ffs@tglx>
[not found] ` <20260424150318.GE641209@noisy.programming.kicks-ass.net>
[not found] ` <87se8kywhb.ffs@tglx>
[not found] ` <87jyttz8cf.ffs@tglx>
2026-04-27 7:40 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Florian Weimer
2026-04-27 11:03 ` Thomas Gleixner
2026-04-27 18:35 ` Mathieu Desnoyers
2026-04-27 21:06 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox