public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mathias Stearn <mathias@mongodb.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Jinjie Ruan <ruanjinjie@huawei.com>,
	linux-man@vger.kernel.org, Mark Rutland <mark.rutland@arm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Chris Kennelly <ckennelly@google.com>,
	regressions@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Ingo Molnar <mingo@kernel.org>,
	Blake Oler <blake.oler@mongodb.com>,
	Rich Felker <dalias@libc.org>,
	Matthew Wilcox <willy@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linuxfoundation.org>,
	criu@lists.linux.dev
Subject: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
Date: Mon, 27 Apr 2026 13:03:32 +0200	[thread overview]
Message-ID: <87h5owzmuz.ffs@tglx> (raw)
In-Reply-To: <lhujyts4zr8.fsf@oldenburg.str.redhat.com>

On Mon, Apr 27 2026 at 09:40, Florian Weimer wrote:
> * Thomas Gleixner:
>> The real question is how to differentiate between the legacy and the
>> optimized mode. I have two working variants to achieve that:
>>
>>    1) The fully safe option requires a new flag for RSEQ
>>       registration. It obviously requires a glibc update. (Suggested by
>>       PeterZ)
>
> Without glibc changes, RSEQ would keep working, but with the old,
> problematic performance, right?

Correct.

> If we don't have a notification in the auxiliary vector, we'd have to do
> two system calls at process start, which isn't ideal, but is probably
> not a significant issue, either.
>
> I haven't verified this, but it looks like introducing the flag breaks
> CRIU?  In dump_thread_rseq, we have this:
>
>         if (rseqc.flags != 0) {
>                 pr_err("something wrong with ptrace(PTRACE_GET_RSEQ_CONFIGURATION, %d) flags = 0x%x\n", tid,
>                        rseqc.flags);
>                 return -1;
>         }

Yeah. That'd need to be fixed or work around.

> I suppose a workaround could make this behavior flag a prctl flag.  CRIU
> wouldn't dump and restore that until taught about it.  If the new
> behavior is switched on explicitly by the flag, it would be
> backwards-compatible, except that restoring with unpatched CRIU would
> lead to a performance loss.

It's worse. The flag will also enable extended RSEQ features beyond
mmcid and requires that the registered rseq size is >= offsetof(struct
rseq, end)'

>>    2) Determine the requirements of the registering task via the size of
>>       the registered RSEQ area.
>>
>>       The original implementation, which TCMalloc depends on, registers
>>       a 32 byte region (ORIG_RSEG_SIZE). This region has 32 byte
>>       alignment requirement.
>>
>>       The extension safe newer variant exposes the kernel RSEQ feature
>>       size via getauxval(AT_RSEQ_FEATURE_SIZE) and the alignment
>>       requirement via getauxval(AT_RSEQ_ALIGN). The alignment
>>       requirement is that the registered rseq region is aligned to the
>>       next power of two of the feature size. The kernel currently has a
>>       feature size of 33 bytes, which means the alignment requirement is
>>       64 bytes.
>
> There are still glibc builds in use that do not use AT_RSEQ_ALIGN, and
> instead unconditionally reserve a size of 32.  In some builds, the RSEQ
> area is not aligned to a multiple of 64, which makes glibc
> indistinguishable from tcmalloc.

That's how it is. So with a size of 32 this will fallback to legacy mode
and not unlock the extended features independent of the alignment. The
alignment requirements are:

          Size 32:     32 bytes
          Size >32:    64 bytes

> You could look at the location of the thread pointer relative to the
> RSEQ area at registration to tell them apart, but that is perhaps too
> nasty.

*Blink*

> Switching to the new extensible RSEQ allocation code in older glibc
> builds is not entirely trivial, and I would prefer not doing that.
> Registering with a new flag is comparatively simple, and we could
> backport it, except that it might not be compatible with CRIU.

Neither with CRIU nor with the requirement to support additional
features which require the registered rseq memory size to be at least as
large as the kernel requires. That's why we have AT_RSEQ_FEATURE_SIZE.

Otherwise we'd end up with runtime conditionals for every single
feature, which just adds more gunk into the hotpaths and ends up in a
ever growing compatibility nightmare.

So if a process runs on a newer kernel with let's say 40 bytes rseq
size, then it can't be safely migrated with CRIU to a older kernel with
32 bytes rseq size as you don't know whether the process uses some of
the extended features in the newer kernel already. But that's not any
different from extended syscall features etc.

So with the size based detection we end up with the following:

  Size 32:             legacy mode no matter whether that's TCMalloc or
                       glibc. Does not support extended features
  
  Size >= kernel size: optimized mode with support for extended features

Thanks,

        tglx




  reply	other threads:[~2026-04-27 11:03 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com>
2026-04-22 12:56 ` [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Peter Zijlstra
2026-04-22 13:13   ` Peter Zijlstra
2026-04-23 10:38     ` Mathias Stearn
     [not found]     ` <CAHnCjA2fa+dP1+yCYNQrTXQaW-JdtfMj7wMikwMeeCRg-3NhiA@mail.gmail.com>
2026-04-23 11:48       ` Thomas Gleixner
2026-04-23 12:11         ` Mathias Stearn
2026-04-23 17:19           ` Thomas Gleixner
2026-04-23 17:38             ` Chris Kennelly
2026-04-23 17:47               ` Mathieu Desnoyers
2026-04-23 19:39               ` Thomas Gleixner
2026-04-23 17:41             ` Linus Torvalds
2026-04-23 18:35               ` Mathias Stearn
2026-04-23 18:53               ` Mark Rutland
2026-04-23 21:03               ` Thomas Gleixner
2026-04-23 21:28                 ` Linus Torvalds
2026-04-23 23:08                   ` Linus Torvalds
2026-04-27  7:06                   ` Florian Weimer
2026-04-27 16:12                     ` Linus Torvalds
2026-04-22 13:09 ` Mark Rutland
2026-04-22 17:49   ` Thomas Gleixner
2026-04-22 18:11     ` Mark Rutland
2026-04-22 19:47       ` Thomas Gleixner
2026-04-23  1:48         ` Jinjie Ruan
2026-04-23  5:53           ` Dmitry Vyukov
2026-04-23 10:39             ` Thomas Gleixner
2026-04-23 10:51               ` Mathias Stearn
2026-04-23 12:24                 ` David Laight
2026-04-23 19:31                 ` Thomas Gleixner
2026-04-24  7:56                   ` Dmitry Vyukov
2026-04-24  8:32                     ` Mathias Stearn
2026-04-24  9:30                       ` Dmitry Vyukov
2026-04-24 14:16                       ` Thomas Gleixner
2026-04-24 15:03                         ` Peter Zijlstra
2026-04-24 19:44                           ` Thomas Gleixner
2026-04-26 22:04                             ` Thomas Gleixner
2026-04-27  7:40                               ` Florian Weimer
2026-04-27 11:03                                 ` Thomas Gleixner [this message]
2026-04-23 12:11             ` Alejandro Colomar
2026-04-23 12:54               ` Mathieu Desnoyers
2026-04-23 12:29             ` Mathieu Desnoyers
2026-04-23 12:36               ` Dmitry Vyukov
2026-04-23 12:53                 ` Mathieu Desnoyers
2026-04-23 12:58                   ` Dmitry Vyukov
2026-04-24 16:45 ` [PATCH] arm64/entry: Fix arm64-specific rseq brokenness (was: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64) " Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h5owzmuz.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=blake.oler@mongodb.com \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=ckennelly@google.com \
    --cc=criu@lists.linux.dev \
    --cc=dalias@libc.org \
    --cc=dvyukov@google.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathias@mongodb.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=regressions@lists.linux.dev \
    --cc=ruanjinjie@huawei.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox