From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97C5A34F27B; Tue, 28 Apr 2026 08:04:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777363453; cv=none; b=Gc+cPWrUsziaefvp0v45E86m09uNRpgdtevyY9e4ZQ2DyoBYQxBJ/BHQPZFZoFZ454M2bTsa9wBvm1o9k/Ns9TZHB/+pHqambZ2dyFCmi9U/nO+Ro8nkaD36xnpUCZamRTXui7WuHL6Q1YKmCFI1wXuJC46SR/Fz5wHTmPO5XUg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777363453; c=relaxed/simple; bh=HrHvz6p2egb5XfEYj1cV1dwO4N5ZIX8W2t7qiksw3HY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=doJCHXVAXSPi9AqA0bCa8MzKJSfP3HB9rR2j9T+5jJV63/8jfVH7eqF3o+SWzAw6rN/r7otZJdv1VryoaAaCrt0asrL9pu4aQkoFjDZ0GfbP+1ddNIjaKSU6aFjy5aZaKwzl4+DBLiSdncuuK+0FS/+nAyCHpXr1zdgIiLHtAEo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=TeID8aFq; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="TeID8aFq" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=3OaSAgTxjK2QR1QU0n+b0SIWBdszlU+aMsuYQB2OmPg=; b=TeID8aFq3/o1MU7dAA0ZD5tQ4e sJexAflc0AubSphjKCeh8h+XA92B8ybTs7Jd2NQGp9ap+wkST6tykVFBs3VTiU4lpmCBub2DrOC/j RT+ZkfpgNSxXdXAfMPKK/wme/PPAYKCXVFnhNjVWzPluO7vYSkDlC2EcWt1yhPCuS6zxIV9pC2L8Y flWWh7ed5kDhru4zofOYYPcIDVjb4fWj/csMrYhgBlb8aCHGfoLI5zpOn86dU+sXRdT6C+3E1tjD1 SjgSMYAcFLyH+Ky/l9IX9VT5u26fTLI3dxgJWwtbZZJE7Wi4WTe5X8y8DcfjC1KYtWIFP0FVBxIZZ ObGgwntQ==; Received: from 2001-1c00-8d85-4b00-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:4b00:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1wHdQG-00000002LuV-3bGW; Tue, 28 Apr 2026 08:04:01 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 5DB3E301CEB; Tue, 28 Apr 2026 10:03:59 +0200 (CEST) Date: Tue, 28 Apr 2026 10:03:59 +0200 From: Peter Zijlstra To: Thomas Gleixner Cc: Mathias Stearn , Dmitry Vyukov , Jinjie Ruan , linux-man@vger.kernel.org, Mark Rutland , Mathieu Desnoyers , Catalin Marinas , Will Deacon , Boqun Feng , "Paul E. McKenney" , Chris Kennelly , regressions@lists.linux.dev, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Ingo Molnar , Blake Oler , Florian Weimer , Rich Felker , Matthew Wilcox , Greg Kroah-Hartman , Linus Torvalds Subject: Re: [REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere Message-ID: <20260428080359.GI3126523@noisy.programming.kicks-ass.net> References: <87ik9i0xlj.ffs@tglx> <87a4ut1njh.ffs@tglx> <87v7dgzbo7.ffs@tglx> <20260424150318.GE641209@noisy.programming.kicks-ass.net> <87se8kywhb.ffs@tglx> <87jyttz8cf.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87jyttz8cf.ffs@tglx> On Mon, Apr 27, 2026 at 12:04:48AM +0200, Thomas Gleixner wrote: > +Optimized RSEQ V2 > +----------------- > + > +On architectures which utilize the generic entry code and generic TIF bits > +the kernel supports runtime optimizations for RSEQ, which also enable > +enhanced features like scheduler time slice extensions. > + > +To enable them a task has to register the RSEQ region with at least the > +length advertised by getauxval(AT_RSEQ_FEATURE_SIZE). > + > +If existing binaries register with RSEQ_ORIG_SIZE (32 bytes), the kernel > +keeps the legacy low performance mode enabled to fulfil the expectations > +existing users regarding the original RSEQ implementation behaviour. > + > +The following table documents the ABI and behavioral guarantees of the > +legacy and the optimized V2 mode. > + > +.. list-table:: RSEQ modes > + :header-rows: 1 > + > + * - Nr > + - What > + - Legacy > + - Optimized V2 > + * - 1 > + - The cpu_id_start, cpu_id, node_id and mm_cid fields (User mode read > + only) > + - Updated by the kernel unconditionally after each context switch and > + before signal delivery > + - Updated by the kernel if and only if they change, i.e. if the task > + is migrated or mm_cid changes > + * - 2 > + - The rseq_cs critical section field > + - Evaluated and handled unconditionally after each context switch and > + before signal delivery > + - Evaluated and handled conditionally only when user space was > + interrupted. Either after being preempted or before signal delivery > + in the interrupted context. > + * - 3 > + - Read only fields > + - No strict enforcement except in debug mode > + - Strict enforcement > + * - 4 > + - membarrier(...RSEQ) > + - All running threads of the process are interrupted and the ID fields > + are rewritten and eventually active critical sections are aborted > + before they return to user space. All threads which are scheduled > + out whether voluntary or not are covered by #1/#2 above. > + - All running threads of the process are interrupted and eventually > + active critical sections are aborted before these threads return to > + user space. The ID fields are only updated if changed as a > + consequence of the interrupt. All threads which are scheduled out > + whether voluntary not are covered by #1/#2 above. > + * - 5 > + - Time slice extensions > + - Not supported > + - Supported I'm sure its cute when rendered, but when read as text this is nigh on unreadable. > +The legacy mode is obviously less performant as it does unconditional > +updates and critical section checks even if not strictly required by the > +ABI contract. That can't be changed anymore as some users depend on that > +observed behavior, which in turn enables them to violate the ABI and > +overwrite the cpu_id_start field for their own purposes. This is obviously > +discouraged as it renders RSEQ incompatible with the intended usage and > +breaks the expectation of other libraries in the same application. > + > +The ABI compliant optimized mode, which respects the read only fields, does > +not require unconditional updates and therefore is way more performant. The > +kernel validates the read only fields for compliance. If user space > +modifies them, the process is killed. Compliant usage allows multiple > +libraries in the same application to benefit from the RSEQ functionality > +without disturbing each other. > +