From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BE1D24A07C for ; Mon, 26 Jan 2026 21:51:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769464262; cv=none; b=u/cdKrKuo+UtI/6geEnjc2nyBR43mk/Sz7auhoi6AHP1WXU2H3by+yCdFcfOKL7l/kLN7xPvgxiGs6HOZnywhG6L96C2XW/IMdGZeiuGF/yih2cSrZEi+BB+LqN8/IaKWmvzisPGncbCTS+lKRhq+l48PIBnXiZxSS1USlBZqho= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769464262; c=relaxed/simple; bh=tNO/5wRxDw0zwcAr1hYYsGAylhi6qdpxXPriRAws9v8=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=A8J+ZMTU+f3llAwyQkyiESKzdBGjxZ9HnXV/4D+ilcFY1kwx2laHGFQIGIgaBukuUmlvZl+XzksdrLWQUoj3G5hs/wp7VRnhP1MAOi1uaopMvi38JDduQlTb4+ZLt4d0kl3WPOOhdN122EajVolfVjCEjMYi1T0OflGzEsI4d+Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H8lZYLrs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H8lZYLrs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2005EC116C6; Mon, 26 Jan 2026 21:51:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1769464261; bh=tNO/5wRxDw0zwcAr1hYYsGAylhi6qdpxXPriRAws9v8=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=H8lZYLrsSlAU9T7ka8ACArGmACFqylB+l5KhLyJQo8C7Ax4+IYJ1b/rStDfF+fxbz Yijp4olbwOmLdJhjegaJJ61ka+KT4ADCfOfexUwF7Dc09yGYbn8fTgU7WKoaZNZxut +BeqfDPFvTTbKI+HHw8dQSkt/AYb7U6tvT34jbEyd0+g0lzXqzhh8FsDT7HNY4QLgl Y9S2TzP0bgshtRWkv0as1oqQDXe7KkX0zQNrqi3/YFWs/YRDITGnRaZlBxdeCsTZOI AVDYBUiOLjEsAJRakzYsn9fbaBy6LWiC1+jLZQ7YEZmKlPD2sIvziJkRaaMve7ytzv A/0PcMlHTATbg== From: Thomas Gleixner To: Peter Zijlstra , David Matlack Cc: LKML , Michael Jeanson , Jens Axboe , Mathieu Desnoyers , "Paul E. McKenney" , X86 ML , Sean Christopherson , Wei Liu Subject: Re: SIGSEGVs after 39a167560a61 ("rseq: Optimize event setting") In-Reply-To: <20260126204745.GP171111@noisy.programming.kicks-ass.net> References: <20260126204745.GP171111@noisy.programming.kicks-ass.net> Date: Mon, 26 Jan 2026 22:50:58 +0100 Message-ID: <87zf60avr1.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Mon, Jan 26 2026 at 21:47, Peter Zijlstra wrote: > On Mon, Jan 26, 2026 at 11:46:27AM -0800, David Matlack wrote: >> I started seeing SIGSEGVs in Google's remote test executor when >> running on hosts at v6.19-rc6. Bisecting led me to this commit: >> >> 39a167560a61 ("rseq: Optimize event setting") >> >> I discovered this issue while running VFIO selftests against v6.19-rc6, >> but realized the issue has nothing to do with the selftests themselves. >> Even running "sleep" as the test is enough to trigger this issue in the >> executor. >> >> I know that Google uses rseq in its userspace software stack, so I >> assume this is some bad interaction between that implementation and >> commit 39a167560a61. >> >> Unfortunately, the remote test executor that is receiving the SIGSEGV is >> not open source so I don't have a repro I can share. But I can easily >> reproduce the issue with my setup so I'd be happy to help with testing >> any fixes or debug patches. >> >> I've attached the .config that I used when reproducing this issue. The >> host I am using is an Intel server with EMR CPUs in case that matters. > > Is this using tcmalloc? If so, that is somewhat expected because > tcmalloc is known to violate upstream rseq ABI. IIRC you should get a > nice splat if you enable rseq debug mode (echo 1 > /debug/rseq/debug). The correctness of these changes has been validated by the rseq selftests and I don't see how that commit would violate the guaranteed ABI. > Perhaps this is the nudge Google needs to go fix this. The real question is whether the segfault is triggered from the rseq sanity checks or if the application segfaults becauses it relies on something something which is not guaranteed by the ABI. As this is secret sauce, I can't tell. Just for the record: I tried to build tcmalloc and get some tests done with it, but the documentation is abysmal and I have no intention to debug that bazel insanity. Thanks, tglx