From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC9A43793C3 for ; Sat, 28 Mar 2026 12:41:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774701684; cv=none; b=gtTM1nvdbexzPTvioPSIM/UAgag1hd8u4QmxM/duZlQjlDmWCcCsJxt9xr+hmX42UqztTEHRMgcTfJrEQ2TbQ0Z7vGvwOXXRy6Qv/Vu+dOi21hcnnfxuWMn0IXCRRT5+9M3YpS9ZGO5bNFirVftCfmd4TzN+Gm12V8ec9w8Ptnc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774701684; c=relaxed/simple; bh=KePsuxjtK6VYDlybdqo7LTPStHJ4on+GciPKNSKe7Rw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=Nb0mY3n9h9VmkV3kgJc5JWDwlg9WBer8MPecrMKTH2TKm+v/Xxmd0wKKepSh3uxTJvwOroiV9dhbFixYcNgIJIs9evCOtP8SJyoovt6T62/KejEutsNYF+i3OIrWgkjl9b7su2V9p4KWfbbDt5fDSzqHjVB9wob1pA1EEHE9SBc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DNUtEF+b; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DNUtEF+b" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78336C4CEF7; Sat, 28 Mar 2026 12:41:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774701684; bh=KePsuxjtK6VYDlybdqo7LTPStHJ4on+GciPKNSKe7Rw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=DNUtEF+bTGRNiqxr0y9remPz0BFRrGyhhRj6EB0jY9THxwY3PdC7jYu6umbaSfJm3 F18kpp0w9nLFo4UISrD2nfB1E7Nre55TKd1GZo3qi9uy2xXsanphNAJyy82fIXkv7R WzP/RDNwnN0iuXtlSXTEUEmp8gI+FG93SqNNEVg603POGDfe5jPpPdwIK/OffULkHd +Xt6upkeYk0PVuPd7JM7F2vUnOK5i2kpBDgrwJKoVcne8UQsbQQ3r1VOM2nKKzRrir QLsMBBHpCDTGTXnICjF8rQA2OH0651/boRKuzqlc6KDDFUz9vFJBOByYPejgc7Altl UDVsRG0W/0rHw== From: Thomas Gleixner To: Rich Felker , =?utf-8?Q?Andr=C3=A9?= Almeida Cc: LKML , Mathieu Desnoyers , Sebastian Andrzej Siewior , Carlos O'Donell , Peter Zijlstra , Florian Weimer , Torvald Riegel , Darren Hart , Ingo Molnar , Davidlohr Bueso , Arnd Bergmann , "Liam R . Howlett" , Uros Bizjak , Thomas =?utf-8?Q?Wei=C3=9Fschuh?= Subject: Re: [patch v2 00/11] futex: Address the robust futex unlock race for real In-Reply-To: <20260327165018.GF18807@brightrain.aerifal.cx> References: <20260319225224.853416463@kernel.org> <87bjgackw7.ffs@tglx> <20260326220815.GE18807@brightrain.aerifal.cx> <20260327165018.GF18807@brightrain.aerifal.cx> Date: Sat, 28 Mar 2026 13:41:20 +0100 Message-ID: <87se9kazyn.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Fri, Mar 27 2026 at 12:50, Rich Felker wrote: > On Fri, Mar 27, 2026 at 12:42:35AM -0300, Andr=C3=A9 Almeida wrote: >> So you call the vDSO first. If it fails, it means that the lock is conte= nted >> and you need to call futex(). It will wake a waiter, release the lock and >> clean list_op_pending. > > So would we use the vdso function presence as signal that this > functionality is available? In that case, I think what we would do is: > > 1. Try an uncontended unlock using the vdso. > 2. If it fails, attempt FUTEX_ROBUST_UNLOCK. > 3. If that fails (note: this could be due to seccomp!), fallback to > the old kernel code path, holding off any munmap/etc. while we perform > the userspace unlock. FUTEX_ROBUST_UNLOCK is a flag similar to FUTEX_PRIVATE which is or'ed on FUTEX_WAKE, FUTEX_WAKE_BITSET and FUTEX_UNLOCK_PI to tell the kernel that it should do the unlock for FUTEX_WAKE* and the pointer clearing for all three variants. UNLOCK_PI already does the contended unlock today. So yeah, seccomp might refuse, but then it might refuse plain FUTEX_WAKE* too which leaves you in a creek without a paddle. If the kernel supports ROBUST UNLOCK, but does not expose the VDSO function or lacks VDSO at all, you can still use the syscall for the contended case unlock and limit your user space workaround to the successful uncontended unlock case by using try_cmpxchg() in the library code, which is obviously not covered by the fixup as the kernel does not know about it. I briefly pondered to allow user space to register the critical section (that's how I evaluated the approach in the first place). But that's a can of worms we should not open at all because the kernel needs to know the registers used (to retrieve the pending op pointer) and the condition for successful uncontended unlock. Keeping that in sync would be a nightmare. With the VDSO that's not an issue as the kernel can keep the changes synchronized and validate with selftests that it actually is correct. Thanks, tglx