public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch v2 00/11] futex: Address the robust futex unlock race for real
@ 2026-03-19 23:24 Thomas Gleixner
  2026-03-19 23:24 ` [patch v2 01/11] futex: Move futex task related data into a struct Thomas Gleixner
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 37951 bytes --]

This is a follow up to v1 which can be found here:

     https://lore.kernel.org/20260316162316.356674433@kernel.org

The v1 cover letter contains a detailed analysis or the underlying
problem. TLDR:

The robust futex unlock mechanism is racy in respect to the clearing of the
robust_list_head::list_op_pending pointer because unlock and clearing the
pointer are not atomic. The race window is between the unlock and clearing
the pending op pointer. If the task is forced to exit in this window, exit
will access a potentially invalid pending op pointer when cleaning up the
robust list. That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF. In the
worst case this UAF can lead to memory corruption when unrelated content
has been mapped to the same address by the time the access happens.

User space can't solve this problem without help from the kernel. This
series provides the kernel side infrastructure to help it along:

  1) Combined unlock, pointer clearing, wake-up for the contended case

  2) VDSO based unlock and pointer clearing helpers with a fix-up function
     in the kernel when user space was interrupted within the critical
     section.

Both ensure that the pointer clearing happens _before_ a task exits and the
kernel cleans up the robust list during the exit procedure.

Changes since v1:

   - Use a dedicated modifier flag FUTEX_ROBUST_LIST32 for the futex
     syscall ops to indicate the size of the pending op pointer.  This
     replaces the V1 pointer mangling where bit0 indicated 32-bit size.
     The flag is mandatory for 32-bit applications. - Florian

   - Add a new unsafe_atomic_store_release_user() helper and use it for the
     unlock. If defaults to 'smp_mb(); unsafe_put_user()'. Architectures
     where store implies release can remove the smp_mb() by selecting a
     config option (done for x86). The helper can be overridden by
     architecture implementations. - André, Peter

   - Remove the global exposure of the critical section labels. They are
     now only available through vdsoXX.so.dbg, which is sufficient for
     vdso2c to find them - Thomas, Mathieu

   - Replace the combined mixed size VDSO unlock helper with dedicated
     functions for 64-bit and 32-bit pointers. This requires to evaluate up
     to two ranges in the critical section IP check, but that removes the
     complexity both in the unlock helpers and the architecture specific
     decision and pointer retrieval functions.

   - Use the proper ASM modifiers and reuse the zeroed register, which is
     used for the cmpxchg() for zeroing the pending op pointer instead of
     using a $0 immediate. - Uros

   - Address coding style, documentation and naming feedback - André

Thanks to everyone for feedback and discussion!
   
The modified test case and the delta patch against the previous version is
below.

The series applies on v7.0-rc3 and is also available via git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git locking-futex-v2

According to my basic testing this still works correctly with both sizes,
but that needs obviously more scrutiny especially from the libc people and
André.

If the functionality itself is agreed on we only need to agree on the names
and signatures of the functions exposed through the VDSO before we set them
in stone. That will hopefully not take another 15 years :)

That said, after 20+ years working on futexes I'm still amazed how much
code is required to interact with one or two memory locations in user
space. I long ago stated that the futex code consists of 5% functionality
and 95% corner case handling. After this work episode I'm convinced that
futexes are nothing else than a infinite Rube Goldberg machine in disguise.

Thanks,

	tglx
---
#define _GNU_SOURCE
#include <dlfcn.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>

#include <linux/futex.h>
#include <linux/prctl.h>

#include <sys/prctl.h>
#include <sys/syscall.h>
#include <sys/types.h>

typedef uint32_t (*frtu64_t)(uint32_t *, uint32_t, uint64_t *);
typedef uint32_t (*frtu32_t)(uint32_t *, uint32_t, uint32_t *);

static frtu64_t frtu64;
static frtu32_t frtu32;

static void get_vdso(void)
{
	void *vdso = dlopen("linux-vdso.so.1",
			    RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
	if (!vdso)
		vdso = dlopen("linux-gate.so.1",
			      RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
	if (!vdso)
		vdso = dlopen("linux-vdso32.so.1",
			      RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
	if (!vdso)
		vdso = dlopen("linux-vdso64.so.1",
			      RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
	if (!vdso) {
		printf("Failed to find vDSO\n");
		exit(1);
	}

	frtu64 = (frtu64_t)dlsym(vdso, "__vdso_futex_robust_list64_try_unlock");
	frtu32 = (frtu32_t)dlsym(vdso, "__vdso_futex_robust_list32_try_unlock");
}

#define FUTEX_ROBUST_UNLOCK	 512
#define FUTEX_ROBUST_LIST32	1024

#define UNLOCK64		(FUTEX_WAKE | FUTEX_ROBUST_UNLOCK)
#define UNLOCK64_PI		(FUTEX_UNLOCK_PI | FUTEX_ROBUST_UNLOCK)

#define UNLOCK32		(UNLOCK64 | FUTEX_ROBUST_LIST32)
#define UNLOCK32_PI		(UNLOCK64 | FUTEX_ROBUST_LIST32)

static void set_pop(struct robust_list_head *rhead, pthread_mutex_t *mutex)
{
	rhead->list_op_pending = (struct robust_list *)&mutex->__data.__list.__next;
}

static void *pop_exp(bool sz32, pthread_mutex_t *mutex)
{
	uint64_t exp = (uint64_t)(unsigned long)&mutex->__data.__list.__next;

	if (!sz32)
		return NULL;

	exp &= ~0xFFFFFFFFULL;
	return (void *)(unsigned long)exp;
}

static void unlock_uncontended(bool sz32, struct robust_list_head *rhead,
			       pthread_mutex_t *mutex, pid_t tid)
{
	uint32_t *lock = (uint32_t *)&mutex->__data.__lock;
	void *exp = pop_exp(sz32, mutex);
	pid_t lock_tid;

	set_pop(rhead, mutex);
	*lock = tid;

	if (sz32)
		lock_tid = frtu32(lock, tid, (uint32_t *)&rhead->list_op_pending);
	else
		lock_tid = frtu64(lock, tid, (uint64_t *)&rhead->list_op_pending);

	if (lock_tid != tid)
		printf("Non contended unlock failed. Return: %08x\n", lock_tid);
	else if (rhead->list_op_pending != exp)
		printf("List op not cleared: %16lx\n", (unsigned long) rhead->list_op_pending);
	else if (*lock)
		printf("Non contended unlock failed: LOCK %08x\n", *lock);
	else
		printf("Non contended unlock succeeded\n");
}

static void unlock_syscall(bool sz32, bool pi, struct robust_list_head *rhead,
			   pthread_mutex_t *mutex, pid_t tid)
{
	uint32_t *lock = (uint32_t *)&mutex->__data.__lock;
	void *exp = pop_exp(sz32, mutex);
	unsigned int op;
	int ret;

	if (sz32)
		op = pi ? UNLOCK32_PI : UNLOCK32;
	else
		op = pi ? UNLOCK64_PI : UNLOCK64;

	ret = syscall(SYS_futex, lock, op, !!pi, NULL, (uint32_t *)&rhead->list_op_pending, 0, 0);

	if (ret < 0)
		printf("syscall unlock failed %d\n", errno);
	else if (rhead->list_op_pending != exp)
		printf("List op not cleared in syscall: %16lx\n", (unsigned long) rhead->list_op_pending);
	else if (*lock)
		printf("Contended syscall unlock failed: LOCK %08x\n", *lock);
	else
		printf("Contended unlock syscall succeeded\n");
}

static void unlock_contended(bool sz32, bool pi, struct robust_list_head *rhead,
			     pthread_mutex_t *mutex, pid_t tid)
{
	uint32_t *lock = (uint32_t *)&mutex->__data.__lock;
	pid_t lock_tid;

	set_pop(rhead, mutex);
	*lock = tid | FUTEX_WAITERS;

	if (sz32)
		lock_tid = frtu32(lock, tid, (uint32_t *)&rhead->list_op_pending);
	else
		lock_tid = frtu64(lock, tid, (uint64_t *)&rhead->list_op_pending);

	if (lock_tid == tid)
		printf("Contended unlock succeeded %08x\n", lock_tid);
	else if (rhead->list_op_pending != (struct robust_list *)&mutex->__data.__list.__next)
		printf("List op cleared: %16lx\n", (unsigned long) rhead->list_op_pending);
	else if (!*lock)
		printf("Contended unlock cleared LOCK %08x\n", *lock);
	else
		unlock_syscall(sz32, pi, rhead, mutex, tid);
}

static void test(bool sz32)
{
	pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
	struct robust_list_head *rhead;
	pid_t tid = gettid();
	size_t sz;

	if (sz32 && !frtu32)
		return;
	if (!sz32 && !frtu64)
		return;

	syscall(SYS_get_robust_list, 0, &rhead, &sz);

	printf("Testing non contended unlock %s\n", sz32 ? "POP32" : "POP64");
	unlock_uncontended(sz32, rhead, &mutex, tid);

	printf("Testing contended FUTEX_WAKE unlock %s\n", sz32 ? "POP32" : "POP64");
	unlock_contended(sz32, false, rhead, &mutex, tid);

	printf("Testing contended FUTEX_UNLOCK_PI %s\n", sz32 ? "POP32" : "POP64");
	unlock_contended(sz32, true, rhead, &mutex, tid);
}

int main(int argc, char * const argv[])
{
	get_vdso();

	test(false);
	test(true);

	return 0;
}

---

Delta diff:
---

diff --git a/arch/Kconfig b/arch/Kconfig
index 102ddbd4298e..0c1e6cc101ff 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -403,6 +403,10 @@ config ARCH_32BIT_OFF_T
 config ARCH_32BIT_USTAT_F_TINODE
 	bool
 
+# Selected by architectures when plain stores have release semantics
+config ARCH_STORE_IMPLIES_RELEASE
+	bool
+
 config HAVE_ASM_MODVERSIONS
 	bool
 	help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4be953f0516b..e9437efae787 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_STACKWALK
+	select ARCH_STORE_IMPLIES_RELEASE
 	select ARCH_SUPPORTS_ACPI
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
index 19d8ef130b63..8df8fd6c759d 100644
--- a/arch/x86/entry/vdso/common/vfutex.c
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -2,71 +2,75 @@
 #include <vdso/futex.h>
 
 /*
- * Compat enabled kernels have to take the size bit into account to support the
- * mixed size use case of gaming emulators. Contrary to the kernel robust unlock
- * mechanism all of this does not test for the 32-bit modifier in 32-bit VDSOs
- * and in compat disabled kernels. User space can keep the pieces.
+ * Assembly template for the try unlock functions. The basic functionality is:
+ *
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		lock_cmpxchgl	%ecx, (%rdi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		movq		%rcx, (%rdx)	Set the pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_success is
+ * technically not required, but there for illustration, debugging and testing.
+ *
+ * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functions.
+ * One for the regular 64-bit sized pending operation pointer and one for a
+ * 32-bit sized pointer to support gaming emulators.
+ *
+ * The 32-bit VDSO provides only the one for 32-bit sized pointers.
  */
-#if defined(CONFIG_X86_64) && !defined(BUILD_VDSO32_64)
+#define __stringify_1(x...)	#x
+#define __stringify(x...)	__stringify_1(x)
 
-#ifdef CONFIG_COMPAT
-
-# define ASM_CLEAR_PTR								\
-		"	testl	$1, (%[pop])				\n"	\
-		"	jz	.Lop64					\n"	\
-		"	movl	$0, (%[pad])				\n"	\
-		"	jmp	__vdso_futex_robust_try_unlock_cs_end	\n"	\
-		".Lop64:						\n"	\
-		"	movq	$0, (%[pad])				\n"
-
-# define ASM_PAD_CONSTRAINT	,[pad] "S" (((unsigned long)pop) & ~0x1UL)
-
-#else /* CONFIG_COMPAT */
-
-# define ASM_CLEAR_PTR								\
-		"	movq	$0, (%[pop])				\n"
-
-# define ASM_PAD_CONSTRAINT
-
-#endif /* !CONFIG_COMPAT */
+#define LABEL(name, which)	__stringify(name##_futex_try_unlock_cs_##which:)
 
-#else /* CONFIG_X86_64 && !BUILD_VDSO32_64 */
+#define JNZ_END(name)		"jnz " __stringify(name) "_futex_try_unlock_cs_end\n"
 
-# define ASM_CLEAR_PTR								\
-		"	movl	$0, (%[pad])				\n"
+#define CLEAR_POPQ		"movq	%[zero],  %a[pop]\n"
+#define CLEAR_POPL		"movl	%k[zero], %a[pop]\n"
 
-# define ASM_PAD_CONSTRAINT	,[pad] "S" (((unsigned long)pop) & ~0x1UL)
+#define futex_robust_try_unlock(name, clear_pop, __lock, __tid, __pop)	\
+({									\
+	asm volatile (							\
+		"						\n"	\
+		"	lock cmpxchgl	%k[zero], %a[lock]	\n"	\
+		"						\n"	\
+		LABEL(name, start)					\
+		"						\n"	\
+		JNZ_END(name)						\
+		"						\n"	\
+		LABEL(name, success)					\
+		"						\n"	\
+			clear_pop					\
+		"						\n"	\
+		LABEL(name, end)					\
+		: [tid]   "+&a" (__tid)					\
+		: [lock]  "D"   (__lock),				\
+		  [pop]   "d"   (__pop),				\
+		  [zero]  "S"   (0UL)					\
+		: "memory"						\
+	);								\
+	__tid;								\
+})
 
-#endif /* !CONFIG_X86_64 || BUILD_VDSO32_64 */
-
-uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *pop)
+#ifdef __x86_64__
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
 {
-	asm volatile (
-		".global __vdso_futex_robust_try_unlock_cs_start		\n"
-		".global __vdso_futex_robust_try_unlock_cs_success		\n"
-		".global __vdso_futex_robust_try_unlock_cs_end			\n"
-		"								\n"
-		"       lock cmpxchgl	%[val], (%[ptr])			\n"
-		"								\n"
-		"__vdso_futex_robust_try_unlock_cs_start:			\n"
-		"								\n"
-		"	jnz		__vdso_futex_robust_try_unlock_cs_end	\n"
-		"								\n"
-		"__vdso_futex_robust_try_unlock_cs_success:			\n"
-		"								\n"
-			ASM_CLEAR_PTR
-		"								\n"
-		"__vdso_futex_robust_try_unlock_cs_end:				\n"
-		: [tid] "+a" (tid)
-		: [ptr] "D"  (lock),
-		  [pop] "d" (pop),
-		  [val] "r"  (0)
-		  ASM_PAD_CONSTRAINT
-		: "memory"
-	);
-
-	return tid;
+	return futex_robust_try_unlock(x86_64, CLEAR_POPQ, lock, tid, pop);
 }
 
-uint32_t futex_robust_try_unlock(uint32_t *, uint32_t, void **)
-	__attribute__((weak, alias("__vdso_futex_robust_try_unlock")));
+#ifdef CONFIG_COMPAT
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	return futex_robust_try_unlock(x86_64_compat, CLEAR_POPL, lock, tid, pop);
+}
+#endif /* CONFIG_COMPAT */
+#else  /* __x86_64__ */
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	return futex_robust_try_unlock(x86_32, CLEAR_POPL, lock, tid, pop);
+}
+#endif /* !__x86_64__ */
diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
index b027d2f98bd0..cee8f7f9fe80 100644
--- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
@@ -31,10 +31,7 @@ VERSION
 		__vdso_clock_getres_time64;
 		__vdso_getcpu;
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-		__vdso_futex_robust_try_unlock;
-		__vdso_futex_robust_try_unlock_cs_start;
-		__vdso_futex_robust_try_unlock_cs_success;
-		__vdso_futex_robust_try_unlock_cs_end;
+		__vdso_futex_robust_list32_try_unlock;
 #endif
 	};
 
diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
index e5c0ca9664e1..11dae35358a2 100644
--- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
@@ -33,10 +33,8 @@ VERSION {
 		getrandom;
 		__vdso_getrandom;
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-		__vdso_futex_robust_try_unlock;
-		__vdso_futex_robust_try_unlock_cs_start;
-		__vdso_futex_robust_try_unlock_cs_success;
-		__vdso_futex_robust_try_unlock_cs_end;
+		__vdso_futex_robust_list64_try_unlock;
+		__vdso_futex_robust_list32_try_unlock;
 #endif
 	local: *;
 	};
diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
index 4409d97e7ef6..0e844af63304 100644
--- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
@@ -23,10 +23,8 @@ VERSION {
 		__vdso_time;
 		__vdso_clock_getres;
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-		__vdso_futex_robust_try_unlock;
-		__vdso_futex_robust_try_unlock_cs_start;
-		__vdso_futex_robust_try_unlock_cs_success;
-		__vdso_futex_robust_try_unlock_cs_end;
+		__vdso_futex_robust_list64_try_unlock;
+		__vdso_futex_robust_list32_try_unlock;
 #endif
 	local: *;
 	};
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5ccb45840f79..ad87818d42a0 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -78,13 +78,28 @@ static void vdso_futex_robust_unlock_update_ips(void)
 {
 	const struct vdso_image *image = current->mm->context.vdso_image;
 	unsigned long vdso = (unsigned long) current->mm->context.vdso;
+	struct futex_mm_data *fd = &current->mm->futex;
+	struct futex_unlock_cs_range *csr = fd->unlock_cs_ranges;
 
-	current->mm->futex.unlock_cs_start_ip =
-		vdso + image->sym___vdso_futex_robust_try_unlock_cs_start;
-	current->mm->futex.unlock_cs_success_ip =
-		vdso + image->sym___vdso_futex_robust_try_unlock_cs_success;
-	current->mm->futex.unlock_cs_end_ip =
-		vdso + image->sym___vdso_futex_robust_try_unlock_cs_end;
+	fd->unlock_cs_num_ranges = 0;
+#ifdef CONFIG_X86_64
+	if (image->sym_x86_64_futex_try_unlock_cs_start) {
+		csr->start_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_start;
+		csr->end_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_end;
+		csr->pop_size32 = 0;
+		csr++;
+		fd->unlock_cs_num_ranges++;
+	}
+#endif /* CONFIG_X86_64 */
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
+	if (image->sym_x86_32_futex_try_unlock_cs_start) {
+		csr->start_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_start;
+		csr->end_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_end;
+		csr->pop_size32 = 1;
+		fd->unlock_cs_num_ranges++;
+	}
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
 }
 #else
 static inline void vdso_futex_robust_unlock_update_ips(void) { }
diff --git a/arch/x86/include/asm/futex_robust.h b/arch/x86/include/asm/futex_robust.h
index bcabb289d18e..e87954703ae2 100644
--- a/arch/x86/include/asm/futex_robust.h
+++ b/arch/x86/include/asm/futex_robust.h
@@ -4,41 +4,16 @@
 
 #include <asm/ptrace.h>
 
-static __always_inline bool x86_futex_needs_robust_unlock_fixup(struct pt_regs *regs)
+static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs)
 {
 	/*
-	 * This is tricky in the compat case as it has to take the size check
-	 * into account. See the ASM magic in the VDSO vfutex code. If compat is
-	 * disabled or this is a 32-bit kernel then ZF is authoritive no matter
-	 * what.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || !IS_ENABLED(CONFIG_IA32_EMULATION))
-		return !!(regs->flags & X86_EFLAGS_ZF);
-
-	/*
-	 * For the compat case, the core code already established that regs->ip
-	 * is >= cs_start and < cs_end. Now check whether it is at the
-	 * conditional jump which checks the cmpxchg() or if it succeeded and
-	 * does the size check, which obviously modifies ZF too.
-	 */
-	if (regs->ip >= current->mm->futex.unlock_cs_success_ip)
-		return true;
-	/*
-	 * It's at the jnz right after the cmpxchg(). ZF tells whether this
-	 * succeeded or not.
+	 * If ZF is set then the cmpxchg succeeded and the pending op pointer
+	 * needs to be cleared.
 	 */
-	return !!(regs->flags & X86_EFLAGS_ZF);
-}
-
-#define arch_futex_needs_robust_unlock_fixup(regs)		\
-	x86_futex_needs_robust_unlock_fixup(regs)
-
-static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs)
-{
-	return (void __user *)regs->dx;
+	return regs->flags & X86_EFLAGS_ZF ? (void __user *)regs->dx : NULL;
 }
 
-#define arch_futex_robust_unlock_get_pop(regs)			\
+#define arch_futex_robust_unlock_get_pop(regs)	\
 	x86_futex_robust_unlock_get_pop(regs)
 
 #endif /* _ASM_X86_FUTEX_ROBUST_H */
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 1ed7a2ae600d..b96a6f04d677 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -25,9 +25,12 @@ struct vdso_image {
 	long sym_int80_landing_pad;
 	long sym_vdso32_sigreturn_landing_pad;
 	long sym_vdso32_rt_sigreturn_landing_pad;
-	long sym___vdso_futex_robust_try_unlock_cs_start;
-	long sym___vdso_futex_robust_try_unlock_cs_success;
-	long sym___vdso_futex_robust_try_unlock_cs_end;
+	long sym_x86_64_futex_try_unlock_cs_start;
+	long sym_x86_64_futex_try_unlock_cs_end;
+	long sym_x86_64_compat_futex_try_unlock_cs_start;
+	long sym_x86_64_compat_futex_try_unlock_cs_end;
+	long sym_x86_32_futex_try_unlock_cs_start;
+	long sym_x86_32_futex_try_unlock_cs_end;
 };
 
 extern const struct vdso_image vdso64_image;
diff --git a/arch/x86/tools/vdso2c.c b/arch/x86/tools/vdso2c.c
index 47012474ccc4..2d01e511ca8a 100644
--- a/arch/x86/tools/vdso2c.c
+++ b/arch/x86/tools/vdso2c.c
@@ -82,9 +82,12 @@ struct vdso_sym required_syms[] = {
 	{"int80_landing_pad",				true},
 	{"vdso32_rt_sigreturn_landing_pad",		true},
 	{"vdso32_sigreturn_landing_pad",		true},
-	{"__vdso_futex_robust_try_unlock_cs_start",	true},
-	{"__vdso_futex_robust_try_unlock_cs_success",	true},
-	{"__vdso_futex_robust_try_unlock_cs_end",	true},
+	{"x86_64_futex_try_unlock_cs_start",		true},
+	{"x86_64_futex_try_unlock_cs_end",		true},
+	{"x86_64_compat_futex_try_unlock_cs_start",	true},
+	{"x86_64_compat_futex_try_unlock_cs_end",	true},
+	{"x86_32_futex_try_unlock_cs_start",		true},
+	{"x86_32_futex_try_unlock_cs_end",		true},
 };
 
 __attribute__((format(printf, 1, 2))) __attribute__((noreturn))
diff --git a/include/linux/futex.h b/include/linux/futex.h
index cd347a4b3fac..8e3d46737b03 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -115,18 +115,20 @@ static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
 #include <asm/futex_robust.h>
 
-void __futex_fixup_robust_unlock(struct pt_regs *regs);
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr);
 
-static inline bool futex_within_robust_unlock(struct pt_regs *regs)
+static inline bool futex_within_robust_unlock(struct pt_regs *regs,
+					      struct futex_unlock_cs_range *csr)
 {
 	unsigned long ip = instruction_pointer(regs);
 
-	return ip >= current->mm->futex.unlock_cs_start_ip &&
-		ip < current->mm->futex.unlock_cs_end_ip;
+	return ip >= csr->start_ip && ip < csr->end_ip;
 }
 
 static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
 {
+	struct futex_unlock_cs_range *csr;
+
 	/*
 	 * Avoid dereferencing current->mm if not returning from interrupt.
 	 * current->rseq.event is going to be used anyway in the exit to user
@@ -135,8 +137,17 @@ static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
 	if (!current->rseq.event.user_irq)
 		return;
 
-	if (unlikely(futex_within_robust_unlock(regs)))
-		__futex_fixup_robust_unlock(regs);
+	csr = current->mm->futex.unlock_cs_ranges;
+	if (unlikely(futex_within_robust_unlock(regs, csr))) {
+		__futex_fixup_robust_unlock(regs, csr);
+		return;
+	}
+
+	/* Multi sized robust lists are only supported with CONFIG_COMPAT */
+	if (IS_ENABLED(CONFIG_COMPAT) && current->mm->futex.unlock_cs_num_ranges == 2) {
+		if (unlikely(futex_within_robust_unlock(regs, ++csr)))
+			__futex_fixup_robust_unlock(regs, csr);
+	}
 }
 #else /* CONFIG_FUTEX_ROBUST_UNLOCK */
 static inline void futex_fixup_robust_unlock(struct pt_regs *regs) {}
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index 223f469789c5..90e24a10ed08 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -11,13 +11,15 @@ struct futex_pi_state;
 struct robust_list_head;
 
 /**
- * struct futex_ctrl - Futex related per task data
+ * struct futex_sched_data - Futex related per task data
  * @robust_list:	User space registered robust list pointer
  * @compat_robust_list:	User space registered robust list pointer for compat tasks
+ * @pi_state_list:	List head for Priority Inheritance (PI) state management
+ * @pi_state_cache:	Pointer to cache one PI state object per task
  * @exit_mutex:		Mutex for serializing exit
  * @state:		Futex handling state to handle exit races correctly
  */
-struct futex_ctrl {
+struct futex_sched_data {
 	struct robust_list_head __user		*robust_list;
 #ifdef CONFIG_COMPAT
 	struct compat_robust_list_head __user	*compat_robust_list;
@@ -27,9 +29,20 @@ struct futex_ctrl {
 	struct mutex				exit_mutex;
 	unsigned int				state;
 };
-#else
-struct futex_ctrl { };
-#endif /* !CONFIG_FUTEX */
+
+/**
+ * struct futex_unlock_cs_range - Range for the VDSO unlock critical section
+ * @start_ip:	The start IP of the robust futex unlock critical section (inclusive)
+ * @end_ip:	The end IP of the robust futex unlock critical section (exclusive)
+ * @pop_size32:	Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
+ */
+struct futex_unlock_cs_range {
+	unsigned long	       start_ip;
+	unsigned long	       end_ip;
+	unsigned int	       pop_size32;
+};
+
+#define FUTEX_ROBUST_MAX_CS_RANGES	2
 
 /**
  * struct futex_mm_data - Futex related per MM data
@@ -41,18 +54,9 @@ struct futex_ctrl { };
  * @phash_atomic:		Aggregate value for @phash_ref
  * @phash_ref:			Per CPU reference counter for a private hash
  *
- * @unlock_cs_start_ip:		The start IP of the robust futex unlock critical section
- *
- * @unlock_cs_success_ip:	The IP of the robust futex unlock critical section which
- *				indicates that the unlock (cmpxchg) was successful
- *				Required to handle the compat size insanity for mixed mode
- *				game emulators.
- *
- *				Not evaluated by the core code as that only
- *				evaluates the start/end range. Can therefore be 0 if the
- *				architecture does not care.
- *
- * @unlock_cs_end_ip:		The end IP of the robust futex unlock critical section
+ * @unlock_cs_num_ranges:	The number of critical section ranges for VDSO assisted unlock
+ *				of robust futexes.
+ * @unlock_cs_ranges:		The critical section ranges for VDSO assisted unlock
  */
 struct futex_mm_data {
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
@@ -65,10 +69,14 @@ struct futex_mm_data {
 	unsigned int			__percpu *phash_ref;
 #endif
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-	unsigned long			unlock_cs_start_ip;
-	unsigned long			unlock_cs_success_ip;
-	unsigned long			unlock_cs_end_ip;
+	unsigned int			unlock_cs_num_ranges;
+	struct futex_unlock_cs_range	unlock_cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
 #endif
 };
 
+#else
+struct futex_sched_data { };
+struct futex_mm_data { };
+#endif /* !CONFIG_FUTEX */
+
 #endif /* _LINUX_FUTEX_TYPES_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 266d4859e322..a5d5c0ec3c64 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1329,7 +1329,7 @@ struct task_struct {
 	u32				rmid;
 #endif
 
-	struct futex_ctrl		futex;
+	struct futex_sched_data		futex;
 
 #ifdef CONFIG_PERF_EVENTS
 	u8				perf_recursion[PERF_NR_CONTEXTS];
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 809e4f7dfdbd..bc41d619f9a3 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -644,6 +644,15 @@ static inline void user_access_restore(unsigned long flags) { }
 #define user_read_access_end user_access_end
 #endif
 
+#ifndef unsafe_atomic_store_release_user
+# define unsafe_atomic_store_release_user(val, uptr, elbl)		\
+	do {								\
+		if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE))	\
+			smp_mb();					\
+		unsafe_put_user(val, uptr, elbl);			\
+	} while (0)
+#endif
+
 /* Define RW variant so the below _mode macro expansion works */
 #define masked_user_rw_access_begin(u)	masked_user_access_begin(u)
 #define user_rw_access_begin(u, s)	user_access_begin(u, s)
diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index ab9d89748595..9a0f564f1737 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -26,23 +26,48 @@
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
 #define FUTEX_UNLOCK_ROBUST	512
-#define FUTEX_CMD_MASK		~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | FUTEX_UNLOCK_ROBUST)
-
-#define FUTEX_WAIT_PRIVATE	(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_PRIVATE	(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_REQUEUE_PRIVATE	(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_OP_PRIVATE	(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI_PRIVATE	(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI2_PRIVATE	(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
-#define FUTEX_UNLOCK_PI_PRIVATE	(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_ROBUST_LIST32	1024
+
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | \
+					  FUTEX_UNLOCK_ROBUST | FUTEX_ROBUST_LIST32)
+
+#define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_REQUEUE_PRIVATE		(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PRIVATE	(FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_OP_PRIVATE		(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI_PRIVATE		(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI2_PRIVATE		(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_PRIVATE		(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_TRYLOCK_PI_PRIVATE	(FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+
+/*
+ * Operations to unlock a futex, clear the robust list pending op pointer and
+ * wake waiters.
+ */
+#define FUTEX_UNLOCK_PI_LIST64			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_PI_LIST64_PRIVATE		(FUTEX_UNLOCK_PI_LIST64 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_LIST32			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_PI_LIST32_PRIVATE		(FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST64		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE	(FUTEX_UNLOCK_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST32		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE	(FUTEX_UNLOCK_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST64		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST32		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_BITSET_LIST32_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST32 | FUTEX_PRIVATE_FLAG)
 
 /*
  * Flags for futex2 syscalls.
@@ -182,23 +207,6 @@ struct robust_list_head {
 #define FUTEX_ROBUST_MOD_PI		(0x1UL)
 #define FUTEX_ROBUST_MOD_MASK		(FUTEX_ROBUST_MOD_PI)
 
-/*
- * Modifier for FUTEX_ROBUST_UNLOCK uaddr2. Required to distinguish the storage
- * size for the robust_list_head::list_pending_op. This solves two problems:
- *
- *	1) COMPAT tasks
- *
- *	2) The mixed mode magic gaming use case which has both 32-bit and 64-bit
- *	   robust lists. Oh well....
- *
- * Long story short: 32-bit userspace must set this bit unconditionally to
- * ensure that it can run on a 64-bit kernel in compat mode. If user space
- * screws that up a 64-bit kernel will happily clear the full 64-bits. 32-bit
- * kernels return an error code if the bit is not set.
- */
-#define FUTEX_ROBUST_UNLOCK_MOD_32BIT	(0x1UL)
-#define FUTEX_ROBUST_UNLOCK_MOD_MASK	(FUTEX_ROBUST_UNLOCK_MOD_32BIT)
-
 /*
  * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a
  * match of any bit.
diff --git a/include/vdso/futex.h b/include/vdso/futex.h
index 8061bfcb6b92..3cd175eefe64 100644
--- a/include/vdso/futex.h
+++ b/include/vdso/futex.h
@@ -2,15 +2,14 @@
 #ifndef _VDSO_FUTEX_H
 #define _VDSO_FUTEX_H
 
-#include <linux/types.h>
-
-struct robust_list;
+#include <uapi/linux/types.h>
 
 /**
- * __vdso_futex_robust_try_unlock - Try to unlock an uncontended robust futex
+ * __vdso_futex_robust_list64_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 64-bit pending op pointer
  * @lock:	Pointer to the futex lock object
  * @tid:	The TID of the calling task
- * @op:		Pointer to the task's robust_list_head::list_pending_op
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
  *
  * Return: The content of *@lock. On success this is the same as @tid.
  *
@@ -28,17 +27,26 @@ struct robust_list;
  *
  * User space uses it in the following way:
  *
- * if (__vdso_futex_robust_try_unlock(lock, tid, &pending_op) != tid)
+ * if (__vdso_futex_robust_list64_try_unlock(lock, tid, &pending_op) != tid)
  *	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
  *
  * If the unlock attempt fails due to the FUTEX_WAITERS bit set in the lock,
  * then the syscall does the unlock, clears the pending op pointer and wakes the
  * requested number of waiters.
+ */
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop);
+
+/**
+ * __vdso_futex_robust_list32_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 32-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
  *
- * The @op pointer is intentionally void. It has the same requirements as the
- * @uaddr2 argument for sys_futex(FUTEX_ROBUST_UNLOCK) operations. See the
- * modifier and the related documentation in include/uapi/linux/futex.h
+ * Same as __vdso_futex_robust_list64_try_unlock() just with a 32-bit @pop pointer.
  */
-uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *op);
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop);
 
 #endif
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 7957edd46b89..6a9c04471c44 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -46,6 +46,8 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <vdso/futex.h>
+
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
@@ -1434,17 +1436,9 @@ static void exit_pi_state_list(struct task_struct *curr)
 static inline void exit_pi_state_list(struct task_struct *curr) { }
 #endif
 
-static inline bool mask_pop_addr(void __user **pop)
-{
-	unsigned long addr = (unsigned long)*pop;
-
-	*pop = (void __user *) (addr & ~FUTEX_ROBUST_UNLOCK_MOD_MASK);
-	return !!(addr & FUTEX_ROBUST_UNLOCK_MOD_32BIT);
-}
-
-bool futex_robust_list_clear_pending(void __user *pop)
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
 {
-	bool size32bit = mask_pop_addr(&pop);
+	bool size32bit = !!(flags & FLAGS_ROBUST_LIST32);
 
 	if (!IS_ENABLED(CONFIG_64BIT) && !size32bit)
 		return false;
@@ -1456,15 +1450,14 @@ bool futex_robust_list_clear_pending(void __user *pop)
 }
 
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-void __futex_fixup_robust_unlock(struct pt_regs *regs)
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr)
 {
-	void __user *pop;
+	void __user *pop = arch_futex_robust_unlock_get_pop(regs);
 
-	if (!arch_futex_needs_robust_unlock_fixup(regs))
+	if (!pop)
 		return;
 
-	pop = arch_futex_robust_unlock_get_pop(regs);
-	futex_robust_list_clear_pending(pop);
+	futex_robust_list_clear_pending(pop, csr->cs_pop_size32 ? FLAGS_ROBUST_LIST32 : 0);
 }
 #endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
 
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index b1aaa90f1779..31a5bae8b470 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -41,6 +41,7 @@
 #define FLAGS_STRICT		0x0100
 #define FLAGS_MPOL		0x0200
 #define FLAGS_UNLOCK_ROBUST	0x0400
+#define FLAGS_ROBUST_LIST32	0x0800
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
@@ -56,6 +57,9 @@ static inline unsigned int futex_to_flags(unsigned int op)
 	if (op & FUTEX_UNLOCK_ROBUST)
 		flags |= FLAGS_UNLOCK_ROBUST;
 
+	if (op & FUTEX_ROBUST_LIST32)
+		flags |= FLAGS_ROBUST_LIST32;
+
 	return flags;
 }
 
@@ -452,6 +456,6 @@ extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *p
 
 extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock);
 
-bool futex_robust_list_clear_pending(void __user *pop);
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags);
 
 #endif /* _FUTEX_H */
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index b8c76b6242e4..05ca360a7a30 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -1298,7 +1298,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop)
 	if (ret || !(flags & FLAGS_UNLOCK_ROBUST))
 		return ret;
 
-	if (!futex_robust_list_clear_pending(pop))
+	if (!futex_robust_list_clear_pending(pop, flags))
 		return -EFAULT;
 
 	return 0;
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index 45effcf42961..c9723d3d17f1 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -157,16 +157,19 @@ static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __us
 	if (!(flags & FLAGS_UNLOCK_ROBUST))
 		return true;
 
-	/* First unlock the futex. */
-	if (put_user(0U, uaddr))
-		return false;
+	/* First unlock the futex, which requires release semantics. */
+	scoped_user_write_access(uaddr, efault)
+		unsafe_atomic_store_release_user(0, uaddr, efault);
 
 	/*
 	 * Clear the pending list op now. If that fails, then the task is in
-	 * deeper trouble as the robust list head is usually part of TLS. The
-	 * chance of survival is close to zero.
+	 * deeper trouble as the robust list head is usually part of the TLS.
+	 * The chance of survival is close to zero.
 	 */
-	return futex_robust_list_clear_pending(pop);
+	return futex_robust_list_clear_pending(pop, flags);
+
+efault:
+	return false;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch v2 01/11] futex: Move futex task related data into a struct
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 14:59   ` André Almeida
  2026-03-19 23:24 ` [patch v2 02/11] futex: Move futex related mm_struct " Thomas Gleixner
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Having all these members in task_struct along with the required #ifdeffery
is annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: Rename the struct and add the missing kernel doc - Andre
---
 Documentation/locking/robust-futexes.rst |    8 ++--
 include/linux/futex.h                    |   12 ++----
 include/linux/futex_types.h              |   36 ++++++++++++++++++++
 include/linux/sched.h                    |   16 ++-------
 kernel/exit.c                            |    4 +-
 kernel/futex/core.c                      |   55 +++++++++++++++----------------
 kernel/futex/pi.c                        |   26 +++++++-------
 kernel/futex/syscalls.c                  |   23 ++++--------
 8 files changed, 99 insertions(+), 81 deletions(-)

--- a/Documentation/locking/robust-futexes.rst
+++ b/Documentation/locking/robust-futexes.rst
@@ -94,7 +94,7 @@ time, the kernel checks this user-space
 locks to be cleaned up?
 
 In the common case, at do_exit() time, there is no list registered, so
-the cost of robust futexes is just a simple current->robust_list != NULL
+the cost of robust futexes is just a current->futex.robust_list != NULL
 comparison. If the thread has registered a list, then normally the list
 is empty. If the thread/process crashed or terminated in some incorrect
 way then the list might be non-empty: in this case the kernel carefully
@@ -178,9 +178,9 @@ The patch adds two new syscalls: one to
                      size_t __user *len_ptr);
 
 List registration is very fast: the pointer is simply stored in
-current->robust_list. [Note that in the future, if robust futexes become
-widespread, we could extend sys_clone() to register a robust-list head
-for new threads, without the need of another syscall.]
+current->futex.robust_list. [Note that in the future, if robust futexes
+become widespread, we could extend sys_clone() to register a robust-list
+head for new threads, without the need of another syscall.]
 
 So there is virtually zero overhead for tasks not using robust futexes,
 and even for robust futex users, there is only one extra syscall per
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -64,14 +64,10 @@ enum {
 
 static inline void futex_init_task(struct task_struct *tsk)
 {
-	tsk->robust_list = NULL;
-#ifdef CONFIG_COMPAT
-	tsk->compat_robust_list = NULL;
-#endif
-	INIT_LIST_HEAD(&tsk->pi_state_list);
-	tsk->pi_state_cache = NULL;
-	tsk->futex_state = FUTEX_STATE_OK;
-	mutex_init(&tsk->futex_exit_mutex);
+	memset(&tsk->futex, 0, sizeof(tsk->futex));
+	INIT_LIST_HEAD(&tsk->futex.pi_state_list);
+	tsk->futex.state = FUTEX_STATE_OK;
+	mutex_init(&tsk->futex.exit_mutex);
 }
 
 void futex_exit_recursive(struct task_struct *tsk);
--- /dev/null
+++ b/include/linux/futex_types.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_FUTEX_TYPES_H
+#define _LINUX_FUTEX_TYPES_H
+
+#ifdef CONFIG_FUTEX
+#include <linux/mutex_types.h>
+#include <linux/types.h>
+
+struct compat_robust_list_head;
+struct futex_pi_state;
+struct robust_list_head;
+
+/**
+ * struct futex_sched_data - Futex related per task data
+ * @robust_list:	User space registered robust list pointer
+ * @compat_robust_list:	User space registered robust list pointer for compat tasks
+ * @pi_state_list:	List head for Priority Inheritance (PI) state management
+ * @pi_state_cache:	Pointer to cache one PI state object per task
+ * @exit_mutex:		Mutex for serializing exit
+ * @state:		Futex handling state to handle exit races correctly
+ */
+struct futex_sched_data {
+	struct robust_list_head __user		*robust_list;
+#ifdef CONFIG_COMPAT
+	struct compat_robust_list_head __user	*compat_robust_list;
+#endif
+	struct list_head			pi_state_list;
+	struct futex_pi_state			*pi_state_cache;
+	struct mutex				exit_mutex;
+	unsigned int				state;
+};
+#else
+struct futex_sched_data { };
+#endif /* !CONFIG_FUTEX */
+
+#endif /* _LINUX_FUTEX_TYPES_H */
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -16,6 +16,7 @@
 #include <linux/cpumask_types.h>
 
 #include <linux/cache.h>
+#include <linux/futex_types.h>
 #include <linux/irqflags_types.h>
 #include <linux/smp_types.h>
 #include <linux/pid_types.h>
@@ -64,7 +65,6 @@ struct bpf_net_context;
 struct capture_control;
 struct cfs_rq;
 struct fs_struct;
-struct futex_pi_state;
 struct io_context;
 struct io_uring_task;
 struct mempolicy;
@@ -76,7 +76,6 @@ struct pid_namespace;
 struct pipe_inode_info;
 struct rcu_node;
 struct reclaim_state;
-struct robust_list_head;
 struct root_domain;
 struct rq;
 struct sched_attr;
@@ -1329,16 +1328,9 @@ struct task_struct {
 	u32				closid;
 	u32				rmid;
 #endif
-#ifdef CONFIG_FUTEX
-	struct robust_list_head __user	*robust_list;
-#ifdef CONFIG_COMPAT
-	struct compat_robust_list_head __user *compat_robust_list;
-#endif
-	struct list_head		pi_state_list;
-	struct futex_pi_state		*pi_state_cache;
-	struct mutex			futex_exit_mutex;
-	unsigned int			futex_state;
-#endif
+
+	struct futex_sched_data		futex;
+
 #ifdef CONFIG_PERF_EVENTS
 	u8				perf_recursion[PERF_NR_CONTEXTS];
 	struct perf_event_context	*perf_event_ctxp;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -989,8 +989,8 @@ void __noreturn do_exit(long code)
 	proc_exit_connector(tsk);
 	mpol_put_task_policy(tsk);
 #ifdef CONFIG_FUTEX
-	if (unlikely(current->pi_state_cache))
-		kfree(current->pi_state_cache);
+	if (unlikely(current->futex.pi_state_cache))
+		kfree(current->futex.pi_state_cache);
 #endif
 	/*
 	 * Make sure we are holding no locks:
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -32,18 +32,19 @@
  *  "But they come in a choice of three flavours!"
  */
 #include <linux/compat.h>
-#include <linux/jhash.h>
-#include <linux/pagemap.h>
 #include <linux/debugfs.h>
-#include <linux/plist.h>
+#include <linux/fault-inject.h>
 #include <linux/gfp.h>
-#include <linux/vmalloc.h>
+#include <linux/jhash.h>
 #include <linux/memblock.h>
-#include <linux/fault-inject.h>
-#include <linux/slab.h>
-#include <linux/prctl.h>
 #include <linux/mempolicy.h>
 #include <linux/mmap_lock.h>
+#include <linux/pagemap.h>
+#include <linux/plist.h>
+#include <linux/prctl.h>
+#include <linux/rseq.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
 
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
@@ -829,7 +830,7 @@ void wait_for_owner_exiting(int ret, str
 	if (WARN_ON_ONCE(ret == -EBUSY && !exiting))
 		return;
 
-	mutex_lock(&exiting->futex_exit_mutex);
+	mutex_lock(&exiting->futex.exit_mutex);
 	/*
 	 * No point in doing state checking here. If the waiter got here
 	 * while the task was in exec()->exec_futex_release() then it can
@@ -838,7 +839,7 @@ void wait_for_owner_exiting(int ret, str
 	 * already. Highly unlikely and not a problem. Just one more round
 	 * through the futex maze.
 	 */
-	mutex_unlock(&exiting->futex_exit_mutex);
+	mutex_unlock(&exiting->futex.exit_mutex);
 
 	put_task_struct(exiting);
 }
@@ -1048,7 +1049,7 @@ static int handle_futex_death(u32 __user
 	 *
 	 * In both cases the following conditions are met:
 	 *
-	 *	1) task->robust_list->list_op_pending != NULL
+	 *	1) task->futex.robust_list->list_op_pending != NULL
 	 *	   @pending_op == true
 	 *	2) The owner part of user space futex value == 0
 	 *	3) Regular futex: @pi == false
@@ -1153,7 +1154,7 @@ static inline int fetch_robust_entry(str
  */
 static void exit_robust_list(struct task_struct *curr)
 {
-	struct robust_list_head __user *head = curr->robust_list;
+	struct robust_list_head __user *head = curr->futex.robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1247,7 +1248,7 @@ compat_fetch_robust_entry(compat_uptr_t
  */
 static void compat_exit_robust_list(struct task_struct *curr)
 {
-	struct compat_robust_list_head __user *head = curr->compat_robust_list;
+	struct compat_robust_list_head __user *head = curr->futex.compat_robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1323,7 +1324,7 @@ static void compat_exit_robust_list(stru
  */
 static void exit_pi_state_list(struct task_struct *curr)
 {
-	struct list_head *next, *head = &curr->pi_state_list;
+	struct list_head *next, *head = &curr->futex.pi_state_list;
 	struct futex_pi_state *pi_state;
 	union futex_key key = FUTEX_KEY_INIT;
 
@@ -1407,19 +1408,19 @@ static inline void exit_pi_state_list(st
 
 static void futex_cleanup(struct task_struct *tsk)
 {
-	if (unlikely(tsk->robust_list)) {
+	if (unlikely(tsk->futex.robust_list)) {
 		exit_robust_list(tsk);
-		tsk->robust_list = NULL;
+		tsk->futex.robust_list = NULL;
 	}
 
 #ifdef CONFIG_COMPAT
-	if (unlikely(tsk->compat_robust_list)) {
+	if (unlikely(tsk->futex.compat_robust_list)) {
 		compat_exit_robust_list(tsk);
-		tsk->compat_robust_list = NULL;
+		tsk->futex.compat_robust_list = NULL;
 	}
 #endif
 
-	if (unlikely(!list_empty(&tsk->pi_state_list)))
+	if (unlikely(!list_empty(&tsk->futex.pi_state_list)))
 		exit_pi_state_list(tsk);
 }
 
@@ -1442,10 +1443,10 @@ static void futex_cleanup(struct task_st
  */
 void futex_exit_recursive(struct task_struct *tsk)
 {
-	/* If the state is FUTEX_STATE_EXITING then futex_exit_mutex is held */
-	if (tsk->futex_state == FUTEX_STATE_EXITING)
-		mutex_unlock(&tsk->futex_exit_mutex);
-	tsk->futex_state = FUTEX_STATE_DEAD;
+	/* If the state is FUTEX_STATE_EXITING then futex.exit_mutex is held */
+	if (tsk->futex.state == FUTEX_STATE_EXITING)
+		mutex_unlock(&tsk->futex.exit_mutex);
+	tsk->futex.state = FUTEX_STATE_DEAD;
 }
 
 static void futex_cleanup_begin(struct task_struct *tsk)
@@ -1453,10 +1454,10 @@ static void futex_cleanup_begin(struct t
 	/*
 	 * Prevent various race issues against a concurrent incoming waiter
 	 * including live locks by forcing the waiter to block on
-	 * tsk->futex_exit_mutex when it observes FUTEX_STATE_EXITING in
+	 * tsk->futex.exit_mutex when it observes FUTEX_STATE_EXITING in
 	 * attach_to_pi_owner().
 	 */
-	mutex_lock(&tsk->futex_exit_mutex);
+	mutex_lock(&tsk->futex.exit_mutex);
 
 	/*
 	 * Switch the state to FUTEX_STATE_EXITING under tsk->pi_lock.
@@ -1470,7 +1471,7 @@ static void futex_cleanup_begin(struct t
 	 * be observed in exit_pi_state_list().
 	 */
 	raw_spin_lock_irq(&tsk->pi_lock);
-	tsk->futex_state = FUTEX_STATE_EXITING;
+	tsk->futex.state = FUTEX_STATE_EXITING;
 	raw_spin_unlock_irq(&tsk->pi_lock);
 }
 
@@ -1480,12 +1481,12 @@ static void futex_cleanup_end(struct tas
 	 * Lockless store. The only side effect is that an observer might
 	 * take another loop until it becomes visible.
 	 */
-	tsk->futex_state = state;
+	tsk->futex.state = state;
 	/*
 	 * Drop the exit protection. This unblocks waiters which observed
 	 * FUTEX_STATE_EXITING to reevaluate the state.
 	 */
-	mutex_unlock(&tsk->futex_exit_mutex);
+	mutex_unlock(&tsk->futex.exit_mutex);
 }
 
 void futex_exec_release(struct task_struct *tsk)
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -14,7 +14,7 @@ int refill_pi_state_cache(void)
 {
 	struct futex_pi_state *pi_state;
 
-	if (likely(current->pi_state_cache))
+	if (likely(current->futex.pi_state_cache))
 		return 0;
 
 	pi_state = kzalloc_obj(*pi_state);
@@ -28,17 +28,17 @@ int refill_pi_state_cache(void)
 	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
-	current->pi_state_cache = pi_state;
+	current->futex.pi_state_cache = pi_state;
 
 	return 0;
 }
 
 static struct futex_pi_state *alloc_pi_state(void)
 {
-	struct futex_pi_state *pi_state = current->pi_state_cache;
+	struct futex_pi_state *pi_state = current->futex.pi_state_cache;
 
 	WARN_ON(!pi_state);
-	current->pi_state_cache = NULL;
+	current->futex.pi_state_cache = NULL;
 
 	return pi_state;
 }
@@ -60,7 +60,7 @@ static void pi_state_update_owner(struct
 	if (new_owner) {
 		raw_spin_lock(&new_owner->pi_lock);
 		WARN_ON(!list_empty(&pi_state->list));
-		list_add(&pi_state->list, &new_owner->pi_state_list);
+		list_add(&pi_state->list, &new_owner->futex.pi_state_list);
 		pi_state->owner = new_owner;
 		raw_spin_unlock(&new_owner->pi_lock);
 	}
@@ -96,7 +96,7 @@ void put_pi_state(struct futex_pi_state
 		raw_spin_unlock_irqrestore(&pi_state->pi_mutex.wait_lock, flags);
 	}
 
-	if (current->pi_state_cache) {
+	if (current->futex.pi_state_cache) {
 		kfree(pi_state);
 	} else {
 		/*
@@ -106,7 +106,7 @@ void put_pi_state(struct futex_pi_state
 		 */
 		pi_state->owner = NULL;
 		refcount_set(&pi_state->refcount, 1);
-		current->pi_state_cache = pi_state;
+		current->futex.pi_state_cache = pi_state;
 	}
 }
 
@@ -179,7 +179,7 @@ void put_pi_state(struct futex_pi_state
  *
  * p->pi_lock:
  *
- *	p->pi_state_list -> pi_state->list, relation
+ *	p->futex.pi_state_list -> pi_state->list, relation
  *	pi_mutex->owner -> pi_state->owner, relation
  *
  * pi_state->refcount:
@@ -327,7 +327,7 @@ static int handle_exit_race(u32 __user *
 	 * If the futex exit state is not yet FUTEX_STATE_DEAD, tell the
 	 * caller that the alleged owner is busy.
 	 */
-	if (tsk && tsk->futex_state != FUTEX_STATE_DEAD)
+	if (tsk && tsk->futex.state != FUTEX_STATE_DEAD)
 		return -EBUSY;
 
 	/*
@@ -346,8 +346,8 @@ static int handle_exit_race(u32 __user *
 	 *    *uaddr = 0xC0000000;	     tsk = get_task(PID);
 	 *   }				     if (!tsk->flags & PF_EXITING) {
 	 *  ...				       attach();
-	 *  tsk->futex_state =               } else {
-	 *	FUTEX_STATE_DEAD;              if (tsk->futex_state !=
+	 *  tsk->futex.state =               } else {
+	 *	FUTEX_STATE_DEAD;              if (tsk->futex.state !=
 	 *					  FUTEX_STATE_DEAD)
 	 *				         return -EAGAIN;
 	 *				       return -ESRCH; <--- FAIL
@@ -395,7 +395,7 @@ static void __attach_to_pi_owner(struct
 	pi_state->key = *key;
 
 	WARN_ON(!list_empty(&pi_state->list));
-	list_add(&pi_state->list, &p->pi_state_list);
+	list_add(&pi_state->list, &p->futex.pi_state_list);
 	/*
 	 * Assignment without holding pi_state->pi_mutex.wait_lock is safe
 	 * because there is no concurrency as the object is not published yet.
@@ -439,7 +439,7 @@ static int attach_to_pi_owner(u32 __user
 	 * in futex_exit_release(), we do this protected by p->pi_lock:
 	 */
 	raw_spin_lock_irq(&p->pi_lock);
-	if (unlikely(p->futex_state != FUTEX_STATE_OK)) {
+	if (unlikely(p->futex.state != FUTEX_STATE_OK)) {
 		/*
 		 * The task is on the way out. When the futex state is
 		 * FUTEX_STATE_DEAD, we know that the task has finished
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -25,17 +25,13 @@
  * @head:	pointer to the list-head
  * @len:	length of the list-head, as userspace expects
  */
-SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
-		size_t, len)
+SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head, size_t, len)
 {
-	/*
-	 * The kernel knows only one size for now:
-	 */
+	/* The kernel knows only one size for now. */
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->robust_list = head;
-
+	current->futex.robust_list = head;
 	return 0;
 }
 
@@ -43,9 +39,9 @@ static inline void __user *futex_task_ro
 {
 #ifdef CONFIG_COMPAT
 	if (compat)
-		return p->compat_robust_list;
+		return p->futex.compat_robust_list;
 #endif
-	return p->robust_list;
+	return p->futex.robust_list;
 }
 
 static void __user *futex_get_robust_list_common(int pid, bool compat)
@@ -467,15 +463,13 @@ SYSCALL_DEFINE4(futex_requeue,
 }
 
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE2(set_robust_list,
-		struct compat_robust_list_head __user *, head,
-		compat_size_t, len)
+COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head,
+		       compat_size_t, len)
 {
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->compat_robust_list = head;
-
+	current->futex.compat_robust_list = head;
 	return 0;
 }
 
@@ -515,4 +509,3 @@ SYSCALL_DEFINE6(futex_time32, u32 __user
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 #endif /* CONFIG_COMPAT_32BIT_TIME */
-


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 02/11] futex: Move futex related mm_struct data into a struct
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
  2026-03-19 23:24 ` [patch v2 01/11] futex: Move futex task related data into a struct Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 15:00   ` André Almeida
  2026-03-19 23:24 ` [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Having all these members in mm_struct along with the required #ifdeffery is
annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
 include/linux/futex_types.h |   24 ++++++++
 include/linux/mm_types.h    |   11 ---
 kernel/futex/core.c         |  123 ++++++++++++++++++++------------------------
 3 files changed, 82 insertions(+), 76 deletions(-)

--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -29,8 +29,32 @@ struct futex_sched_data {
 	struct mutex				exit_mutex;
 	unsigned int				state;
 };
+
+/**
+ * struct futex_mm_data - Futex related per MM data
+ * @phash_lock:			Mutex to protect the private hash operations
+ * @phash:			RCU managed pointer to the private hash
+ * @phash_new:			Pointer to a newly allocated private hash
+ * @phash_batches:		Batch state for RCU synchronization
+ * @phash_rcu:			RCU head for call_rcu()
+ * @phash_atomic:		Aggregate value for @phash_ref
+ * @phash_ref:			Per CPU reference counter for a private hash
+ */
+struct futex_mm_data {
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+	struct mutex			phash_lock;
+	struct futex_private_hash	__rcu *phash;
+	struct futex_private_hash	*phash_new;
+	unsigned long			phash_batches;
+	struct rcu_head			phash_rcu;
+	atomic_long_t			phash_atomic;
+	unsigned int			__percpu *phash_ref;
+#endif
+};
+
 #else
 struct futex_sched_data { };
+struct futex_mm_data { };
 #endif /* !CONFIG_FUTEX */
 
 #endif /* _LINUX_FUTEX_TYPES_H */
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1221,16 +1221,7 @@ struct mm_struct {
 		 */
 		seqcount_t mm_lock_seq;
 #endif
-#ifdef CONFIG_FUTEX_PRIVATE_HASH
-		struct mutex			futex_hash_lock;
-		struct futex_private_hash	__rcu *futex_phash;
-		struct futex_private_hash	*futex_phash_new;
-		/* futex-ref */
-		unsigned long			futex_batches;
-		struct rcu_head			futex_rcu;
-		atomic_long_t			futex_atomic;
-		unsigned int			__percpu *futex_ref;
-#endif
+		struct futex_mm_data	futex;
 
 		unsigned long hiwater_rss; /* High-watermark of RSS usage */
 		unsigned long hiwater_vm;  /* High-water virtual memory usage */
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -188,13 +188,13 @@ static struct futex_hash_bucket *
 		return NULL;
 
 	if (!fph)
-		fph = rcu_dereference(key->private.mm->futex_phash);
+		fph = rcu_dereference(key->private.mm->futex.phash);
 	if (!fph || !fph->hash_mask)
 		return NULL;
 
-	hash = jhash2((void *)&key->private.address,
-		      sizeof(key->private.address) / 4,
+	hash = jhash2((void *)&key->private.address, sizeof(key->private.address) / 4,
 		      key->both.offset);
+
 	return &fph->queues[hash & fph->hash_mask];
 }
 
@@ -238,13 +238,12 @@ static bool __futex_pivot_hash(struct mm
 {
 	struct futex_private_hash *fph;
 
-	WARN_ON_ONCE(mm->futex_phash_new);
+	WARN_ON_ONCE(mm->futex.phash_new);
 
-	fph = rcu_dereference_protected(mm->futex_phash,
-					lockdep_is_held(&mm->futex_hash_lock));
+	fph = rcu_dereference_protected(mm->futex.phash, lockdep_is_held(&mm->futex.phash_lock));
 	if (fph) {
 		if (!futex_ref_is_dead(fph)) {
-			mm->futex_phash_new = new;
+			mm->futex.phash_new = new;
 			return false;
 		}
 
@@ -252,8 +251,8 @@ static bool __futex_pivot_hash(struct mm
 	}
 	new->state = FR_PERCPU;
 	scoped_guard(rcu) {
-		mm->futex_batches = get_state_synchronize_rcu();
-		rcu_assign_pointer(mm->futex_phash, new);
+		mm->futex.phash_batches = get_state_synchronize_rcu();
+		rcu_assign_pointer(mm->futex.phash, new);
 	}
 	kvfree_rcu(fph, rcu);
 	return true;
@@ -261,12 +260,12 @@ static bool __futex_pivot_hash(struct mm
 
 static void futex_pivot_hash(struct mm_struct *mm)
 {
-	scoped_guard(mutex, &mm->futex_hash_lock) {
+	scoped_guard(mutex, &mm->futex.phash_lock) {
 		struct futex_private_hash *fph;
 
-		fph = mm->futex_phash_new;
+		fph = mm->futex.phash_new;
 		if (fph) {
-			mm->futex_phash_new = NULL;
+			mm->futex.phash_new = NULL;
 			__futex_pivot_hash(mm, fph);
 		}
 	}
@@ -289,7 +288,7 @@ struct futex_private_hash *futex_private
 	scoped_guard(rcu) {
 		struct futex_private_hash *fph;
 
-		fph = rcu_dereference(mm->futex_phash);
+		fph = rcu_dereference(mm->futex.phash);
 		if (!fph)
 			return NULL;
 
@@ -412,8 +411,7 @@ static int futex_mpol(struct mm_struct *
  * private hash) is returned if existing. Otherwise a hash bucket from the
  * global hash is returned.
  */
-static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph)
+static struct futex_hash_bucket *__futex_hash(union futex_key *key, struct futex_private_hash *fph)
 {
 	int node = key->both.node;
 	u32 hash;
@@ -426,8 +424,7 @@ static struct futex_hash_bucket *
 			return hb;
 	}
 
-	hash = jhash2((u32 *)key,
-		      offsetof(typeof(*key), both.offset) / sizeof(u32),
+	hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / sizeof(u32),
 		      key->both.offset);
 
 	if (node == FUTEX_NO_NODE) {
@@ -442,8 +439,7 @@ static struct futex_hash_bucket *
 		 */
 		node = (hash >> futex_hashshift) % nr_node_ids;
 		if (!node_possible(node)) {
-			node = find_next_bit_wrap(node_possible_map.bits,
-						  nr_node_ids, node);
+			node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
 		}
 	}
 
@@ -460,9 +456,8 @@ static struct futex_hash_bucket *
  * Return: Initialized hrtimer_sleeper structure or NULL if no timeout
  *	   value given
  */
-struct hrtimer_sleeper *
-futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
-		  int flags, u64 range_ns)
+struct hrtimer_sleeper *futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
+					  int flags, u64 range_ns)
 {
 	if (!time)
 		return NULL;
@@ -1551,17 +1546,17 @@ static void __futex_ref_atomic_begin(str
 	 * otherwise it would be impossible for it to have reported success
 	 * from futex_ref_is_dead().
 	 */
-	WARN_ON_ONCE(atomic_long_read(&mm->futex_atomic) != 0);
+	WARN_ON_ONCE(atomic_long_read(&mm->futex.phash_atomic) != 0);
 
 	/*
 	 * Set the atomic to the bias value such that futex_ref_{get,put}()
 	 * will never observe 0. Will be fixed up in __futex_ref_atomic_end()
 	 * when folding in the percpu count.
 	 */
-	atomic_long_set(&mm->futex_atomic, LONG_MAX);
+	atomic_long_set(&mm->futex.phash_atomic, LONG_MAX);
 	smp_store_release(&fph->state, FR_ATOMIC);
 
-	call_rcu_hurry(&mm->futex_rcu, futex_ref_rcu);
+	call_rcu_hurry(&mm->futex.phash_rcu, futex_ref_rcu);
 }
 
 static void __futex_ref_atomic_end(struct futex_private_hash *fph)
@@ -1582,7 +1577,7 @@ static void __futex_ref_atomic_end(struc
 	 * Therefore the per-cpu counter is now stable, sum and reset.
 	 */
 	for_each_possible_cpu(cpu) {
-		unsigned int *ptr = per_cpu_ptr(mm->futex_ref, cpu);
+		unsigned int *ptr = per_cpu_ptr(mm->futex.phash_ref, cpu);
 		count += *ptr;
 		*ptr = 0;
 	}
@@ -1590,7 +1585,7 @@ static void __futex_ref_atomic_end(struc
 	/*
 	 * Re-init for the next cycle.
 	 */
-	this_cpu_inc(*mm->futex_ref); /* 0 -> 1 */
+	this_cpu_inc(*mm->futex.phash_ref); /* 0 -> 1 */
 
 	/*
 	 * Add actual count, subtract bias and initial refcount.
@@ -1598,7 +1593,7 @@ static void __futex_ref_atomic_end(struc
 	 * The moment this atomic operation happens, futex_ref_is_dead() can
 	 * become true.
 	 */
-	ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex_atomic);
+	ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex.phash_atomic);
 	if (!ret)
 		wake_up_var(mm);
 
@@ -1608,8 +1603,8 @@ static void __futex_ref_atomic_end(struc
 
 static void futex_ref_rcu(struct rcu_head *head)
 {
-	struct mm_struct *mm = container_of(head, struct mm_struct, futex_rcu);
-	struct futex_private_hash *fph = rcu_dereference_raw(mm->futex_phash);
+	struct mm_struct *mm = container_of(head, struct mm_struct, futex.phash_rcu);
+	struct futex_private_hash *fph = rcu_dereference_raw(mm->futex.phash);
 
 	if (fph->state == FR_PERCPU) {
 		/*
@@ -1638,7 +1633,7 @@ static void futex_ref_drop(struct futex_
 	/*
 	 * Can only transition the current fph;
 	 */
-	WARN_ON_ONCE(rcu_dereference_raw(mm->futex_phash) != fph);
+	WARN_ON_ONCE(rcu_dereference_raw(mm->futex.phash) != fph);
 	/*
 	 * We enqueue at least one RCU callback. Ensure mm stays if the task
 	 * exits before the transition is completed.
@@ -1650,8 +1645,8 @@ static void futex_ref_drop(struct futex_
 	 *
 	 * futex_hash()			__futex_pivot_hash()
 	 *   guard(rcu);		  guard(mm->futex_hash_lock);
-	 *   fph = mm->futex_phash;
-	 *				  rcu_assign_pointer(&mm->futex_phash, new);
+	 *   fph = mm->futex.phash;
+	 *				  rcu_assign_pointer(&mm->futex.phash, new);
 	 *				futex_hash_allocate()
 	 *				  futex_ref_drop()
 	 *				    fph->state = FR_ATOMIC;
@@ -1666,7 +1661,7 @@ static void futex_ref_drop(struct futex_
 	 * There must be at least one full grace-period between publishing a
 	 * new fph and trying to replace it.
 	 */
-	if (poll_state_synchronize_rcu(mm->futex_batches)) {
+	if (poll_state_synchronize_rcu(mm->futex.phash_batches)) {
 		/*
 		 * There was a grace-period, we can begin now.
 		 */
@@ -1674,7 +1669,7 @@ static void futex_ref_drop(struct futex_
 		return;
 	}
 
-	call_rcu_hurry(&mm->futex_rcu, futex_ref_rcu);
+	call_rcu_hurry(&mm->futex.phash_rcu, futex_ref_rcu);
 }
 
 static bool futex_ref_get(struct futex_private_hash *fph)
@@ -1684,11 +1679,11 @@ static bool futex_ref_get(struct futex_p
 	guard(preempt)();
 
 	if (READ_ONCE(fph->state) == FR_PERCPU) {
-		__this_cpu_inc(*mm->futex_ref);
+		__this_cpu_inc(*mm->futex.phash_ref);
 		return true;
 	}
 
-	return atomic_long_inc_not_zero(&mm->futex_atomic);
+	return atomic_long_inc_not_zero(&mm->futex.phash_atomic);
 }
 
 static bool futex_ref_put(struct futex_private_hash *fph)
@@ -1698,11 +1693,11 @@ static bool futex_ref_put(struct futex_p
 	guard(preempt)();
 
 	if (READ_ONCE(fph->state) == FR_PERCPU) {
-		__this_cpu_dec(*mm->futex_ref);
+		__this_cpu_dec(*mm->futex.phash_ref);
 		return false;
 	}
 
-	return atomic_long_dec_and_test(&mm->futex_atomic);
+	return atomic_long_dec_and_test(&mm->futex.phash_atomic);
 }
 
 static bool futex_ref_is_dead(struct futex_private_hash *fph)
@@ -1714,18 +1709,14 @@ static bool futex_ref_is_dead(struct fut
 	if (smp_load_acquire(&fph->state) == FR_PERCPU)
 		return false;
 
-	return atomic_long_read(&mm->futex_atomic) == 0;
+	return atomic_long_read(&mm->futex.phash_atomic) == 0;
 }
 
 int futex_mm_init(struct mm_struct *mm)
 {
-	mutex_init(&mm->futex_hash_lock);
-	RCU_INIT_POINTER(mm->futex_phash, NULL);
-	mm->futex_phash_new = NULL;
-	/* futex-ref */
-	mm->futex_ref = NULL;
-	atomic_long_set(&mm->futex_atomic, 0);
-	mm->futex_batches = get_state_synchronize_rcu();
+	memset(&mm->futex, 0, sizeof(mm->futex));
+	mutex_init(&mm->futex.phash_lock);
+	mm->futex.phash_batches = get_state_synchronize_rcu();
 	return 0;
 }
 
@@ -1733,9 +1724,9 @@ void futex_hash_free(struct mm_struct *m
 {
 	struct futex_private_hash *fph;
 
-	free_percpu(mm->futex_ref);
-	kvfree(mm->futex_phash_new);
-	fph = rcu_dereference_raw(mm->futex_phash);
+	free_percpu(mm->futex.phash_ref);
+	kvfree(mm->futex.phash_new);
+	fph = rcu_dereference_raw(mm->futex.phash);
 	if (fph)
 		kvfree(fph);
 }
@@ -1746,10 +1737,10 @@ static bool futex_pivot_pending(struct m
 
 	guard(rcu)();
 
-	if (!mm->futex_phash_new)
+	if (!mm->futex.phash_new)
 		return true;
 
-	fph = rcu_dereference(mm->futex_phash);
+	fph = rcu_dereference(mm->futex.phash);
 	return futex_ref_is_dead(fph);
 }
 
@@ -1791,7 +1782,7 @@ static int futex_hash_allocate(unsigned
 	 * Once we've disabled the global hash there is no way back.
 	 */
 	scoped_guard(rcu) {
-		fph = rcu_dereference(mm->futex_phash);
+		fph = rcu_dereference(mm->futex.phash);
 		if (fph && !fph->hash_mask) {
 			if (custom)
 				return -EBUSY;
@@ -1799,15 +1790,15 @@ static int futex_hash_allocate(unsigned
 		}
 	}
 
-	if (!mm->futex_ref) {
+	if (!mm->futex.phash_ref) {
 		/*
 		 * This will always be allocated by the first thread and
 		 * therefore requires no locking.
 		 */
-		mm->futex_ref = alloc_percpu(unsigned int);
-		if (!mm->futex_ref)
+		mm->futex.phash_ref = alloc_percpu(unsigned int);
+		if (!mm->futex.phash_ref)
 			return -ENOMEM;
-		this_cpu_inc(*mm->futex_ref); /* 0 -> 1 */
+		this_cpu_inc(*mm->futex.phash_ref); /* 0 -> 1 */
 	}
 
 	fph = kvzalloc(struct_size(fph, queues, hash_slots),
@@ -1830,14 +1821,14 @@ static int futex_hash_allocate(unsigned
 		wait_var_event(mm, futex_pivot_pending(mm));
 	}
 
-	scoped_guard(mutex, &mm->futex_hash_lock) {
+	scoped_guard(mutex, &mm->futex.phash_lock) {
 		struct futex_private_hash *free __free(kvfree) = NULL;
 		struct futex_private_hash *cur, *new;
 
-		cur = rcu_dereference_protected(mm->futex_phash,
-						lockdep_is_held(&mm->futex_hash_lock));
-		new = mm->futex_phash_new;
-		mm->futex_phash_new = NULL;
+		cur = rcu_dereference_protected(mm->futex.phash,
+						lockdep_is_held(&mm->futex.phash_lock));
+		new = mm->futex.phash_new;
+		mm->futex.phash_new = NULL;
 
 		if (fph) {
 			if (cur && !cur->hash_mask) {
@@ -1847,7 +1838,7 @@ static int futex_hash_allocate(unsigned
 				 * the second one returns here.
 				 */
 				free = fph;
-				mm->futex_phash_new = new;
+				mm->futex.phash_new = new;
 				return -EBUSY;
 			}
 			if (cur && !new) {
@@ -1877,7 +1868,7 @@ static int futex_hash_allocate(unsigned
 
 		if (new) {
 			/*
-			 * Will set mm->futex_phash_new on failure;
+			 * Will set mm->futex.phash_new on failure;
 			 * futex_private_hash_get() will try again.
 			 */
 			if (!__futex_pivot_hash(mm, new) && custom)
@@ -1900,7 +1891,7 @@ int futex_hash_allocate_default(void)
 				get_nr_threads(current),
 				num_online_cpus());
 
-		fph = rcu_dereference(current->mm->futex_phash);
+		fph = rcu_dereference(current->mm->futex.phash);
 		if (fph) {
 			if (fph->custom)
 				return 0;
@@ -1927,7 +1918,7 @@ static int futex_hash_get_slots(void)
 	struct futex_private_hash *fph;
 
 	guard(rcu)();
-	fph = rcu_dereference(current->mm->futex_phash);
+	fph = rcu_dereference(current->mm->futex.phash);
 	if (fph && fph->hash_mask)
 		return fph->hash_mask + 1;
 	return 0;


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
  2026-03-19 23:24 ` [patch v2 01/11] futex: Move futex task related data into a struct Thomas Gleixner
  2026-03-19 23:24 ` [patch v2 02/11] futex: Move futex related mm_struct " Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 15:01   ` André Almeida
  2026-03-19 23:24 ` [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

The marker for PI futexes in the robust list is a hardcoded 0x1 which lacks
any sensible form of documentation.

Provide proper defines for the bit and the mask and fix up the usage
sites. Thereby convert the boolean pi argument into a modifier argument,
which allows new modifier bits to be trivially added and conveyed.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
 include/uapi/linux/futex.h |    4 +++
 kernel/futex/core.c        |   53 +++++++++++++++++++++------------------------
 2 files changed, 29 insertions(+), 28 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -177,6 +177,10 @@ struct robust_list_head {
  */
 #define ROBUST_LIST_LIMIT	2048
 
+/* Modifiers for robust_list_head::list_op_pending */
+#define FUTEX_ROBUST_MOD_PI		(0x1UL)
+#define FUTEX_ROBUST_MOD_MASK		(FUTEX_ROBUST_MOD_PI)
+
 /*
  * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a
  * match of any bit.
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1009,8 +1009,9 @@ void futex_unqueue_pi(struct futex_q *q)
  * dying task, and do notification if so:
  */
 static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
-			      bool pi, bool pending_op)
+			      unsigned int mod, bool pending_op)
 {
+	bool pi = !!(mod & FUTEX_ROBUST_MOD_PI);
 	u32 uval, nval, mval;
 	pid_t owner;
 	int err;
@@ -1128,21 +1129,21 @@ static int handle_futex_death(u32 __user
  */
 static inline int fetch_robust_entry(struct robust_list __user **entry,
 				     struct robust_list __user * __user *head,
-				     unsigned int *pi)
+				     unsigned int *mod)
 {
 	unsigned long uentry;
 
 	if (get_user(uentry, (unsigned long __user *)head))
 		return -EFAULT;
 
-	*entry = (void __user *)(uentry & ~1UL);
-	*pi = uentry & 1;
+	*entry = (void __user *)(uentry & ~FUTEX_ROBUST_MOD_MASK);
+	*mod = uentry & FUTEX_ROBUST_MOD_MASK;
 
 	return 0;
 }
 
 /*
- * Walk curr->robust_list (very carefully, it's a userspace list!)
+ * Walk curr->futex.robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
@@ -1150,9 +1151,8 @@ static inline int fetch_robust_entry(str
 static void exit_robust_list(struct task_struct *curr)
 {
 	struct robust_list_head __user *head = curr->futex.robust_list;
+	unsigned int limit = ROBUST_LIST_LIMIT, cur_mod, next_mod, pend_mod;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
-	unsigned int next_pi;
 	unsigned long futex_offset;
 	int rc;
 
@@ -1160,7 +1160,7 @@ static void exit_robust_list(struct task
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (fetch_robust_entry(&entry, &head->list.next, &pi))
+	if (fetch_robust_entry(&entry, &head->list.next, &cur_mod))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1171,7 +1171,7 @@ static void exit_robust_list(struct task
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (fetch_robust_entry(&pending, &head->list_op_pending, &pip))
+	if (fetch_robust_entry(&pending, &head->list_op_pending, &pend_mod))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1180,20 +1180,20 @@ static void exit_robust_list(struct task
 		 * Fetch the next entry in the list before calling
 		 * handle_futex_death:
 		 */
-		rc = fetch_robust_entry(&next_entry, &entry->next, &next_pi);
+		rc = fetch_robust_entry(&next_entry, &entry->next, &next_mod);
 		/*
 		 * A pending lock might already be on the list, so
 		 * don't process it twice:
 		 */
 		if (entry != pending) {
 			if (handle_futex_death((void __user *)entry + futex_offset,
-						curr, pi, HANDLE_DEATH_LIST))
+						curr, cur_mod, HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
 			return;
 		entry = next_entry;
-		pi = next_pi;
+		cur_mod = next_mod;
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
@@ -1205,7 +1205,7 @@ static void exit_robust_list(struct task
 
 	if (pending) {
 		handle_futex_death((void __user *)pending + futex_offset,
-				   curr, pip, HANDLE_DEATH_PENDING);
+				   curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
 
@@ -1224,29 +1224,28 @@ static void __user *futex_uaddr(struct r
  */
 static inline int
 compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
-		   compat_uptr_t __user *head, unsigned int *pi)
+		   compat_uptr_t __user *head, unsigned int *pflags)
 {
 	if (get_user(*uentry, head))
 		return -EFAULT;
 
-	*entry = compat_ptr((*uentry) & ~1);
-	*pi = (unsigned int)(*uentry) & 1;
+	*entry = compat_ptr((*uentry) & ~FUTEX_ROBUST_MOD_MASK);
+	*pflags = (unsigned int)(*uentry) & FUTEX_ROBUST_MOD_MASK;
 
 	return 0;
 }
 
 /*
- * Walk curr->robust_list (very carefully, it's a userspace list!)
+ * Walk curr->futex.robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
  */
 static void compat_exit_robust_list(struct task_struct *curr)
 {
-	struct compat_robust_list_head __user *head = curr->futex.compat_robust_list;
+	struct compat_robust_list_head __user *head = current->futex.compat_robust_list;
+	unsigned int limit = ROBUST_LIST_LIMIT, cur_mod, next_mod, pend_mod;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
-	unsigned int next_pi;
 	compat_uptr_t uentry, next_uentry, upending;
 	compat_long_t futex_offset;
 	int rc;
@@ -1255,7 +1254,7 @@ static void compat_exit_robust_list(stru
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi))
+	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &cur_mod))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1266,8 +1265,7 @@ static void compat_exit_robust_list(stru
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (compat_fetch_robust_entry(&upending, &pending,
-			       &head->list_op_pending, &pip))
+	if (compat_fetch_robust_entry(&upending, &pending, &head->list_op_pending, &pend_mod))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1277,7 +1275,7 @@ static void compat_exit_robust_list(stru
 		 * handle_futex_death:
 		 */
 		rc = compat_fetch_robust_entry(&next_uentry, &next_entry,
-			(compat_uptr_t __user *)&entry->next, &next_pi);
+			(compat_uptr_t __user *)&entry->next, &next_mod);
 		/*
 		 * A pending lock might already be on the list, so
 		 * dont process it twice:
@@ -1285,15 +1283,14 @@ static void compat_exit_robust_list(stru
 		if (entry != pending) {
 			void __user *uaddr = futex_uaddr(entry, futex_offset);
 
-			if (handle_futex_death(uaddr, curr, pi,
-					       HANDLE_DEATH_LIST))
+			if (handle_futex_death(uaddr, curr, cur_mod, HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
 			return;
 		uentry = next_uentry;
 		entry = next_entry;
-		pi = next_pi;
+		cur_mod = next_mod;
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
@@ -1305,7 +1302,7 @@ static void compat_exit_robust_list(stru
 	if (pending) {
 		void __user *uaddr = futex_uaddr(pending, futex_offset);
 
-		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
+		handle_futex_death(uaddr, curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
 #endif


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user()
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (2 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20  9:11   ` Peter Zijlstra
  2026-03-20 16:07   ` André Almeida
  2026-03-19 23:24 ` [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE Thomas Gleixner
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

The upcoming support for unlocking robust futexes in the kernel requires
store release semantics. Syscalls do not imply memory ordering on all
architectures so the unlock operation requires a barrier.

This barrier can be avoided when stores imply release like on x86.

Provide a generic version with a smp_mb() before the unsafe_put_user(),
which can be overridden by architectures.

Provide also a ARCH_STORE_IMPLIES_RELEASE Kconfig option, which can be
selected by architectures where store implies release, so that the smp_mb()
in the generic implementation can be avoided.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: New patch
---
 arch/Kconfig            |    4 ++++
 include/linux/uaccess.h |    9 +++++++++
 2 files changed, 13 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -403,6 +403,10 @@ config ARCH_32BIT_OFF_T
 config ARCH_32BIT_USTAT_F_TINODE
 	bool
 
+# Selected by architectures when plain stores have release semantics
+config ARCH_STORE_IMPLIES_RELEASE
+	bool
+
 config HAVE_ASM_MODVERSIONS
 	bool
 	help
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -644,6 +644,15 @@ static inline void user_access_restore(u
 #define user_read_access_end user_access_end
 #endif
 
+#ifndef unsafe_atomic_store_release_user
+# define unsafe_atomic_store_release_user(val, uptr, elbl)		\
+	do {								\
+		if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE))	\
+			smp_mb();					\
+		unsafe_put_user(val, uptr, elbl);			\
+	} while (0)
+#endif
+
 /* Define RW variant so the below _mode macro expansion works */
 #define masked_user_rw_access_begin(u)	masked_user_access_begin(u)
 #define user_rw_access_begin(u, s)	user_access_begin(u, s)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (3 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 16:08   ` André Almeida
  2026-03-19 23:24 ` [patch v2 06/11] futex: Cleanup UAPI defines Thomas Gleixner
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

The generic unsafe_atomic_store_release_user() implementation does:

    if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE))
        smp_mb();
    unsafe_put_user();

As stores on x86 imply release, select ARCH_STORE_IMPLIES_RELEASE to avoid
the unnecessary smp_mb().

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: New patch
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_STACKWALK
+	select ARCH_STORE_IMPLIES_RELEASE
 	select ARCH_SUPPORTS_ACPI
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 06/11] futex: Cleanup UAPI defines
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (4 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 16:09   ` André Almeida
  2026-03-19 23:24 ` [patch v2 07/11] futex: Add support for unlocking robust futexes Thomas Gleixner
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Make the operand defines tabular for readability sake.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: New patch
---
 include/uapi/linux/futex.h |   27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,23 +25,22 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
-#define FUTEX_CMD_MASK		~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
 
-#define FUTEX_WAIT_PRIVATE	(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_PRIVATE	(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_REQUEUE_PRIVATE	(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_OP_PRIVATE	(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI_PRIVATE	(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI2_PRIVATE	(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
-#define FUTEX_UNLOCK_PI_PRIVATE	(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+
+#define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_REQUEUE_PRIVATE		(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PRIVATE	(FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_OP_PRIVATE		(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI_PRIVATE		(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI2_PRIVATE		(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_PRIVATE		(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_TRYLOCK_PI_PRIVATE	(FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
 
 /*
  * Flags for futex2 syscalls.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 07/11] futex: Add support for unlocking robust futexes
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (5 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 06/11] futex: Cleanup UAPI defines Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 17:14   ` André Almeida
  2026-03-19 23:24 ` [patch v2 08/11] futex: Add robust futex unlock IP range Thomas Gleixner
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Unlocking robust non-PI futexes happens in user space with the following
sequence:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = 0;
  3)	lval = atomic_xchg(lock, lval);
  4)	if (lval & WAITERS)
  5)		sys_futex(WAKE,....);
  6)	robust_list_clear_op_pending();

That opens a window between #3 and #6 where the mutex could be acquired by
some other task which observes that it is the last user and:

  A) unmaps the mutex memory
  B) maps a different file, which ends up covering the same address

When the original task exits before reaching #6 then the kernel robust list
handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupting unrelated data.

PI futexes have a similar problem both for the non-contented user space
unlock and the in kernel unlock:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = gettid();
  3)	if (!atomic_try_cmpxchg(lock, lval, 0))
  4)		sys_futex(UNLOCK_PI,....);
  5)	robust_list_clear_op_pending();

Address the first part of the problem where the futexes have waiters and
need to enter the kernel anyway. Add a new FUTEX_ROBUST_UNLOCK flag, which
is valid for the sys_futex() FUTEX_UNLOCK_PI, FUTEX_WAKE, FUTEX_WAKE_BITSET
operations.

This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear
whether this is needed and there is no usage of it in glibc either to
investigate.

For the futex2 syscall family this needs to be implemented with a new
syscall.

The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
the kernel. This argument is only evaluated when the FUTEX_ROBUST_UNLOCK
bit is set and is therefore backward compatible.

This requires a second flag FUTEX_ROBUST_LIST32 which indicates that the
robust list pointer points to an u32 and not to an u64. This is required
for two reasons:

    1) sys_futex() has no compat variant

    2) The gaming emulators use both both 64-bit and compat 32-bit robust
       lists in the same 64-bit application

As a consequence 32-bit applications have to set this flag unconditionally
so they can run on a 64-bit kernel in compat mode unmodified. 32-bit
kernels return an error code when the flag is not set. 64-bit kernels will
happily clear the full 64 bits if user space fails to set it.

In case of FUTEX_UNLOCK_PI this clears the robust list pending op when the
unlock succeeded. In case of errors, the user space value is still locked
by the caller and therefore the above cannot happen.

In case of FUTEX_WAKE* this does the unlock of the futex in the kernel and
clears the robust list pending op when the unlock was successful. If not,
the user space value is still locked and user space has to deal with the
returned error. That means that the unlocking of non-PI robust futexes has
to use the same try_cmpxchg() unlock scheme as PI futexes.

If the clearing of the pending list op fails (fault) then the kernel clears
the registered robust list pointer if it matches to prevent that exit()
will try to handle invalid data. That's a valid paranoid decision because
the robust list head sits usually in the TLS and if the TLS is not longer
accessible then the chance for fixing up the resulting mess is very close
to zero.

The problem of non-contended unlocks still exists and will be addressed
separately.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: Use store release for unlock	- Andre, Peter
    Use a separate FLAG for 32bit lists	- Florian
    Add command defines
---
 include/uapi/linux/futex.h |   29 +++++++++++++++++++++++-
 io_uring/futex.c           |    2 -
 kernel/futex/core.c        |   53 +++++++++++++++++++++++++++++++++++++++++++--
 kernel/futex/futex.h       |   15 +++++++++++-
 kernel/futex/pi.c          |   15 +++++++++++-
 kernel/futex/syscalls.c    |   13 ++++++++---
 kernel/futex/waitwake.c    |   30 +++++++++++++++++++++++--
 7 files changed, 144 insertions(+), 13 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,8 +25,11 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
+#define FUTEX_UNLOCK_ROBUST	512
+#define FUTEX_ROBUST_LIST32	1024
 
-#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | \
+					  FUTEX_UNLOCK_ROBUST | FUTEX_ROBUST_LIST32)
 
 #define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
@@ -43,6 +46,30 @@
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
 
 /*
+ * Operations to unlock a futex, clear the robust list pending op pointer and
+ * wake waiters.
+ */
+#define FUTEX_UNLOCK_PI_LIST64			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_PI_LIST64_PRIVATE		(FUTEX_UNLOCK_PI_LIST64 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_LIST32			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_PI_LIST32_PRIVATE		(FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST64		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE	(FUTEX_UNLOCK_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST32		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE	(FUTEX_UNLOCK_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST64		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST32		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_BITSET_LIST32_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST32 | FUTEX_PRIVATE_FLAG)
+
+/*
  * Flags for futex2 syscalls.
  *
  * NOTE: these are not pure flags, they can also be seen as:
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -325,7 +325,7 @@ int io_futex_wake(struct io_kiocb *req,
 	 * Strict flags - ensure that waking 0 futexes yields a 0 result.
 	 * See commit 43adf8449510 ("futex: FLAGS_STRICT") for details.
 	 */
-	ret = futex_wake(iof->uaddr, FLAGS_STRICT | iof->futex_flags,
+	ret = futex_wake(iof->uaddr, FLAGS_STRICT | iof->futex_flags, NULL,
 			 iof->futex_val, iof->futex_mask);
 	if (ret < 0)
 		req_set_fail(req);
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1063,7 +1063,7 @@ static int handle_futex_death(u32 __user
 	owner = uval & FUTEX_TID_MASK;
 
 	if (pending_op && !pi && !owner) {
-		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, NULL, 1,
 			   FUTEX_BITSET_MATCH_ANY);
 		return 0;
 	}
@@ -1117,7 +1117,7 @@ static int handle_futex_death(u32 __user
 	 * PI futexes happens in exit_pi_state():
 	 */
 	if (!pi && (uval & FUTEX_WAITERS)) {
-		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, NULL, 1,
 			   FUTEX_BITSET_MATCH_ANY);
 	}
 
@@ -1209,6 +1209,27 @@ static void exit_robust_list(struct task
 	}
 }
 
+static bool robust_list_clear_pending(unsigned long __user *pop)
+{
+	struct robust_list_head __user *head = current->futex.robust_list;
+
+	if (!put_user(0UL, pop))
+		return true;
+
+	/*
+	 * Just give up. The robust list head is usually part of TLS, so the
+	 * chance that this gets resolved is close to zero.
+	 *
+	 * If @pop_addr is the robust_list_head::list_op_pending pointer then
+	 * clear the robust list head pointer to prevent further damage when the
+	 * task exits.  Better a few stale futexes than corrupted memory. But
+	 * that's mostly an academic exercise.
+	 */
+	if (pop == (unsigned long __user *)&head->list_op_pending)
+		current->futex.robust_list = NULL;
+	return false;
+}
+
 #ifdef CONFIG_COMPAT
 static void __user *futex_uaddr(struct robust_list __user *entry,
 				compat_long_t futex_offset)
@@ -1305,6 +1326,21 @@ static void compat_exit_robust_list(stru
 		handle_futex_death(uaddr, curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
+
+static bool compat_robust_list_clear_pending(u32 __user *pop)
+{
+	struct compat_robust_list_head __user *head = current->futex.compat_robust_list;
+
+	if (!put_user(0U, pop))
+		return true;
+
+	/* See comment in robust_list_clear_pending(). */
+	if (pop == &head->list_op_pending)
+		current->futex.compat_robust_list = NULL;
+	return false;
+}
+#else
+static bool compat_robust_list_clear_pending(u32 __user *pop_addr) { return false; }
 #endif
 
 #ifdef CONFIG_FUTEX_PI
@@ -1398,6 +1434,19 @@ static void exit_pi_state_list(struct ta
 static inline void exit_pi_state_list(struct task_struct *curr) { }
 #endif
 
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
+{
+	bool size32bit = !!(flags & FLAGS_ROBUST_LIST32);
+
+	if (!IS_ENABLED(CONFIG_64BIT) && !size32bit)
+		return false;
+
+	if (IS_ENABLED(CONFIG_64BIT) && size32bit)
+		return compat_robust_list_clear_pending(pop);
+
+	return robust_list_clear_pending(pop);
+}
+
 static void futex_cleanup(struct task_struct *tsk)
 {
 	if (unlikely(tsk->futex.robust_list)) {
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -40,6 +40,8 @@
 #define FLAGS_NUMA		0x0080
 #define FLAGS_STRICT		0x0100
 #define FLAGS_MPOL		0x0200
+#define FLAGS_UNLOCK_ROBUST	0x0400
+#define FLAGS_ROBUST_LIST32	0x0800
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
@@ -52,6 +54,12 @@ static inline unsigned int futex_to_flag
 	if (op & FUTEX_CLOCK_REALTIME)
 		flags |= FLAGS_CLOCKRT;
 
+	if (op & FUTEX_UNLOCK_ROBUST)
+		flags |= FLAGS_UNLOCK_ROBUST;
+
+	if (op & FUTEX_ROBUST_LIST32)
+		flags |= FLAGS_ROBUST_LIST32;
+
 	return flags;
 }
 
@@ -438,13 +446,16 @@ extern int futex_unqueue_multiple(struct
 extern int futex_wait_multiple(struct futex_vector *vs, unsigned int count,
 			       struct hrtimer_sleeper *to);
 
-extern int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset);
+extern int futex_wake(u32 __user *uaddr, unsigned int flags, void __user *pop,
+		      int nr_wake, u32 bitset);
 
 extern int futex_wake_op(u32 __user *uaddr1, unsigned int flags,
 			 u32 __user *uaddr2, int nr_wake, int nr_wake2, int op);
 
-extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags);
+extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop);
 
 extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock);
 
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags);
+
 #endif /* _FUTEX_H */
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -1129,7 +1129,7 @@ int futex_lock_pi(u32 __user *uaddr, uns
  * This is the in-kernel slowpath: we look up the PI state (if any),
  * and do the rt-mutex unlock.
  */
-int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
+static int __futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 {
 	u32 curval, uval, vpid = task_pid_vnr(current);
 	union futex_key key = FUTEX_KEY_INIT;
@@ -1138,7 +1138,6 @@ int futex_unlock_pi(u32 __user *uaddr, u
 
 	if (!IS_ENABLED(CONFIG_FUTEX_PI))
 		return -ENOSYS;
-
 retry:
 	if (get_user(uval, uaddr))
 		return -EFAULT;
@@ -1292,3 +1291,15 @@ int futex_unlock_pi(u32 __user *uaddr, u
 	return ret;
 }
 
+int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop)
+{
+	int ret = __futex_unlock_pi(uaddr, flags);
+
+	if (ret || !(flags & FLAGS_UNLOCK_ROBUST))
+		return ret;
+
+	if (!futex_robust_list_clear_pending(pop, flags))
+		return -EFAULT;
+
+	return 0;
+}
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -118,6 +118,13 @@ long do_futex(u32 __user *uaddr, int op,
 			return -ENOSYS;
 	}
 
+	if (flags & FLAGS_UNLOCK_ROBUST) {
+		if (cmd != FUTEX_WAKE &&
+		    cmd != FUTEX_WAKE_BITSET &&
+		    cmd != FUTEX_UNLOCK_PI)
+			return -ENOSYS;
+	}
+
 	switch (cmd) {
 	case FUTEX_WAIT:
 		val3 = FUTEX_BITSET_MATCH_ANY;
@@ -128,7 +135,7 @@ long do_futex(u32 __user *uaddr, int op,
 		val3 = FUTEX_BITSET_MATCH_ANY;
 		fallthrough;
 	case FUTEX_WAKE_BITSET:
-		return futex_wake(uaddr, flags, val, val3);
+		return futex_wake(uaddr, flags, uaddr2, val, val3);
 	case FUTEX_REQUEUE:
 		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0);
 	case FUTEX_CMP_REQUEUE:
@@ -141,7 +148,7 @@ long do_futex(u32 __user *uaddr, int op,
 	case FUTEX_LOCK_PI2:
 		return futex_lock_pi(uaddr, flags, timeout, 0);
 	case FUTEX_UNLOCK_PI:
-		return futex_unlock_pi(uaddr, flags);
+		return futex_unlock_pi(uaddr, flags, uaddr2);
 	case FUTEX_TRYLOCK_PI:
 		return futex_lock_pi(uaddr, flags, NULL, 1);
 	case FUTEX_WAIT_REQUEUE_PI:
@@ -375,7 +382,7 @@ SYSCALL_DEFINE4(futex_wake,
 	if (!futex_validate_input(flags, mask))
 		return -EINVAL;
 
-	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
+	return futex_wake(uaddr, FLAGS_STRICT | flags, NULL, nr, mask);
 }
 
 /*
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -150,12 +150,35 @@ void futex_wake_mark(struct wake_q_head
 }
 
 /*
+ * If requested, clear the robust list pending op and unlock the futex
+ */
+static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __user *pop)
+{
+	if (!(flags & FLAGS_UNLOCK_ROBUST))
+		return true;
+
+	/* First unlock the futex, which requires release semantics. */
+	scoped_user_write_access(uaddr, efault)
+		unsafe_atomic_store_release_user(0, uaddr, efault);
+
+	/*
+	 * Clear the pending list op now. If that fails, then the task is in
+	 * deeper trouble as the robust list head is usually part of the TLS.
+	 * The chance of survival is close to zero.
+	 */
+	return futex_robust_list_clear_pending(pop, flags);
+
+efault:
+	return false;
+}
+
+/*
  * Wake up waiters matching bitset queued on this futex (uaddr).
  */
-int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
+int futex_wake(u32 __user *uaddr, unsigned int flags, void __user *pop, int nr_wake, u32 bitset)
 {
-	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
+	struct futex_q *this, *next;
 	DEFINE_WAKE_Q(wake_q);
 	int ret;
 
@@ -166,6 +189,9 @@ int futex_wake(u32 __user *uaddr, unsign
 	if (unlikely(ret != 0))
 		return ret;
 
+	if (!futex_robust_unlock(uaddr, flags, pop))
+		return -EFAULT;
+
 	if ((flags & FLAGS_STRICT) && !nr_wake)
 		return 0;
 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 08/11] futex: Add robust futex unlock IP range
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (6 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 07/11] futex: Add support for unlocking robust futexes Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20  9:07   ` Peter Zijlstra
  2026-03-27 13:24   ` Sebastian Andrzej Siewior
  2026-03-19 23:24 ` [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

There will be a VDSO function to unlock robust futexes in user space. The
unlock sequence is racy vs. clearing the list_pending_op pointer in the
tasks robust list head. To plug this race the kernel needs to know the
instruction window. As the VDSO is per MM the addresses are stored in
mm_struct::futex.

Architectures which implement support for this have to update these
addresses when the VDSO is (re)mapped and indicate the pending op pointer
size which is matching the IP.

Arguably this could be resolved by chasing mm->context->vdso->image, but
that's architecture specific and requires to touch quite some cache
lines. Having it in mm::futex reduces the cache line impact and avoids
having yet another set of architecture specific functionality.

To support multi size robust list applications (gaming) this provides two
ranges.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: Store ranges in a struct with size information and allow up to two ranges.
---
 include/linux/futex_types.h |   22 ++++++++++++++++++++++
 include/linux/mm_types.h    |    1 +
 init/Kconfig                |    6 ++++++
 3 files changed, 29 insertions(+)

--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -31,6 +31,20 @@ struct futex_sched_data {
 };
 
 /**
+ * struct futex_unlock_cs_range - Range for the VDSO unlock critical section
+ * @start_ip:	The start IP of the robust futex unlock critical section (inclusive)
+ * @end_ip:	The end IP of the robust futex unlock critical section (exclusive)
+ * @pop_size32:	Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
+ */
+struct futex_unlock_cs_range {
+	unsigned long	       start_ip;
+	unsigned long	       end_ip;
+	unsigned int	       pop_size32;
+};
+
+#define FUTEX_ROBUST_MAX_CS_RANGES	2
+
+/**
  * struct futex_mm_data - Futex related per MM data
  * @phash_lock:			Mutex to protect the private hash operations
  * @phash:			RCU managed pointer to the private hash
@@ -39,6 +53,10 @@ struct futex_sched_data {
  * @phash_rcu:			RCU head for call_rcu()
  * @phash_atomic:		Aggregate value for @phash_ref
  * @phash_ref:			Per CPU reference counter for a private hash
+ *
+ * @unlock_cs_num_ranges:	The number of critical section ranges for VDSO assisted unlock
+ *				of robust futexes.
+ * @unlock_cs_ranges:		The critical section ranges for VDSO assisted unlock
  */
 struct futex_mm_data {
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
@@ -50,6 +68,10 @@ struct futex_mm_data {
 	atomic_long_t			phash_atomic;
 	unsigned int			__percpu *phash_ref;
 #endif
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+	unsigned int			unlock_cs_num_ranges;
+	struct futex_unlock_cs_range	unlock_cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
+#endif
 };
 
 #else
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/rseq_types.h>
 #include <linux/bitmap.h>
+#include <linux/futex_types.h>
 
 #include <asm/mmu.h>
 
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1822,6 +1822,12 @@ config FUTEX_MPOL
 	depends on FUTEX && NUMA
 	default y
 
+config HAVE_FUTEX_ROBUST_UNLOCK
+	bool
+
+config FUTEX_ROBUST_UNLOCK
+	def_bool FUTEX && HAVE_GENERIC_VDSO && GENERIC_IRQ_ENTRY && RSEQ && HAVE_FUTEX_ROBUST_UNLOCK
+
 config EPOLL
 	bool "Enable eventpoll support" if EXPERT
 	default y


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (7 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 08/11] futex: Add robust futex unlock IP range Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-20 13:35   ` Thomas Gleixner
  2026-03-19 23:24 ` [patch v2 10/11] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in user space looks like this:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = gettid();
  3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4)		robust_list_clear_op_pending();
  	else
  5)		sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task, which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

On X86 this boils down to this simplified assembly sequence:

		mov		%esi,%eax	// Load TID into EAX
        	xor		%ecx,%ecx	// Set ECX to 0
   #3		lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
	.Lstart:
		jnz    		.Lend
   #4 		movq		%rcx,(%rdx)	// Clear list_op_pending
	.Lend:

If the cmpxchg() succeeds and the task is interrupted before it can clear
list_op_pending in the robust list head (#4) and the task crashes in a
signal handler or gets killed then it ends up in do_exit() and subsequently
in the robust list handling, which then might run into the unmap/map issue
described above.

This is only relevant when user space was interrupted and a signal is
pending. The fix-up has to be done before signal delivery is attempted
because:

   1) The signal might be fatal so get_signal() ends up in do_exit()

   2) The signal handler might crash or the task is killed before returning
      from the handler. At that point the instruction pointer in pt_regs is
      not longer the instruction pointer of the initially interrupted unlock
      sequence.

The right place to handle this is in __exit_to_user_mode_loop() before
invoking arch_do_signal_or_restart() as this covers obviously both
scenarios.

As this is only relevant when the task was interrupted in user space, this
is tied to RSEQ and the generic entry code as RSEQ keeps track of user
space interrupts unconditionally even if the task does not have a RSEQ
region installed. That makes the decision very lightweight:

       if (current->rseq.user_irq && within(regs, csr->unlock_ip_range))
       		futex_fixup_robust_unlock(regs, csr);

futex_fixup_robust_unlock() then invokes a architecture specific function
to returen the pending op pointer or NULL. The function evaluates the
register content to decide whether the pending ops pointer in the robust
list head needs to be cleared.

Assuming the above unlock sequence, then on x86 this decision is the
trivial evaluation of the zero flag:

	return regs->eflags & X86_EFLAGS_ZF ? regs->dx : NULL;

Other architectures might need to do more complex evaluations due to LLSC,
but the approach is valid in general. The size of the pointer is determined
from the matching range struct, which covers both 32-bit and 64-bit builds
including COMPAT.

The unlock sequence is going to be placed in the VDSO so that the kernel
can keep everything synchronized, especially the register usage. The
resulting code sequence for user space is:

   if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid)
 	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);

Both the VDSO unlock and the kernel side unlock ensure that the pending_op
pointer is always cleared when the lock becomes unlocked.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: Convert to the struct range storage and simplify the fixup logic
---
 include/linux/futex.h |   42 +++++++++++++++++++++++++++++++++++++++-
 include/vdso/futex.h  |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/entry/common.c |    9 +++++---
 kernel/futex/core.c   |   14 +++++++++++++
 4 files changed, 113 insertions(+), 4 deletions(-)

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -110,7 +110,47 @@ static inline int futex_hash_allocate_de
 }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
 static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
+#endif /* !CONFIG_FUTEX */
 
-#endif
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+#include <asm/futex_robust.h>
+
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr);
+
+static inline bool futex_within_robust_unlock(struct pt_regs *regs,
+					      struct futex_unlock_cs_range *csr)
+{
+	unsigned long ip = instruction_pointer(regs);
+
+	return ip >= csr->start_ip && ip < csr->end_ip;
+}
+
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
+{
+	struct futex_unlock_cs_range *csr;
+
+	/*
+	 * Avoid dereferencing current->mm if not returning from interrupt.
+	 * current->rseq.event is going to be used anyway in the exit to user
+	 * code, so bringing it in is not a big deal.
+	 */
+	if (!current->rseq.event.user_irq)
+		return;
+
+	csr = current->mm->futex.unlock_cs_ranges;
+	if (unlikely(futex_within_robust_unlock(regs, csr))) {
+		__futex_fixup_robust_unlock(regs, csr);
+		return;
+	}
+
+	/* Multi sized robust lists are only supported with CONFIG_COMPAT */
+	if (IS_ENABLED(CONFIG_COMPAT) && current->mm->futex.unlock_cs_num_ranges == 2) {
+		if (unlikely(futex_within_robust_unlock(regs, ++csr)))
+			__futex_fixup_robust_unlock(regs, csr);
+	}
+}
+#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs) {}
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
 
 #endif
--- /dev/null
+++ b/include/vdso/futex.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _VDSO_FUTEX_H
+#define _VDSO_FUTEX_H
+
+#include <uapi/linux/types.h>
+
+/**
+ * __vdso_futex_robust_list64_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 64-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * The function implements:
+ *	if (atomic_try_cmpxchg(lock, &tid, 0))
+ *		*op = NULL;
+ *	return tid;
+ *
+ * There is a race between a successful unlock and clearing the pending op
+ * pointer in the robust list head. If the calling task is interrupted in the
+ * race window and has to handle a (fatal) signal on return to user space then
+ * the kernel handles the clearing of @pending_op before attempting to deliver
+ * the signal. That ensures that a task cannot exit with a potentially invalid
+ * pending op pointer.
+ *
+ * User space uses it in the following way:
+ *
+ * if (__vdso_futex_robust_list64_try_unlock(lock, tid, &pending_op) != tid)
+ *	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
+ *
+ * If the unlock attempt fails due to the FUTEX_WAITERS bit set in the lock,
+ * then the syscall does the unlock, clears the pending op pointer and wakes the
+ * requested number of waiters.
+ */
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop);
+
+/**
+ * __vdso_futex_robust_list32_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 32-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * Same as __vdso_futex_robust_list64_try_unlock() just with a 32-bit @pop pointer.
+ */
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop);
+
+#endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -1,11 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include <linux/irq-entry-common.h>
-#include <linux/resume_user_mode.h>
+#include <linux/futex.h>
 #include <linux/highmem.h>
+#include <linux/irq-entry-common.h>
 #include <linux/jump_label.h>
 #include <linux/kmsan.h>
 #include <linux/livepatch.h>
+#include <linux/resume_user_mode.h>
 #include <linux/tick.h>
 
 /* Workaround to allow gradual conversion of architecture code */
@@ -60,8 +61,10 @@ static __always_inline unsigned long __e
 		if (ti_work & _TIF_PATCH_PENDING)
 			klp_update_patch_state(current);
 
-		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
+		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
+			futex_fixup_robust_unlock(regs);
 			arch_do_signal_or_restart(regs);
+		}
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
 			resume_user_mode_work(regs);
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -46,6 +46,8 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <vdso/futex.h>
+
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
@@ -1447,6 +1449,18 @@ bool futex_robust_list_clear_pending(voi
 	return robust_list_clear_pending(pop);
 }
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr)
+{
+	void __user *pop = arch_futex_robust_unlock_get_pop(regs);
+
+	if (!pop)
+		return;
+
+	futex_robust_list_clear_pending(pop, csr->cs_pop_size32 ? FLAGS_ROBUST_LIST32 : 0);
+}
+#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+
 static void futex_cleanup(struct task_struct *tsk)
 {
 	if (unlikely(tsk->futex.robust_list)) {


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 10/11] x86/vdso: Prepare for robust futex unlock support
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (8 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
@ 2026-03-19 23:24 ` Thomas Gleixner
  2026-03-19 23:25 ` [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
  2026-03-26 21:59 ` [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
  11 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:24 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

There will be a VDSO function to unlock non-contended robust futexes in
user space. The unlock sequence is racy vs. clearing the list_pending_op
pointer in the task's robust list head. To plug this race the kernel needs
to know the critical section window so it can clear the pointer when the
task is interrupted within that race window. The window is determined by
labels in the inline assembly.

Add these symbols to the vdso2c generator and use them in the VDSO VMA code
to update the critical section addresses in mm_struct::futex on (re)map().

The symbols are not exported to user space, but available in the debug
version of the vDSO.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: Rename the symbols
---
 arch/x86/entry/vdso/vma.c   |   35 +++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h |    6 ++++++
 arch/x86/tools/vdso2c.c     |   20 +++++++++++++-------
 3 files changed, 54 insertions(+), 7 deletions(-)

--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -73,6 +73,38 @@ static void vdso_fix_landing(const struc
 		regs->ip = new_vma->vm_start + ipoffset;
 }
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void vdso_futex_robust_unlock_update_ips(void)
+{
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	unsigned long vdso = (unsigned long) current->mm->context.vdso;
+	struct futex_mm_data *fd = &current->mm->futex;
+	struct futex_unlock_cs_range *csr = fd->unlock_cs_ranges;
+
+	fd->unlock_cs_num_ranges = 0;
+#ifdef CONFIG_X86_64
+	if (image->sym_x86_64_futex_try_unlock_cs_start) {
+		csr->start_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_start;
+		csr->end_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_end;
+		csr->pop_size32 = 0;
+		csr++;
+		fd->unlock_cs_num_ranges++;
+	}
+#endif /* CONFIG_X86_64 */
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
+	if (image->sym_x86_32_futex_try_unlock_cs_start) {
+		csr->start_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_start;
+		csr->end_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_end;
+		csr->pop_size32 = 1;
+		fd->unlock_cs_num_ranges++;
+	}
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
+}
+#else
+static inline void vdso_futex_robust_unlock_update_ips(void) { }
+#endif
+
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		struct vm_area_struct *new_vma)
 {
@@ -80,6 +112,7 @@ static int vdso_mremap(const struct vm_s
 
 	vdso_fix_landing(image, new_vma);
 	current->mm->context.vdso = (void __user *)new_vma->vm_start;
+	vdso_futex_robust_unlock_update_ips();
 
 	return 0;
 }
@@ -189,6 +222,8 @@ static int map_vdso(const struct vdso_im
 	current->mm->context.vdso = (void __user *)text_start;
 	current->mm->context.vdso_image = image;
 
+	vdso_futex_robust_unlock_update_ips();
+
 up_fail:
 	mmap_write_unlock(mm);
 	return ret;
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -25,6 +25,12 @@ struct vdso_image {
 	long sym_int80_landing_pad;
 	long sym_vdso32_sigreturn_landing_pad;
 	long sym_vdso32_rt_sigreturn_landing_pad;
+	long sym_x86_64_futex_try_unlock_cs_start;
+	long sym_x86_64_futex_try_unlock_cs_end;
+	long sym_x86_64_compat_futex_try_unlock_cs_start;
+	long sym_x86_64_compat_futex_try_unlock_cs_end;
+	long sym_x86_32_futex_try_unlock_cs_start;
+	long sym_x86_32_futex_try_unlock_cs_end;
 };
 
 extern const struct vdso_image vdso64_image;
--- a/arch/x86/tools/vdso2c.c
+++ b/arch/x86/tools/vdso2c.c
@@ -75,13 +75,19 @@ struct vdso_sym {
 };
 
 struct vdso_sym required_syms[] = {
-	{"VDSO32_NOTE_MASK", true},
-	{"__kernel_vsyscall", true},
-	{"__kernel_sigreturn", true},
-	{"__kernel_rt_sigreturn", true},
-	{"int80_landing_pad", true},
-	{"vdso32_rt_sigreturn_landing_pad", true},
-	{"vdso32_sigreturn_landing_pad", true},
+	{"VDSO32_NOTE_MASK",				true},
+	{"__kernel_vsyscall",				true},
+	{"__kernel_sigreturn",				true},
+	{"__kernel_rt_sigreturn",			true},
+	{"int80_landing_pad",				true},
+	{"vdso32_rt_sigreturn_landing_pad",		true},
+	{"vdso32_sigreturn_landing_pad",		true},
+	{"x86_64_futex_try_unlock_cs_start",		true},
+	{"x86_64_futex_try_unlock_cs_end",		true},
+	{"x86_64_compat_futex_try_unlock_cs_start",	true},
+	{"x86_64_compat_futex_try_unlock_cs_end",	true},
+	{"x86_32_futex_try_unlock_cs_start",		true},
+	{"x86_32_futex_try_unlock_cs_end",		true},
 };
 
 __attribute__((format(printf, 1, 2))) __attribute__((noreturn))


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock()
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (9 preceding siblings ...)
  2026-03-19 23:24 ` [patch v2 10/11] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
@ 2026-03-19 23:25 ` Thomas Gleixner
  2026-03-20  7:14   ` Uros Bizjak
  2026-03-26 21:59 ` [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-19 23:25 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in userspace looks like this:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = gettid();
  3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4)		robust_list_clear_op_pending();
  	else
  5)		sys_futex(OP,...FUTEX_ROBUST_UNLOCK);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

Provide a VDSO function which exposes the critical section window in the
VDSO symbol table. The resulting addresses are updated in the task's mm
when the VDSO is (re)map()'ed.

The core code detects when a task was interrupted within the critical
section and is about to deliver a signal. It then invokes an architecture
specific function which determines whether the pending op pointer has to be
cleared or not. The unlock assembly sequence on 64-bit is:

	mov		%esi,%eax	// Load TID into EAX
       	xor		%ecx,%ecx	// Set ECX to 0
	lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
  .Lstart:
	jnz    		.Lend
	movq		%rcx,(%rdx)	// Clear list_op_pending
  .Lend:
	ret

So the decision can be simply based on the ZF state in regs->flags. The
pending op pointer is always in DX independent of the build mode
(32/64-bit) to make the pending op pointer retrieval uniform. The size of
the pointer is stored in the matching criticial section range struct and
the core code retrieves it from there. So the pointer retrieval function
does not have to care. It is bit-size independent:

     return regs->flags & X86_EFLAGS_ZF ? regs->dx : NULL;

There are two entry points to handle the different robust list pending op
pointer size:

	__vdso_futex_robust_list64_try_unlock()
	__vdso_futex_robust_list32_try_unlock()

The 32-bit VDSO provides only __vdso_futex_robust_list32_try_unlock().

The 64-bit VDSO provides always __vdso_futex_robust_list64_try_unlock() and
when COMPAT is enabled also the list32 variant, which is required to
support multi-size robust list pointers used by gaming emulators.

The unlock function is inspired by an idea from Mathieu Desnoyers.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://lore.kernel.org/20260311185409.1988269-1-mathieu.desnoyers@efficios.com
--
V2: Provide different entry points	- Florian
    Use __u32 and __x86_64__		- Thomas
    Use private labels			- Thomas
    Optimize assembly		   	- Uros
    
    Split the functions up now that ranges are supported in the core and
    document the actual assembly.
---
 arch/x86/Kconfig                         |    1 
 arch/x86/entry/vdso/common/vfutex.c      |   76 +++++++++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso32/Makefile      |    5 +-
 arch/x86/entry/vdso/vdso32/vdso32.lds.S  |    3 +
 arch/x86/entry/vdso/vdso32/vfutex.c      |    1 
 arch/x86/entry/vdso/vdso64/Makefile      |    7 +-
 arch/x86/entry/vdso/vdso64/vdso64.lds.S  |    4 +
 arch/x86/entry/vdso/vdso64/vdsox32.lds.S |    4 +
 arch/x86/entry/vdso/vdso64/vfutex.c      |    1 
 arch/x86/include/asm/futex_robust.h      |   19 +++++++
 10 files changed, 116 insertions(+), 5 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -238,6 +238,7 @@ config X86
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_EISA			if X86_32
 	select HAVE_EXIT_THREAD
+	select HAVE_FUTEX_ROBUST_UNLOCK
 	select HAVE_GENERIC_TIF_BITS
 	select HAVE_GUP_FAST
 	select HAVE_FENTRY			if X86_64 || DYNAMIC_FTRACE
--- /dev/null
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <vdso/futex.h>
+
+/*
+ * Assembly template for the try unlock functions. The basic functionality is:
+ *
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		lock_cmpxchgl	%ecx, (%rdi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		movq		%rcx, (%rdx)	Set the pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_success is
+ * technically not required, but there for illustration, debugging and testing.
+ *
+ * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functions.
+ * One for the regular 64-bit sized pending operation pointer and one for a
+ * 32-bit sized pointer to support gaming emulators.
+ *
+ * The 32-bit VDSO provides only the one for 32-bit sized pointers.
+ */
+#define __stringify_1(x...)	#x
+#define __stringify(x...)	__stringify_1(x)
+
+#define LABEL(name, which)	__stringify(name##_futex_try_unlock_cs_##which:)
+
+#define JNZ_END(name)		"jnz " __stringify(name) "_futex_try_unlock_cs_end\n"
+
+#define CLEAR_POPQ		"movq	%[zero],  %a[pop]\n"
+#define CLEAR_POPL		"movl	%k[zero], %a[pop]\n"
+
+#define futex_robust_try_unlock(name, clear_pop, __lock, __tid, __pop)	\
+({									\
+	asm volatile (							\
+		"						\n"	\
+		"	lock cmpxchgl	%k[zero], %a[lock]	\n"	\
+		"						\n"	\
+		LABEL(name, start)					\
+		"						\n"	\
+		JNZ_END(name)						\
+		"						\n"	\
+		LABEL(name, success)					\
+		"						\n"	\
+			clear_pop					\
+		"						\n"	\
+		LABEL(name, end)					\
+		: [tid]   "+&a" (__tid)					\
+		: [lock]  "D"   (__lock),				\
+		  [pop]   "d"   (__pop),				\
+		  [zero]  "S"   (0UL)					\
+		: "memory"						\
+	);								\
+	__tid;								\
+})
+
+#ifdef __x86_64__
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+{
+	return futex_robust_try_unlock(x86_64, CLEAR_POPQ, lock, tid, pop);
+}
+
+#ifdef CONFIG_COMPAT
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	return futex_robust_try_unlock(x86_64_compat, CLEAR_POPL, lock, tid, pop);
+}
+#endif /* CONFIG_COMPAT */
+#else  /* __x86_64__ */
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	return futex_robust_try_unlock(x86_32, CLEAR_POPL, lock, tid, pop);
+}
+#endif /* !__x86_64__ */
--- a/arch/x86/entry/vdso/vdso32/Makefile
+++ b/arch/x86/entry/vdso/vdso32/Makefile
@@ -7,8 +7,9 @@
 vdsos-y			:= 32
 
 # Files to link into the vDSO:
-vobjs-y			:= note.o vclock_gettime.o vgetcpu.o
-vobjs-y			+= system_call.o sigreturn.o
+vobjs-y					:= note.o vclock_gettime.o vgetcpu.o
+vobjs-y					+= system_call.o sigreturn.o
+vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK)	+= vfutex.o
 
 # Compilation flags
 flags-y			:= -DBUILD_VDSO32 -m32 -mregparm=0
--- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
@@ -30,6 +30,9 @@ VERSION
 		__vdso_clock_gettime64;
 		__vdso_clock_getres_time64;
 		__vdso_getcpu;
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list32_try_unlock;
+#endif
 	};
 
 	LINUX_2.5 {
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso32/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
--- a/arch/x86/entry/vdso/vdso64/Makefile
+++ b/arch/x86/entry/vdso/vdso64/Makefile
@@ -8,9 +8,10 @@ vdsos-y				:= 64
 vdsos-$(CONFIG_X86_X32_ABI)	+= x32
 
 # Files to link into the vDSO:
-vobjs-y				:= note.o vclock_gettime.o vgetcpu.o
-vobjs-y				+= vgetrandom.o vgetrandom-chacha.o
-vobjs-$(CONFIG_X86_SGX)		+= vsgx.o
+vobjs-y					:= note.o vclock_gettime.o vgetcpu.o
+vobjs-y					+= vgetrandom.o vgetrandom-chacha.o
+vobjs-$(CONFIG_X86_SGX)			+= vsgx.o
+vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK)	+= vfutex.o
 
 # Compilation flags
 flags-y				:= -DBUILD_VDSO64 -m64 -mcmodel=small
--- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
@@ -32,6 +32,10 @@ VERSION {
 #endif
 		getrandom;
 		__vdso_getrandom;
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list64_try_unlock;
+		__vdso_futex_robust_list32_try_unlock;
+#endif
 	local: *;
 	};
 }
--- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
@@ -22,6 +22,10 @@ VERSION {
 		__vdso_getcpu;
 		__vdso_time;
 		__vdso_clock_getres;
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list64_try_unlock;
+		__vdso_futex_robust_list32_try_unlock;
+#endif
 	local: *;
 	};
 }
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso64/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
--- /dev/null
+++ b/arch/x86/include/asm/futex_robust.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_FUTEX_ROBUST_H
+#define _ASM_X86_FUTEX_ROBUST_H
+
+#include <asm/ptrace.h>
+
+static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs)
+{
+	/*
+	 * If ZF is set then the cmpxchg succeeded and the pending op pointer
+	 * needs to be cleared.
+	 */
+	return regs->flags & X86_EFLAGS_ZF ? (void __user *)regs->dx : NULL;
+}
+
+#define arch_futex_robust_unlock_get_pop(regs)	\
+	x86_futex_robust_unlock_get_pop(regs)
+
+#endif /* _ASM_X86_FUTEX_ROBUST_H */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock()
  2026-03-19 23:25 ` [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
@ 2026-03-20  7:14   ` Uros Bizjak
  2026-03-20 12:48     ` Thomas Gleixner
  0 siblings, 1 reply; 35+ messages in thread
From: Uros Bizjak @ 2026-03-20  7:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Peter Zijlstra,
	Florian Weimer, Rich Felker, Torvald Riegel, Darren Hart,
	Ingo Molnar, Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett,
	Thomas Weißschuh

On Fri, Mar 20, 2026 at 12:25 AM Thomas Gleixner <tglx@kernel.org> wrote:

>         mov             %esi,%eax       // Load TID into EAX
>         xor             %ecx,%ecx       // Set ECX to 0
>         lock cmpxchg    %ecx,(%rdi)     // Try the TID -> 0 transition
>   .Lstart:
>         jnz             .Lend
>         movq            %rcx,(%rdx)     // Clear list_op_pending
>   .Lend:
>         ret

[...]

> + * Assembly template for the try unlock functions. The basic functionality is:
> + *
> + *             mov             esi, %eax       Move the TID into EAX
> + *             xor             %ecx, %ecx      Clear ECX
> + *             lock_cmpxchgl   %ecx, (%rdi)    Attempt the TID -> 0 transition
> + * .Lcs_start:                                 Start of the critical section
> + *             jnz             .Lcs_end        If cmpxchl failed jump to the end
> + * .Lcs_success:                               Start of the success section
> + *             movq            %rcx, (%rdx)    Set the pending op pointer to 0
> + * .Lcs_end:                                   End of the critical section
> + *
> + * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_success is
> + * technically not required, but there for illustration, debugging and testing.
> + *
> + * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functions.
> + * One for the regular 64-bit sized pending operation pointer and one for a
> + * 32-bit sized pointer to support gaming emulators.
> + *
> + * The 32-bit VDSO provides only the one for 32-bit sized pointers.
> + */
> +#define __stringify_1(x...)    #x
> +#define __stringify(x...)      __stringify_1(x)
> +
> +#define LABEL(name, which)     __stringify(name##_futex_try_unlock_cs_##which:)
> +
> +#define JNZ_END(name)          "jnz " __stringify(name) "_futex_try_unlock_cs_end\n"
> +
> +#define CLEAR_POPQ             "movq   %[zero],  %a[pop]\n"
> +#define CLEAR_POPL             "movl   %k[zero], %a[pop]\n"
> +
> +#define futex_robust_try_unlock(name, clear_pop, __lock, __tid, __pop) \
> +({                                                                     \
> +       asm volatile (                                                  \
> +               "                                               \n"     \
> +               "       lock cmpxchgl   %k[zero], %a[lock]      \n"     \
> +               "                                               \n"     \
> +               LABEL(name, start)                                      \
> +               "                                               \n"     \
> +               JNZ_END(name)                                           \
> +               "                                               \n"     \
> +               LABEL(name, success)                                    \
> +               "                                               \n"     \
> +                       clear_pop                                       \
> +               "                                               \n"     \
> +               LABEL(name, end)                                        \
> +               : [tid]   "+&a" (__tid)                                 \
> +               : [lock]  "D"   (__lock),                               \
> +                 [pop]   "d"   (__pop),                                \
> +                 [zero]  "S"   (0UL)                                   \

[zero] represents an internal register, so the above constraint can be
"r" (*). If it remains a hard register constraint (%rsi) for some
reason then the above two comments should be updated to reflect the
new constraint.

With the constraint changed to "r":

Acked-by: Uros Bizjak <ubizjak@gmail.com> (for asm template)

(*) "r" allows the compiler some more freedom. The compiler tracks the
values in registers, so it can reuse zero from an unrelated register
without moving it to the %rsi and without clobbering the source
register. In non-trivial functions, there is a high chance that needed
value is already available in some register.

Uros.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 08/11] futex: Add robust futex unlock IP range
  2026-03-19 23:24 ` [patch v2 08/11] futex: Add robust futex unlock IP range Thomas Gleixner
@ 2026-03-20  9:07   ` Peter Zijlstra
  2026-03-20 12:07     ` Thomas Gleixner
  2026-03-27 13:24   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2026-03-20  9:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20, 2026 at 12:24:46AM +0100, Thomas Gleixner wrote:
>  /**
> + * struct futex_unlock_cs_range - Range for the VDSO unlock critical section
> + * @start_ip:	The start IP of the robust futex unlock critical section (inclusive)
> + * @end_ip:	The end IP of the robust futex unlock critical section (exclusive)
> + * @pop_size32:	Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
> + */
> +struct futex_unlock_cs_range {
> +	unsigned long	       start_ip;
> +	unsigned long	       end_ip;
> +	unsigned int	       pop_size32;
> +};
> +
> +#define FUTEX_ROBUST_MAX_CS_RANGES	2

Would it make sense to write that like:

#define FUTEX_ROBUST_MAX_CS_RANGE (1+IS_ENABLED(CONFIG_COMPAT))

Given you only ever use that second entry when COMPAT?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user()
  2026-03-19 23:24 ` [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
@ 2026-03-20  9:11   ` Peter Zijlstra
  2026-03-20 12:38     ` Thomas Gleixner
  2026-03-20 16:07   ` André Almeida
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2026-03-20  9:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20, 2026 at 12:24:25AM +0100, Thomas Gleixner wrote:

> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -403,6 +403,10 @@ config ARCH_32BIT_OFF_T
>  config ARCH_32BIT_USTAT_F_TINODE
>  	bool
>  
> +# Selected by architectures when plain stores have release semantics
> +config ARCH_STORE_IMPLIES_RELEASE
> +	bool
> +
>  config HAVE_ASM_MODVERSIONS
>  	bool
>  	help
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -644,6 +644,15 @@ static inline void user_access_restore(u
>  #define user_read_access_end user_access_end
>  #endif
>  
> +#ifndef unsafe_atomic_store_release_user
> +# define unsafe_atomic_store_release_user(val, uptr, elbl)		\
> +	do {								\
> +		if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE))	\
> +			smp_mb();					\
> +		unsafe_put_user(val, uptr, elbl);			\
> +	} while (0)
> +#endif
> +
>  /* Define RW variant so the below _mode macro expansion works */
>  #define masked_user_rw_access_begin(u)	masked_user_access_begin(u)
>  #define user_rw_access_begin(u, s)	user_access_begin(u, s)

Looking at this again after a sleep; does it make sense to rename this
config symbol to something like ARCH_MEMORY_ORDER_TSO or somesuch?

I mean, this is only going to be the 3 TSO architectures (x86, s390 and
sparc64) setting this anyway, might as well make a little more generic
config symbol for this.

OTOH, its easy enough to rename the config thing if it ever is needed
elsewhere I suppose.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 08/11] futex: Add robust futex unlock IP range
  2026-03-20  9:07   ` Peter Zijlstra
@ 2026-03-20 12:07     ` Thomas Gleixner
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-20 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20 2026 at 10:07, Peter Zijlstra wrote:

> On Fri, Mar 20, 2026 at 12:24:46AM +0100, Thomas Gleixner wrote:
>>  /**
>> + * struct futex_unlock_cs_range - Range for the VDSO unlock critical section
>> + * @start_ip:	The start IP of the robust futex unlock critical section (inclusive)
>> + * @end_ip:	The end IP of the robust futex unlock critical section (exclusive)
>> + * @pop_size32:	Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
>> + */
>> +struct futex_unlock_cs_range {
>> +	unsigned long	       start_ip;
>> +	unsigned long	       end_ip;
>> +	unsigned int	       pop_size32;
>> +};
>> +
>> +#define FUTEX_ROBUST_MAX_CS_RANGES	2
>
> Would it make sense to write that like:
>
> #define FUTEX_ROBUST_MAX_CS_RANGE (1+IS_ENABLED(CONFIG_COMPAT))
>
> Given you only ever use that second entry when COMPAT?

Indeed!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user()
  2026-03-20  9:11   ` Peter Zijlstra
@ 2026-03-20 12:38     ` Thomas Gleixner
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-20 12:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20 2026 at 10:11, Peter Zijlstra wrote:
> On Fri, Mar 20, 2026 at 12:24:25AM +0100, Thomas Gleixner wrote:
>>  /* Define RW variant so the below _mode macro expansion works */
>>  #define masked_user_rw_access_begin(u)	masked_user_access_begin(u)
>>  #define user_rw_access_begin(u, s)	user_access_begin(u, s)
>
> Looking at this again after a sleep; does it make sense to rename this
> config symbol to something like ARCH_MEMORY_ORDER_TSO or somesuch?
>
> I mean, this is only going to be the 3 TSO architectures (x86, s390 and
> sparc64) setting this anyway, might as well make a little more generic
> config symbol for this.
>
> OTOH, its easy enough to rename the config thing if it ever is needed
> elsewhere I suppose.

Nah. TSO makes sense and is more useful.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock()
  2026-03-20  7:14   ` Uros Bizjak
@ 2026-03-20 12:48     ` Thomas Gleixner
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-20 12:48 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Peter Zijlstra,
	Florian Weimer, Rich Felker, Torvald Riegel, Darren Hart,
	Ingo Molnar, Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett,
	Thomas Weißschuh

On Fri, Mar 20 2026 at 08:14, Uros Bizjak wrote:
> On Fri, Mar 20, 2026 at 12:25 AM Thomas Gleixner <tglx@kernel.org> wrote:
>> +                 [zero]  "S"   (0UL)                                   \
>
> [zero] represents an internal register, so the above constraint can be
> "r" (*). If it remains a hard register constraint (%rsi) for some
> reason then the above two comments should be updated to reflect the
> new constraint.

Right. That's a leftover from some earlier experiment where I needed the
'zero' register at a fixed place, but that's all gone.

> With the constraint changed to "r":
>
> Acked-by: Uros Bizjak <ubizjak@gmail.com> (for asm template)
>
> (*) "r" allows the compiler some more freedom. The compiler tracks the
> values in registers, so it can reuse zero from an unrelated register
> without moving it to the %rsi and without clobbering the source
> register. In non-trivial functions, there is a high chance that needed
> value is already available in some register.

I'm aware of that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-03-19 23:24 ` [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
@ 2026-03-20 13:35   ` Thomas Gleixner
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-20 13:35 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
> +#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
> +void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr)
> +{
> +	void __user *pop = arch_futex_robust_unlock_get_pop(regs);
> +
> +	if (!pop)
> +		return;
> +
> +	futex_robust_list_clear_pending(pop, csr->cs_pop_size32 ? FLAGS_ROBUST_LIST32 : 0);

That needs to be 

	futex_robust_list_clear_pending(pop, csr->pop_size32 ? FLAGS_ROBUST_LIST32 : 0);

Somehow that fixup was lost on the devel machine. Fixed in the git tree.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 01/11] futex: Move futex task related data into a struct
  2026-03-19 23:24 ` [patch v2 01/11] futex: Move futex task related data into a struct Thomas Gleixner
@ 2026-03-20 14:59   ` André Almeida
  0 siblings, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-20 14:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	LKML, Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> Having all these members in task_struct along with the required #ifdeffery
> is annoying, does not allow efficient initializing of the data with
> memset() and makes extending it tedious.
> 
> Move it into a data structure and fix up all usage sites.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 02/11] futex: Move futex related mm_struct data into a struct
  2026-03-19 23:24 ` [patch v2 02/11] futex: Move futex related mm_struct " Thomas Gleixner
@ 2026-03-20 15:00   ` André Almeida
  0 siblings, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-20 15:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, LKML, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> Having all these members in mm_struct along with the required #ifdeffery is
> annoying, does not allow efficient initializing of the data with
> memset() and makes extending it tedious.
> 
> Move it into a data structure and fix up all usage sites.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers
  2026-03-19 23:24 ` [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
@ 2026-03-20 15:01   ` André Almeida
  0 siblings, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-20 15:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, LKML, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> The marker for PI futexes in the robust list is a hardcoded 0x1 which lacks
> any sensible form of documentation.
> 
> Provide proper defines for the bit and the mask and fix up the usage
> sites. Thereby convert the boolean pi argument into a modifier argument,
> which allows new modifier bits to be trivially added and conveyed.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user()
  2026-03-19 23:24 ` [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
  2026-03-20  9:11   ` Peter Zijlstra
@ 2026-03-20 16:07   ` André Almeida
  1 sibling, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-20 16:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, LKML, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> The upcoming support for unlocking robust futexes in the kernel requires
> store release semantics. Syscalls do not imply memory ordering on all
> architectures so the unlock operation requires a barrier.
> 
> This barrier can be avoided when stores imply release like on x86.
> 
> Provide a generic version with a smp_mb() before the unsafe_put_user(),
> which can be overridden by architectures.
> 
> Provide also a ARCH_STORE_IMPLIES_RELEASE Kconfig option, which can be
> selected by architectures where store implies release, so that the smp_mb()
> in the generic implementation can be avoided.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE
  2026-03-19 23:24 ` [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE Thomas Gleixner
@ 2026-03-20 16:08   ` André Almeida
  0 siblings, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-20 16:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, LKML, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> The generic unsafe_atomic_store_release_user() implementation does:
> 
>      if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE))
>          smp_mb();
>      unsafe_put_user();
> 
> As stores on x86 imply release, select ARCH_STORE_IMPLIES_RELEASE to avoid
> the unnecessary smp_mb().
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 06/11] futex: Cleanup UAPI defines
  2026-03-19 23:24 ` [patch v2 06/11] futex: Cleanup UAPI defines Thomas Gleixner
@ 2026-03-20 16:09   ` André Almeida
  0 siblings, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-20 16:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, LKML, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> Make the operand defines tabular for readability sake.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 07/11] futex: Add support for unlocking robust futexes
  2026-03-19 23:24 ` [patch v2 07/11] futex: Add support for unlocking robust futexes Thomas Gleixner
@ 2026-03-20 17:14   ` André Almeida
  2026-03-26 22:23     ` Thomas Gleixner
  0 siblings, 1 reply; 35+ messages in thread
From: André Almeida @ 2026-03-20 17:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, LKML,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Em 19/03/2026 20:24, Thomas Gleixner escreveu:
> 
> The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
> the kernel. This argument is only evaluated when the FUTEX_ROBUST_UNLOCK
> bit is set and is therefore backward compatible.
> 
I didn't find anywhere in the commit message that says what this 
pointers points to, so I would add:

"@uaddr2 argument to hand the address of robust list pending op to the 
kernel"

and also explain why we can't use 
current->futex.robust_list->list_op_pending (if I understood it 
correctly why):

"Instead of using the list_op_pending address found at 
current->futex.robust_list, use the address explicitly set by the user 
in the syscall arguments to avoid racing with set_robust_list()"


Anyway, this is

Reviewed-by: André Almeida <andrealmeid@igalia.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 00/11] futex: Address the robust futex unlock race for real
  2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (10 preceding siblings ...)
  2026-03-19 23:25 ` [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
@ 2026-03-26 21:59 ` Thomas Gleixner
  2026-03-26 22:08   ` Rich Felker
  11 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-26 21:59 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
> If the functionality itself is agreed on we only need to agree on the names
> and signatures of the functions exposed through the VDSO before we set them
> in stone. That will hopefully not take another 15 years :)

Have the libc folks any further opinion on the syscall and the vDSO part
before I prepare v3?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 00/11] futex: Address the robust futex unlock race for real
  2026-03-26 21:59 ` [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
@ 2026-03-26 22:08   ` Rich Felker
  2026-03-27  3:42     ` André Almeida
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2026-03-26 22:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Peter Zijlstra,
	Florian Weimer, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Thu, Mar 26, 2026 at 10:59:20PM +0100, Thomas Gleixner wrote:
> On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
> > If the functionality itself is agreed on we only need to agree on the names
> > and signatures of the functions exposed through the VDSO before we set them
> > in stone. That will hopefully not take another 15 years :)
> 
> Have the libc folks any further opinion on the syscall and the vDSO part
> before I prepare v3?

This whole conversation has been way too much for me to keep up with,
so I'm not sure where it's at right now.

From musl's perspective, the way we make robust mutex unlocking safe
right now is by inhibiting munmap/mremap/MAP_FIXED and
pthread_mutex_destroy while there are any in-flight robust unlocks. It
will be nice to be able to conditionally stop doing that if vdso is
available, but I can't see using a fallback that requires a syscall,
as that would just be a lot more expensive than what we're doing right
now and still not work on older kernels. So I think the only part
we're interested in is the fully-userspace approach in vdso.

If it sounds like I have a misconception of the current state of this
proposal from what I said above, let me know and I'll try to figure
out what I'm missing and catch up.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 07/11] futex: Add support for unlocking robust futexes
  2026-03-20 17:14   ` André Almeida
@ 2026-03-26 22:23     ` Thomas Gleixner
  2026-03-27  0:48       ` André Almeida
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-26 22:23 UTC (permalink / raw)
  To: André Almeida
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, LKML,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 20 2026 at 14:14, André Almeida wrote:
> Em 19/03/2026 20:24, Thomas Gleixner escreveu:
>> 
>> The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
>> the kernel. This argument is only evaluated when the FUTEX_ROBUST_UNLOCK
>> bit is set and is therefore backward compatible.
>> 
> I didn't find anywhere in the commit message that says what this 
> pointers points to, so I would add:
>
> "@uaddr2 argument to hand the address of robust list pending op to the 
> kernel"

Right.

> and also explain why we can't use 
> current->futex.robust_list->list_op_pending (if I understood it 
> correctly why):
>
> "Instead of using the list_op_pending address found at 
> current->futex.robust_list, use the address explicitly set by the user 
> in the syscall arguments to avoid racing with set_robust_list()"

No. The task can't be in the futex syscall and update the robust list
pointer concurrently. :)

The reason is to avoid the lookup of the robust list pointer and
retrieving the pending op pointer. User space has the pointer already
there so it can just put it into the @uaddr2 argument. Aside of that
this allows the usage of multiple robust lists in the future w/o any
changes to the internal functions as they just operate on the supplied
pointer. Only in the unresolvable fault case the kernel checks whether
there is a matching robust list registered and clears it to avoid
further trouble.

I'll amend the change log.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 07/11] futex: Add support for unlocking robust futexes
  2026-03-26 22:23     ` Thomas Gleixner
@ 2026-03-27  0:48       ` André Almeida
  0 siblings, 0 replies; 35+ messages in thread
From: André Almeida @ 2026-03-27  0:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Sebastian Andrzej Siewior, LKML,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Em 26/03/2026 19:23, Thomas Gleixner escreveu:
> On Fri, Mar 20 2026 at 14:14, André Almeida wrote:
>> Em 19/03/2026 20:24, Thomas Gleixner escreveu:
>>>
>>> The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
>>> the kernel. This argument is only evaluated when the FUTEX_ROBUST_UNLOCK
>>> bit is set and is therefore backward compatible.
>>>
>> I didn't find anywhere in the commit message that says what this
>> pointers points to, so I would add:
>>
>> "@uaddr2 argument to hand the address of robust list pending op to the
>> kernel"
> 
> Right.
> 
>> and also explain why we can't use
>> current->futex.robust_list->list_op_pending (if I understood it
>> correctly why):
>>
>> "Instead of using the list_op_pending address found at
>> current->futex.robust_list, use the address explicitly set by the user
>> in the syscall arguments to avoid racing with set_robust_list()"
> 
> No. The task can't be in the futex syscall and update the robust list
> pointer concurrently. :)
> 

Oh, that's right...

> The reason is to avoid the lookup of the robust list pointer and
> retrieving the pending op pointer. User space has the pointer already
> there so it can just put it into the @uaddr2 argument. Aside of that
> this allows the usage of multiple robust lists in the future w/o any
> changes to the internal functions as they just operate on the supplied
> pointer. Only in the unresolvable fault case the kernel checks whether
> there is a matching robust list registered and clears it to avoid
> further trouble.
> 

Ok, makes sense!

> I'll amend the change log.
> 

Thanks for the clarification.

> Thanks,
> 
>          tglx


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 00/11] futex: Address the robust futex unlock race for real
  2026-03-26 22:08   ` Rich Felker
@ 2026-03-27  3:42     ` André Almeida
  2026-03-27 10:08       ` Thomas Gleixner
  2026-03-27 16:50       ` Rich Felker
  0 siblings, 2 replies; 35+ messages in thread
From: André Almeida @ 2026-03-27  3:42 UTC (permalink / raw)
  To: Rich Felker
  Cc: LKML, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer,
	Torvald Riegel, Darren Hart, Thomas Gleixner, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

Em 26/03/2026 19:08, Rich Felker escreveu:
> On Thu, Mar 26, 2026 at 10:59:20PM +0100, Thomas Gleixner wrote:
>> On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
>>> If the functionality itself is agreed on we only need to agree on the names
>>> and signatures of the functions exposed through the VDSO before we set them
>>> in stone. That will hopefully not take another 15 years :)
>>
>> Have the libc folks any further opinion on the syscall and the vDSO part
>> before I prepare v3?
> 
> This whole conversation has been way too much for me to keep up with,
> so I'm not sure where it's at right now.
> 
>  From musl's perspective, the way we make robust mutex unlocking safe
> right now is by inhibiting munmap/mremap/MAP_FIXED and
> pthread_mutex_destroy while there are any in-flight robust unlocks. It
> will be nice to be able to conditionally stop doing that if vdso is
> available, but I can't see using a fallback that requires a syscall,
> as that would just be a lot more expensive than what we're doing right
> now and still not work on older kernels. So I think the only part
> we're interested in is the fully-userspace approach in vdso.
> 

You just need the syscall for the contented case (where you would need a 
syscall anyway for a FUTEX_WAKE).

As Thomas wrote in patch 09/11:

   The resulting code sequence for user space is:

   if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != 
tid)
  	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);

   Both the VDSO unlock and the kernel side unlock ensure that the 
pending_op pointer is always cleared when the lock becomes unlocked.


So you call the vDSO first. If it fails, it means that the lock is 
contented and you need to call futex(). It will wake a waiter, release 
the lock and clean list_op_pending.

> If it sounds like I have a misconception of the current state of this
> proposal from what I said above, let me know and I'll try to figure
> out what I'm missing and catch up.
> 
> Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 00/11] futex: Address the robust futex unlock race for real
  2026-03-27  3:42     ` André Almeida
@ 2026-03-27 10:08       ` Thomas Gleixner
  2026-03-27 16:50       ` Rich Felker
  1 sibling, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-27 10:08 UTC (permalink / raw)
  To: André Almeida, Rich Felker
  Cc: LKML, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 27 2026 at 00:42, André Almeida wrote:
> Em 26/03/2026 19:08, Rich Felker escreveu:
>> On Thu, Mar 26, 2026 at 10:59:20PM +0100, Thomas Gleixner wrote:
>>> On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
>>>> If the functionality itself is agreed on we only need to agree on the names
>>>> and signatures of the functions exposed through the VDSO before we set them
>>>> in stone. That will hopefully not take another 15 years :)
>>>
>>> Have the libc folks any further opinion on the syscall and the vDSO part
>>> before I prepare v3?
>> 
>> This whole conversation has been way too much for me to keep up with,
>> so I'm not sure where it's at right now.
>> 
>>  From musl's perspective, the way we make robust mutex unlocking safe
>> right now is by inhibiting munmap/mremap/MAP_FIXED and
>> pthread_mutex_destroy while there are any in-flight robust unlocks. It
>> will be nice to be able to conditionally stop doing that if vdso is
>> available, but I can't see using a fallback that requires a syscall,
>> as that would just be a lot more expensive than what we're doing right
>> now and still not work on older kernels. So I think the only part
>> we're interested in is the fully-userspace approach in vdso.
>> 
>
> You just need the syscall for the contented case (where you would need a 
> syscall anyway for a FUTEX_WAKE).
>
> As Thomas wrote in patch 09/11:
>
>    The resulting code sequence for user space is:
>
>    if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != 
> tid)
>   	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
>
>    Both the VDSO unlock and the kernel side unlock ensure that the 
> pending_op pointer is always cleared when the lock becomes unlocked.
>
>
> So you call the vDSO first. If it fails, it means that the lock is 
> contented and you need to call futex(). It will wake a waiter, release 
> the lock and clean list_op_pending.

See also the V1 cover letter which has a full deep dive:

     https://lore.kernel.org/20260316162316.356674433@kernel.org

TLDR:

The problem can be split into two issues:

    1) Contended unlock

    2) Uncontended unlock

#1 is solved by moving the unlock into the kernel instead of unlocking
   first and then invoking the syscall to wake waiters. The syscall
   takes the list_pending_op pointer as an argument and after unlocking,
   i.e. *lock = 0, it clears the list_pending_op pointer

   For this to work, it needs to use try_cmpxchg() like PI unlock does.

#2 The race is between the succesful try_cmpxchg() and the clearing of
   the list_pending_op pointer

   That's where the VDSO comes into play. Instead of having the
   try_cmpxchg() in the library code the library invokes the VDSO
   provided variant. That allows the kernel to check in the signal
   delivery path whether a successful unlock requires a helping hand to
   clear the list pending op pointer. If the interrupted IP is in the
   critical section _and_ the try_cmpxchg() succeeded then the kernel
   clears the pointer.

   In x86 ASM:

   0000000000001590 <__vdso_futex_robust_list64_try_unlock@@LINUX_2.6>:
    1590:  mov    %esi,%eax
    1592:  xor    %ecx,%ecx
    1594:  lock cmpxchg %ecx,(%rdi)    // Result goes into ZF
    1598:  jne    159d               <- CS start    
    159a:  mov    %rcx,(%rdx)          // Clear list_pending_op
    159d:  ret                       <- CS end
    159e:  xchg   %ax,%ax

   So if the kernel observes

         IP >= CS start && IP < CS end

   then it checks the ZF flag in pt_regs and if set it clears the
   list_pending op.

Obviously #1 depends on #2 to close all holes.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 08/11] futex: Add robust futex unlock IP range
  2026-03-19 23:24 ` [patch v2 08/11] futex: Add robust futex unlock IP range Thomas Gleixner
  2026-03-20  9:07   ` Peter Zijlstra
@ 2026-03-27 13:24   ` Sebastian Andrzej Siewior
  2026-03-27 16:19     ` Thomas Gleixner
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-27 13:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

On 2026-03-20 00:24:46 [+0100], Thomas Gleixner wrote:
> --- a/include/linux/futex_types.h
> +++ b/include/linux/futex_types.h
> @@ -31,6 +31,20 @@ struct futex_sched_data {
>  
> +struct futex_unlock_cs_range {
> +	unsigned long	       start_ip;
> +	unsigned long	       end_ip;
> +	unsigned int	       pop_size32;
> +};
> +
> +#define FUTEX_ROBUST_MAX_CS_RANGES	2
> @@ -50,6 +68,10 @@ struct futex_mm_data {
>  	atomic_long_t			phash_atomic;
>  	unsigned int			__percpu *phash_ref;
>  #endif
> +#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
> +	unsigned int			unlock_cs_num_ranges;
> +	struct futex_unlock_cs_range	unlock_cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
> +#endif
>  };

While looking at this from an economic point of view, we get:
|        unsigned int *             phash_ref;            /*    80     8 */
|        unsigned int               unlock_cs_num_ranges; /*    88     4 */
|
|        /* XXX 4 bytes hole, try to pack */
|
|        struct futex_unlock_cs_range unlock_cs_ranges[2]; /*    96    48 */
|}
|struct futex_unlock_cs_range {
|        long unsigned int          start_ip;             /*     0     8 */
|        long unsigned int          end_ip;               /*     8     8 */
|        unsigned int               pop_size32;           /*    16     4 */
|
|        /* size: 24, cachelines: 1, members: 3 */
|        /* padding: 4 */
|        /* last cacheline: 24 bytes */
|};

end_ip could be replaced with a u16 size. There is no need to have
pop_size32 as u32, it could be a u16 filling the gap.
On the other hand, pop_size32 could be passed by the caller since it is
known if it is the first or the second member / the 64bit or 32bit case.

unlock_cs_num_ranges could probably go because if start_ip == NULL then
there is no mapping since it can't be mapped at 0x0. Worst case would be
to check two variables vs NULL.

And if we replace end_ip with size then we could remove it because vdso
is known at compile so we should know the size at compile time.

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 08/11] futex: Add robust futex unlock IP range
  2026-03-27 13:24   ` Sebastian Andrzej Siewior
@ 2026-03-27 16:19     ` Thomas Gleixner
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2026-03-27 16:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, Mathieu Desnoyers, André Almeida, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh

On Fri, Mar 27 2026 at 14:24, Sebastian Andrzej Siewior wrote:
> On 2026-03-20 00:24:46 [+0100], Thomas Gleixner wrote:
>
> end_ip could be replaced with a u16 size. There is no need to have
> pop_size32 as u32, it could be a u16 filling the gap.
> On the other hand, pop_size32 could be passed by the caller since it is
> known if it is the first or the second member / the 64bit or 32bit case.

That's not a win because
    {
        unsigned long	start;
        u??		len;
    }

will always end up with a hole between the array entries.

> unlock_cs_num_ranges could probably go because if start_ip == NULL then
> there is no mapping since it can't be mapped at 0x0. Worst case would be
> to check two variables vs NULL.

That's correct, but it increases the costs for COMPAT as

       if (ip >= r[0].start && ip < r[0].start + r[0].end)
          // not taken
          return .....;
       if (ip >= r[1].start && ip < r[1].start + r[1].end)
          // not taken
          return .....;
        
IP is > 0, so it needs to do both checks with full evaluation. That can
be avoided by initializing the r[N].start with ~0UL, which means the first
check

        ip >= r[0].start

will be false.

> And if we replace end_ip with size then we could remove it because vdso
> is known at compile so we should know the size at compile time.

No, it's not because the VDSO is a user space build and the kernel
relies on vdso2c to convert it to a binary blob, which is mapped to user
space and the pre?ap-pended container for extable, alternatives and
symbols the kernel needs. And no, the build dependency mess is big
enough already, no need to make it worse.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 00/11] futex: Address the robust futex unlock race for real
  2026-03-27  3:42     ` André Almeida
  2026-03-27 10:08       ` Thomas Gleixner
@ 2026-03-27 16:50       ` Rich Felker
  1 sibling, 0 replies; 35+ messages in thread
From: Rich Felker @ 2026-03-27 16:50 UTC (permalink / raw)
  To: André Almeida
  Cc: LKML, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer,
	Torvald Riegel, Darren Hart, Thomas Gleixner, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh

On Fri, Mar 27, 2026 at 12:42:35AM -0300, André Almeida wrote:
> Em 26/03/2026 19:08, Rich Felker escreveu:
> > On Thu, Mar 26, 2026 at 10:59:20PM +0100, Thomas Gleixner wrote:
> > > On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
> > > > If the functionality itself is agreed on we only need to agree on the names
> > > > and signatures of the functions exposed through the VDSO before we set them
> > > > in stone. That will hopefully not take another 15 years :)
> > > 
> > > Have the libc folks any further opinion on the syscall and the vDSO part
> > > before I prepare v3?
> > 
> > This whole conversation has been way too much for me to keep up with,
> > so I'm not sure where it's at right now.
> > 
> >  From musl's perspective, the way we make robust mutex unlocking safe
> > right now is by inhibiting munmap/mremap/MAP_FIXED and
> > pthread_mutex_destroy while there are any in-flight robust unlocks. It
> > will be nice to be able to conditionally stop doing that if vdso is
> > available, but I can't see using a fallback that requires a syscall,
> > as that would just be a lot more expensive than what we're doing right
> > now and still not work on older kernels. So I think the only part
> > we're interested in is the fully-userspace approach in vdso.
> > 
> 
> You just need the syscall for the contented case (where you would need a
> syscall anyway for a FUTEX_WAKE).
> 
> As Thomas wrote in patch 09/11:
> 
>   The resulting code sequence for user space is:
> 
>   if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid)
>  	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
> 
>   Both the VDSO unlock and the kernel side unlock ensure that the pending_op
> pointer is always cleared when the lock becomes unlocked.
> 
> 
> So you call the vDSO first. If it fails, it means that the lock is contented
> and you need to call futex(). It will wake a waiter, release the lock and
> clean list_op_pending.

So would we use the vdso function presence as signal that this
functionality is available? In that case, I think what we would do is:

1. Try an uncontended unlock using the vdso.
2. If it fails, attempt FUTEX_ROBUST_UNLOCK.
3. If that fails (note: this could be due to seccomp!), fallback to
the old kernel code path, holding off any munmap/etc. while we perform
the userspace unlock.

The path where the vdso function is missing would go straight to 3.

Does this sound right?

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2026-03-27 16:50 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-03-19 23:24 ` [patch v2 01/11] futex: Move futex task related data into a struct Thomas Gleixner
2026-03-20 14:59   ` André Almeida
2026-03-19 23:24 ` [patch v2 02/11] futex: Move futex related mm_struct " Thomas Gleixner
2026-03-20 15:00   ` André Almeida
2026-03-19 23:24 ` [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
2026-03-20 15:01   ` André Almeida
2026-03-19 23:24 ` [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
2026-03-20  9:11   ` Peter Zijlstra
2026-03-20 12:38     ` Thomas Gleixner
2026-03-20 16:07   ` André Almeida
2026-03-19 23:24 ` [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE Thomas Gleixner
2026-03-20 16:08   ` André Almeida
2026-03-19 23:24 ` [patch v2 06/11] futex: Cleanup UAPI defines Thomas Gleixner
2026-03-20 16:09   ` André Almeida
2026-03-19 23:24 ` [patch v2 07/11] futex: Add support for unlocking robust futexes Thomas Gleixner
2026-03-20 17:14   ` André Almeida
2026-03-26 22:23     ` Thomas Gleixner
2026-03-27  0:48       ` André Almeida
2026-03-19 23:24 ` [patch v2 08/11] futex: Add robust futex unlock IP range Thomas Gleixner
2026-03-20  9:07   ` Peter Zijlstra
2026-03-20 12:07     ` Thomas Gleixner
2026-03-27 13:24   ` Sebastian Andrzej Siewior
2026-03-27 16:19     ` Thomas Gleixner
2026-03-19 23:24 ` [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
2026-03-20 13:35   ` Thomas Gleixner
2026-03-19 23:24 ` [patch v2 10/11] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
2026-03-19 23:25 ` [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
2026-03-20  7:14   ` Uros Bizjak
2026-03-20 12:48     ` Thomas Gleixner
2026-03-26 21:59 ` [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-03-26 22:08   ` Rich Felker
2026-03-27  3:42     ` André Almeida
2026-03-27 10:08       ` Thomas Gleixner
2026-03-27 16:50       ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox