public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"André Almeida" <andrealmeid@igalia.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Carlos O'Donell" <carlos@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rich Felker" <dalias@aerifal.cx>,
	"Torvald Riegel" <triegel@redhat.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [patch 8/8] x86/vdso: Implement __vdso_futex_robust_try_unlock()
Date: Tue, 17 Mar 2026 23:32:18 +0100	[thread overview]
Message-ID: <87zf46m6j1.ffs@tglx> (raw)
In-Reply-To: <lhuse9ylp1s.fsf@oldenburg.str.redhat.com>

On Tue, Mar 17 2026 at 11:37, Florian Weimer wrote:
> * Thomas Gleixner:
>> Right, I know that no libc implementation supports such an insanity, but
>> the kernel unfortunately allows to do so and it's used in the wild :(
>>
>> So we have to deal with it somehow and the size modifier was the most
>> straight forward solution I could come up with. I'm all ears if someone
>> has a better idea.
>
> Maybe a separate futex op?  And the vDSO would have the futex call,
> mangle uaddr2 as required for the shared code section that handles both
> ops?
>
> As far as I can tell at this point, the current proposal should work.
> We'd probably start with using the syscall-based unlock.

Something like the below compiled but untested delta diff which includes
also the other unrelated feedback fixups?

Thanks,

        tglx
---
diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
index 19d8ef130b63..491ed141622d 100644
--- a/arch/x86/entry/vdso/common/vfutex.c
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -1,72 +1,218 @@
 // SPDX-License-Identifier: GPL-2.0-only
 #include <vdso/futex.h>
 
+/*
+ * Assembly template for the try unlock functions. The basic functionality for
+ * 64-bit is:
+ *
+ *	At the call site:
+ *		mov		&lock, %rdi	Store the lock pointer in RDI
+ *		mov		&pop, %rdx	Store the pending op pointer in RDX
+ *		mov		TID, %esi	Store the thread's TID in ESI
+ *
+ * 64-bit unlock function:
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		lock_cmpxchgl	%ecx, (%rdi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		movq		$0, (%rdx)	Set the pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * For COMPAT enabled 64-bit kernels this is a bit more complex because the size
+ * of the @pop pointer has to be determined in the success section:
+ *
+ *	At the 64-bit call site:
+ *		mov		&lock, %rdi	Store the lock pointer in RDI
+ *		mov		&pop, %rdx	Store the pending op pointer in RDX
+ *		mov		TID, %esi	Store the thread's TID in ESI
+ *
+ *	At the 32-bit call site:
+ *		mov		&lock, %edi	Store the lock pointer in EDI
+ *		mov		&pop, %edx	Store the pending op pointer in EDX
+ *		mov		TID, %esi	Store the thread's TID in ESI
+ *
+ *	The 32-bit entry point:
+ *		or		$0x1, %edx	Mark the op pointer 32-bit
+ *
+ * Common unlock function:
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		mov		%rdx, %rsi	Store the op pointer in RSI
+ *		and		~0x1, %rsi	Clear the size bit in RSI
+  *		lock_cmpxchgl	%ecx, (%rdi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		test		$0x1, %rdx	Test the 32-bit size bit in the original pointer
+ *		jz		.Lop64		If not set, clear 64-bit
+ *		movl		$0, (%rsi)	Set the 32-bit pending op pointer to 0
+ *		jmp		.Lcs_end	Leave the critical section
+ * .Lop64:	movq		$0, (%rsi)	Set the 64-bit pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * The 32-bit VDSO needs to set the 32-bit size bit as well to keep the code
+ * compatible for the kernel side fixup function, but it does not require the
+ * size evaluation in the success path.
+ *
+ *	At the 32-bit call site:
+ *		mov		&lock, %edi	Store the lock pointer in EDI
+ *		mov		&pop, %edx	Store the pending op pointer in EDX
+ *		mov		TID, %esi	Store the thread's TID in ESI
+ *
+ *	The 32-bit entry point does:
+ *		or		$0x1, %edx	Mark the op pointer 32-bit
+ *
+ * 32-bit unlock function:
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		mov		%edx, %esi	Store the op pointer in ESI
+ *		and		~0x1, %esi	Clear the size bit in ESI
+  *		lock_cmpxchgl	%ecx, (%edi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		movl		$0, (%esi)	Set the 32-bit pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * The pointer modification makes sure that the unlock function can determine
+ * the pending op pointer size correctly and clear either 32 or 64 bit.
+ *
+ * The intermediate storage of the unmangled pointer (bit 0 cleared) in [ER]SI
+ * makes sure that the store hits the right address.
+ *
+ * The mangled pointer (bit 0 set for 32-bit) stays in [ER]DX so that the kernel
+ * side fixup function can determine the storage size correctly and always
+ * retrieve regs->rdx without any extra knowledge of the actual code path taken
+ * or checking the compat mode of the task.
+ *
+ * The .Lcs_success label is technically not required for a pure 64-bit and the
+ * 32-bit VDSO but is kept there for simplicity. In those cases the ZF flag in
+ * regs->eflags is authoritative for the whole critical section and no further
+ * evaluation is required.
+ *
+ * In the 64-bit compat case the .Lcs_success label is required because the
+ * pointer size check modifies the ZF flag, which means it is only valid for the
+ * case where .Lcs_start <= regs->ip < L.cs_success, which is obviously the
+ * same as l.cs_start == regs->ip for x86.
+ *
+ * That's still a valuable distinction for clarity to keep the ASM template the
+ * same for all case. This is also a template for other architectures which
+ * might have different requirements even for the non COMPAT case.
+ *
+ * That means in the 64-bit compat case the decision to do the fixup is:
+ *
+ *	if (regs->ip >= .Lcs_start && regs->ip < L.cs_success)
+ *		return (regs->eflags & ZF);
+ *	return regs->ip < .Lcs_end;
+ *
+ * As the initial critical section check in the return to user space code
+ * already established that:
+ *
+ *	.Lcs_start <= regs->ip < L.cs_end
+ *
+ * that decision can be simplified to:
+ *
+ *	return regs->ip >= L.cs_success || regs->eflags & ZF;
+ *
+ */
+#define robust_try_unlock_asm(__tid, __lock, __pop)					\
+	asm volatile (									\
+		".global __kernel_futex_robust_try_unlock_cs_start		\n"	\
+		".global __kernel_futex_robust_try_unlock_cs_success		\n"	\
+		".global __kernel_futex_robust_try_unlock_cs_end		\n"	\
+		"								\n"	\
+		"       lock cmpxchgl	%[val], (%[ptr])			\n"	\
+		"								\n"	\
+		"__kernel_futex_robust_try_unlock_cs_start:			\n"	\
+		"								\n"	\
+		"	jnz		__kernel_futex_robust_try_unlock_cs_end	\n"	\
+		"								\n"	\
+		"__kernel_futex_robust_try_unlock_cs_success:			\n"	\
+		"								\n"	\
+			ASM_CLEAR_PTR							\
+		"								\n"	\
+		"__kernel_futex_robust_try_unlock_cs_end:			\n"	\
+		: [tid] "+a" (__tid)							\
+		: [ptr] "D"  (__lock),							\
+		  [pop] "d"  (__pop),							\
+		  [val] "r"  (0)							\
+		  ASM_PAD_CONSTRAINT(__pop)						\
+		: "memory"								\
+	)
+
 /*
  * Compat enabled kernels have to take the size bit into account to support the
  * mixed size use case of gaming emulators. Contrary to the kernel robust unlock
  * mechanism all of this does not test for the 32-bit modifier in 32-bit VDSOs
  * and in compat disabled kernels. User space can keep the pieces.
  */
-#if defined(CONFIG_X86_64) && !defined(BUILD_VDSO32_64)
-
+#ifdef __x86_64__
 #ifdef CONFIG_COMPAT
 
 # define ASM_CLEAR_PTR								\
 		"	testl	$1, (%[pop])				\n"	\
 		"	jz	.Lop64					\n"	\
 		"	movl	$0, (%[pad])				\n"	\
-		"	jmp	__vdso_futex_robust_try_unlock_cs_end	\n"	\
+		"	jmp	__kernel_futex_robust_try_unlock_cs_end	\n"	\
 		".Lop64:						\n"	\
 		"	movq	$0, (%[pad])				\n"
 
-# define ASM_PAD_CONSTRAINT	,[pad] "S" (((unsigned long)pop) & ~0x1UL)
+# define ASM_PAD_CONSTRAINT(__pop)	,[pad] "S" (((unsigned long)__pop) & ~0x1UL)
+
+__u32 noinline __vdso_futex_robust_try_unlock_64(__u32 *lock, __u32 tid, __u64 *pop)
+{
+	robust_try_unlock_asm(lock, tid, pop);
+	return tid;
+}
+
+__u32 noinline __vdso_futex_robust_try_unlock_32(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	__u64 pop_addr = ((u64) pop) | FUTEX_ROBUST_UNLOCK_MOD_32BIT;
+
+	return __vdso_futex_robust_try_unlock_64(lock, tid, (__u64 *)pop_addr);
+}
+
+__u32 futex_robust_try_unlock_64(__u32 *, __u32, __u64 *)
+       __attribute__((weak, alias("__vdso_futex_robust_try_unlock_64")));
+
+__u32 futex_robust_try_unlock_32(__u32 *, __u32, __u32 *)
+       __attribute__((weak, alias("__vdso_futex_robust_try_unlock_32")));
 
 #else /* CONFIG_COMPAT */
 
 # define ASM_CLEAR_PTR								\
 		"	movq	$0, (%[pop])				\n"
 
-# define ASM_PAD_CONSTRAINT
+# define ASM_PAD_CONSTRAINT(__pop)
+
+__u32 noinline __vdso_futex_robust_try_unlock_64(__u32 *lock, __u32 tid, __u64 *pop)
+{
+	robust_try_unlock_asm(lock, tid, pop);
+	return tid;
+}
+
+__u32 futex_robust_try_unlock_64(__u32 *, __u32, __u64 *)
+       __attribute__((weak, alias("__vdso_futex_robust_try_unlock_64")));
 
 #endif /* !CONFIG_COMPAT */
 
-#else /* CONFIG_X86_64 && !BUILD_VDSO32_64 */
+#else /* __x86_64__ */
 
 # define ASM_CLEAR_PTR								\
 		"	movl	$0, (%[pad])				\n"
 
-# define ASM_PAD_CONSTRAINT	,[pad] "S" (((unsigned long)pop) & ~0x1UL)
-
-#endif /* !CONFIG_X86_64 || BUILD_VDSO32_64 */
+# define ASM_PAD_CONSTRAINT(__pop)	,[pad] "S" (((unsigned long)__pop) & ~0x1UL)
 
-uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *pop)
+__u32 noinline __vdso_futex_robust_try_unlock_32(__u32 *lock, __u32 tid, __u32 *pop)
 {
-	asm volatile (
-		".global __vdso_futex_robust_try_unlock_cs_start		\n"
-		".global __vdso_futex_robust_try_unlock_cs_success		\n"
-		".global __vdso_futex_robust_try_unlock_cs_end			\n"
-		"								\n"
-		"       lock cmpxchgl	%[val], (%[ptr])			\n"
-		"								\n"
-		"__vdso_futex_robust_try_unlock_cs_start:			\n"
-		"								\n"
-		"	jnz		__vdso_futex_robust_try_unlock_cs_end	\n"
-		"								\n"
-		"__vdso_futex_robust_try_unlock_cs_success:			\n"
-		"								\n"
-			ASM_CLEAR_PTR
-		"								\n"
-		"__vdso_futex_robust_try_unlock_cs_end:				\n"
-		: [tid] "+a" (tid)
-		: [ptr] "D"  (lock),
-		  [pop] "d" (pop),
-		  [val] "r"  (0)
-		  ASM_PAD_CONSTRAINT
-		: "memory"
-	);
+	__u32 pop_addr = ((u32) pop) | FUTEX_ROBUST_UNLOCK_MOD_32BIT;
 
+	robust_try_unlock_asm(lock, tid, (__u32 *)pop_addr);
 	return tid;
 }
 
-uint32_t futex_robust_try_unlock(uint32_t *, uint32_t, void **)
-	__attribute__((weak, alias("__vdso_futex_robust_try_unlock")));
+__u32 futex_robust_try_unlock_32(__u32 *, __u32, __u32 *)
+       __attribute__((weak, alias("__vdso_futex_robust_try_unlock_32")));
+#endif /* !__x86_64__ */
diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
index b027d2f98bd0..cb7b8de8009c 100644
--- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
@@ -31,10 +31,10 @@ VERSION
 		__vdso_clock_getres_time64;
 		__vdso_getcpu;
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-		__vdso_futex_robust_try_unlock;
-		__vdso_futex_robust_try_unlock_cs_start;
-		__vdso_futex_robust_try_unlock_cs_success;
-		__vdso_futex_robust_try_unlock_cs_end;
+		__vdso_futex_robust_try_unlock_32;
+		__kernel_futex_robust_try_unlock_cs_start;
+		__kernel_futex_robust_try_unlock_cs_success;
+		__kernel_futex_robust_try_unlock_cs_end;
 #endif
 	};
 
diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
index e5c0ca9664e1..6dd36ae2ab79 100644
--- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
@@ -33,10 +33,11 @@ VERSION {
 		getrandom;
 		__vdso_getrandom;
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-		__vdso_futex_robust_try_unlock;
-		__vdso_futex_robust_try_unlock_cs_start;
-		__vdso_futex_robust_try_unlock_cs_success;
-		__vdso_futex_robust_try_unlock_cs_end;
+		__vdso_futex_robust_try_unlock_64;
+		__vdso_futex_robust_try_unlock_32;
+		__kernel_futex_robust_try_unlock_cs_start;
+		__kernel_futex_robust_try_unlock_cs_success;
+		__kernel_futex_robust_try_unlock_cs_end;
 #endif
 	local: *;
 	};
diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
index 4409d97e7ef6..a456f184c937 100644
--- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
@@ -23,7 +23,8 @@ VERSION {
 		__vdso_time;
 		__vdso_clock_getres;
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
-		__vdso_futex_robust_try_unlock;
+		__vdso_futex_robust_try_unlock_64;
+		__vdso_futex_robust_try_unlock_32;
 		__vdso_futex_robust_try_unlock_cs_start;
 		__vdso_futex_robust_try_unlock_cs_success;
 		__vdso_futex_robust_try_unlock_cs_end;
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index 223f469789c5..a96293050bf4 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -11,13 +11,15 @@ struct futex_pi_state;
 struct robust_list_head;
 
 /**
- * struct futex_ctrl - Futex related per task data
+ * struct futex_sched_data - Futex related per task data
  * @robust_list:	User space registered robust list pointer
  * @compat_robust_list:	User space registered robust list pointer for compat tasks
+ * @pi_state_list:	List head for Priority Inheritance (PI) state management
+ * @pi_state_cache:	Pointer to cache one PI state object per task
  * @exit_mutex:		Mutex for serializing exit
  * @state:		Futex handling state to handle exit races correctly
  */
-struct futex_ctrl {
+struct futex_sched_data {
 	struct robust_list_head __user		*robust_list;
 #ifdef CONFIG_COMPAT
 	struct compat_robust_list_head __user	*compat_robust_list;
@@ -27,9 +29,6 @@ struct futex_ctrl {
 	struct mutex				exit_mutex;
 	unsigned int				state;
 };
-#else
-struct futex_ctrl { };
-#endif /* !CONFIG_FUTEX */
 
 /**
  * struct futex_mm_data - Futex related per MM data
@@ -71,4 +70,9 @@ struct futex_mm_data {
 #endif
 };
 
+#else
+struct futex_sched_data { };
+struct futex_mm_data { };
+#endif /* !CONFIG_FUTEX */
+
 #endif /* _LINUX_FUTEX_TYPES_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 266d4859e322..a5d5c0ec3c64 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1329,7 +1329,7 @@ struct task_struct {
 	u32				rmid;
 #endif
 
-	struct futex_ctrl		futex;
+	struct futex_sched_data		futex;
 
 #ifdef CONFIG_PERF_EVENTS
 	u8				perf_recursion[PERF_NR_CONTEXTS];
diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index ab9d89748595..e447eaea63f4 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -26,6 +26,7 @@
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
 #define FUTEX_UNLOCK_ROBUST	512
+#define FUTEX_ROBUST_LIST32	1024
 #define FUTEX_CMD_MASK		~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | FUTEX_UNLOCK_ROBUST)
 
 #define FUTEX_WAIT_PRIVATE	(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
@@ -182,23 +183,6 @@ struct robust_list_head {
 #define FUTEX_ROBUST_MOD_PI		(0x1UL)
 #define FUTEX_ROBUST_MOD_MASK		(FUTEX_ROBUST_MOD_PI)
 
-/*
- * Modifier for FUTEX_ROBUST_UNLOCK uaddr2. Required to distinguish the storage
- * size for the robust_list_head::list_pending_op. This solves two problems:
- *
- *	1) COMPAT tasks
- *
- *	2) The mixed mode magic gaming use case which has both 32-bit and 64-bit
- *	   robust lists. Oh well....
- *
- * Long story short: 32-bit userspace must set this bit unconditionally to
- * ensure that it can run on a 64-bit kernel in compat mode. If user space
- * screws that up a 64-bit kernel will happily clear the full 64-bits. 32-bit
- * kernels return an error code if the bit is not set.
- */
-#define FUTEX_ROBUST_UNLOCK_MOD_32BIT	(0x1UL)
-#define FUTEX_ROBUST_UNLOCK_MOD_MASK	(FUTEX_ROBUST_UNLOCK_MOD_32BIT)
-
 /*
  * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a
  * match of any bit.
diff --git a/include/vdso/futex.h b/include/vdso/futex.h
index 8061bfcb6b92..a768c00b0ada 100644
--- a/include/vdso/futex.h
+++ b/include/vdso/futex.h
@@ -2,12 +2,11 @@
 #ifndef _VDSO_FUTEX_H
 #define _VDSO_FUTEX_H
 
-#include <linux/types.h>
-
-struct robust_list;
+#include <uapi/linux/types.h>
 
 /**
- * __vdso_futex_robust_try_unlock - Try to unlock an uncontended robust futex
+ * __vdso_futex_robust_try_unlock_64 - Try to unlock an uncontended robust futex
+ *				       with a 64-bit op pointer
  * @lock:	Pointer to the futex lock object
  * @tid:	The TID of the calling task
  * @op:		Pointer to the task's robust_list_head::list_pending_op
@@ -39,6 +38,23 @@ struct robust_list;
  * @uaddr2 argument for sys_futex(FUTEX_ROBUST_UNLOCK) operations. See the
  * modifier and the related documentation in include/uapi/linux/futex.h
  */
-uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *op);
+__u32 __vdso_futex_robust_try_unlock_64(__u32 *lock, __u32 tid, __u64 *op);
+
+/**
+ * __vdso_futex_robust_try_unlock_32 - Try to unlock an uncontended robust futex
+ *				       with a 32-bit op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @op:		Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * Same as __vdso_futex_robust_try_unlock_64() just with a 32-bit @op pointer.
+ */
+__u32 __vdso_futex_robust_try_unlock_32(__u32 *lock, __u32 tid, __u32 *op);
+
+/* Modifier to convey the size of the op pointer */
+#define FUTEX_ROBUST_UNLOCK_MOD_32BIT  (0x1UL)
+#define FUTEX_ROBUST_UNLOCK_MOD_MASK   (FUTEX_ROBUST_UNLOCK_MOD_32BIT)
 
 #endif
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 7957edd46b89..39041cf94522 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -46,6 +46,8 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <vdso/futex.h>
+
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
@@ -1434,17 +1436,9 @@ static void exit_pi_state_list(struct task_struct *curr)
 static inline void exit_pi_state_list(struct task_struct *curr) { }
 #endif
 
-static inline bool mask_pop_addr(void __user **pop)
-{
-	unsigned long addr = (unsigned long)*pop;
-
-	*pop = (void __user *) (addr & ~FUTEX_ROBUST_UNLOCK_MOD_MASK);
-	return !!(addr & FUTEX_ROBUST_UNLOCK_MOD_32BIT);
-}
-
-bool futex_robust_list_clear_pending(void __user *pop)
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
 {
-	bool size32bit = mask_pop_addr(&pop);
+	bool size32bit = !!(flags & FLAGS_ROBUST_LIST32);
 
 	if (!IS_ENABLED(CONFIG_64BIT) && !size32bit)
 		return false;
@@ -1456,15 +1450,28 @@ bool futex_robust_list_clear_pending(void __user *pop)
 }
 
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static inline bool mask_pop_addr(void __user **pop)
+{
+	unsigned long addr = (unsigned long)*pop;
+
+	*pop = (void __user *) (addr & ~FUTEX_ROBUST_UNLOCK_MOD_MASK);
+	return !!(addr & FUTEX_ROBUST_UNLOCK_MOD_32BIT);
+}
+
 void __futex_fixup_robust_unlock(struct pt_regs *regs)
 {
+	unsigned int flags = 0;
 	void __user *pop;
 
 	if (!arch_futex_needs_robust_unlock_fixup(regs))
 		return;
 
 	pop = arch_futex_robust_unlock_get_pop(regs);
-	futex_robust_list_clear_pending(pop);
+
+	if (mask_pop_addr(&pop))
+		flags = FUTEX_ROBUST_UNLOCK_MOD_32BIT;
+
+	futex_robust_list_clear_pending(pop, flags);
 }
 #endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
 
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index b1aaa90f1779..31a5bae8b470 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -41,6 +41,7 @@
 #define FLAGS_STRICT		0x0100
 #define FLAGS_MPOL		0x0200
 #define FLAGS_UNLOCK_ROBUST	0x0400
+#define FLAGS_ROBUST_LIST32	0x0800
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
@@ -56,6 +57,9 @@ static inline unsigned int futex_to_flags(unsigned int op)
 	if (op & FUTEX_UNLOCK_ROBUST)
 		flags |= FLAGS_UNLOCK_ROBUST;
 
+	if (op & FUTEX_ROBUST_LIST32)
+		flags |= FLAGS_ROBUST_LIST32;
+
 	return flags;
 }
 
@@ -452,6 +456,6 @@ extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *p
 
 extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock);
 
-bool futex_robust_list_clear_pending(void __user *pop);
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags);
 
 #endif /* _FUTEX_H */
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index b8c76b6242e4..05ca360a7a30 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -1298,7 +1298,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop)
 	if (ret || !(flags & FLAGS_UNLOCK_ROBUST))
 		return ret;
 
-	if (!futex_robust_list_clear_pending(pop))
+	if (!futex_robust_list_clear_pending(pop, flags))
 		return -EFAULT;
 
 	return 0;
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index 45effcf42961..233f38b1f52e 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -166,7 +166,7 @@ static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __us
 	 * deeper trouble as the robust list head is usually part of TLS. The
 	 * chance of survival is close to zero.
 	 */
-	return futex_robust_list_clear_pending(pop);
+	return futex_robust_list_clear_pending(pop, flags);
 }
 
 /*

  reply	other threads:[~2026-03-17 22:32 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 17:12 [patch 0/8] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-03-16 17:12 ` [patch 1/8] futex: Move futex task related data into a struct Thomas Gleixner
2026-03-16 17:55   ` Mathieu Desnoyers
2026-03-17  2:24   ` André Almeida
2026-03-17  9:52     ` Thomas Gleixner
2026-03-16 17:13 ` [patch 2/8] futex: Move futex related mm_struct " Thomas Gleixner
2026-03-16 18:00   ` Mathieu Desnoyers
2026-03-16 17:13 ` [patch 3/8] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
2026-03-16 18:02   ` Mathieu Desnoyers
2026-03-17  2:38   ` André Almeida
2026-03-17  9:53     ` Thomas Gleixner
2026-03-16 17:13 ` [patch 4/8] futex: Add support for unlocking robust futexes Thomas Gleixner
2026-03-16 18:24   ` Mathieu Desnoyers
2026-03-17 16:17   ` André Almeida
2026-03-17 20:46     ` Peter Zijlstra
2026-03-17 22:40       ` Thomas Gleixner
2026-03-18  8:02         ` Peter Zijlstra
2026-03-18  8:06           ` Florian Weimer
2026-03-18 14:47           ` Peter Zijlstra
2026-03-18 16:03             ` Thomas Gleixner
2026-03-16 17:13 ` [patch 5/8] futex: Add robust futex unlock IP range Thomas Gleixner
2026-03-16 18:36   ` Mathieu Desnoyers
2026-03-17 19:19   ` André Almeida
2026-03-16 17:13 ` [patch 6/8] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
2026-03-16 18:35   ` Mathieu Desnoyers
2026-03-16 20:29     ` Thomas Gleixner
2026-03-16 20:52       ` Mathieu Desnoyers
2026-03-16 17:13 ` [patch 7/8] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
2026-03-16 17:13 ` [patch 8/8] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
2026-03-16 19:19   ` Mathieu Desnoyers
2026-03-16 21:02     ` Thomas Gleixner
2026-03-16 22:35       ` Mathieu Desnoyers
2026-03-16 21:14     ` Thomas Gleixner
2026-03-16 21:29     ` Thomas Gleixner
2026-03-17  7:25   ` Thomas Weißschuh
2026-03-17  9:51     ` Thomas Gleixner
2026-03-17 11:17       ` Thomas Weißschuh
2026-03-18 16:17         ` Thomas Gleixner
2026-03-19  7:41           ` Thomas Weißschuh
2026-03-19  8:53             ` Florian Weimer
2026-03-19  9:04               ` Thomas Weißschuh
2026-03-19  9:08               ` Peter Zijlstra
2026-03-19 23:31                 ` Thomas Gleixner
2026-03-19 10:36             ` Sebastian Andrzej Siewior
2026-03-19 10:49               ` Thomas Weißschuh
2026-03-19 10:55                 ` Sebastian Andrzej Siewior
2026-03-17  8:28   ` Florian Weimer
2026-03-17  9:36     ` Thomas Gleixner
2026-03-17 10:37       ` Florian Weimer
2026-03-17 22:32         ` Thomas Gleixner [this message]
2026-03-18 22:08           ` Thomas Gleixner
2026-03-18 22:10             ` Peter Zijlstra
2026-03-19  2:05             ` André Almeida
2026-03-19  7:10               ` Thomas Gleixner
2026-03-17 15:33   ` Uros Bizjak
2026-03-18  8:21     ` Thomas Gleixner
2026-03-18  8:32       ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zf46m6j1.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=andrealmeid@igalia.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=carlos@redhat.com \
    --cc=dalias@aerifal.cx \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox