The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [patch V5 00/16] futex: Address the robust futex unlock race for real
@ 2026-06-02  9:09 Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
                   ` (15 more replies)
  0 siblings, 16 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

This is a follow up to v4 which can be found here:

     https://lore.kernel.org/20260402151131.876492985@kernel.org

The v1 cover letter contains a detailed analysis of the underlying
problem:

    https://lore.kernel.org/20260316162316.356674433@kernel.org

TLDR:

The robust futex unlock mechanism is racy in respect to the clearing of the
robust_list_head::list_op_pending pointer because unlock and clearing the
pointer are not atomic. The race window is between the unlock and clearing
the pending op pointer. If the task is forced to exit in this window, exit
will access a potentially invalid pending op pointer when cleaning up the
robust list. That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF. In the
worst case this UAF can lead to memory corruption when unrelated content
has been mapped to the same address by the time the access happens.

User space can't solve this problem without help from the kernel. This
series provides the kernel side infrastructure to help it along:

  1) Combined unlock, pointer clearing, wake-up for the contended case

  2) VDSO based unlock and pointer clearing helpers with a fix-up function
     in the kernel when user space was interrupted within the critical
     section.

Both ensure that the pointer clearing happens _before_ a task exits and the
kernel cleans up the robust list during the exit procedure.

Changes since v4:

   - Fixed the build fails caused by using __percpu in the types
     header, which unearthed a nasty header inclusion order and works
     by chance problem. Reported by 0-day and Nam

   - Changed the range initializer to take the full address - Andre

   - Picked up tags where appropriate

   - Added the debug vdso image to sysfs. This allows CI and general
     debugging to find the debug.so in a uniform place. The patch is marked
     RFC and can be discussed and applied/rejected separately of the
     rest. The images are small enough and only one instance of each
     variant is required, so the memory overhead is minimal.

The delta patch (w/o the RFC VDSO sysfs change) against the previous
version can be found below.

The series applies on v7.1-rc2 and is also available via git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git locking-futex-v5

Thanks,

	tglx
---
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 721b652ffb65..937639edc295 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -71,7 +71,8 @@ USER_CFLAGS = $(patsubst $(KERNEL_DEFINES),,$(patsubst -I%,,$(KBUILD_CFLAGS))) \
 		-D_FILE_OFFSET_BITS=64 -idirafter $(srctree)/include \
 		-idirafter $(objtree)/include -D__KERNEL__ -D__UM_HOST__ \
 		-include $(srctree)/include/linux/compiler-version.h \
-		-include $(srctree)/include/linux/kconfig.h
+		-include $(srctree)/include/linux/kconfig.h \
+		-idirafter $(ARCH_DIR)/include/generated
 
 #This will adjust *FLAGS accordingly to the platform.
 include $(srctree)/$(ARCH_DIR)/Makefile-os-Linux
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..e91ba12b7ffc 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += module.h
 generic-y += module.lds.h
 generic-y += parport.h
 generic-y += percpu.h
+generic-y += percpu_types.h
 generic-y += preempt.h
 generic-y += runtime-const.h
 generic-y += softirq_stack.h
diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
index dba54745b355..454f059278e4 100644
--- a/arch/x86/entry/vdso/common/vfutex.c
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -32,7 +32,7 @@
 #define CLEAR_POPQ		"movq	%[zero],  %a[pop]\n"
 #define CLEAR_POPL		"movl	%k[zero], %a[pop]\n"
 
-#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)	\
+#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)\
 ({									\
 	asm volatile (							\
 		"						\n"	\
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b835f6b0f53..9a953e7c76db 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -85,14 +85,14 @@ static void vdso_futex_robust_unlock_update_ips(void)
 	futex_reset_cs_ranges(fd);
 
 #ifdef CONFIG_X86_64
-	futex_set_vdso_cs_range(fd, idx, vdso, image->sym___futex_list64_try_unlock_cs_start,
-				image->sym___futex_list64_try_unlock_cs_end, false);
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list64_try_unlock_cs_start,
+				vdso + image->sym___futex_list64_try_unlock_cs_end, false);
 	idx++;
 #endif /* CONFIG_X86_64 */
 
 #if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
-	futex_set_vdso_cs_range(fd, idx, vdso, image->sym___futex_list32_try_unlock_cs_start,
-				image->sym___futex_list32_try_unlock_cs_end, true);
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list32_try_unlock_cs_start,
+				vdso + image->sym___futex_list32_try_unlock_cs_end, true);
 #endif /* CONFIG_X86_32 || CONFIG_COMPAT */
 }
 #else
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 409981468cba..cef9a4ca9841 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -40,12 +40,10 @@
 #endif
 
 #define __percpu_prefix
-#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
 
 #else /* !CONFIG_CC_HAS_NAMED_AS: */
 
 #define __percpu_prefix		__force_percpu_prefix
-#define __percpu_seg_override
 
 #endif /* CONFIG_CC_HAS_NAMED_AS */
 
@@ -82,7 +80,6 @@
 
 #define __force_percpu_prefix
 #define __percpu_prefix
-#define __percpu_seg_override
 
 #define PER_CPU_VAR(var)	(var)__percpu_rel
 
@@ -92,8 +89,6 @@
 # define __my_cpu_type(var)	typeof(var)
 # define __my_cpu_ptr(ptr)	(ptr)
 # define __my_cpu_var(var)	(var)
-
-# define __percpu_qual		__percpu_seg_override
 #else
 # define __my_cpu_type(var)	typeof(var) __percpu_seg_override
 # define __my_cpu_ptr(ptr)	(__my_cpu_type(*(ptr))*)(__force uintptr_t)(ptr)
diff --git a/arch/x86/include/asm/percpu_types.h b/arch/x86/include/asm/percpu_types.h
new file mode 100644
index 000000000000..0aa3e47a3643
--- /dev/null
+++ b/arch/x86/include/asm/percpu_types.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PERCPU_TYPES_H
+#define _ASM_X86_PERCPU_TYPES_H
+
+#if defined(CONFIG_SMP) && defined(CONFIG_CC_HAS_NAMED_AS)
+#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
+#else /* !CONFIG_CC_HAS_NAMED_AS: */
+#define __percpu_seg_override
+#endif
+
+#if defined(CONFIG_USE_X86_SEG_SUPPORT) && defined(USE_TYPEOF_UNQUAL)
+#define __percpu_qual		__percpu_seg_override
+#endif
+
+#include <asm-generic/percpu_types.h>
+
+#endif
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 2c53a1e0b760..15df9dcb42a5 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -44,6 +44,7 @@ mandatory-y += module.lds.h
 mandatory-y += msi.h
 mandatory-y += pci.h
 mandatory-y += percpu.h
+mandatory-y += percpu_types.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
 mandatory-y += rqspinlock.h
diff --git a/include/asm-generic/percpu_types.h b/include/asm-generic/percpu_types.h
new file mode 100644
index 000000000000..a095cea7fa20
--- /dev/null
+++ b/include/asm-generic/percpu_types.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_PERCPU_TYPES_H_
+#define _ASM_GENERIC_PERCPU_TYPES_H_
+
+#ifndef __ASSEMBLER__
+/*
+ * __percpu_qual is the qualifier for the percpu named address space.
+ *
+ * Most architectures use generic named address space for percpu variables but
+ * some architectures define percpu variables in different named address space.
+ * E.g. on x86, percpu variable may be declared as being relative to the %fs or
+ * %gs segments using __seg_fs or __seg_gs named address space qualifier.
+ */
+#ifndef __percpu_qual
+# define __percpu_qual
+#endif
+
+#endif /* __ASSEMBLER__ */
+#endif /* _ASM_GENERIC_PERCPU_TYPES_H_ */
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index e8fd77593b68..7ad37adda1dd 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -634,6 +634,9 @@ struct ftrace_likely_data {
 #else
 #define __unqual_scalar_typeof(x) __typeof_unqual__(x)
 #endif
+
+#include <asm/percpu_types.h>
+
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 33524dfb3fe4..51f4ccdc9092 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -142,10 +142,9 @@ static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
 }
 
 static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
-					   unsigned long vdso, unsigned long start,
-					   unsigned long end, bool sz32)
+					   unsigned long start, unsigned long end, bool sz32)
 {
-	fd->unlock.cs_ranges[idx].start_ip = vdso + start;
+	fd->unlock.cs_ranges[idx].start_ip = start;
 	fd->unlock.cs_ranges[idx].len = end - start;
 	fd->unlock.cs_ranges[idx].pop_size32 = sz32;
 }
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index 288666fb37b6..d320c0571f0c 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -3,6 +3,7 @@
 #define _LINUX_FUTEX_TYPES_H
 
 #ifdef CONFIG_FUTEX
+#include <linux/compiler_types.h>
 #include <linux/mutex_types.h>
 #include <linux/types.h>
 
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 85bf8dd9f087..2f5a889aa50d 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -3,13 +3,14 @@
 #define __LINUX_PERCPU_H
 
 #include <linux/alloc_tag.h>
+#include <linux/cleanup.h>
+#include <linux/compiler_types.h>
+#include <linux/init.h>
 #include <linux/mmdebug.h>
-#include <linux/preempt.h>
-#include <linux/smp.h>
 #include <linux/pfn.h>
-#include <linux/init.h>
-#include <linux/cleanup.h>
+#include <linux/preempt.h>
 #include <linux/sched.h>
+#include <linux/smp.h>
 
 #include <asm/percpu.h>
 
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index d4080e7834dc..6ea4a97796a1 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1510,7 +1510,7 @@ void futex_exit_recursive(struct task_struct *tsk)
 }
 
 static void futex_cleanup_begin(struct task_struct *tsk)
-	__acquires(&tsk->futex_exit_mutex)
+	__acquires(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Prevent various race issues against a concurrent incoming waiter
@@ -1537,7 +1537,7 @@ static void futex_cleanup_begin(struct task_struct *tsk)
 }
 
 static void futex_cleanup_end(struct task_struct *tsk, int state)
-	__releases(&tsk->futex_exit_mutex)
+	__releases(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Lockless store. The only side effect is that an observer might
diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index 43059f6dbc40..b3fab60181d5 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -45,7 +45,7 @@
 
 #define SLEEP_US 100
 
-#if UINTPTR_MAX == 0xffffffffffffffff
+#if __SIZEOF_LONG__ == 8
 # define BUILD_64
 #endif
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [patch V5 01/16] percpu: Sanitize __percpu_qual include hell
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 02/16] futex: Move futex task related data into a struct Thomas Gleixner
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Slapping __percpu_qual into the next available header is sloppy at best.

It's required by __percpu which is defined in compiler_types.h and that is
meant to be included without requiring a boatload of other headers so that
a struct or function declaration can contain a __percpu qualifier w/o
further prerequisites.

This implicit dependency on linux/percpu.h makes that impossible and causes
a major problem when trying to separate headers.

Create asm/percpu_types.h and move it there. Include that from
compiler_types.h and the whole recursion problem goes away.

Fix up UM so it uses the generic header and includes it in the UM_HOST
build, which pulls in compiler_types.h. The USER_CFLAGS fix was suggested
by Richard.

Signed-off-by: Thomas Gleixner <tglx@kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Weinberger <richard@nod.at>
---
V5: New. Address 0-day __percpu fallout
---
 arch/um/Makefile                    |    3 ++-
 arch/um/include/asm/Kbuild          |    1 +
 arch/x86/include/asm/percpu.h       |    5 -----
 arch/x86/include/asm/percpu_types.h |   17 +++++++++++++++++
 include/asm-generic/Kbuild          |    1 +
 include/asm-generic/percpu_types.h  |   19 +++++++++++++++++++
 include/linux/compiler_types.h      |    3 +++
 include/linux/percpu.h              |    9 +++++----
 8 files changed, 48 insertions(+), 10 deletions(-)

--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -71,7 +71,8 @@ USER_CFLAGS = $(patsubst $(KERNEL_DEFINE
 		-D_FILE_OFFSET_BITS=64 -idirafter $(srctree)/include \
 		-idirafter $(objtree)/include -D__KERNEL__ -D__UM_HOST__ \
 		-include $(srctree)/include/linux/compiler-version.h \
-		-include $(srctree)/include/linux/kconfig.h
+		-include $(srctree)/include/linux/kconfig.h \
+		-idirafter $(ARCH_DIR)/include/generated
 
 #This will adjust *FLAGS accordingly to the platform.
 include $(srctree)/$(ARCH_DIR)/Makefile-os-Linux
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += module.h
 generic-y += module.lds.h
 generic-y += parport.h
 generic-y += percpu.h
+generic-y += percpu_types.h
 generic-y += preempt.h
 generic-y += runtime-const.h
 generic-y += softirq_stack.h
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -40,12 +40,10 @@
 #endif
 
 #define __percpu_prefix
-#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
 
 #else /* !CONFIG_CC_HAS_NAMED_AS: */
 
 #define __percpu_prefix		__force_percpu_prefix
-#define __percpu_seg_override
 
 #endif /* CONFIG_CC_HAS_NAMED_AS */
 
@@ -82,7 +80,6 @@
 
 #define __force_percpu_prefix
 #define __percpu_prefix
-#define __percpu_seg_override
 
 #define PER_CPU_VAR(var)	(var)__percpu_rel
 
@@ -92,8 +89,6 @@
 # define __my_cpu_type(var)	typeof(var)
 # define __my_cpu_ptr(ptr)	(ptr)
 # define __my_cpu_var(var)	(var)
-
-# define __percpu_qual		__percpu_seg_override
 #else
 # define __my_cpu_type(var)	typeof(var) __percpu_seg_override
 # define __my_cpu_ptr(ptr)	(__my_cpu_type(*(ptr))*)(__force uintptr_t)(ptr)
--- /dev/null
+++ b/arch/x86/include/asm/percpu_types.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PERCPU_TYPES_H
+#define _ASM_X86_PERCPU_TYPES_H
+
+#if defined(CONFIG_SMP) && defined(CONFIG_CC_HAS_NAMED_AS)
+#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
+#else /* !CONFIG_CC_HAS_NAMED_AS: */
+#define __percpu_seg_override
+#endif
+
+#if defined(CONFIG_USE_X86_SEG_SUPPORT) && defined(USE_TYPEOF_UNQUAL)
+#define __percpu_qual		__percpu_seg_override
+#endif
+
+#include <asm-generic/percpu_types.h>
+
+#endif
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -44,6 +44,7 @@ mandatory-y += module.lds.h
 mandatory-y += msi.h
 mandatory-y += pci.h
 mandatory-y += percpu.h
+mandatory-y += percpu_types.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
 mandatory-y += rqspinlock.h
--- /dev/null
+++ b/include/asm-generic/percpu_types.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_PERCPU_TYPES_H_
+#define _ASM_GENERIC_PERCPU_TYPES_H_
+
+#ifndef __ASSEMBLER__
+/*
+ * __percpu_qual is the qualifier for the percpu named address space.
+ *
+ * Most architectures use generic named address space for percpu variables but
+ * some architectures define percpu variables in different named address space.
+ * E.g. on x86, percpu variable may be declared as being relative to the %fs or
+ * %gs segments using __seg_fs or __seg_gs named address space qualifier.
+ */
+#ifndef __percpu_qual
+# define __percpu_qual
+#endif
+
+#endif /* __ASSEMBLER__ */
+#endif /* _ASM_GENERIC_PERCPU_TYPES_H_ */
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -634,6 +634,9 @@ struct ftrace_likely_data {
 #else
 #define __unqual_scalar_typeof(x) __typeof_unqual__(x)
 #endif
+
+#include <asm/percpu_types.h>
+
 #endif /* !__ASSEMBLY__ */
 
 /*
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -3,13 +3,14 @@
 #define __LINUX_PERCPU_H
 
 #include <linux/alloc_tag.h>
+#include <linux/cleanup.h>
+#include <linux/compiler_types.h>
+#include <linux/init.h>
 #include <linux/mmdebug.h>
-#include <linux/preempt.h>
-#include <linux/smp.h>
 #include <linux/pfn.h>
-#include <linux/init.h>
-#include <linux/cleanup.h>
+#include <linux/preempt.h>
 #include <linux/sched.h>
+#include <linux/smp.h>
 
 #include <asm/percpu.h>
 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 02/16] futex: Move futex task related data into a struct
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 03/16] futex: Make futex_mm_init() void Thomas Gleixner
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Having all these members in task_struct along with the required #ifdeffery
is annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V2: Rename the struct and add the missing kernel doc - Andre
---
 Documentation/locking/robust-futexes.rst |    8 ++--
 include/linux/futex.h                    |   12 ++----
 include/linux/futex_types.h              |   36 ++++++++++++++++++
 include/linux/sched.h                    |   16 ++------
 kernel/exit.c                            |    4 +-
 kernel/futex/core.c                      |   59 +++++++++++++++----------------
 kernel/futex/pi.c                        |   26 ++++++-------
 kernel/futex/syscalls.c                  |   23 ++++--------
 8 files changed, 101 insertions(+), 83 deletions(-)

--- a/Documentation/locking/robust-futexes.rst
+++ b/Documentation/locking/robust-futexes.rst
@@ -94,7 +94,7 @@ time, the kernel checks this user-space
 locks to be cleaned up?
 
 In the common case, at do_exit() time, there is no list registered, so
-the cost of robust futexes is just a simple current->robust_list != NULL
+the cost of robust futexes is just a current->futex.robust_list != NULL
 comparison. If the thread has registered a list, then normally the list
 is empty. If the thread/process crashed or terminated in some incorrect
 way then the list might be non-empty: in this case the kernel carefully
@@ -178,9 +178,9 @@ The patch adds two new syscalls: one to
                      size_t __user *len_ptr);
 
 List registration is very fast: the pointer is simply stored in
-current->robust_list. [Note that in the future, if robust futexes become
-widespread, we could extend sys_clone() to register a robust-list head
-for new threads, without the need of another syscall.]
+current->futex.robust_list. [Note that in the future, if robust futexes
+become widespread, we could extend sys_clone() to register a robust-list
+head for new threads, without the need of another syscall.]
 
 So there is virtually zero overhead for tasks not using robust futexes,
 and even for robust futex users, there is only one extra syscall per
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -64,14 +64,10 @@ enum {
 
 static inline void futex_init_task(struct task_struct *tsk)
 {
-	tsk->robust_list = NULL;
-#ifdef CONFIG_COMPAT
-	tsk->compat_robust_list = NULL;
-#endif
-	INIT_LIST_HEAD(&tsk->pi_state_list);
-	tsk->pi_state_cache = NULL;
-	tsk->futex_state = FUTEX_STATE_OK;
-	mutex_init(&tsk->futex_exit_mutex);
+	memset(&tsk->futex, 0, sizeof(tsk->futex));
+	INIT_LIST_HEAD(&tsk->futex.pi_state_list);
+	tsk->futex.state = FUTEX_STATE_OK;
+	mutex_init(&tsk->futex.exit_mutex);
 }
 
 void futex_exit_recursive(struct task_struct *tsk);
--- /dev/null
+++ b/include/linux/futex_types.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_FUTEX_TYPES_H
+#define _LINUX_FUTEX_TYPES_H
+
+#ifdef CONFIG_FUTEX
+#include <linux/mutex_types.h>
+#include <linux/types.h>
+
+struct compat_robust_list_head;
+struct futex_pi_state;
+struct robust_list_head;
+
+/**
+ * struct futex_sched_data - Futex related per task data
+ * @robust_list:	User space registered robust list pointer
+ * @compat_robust_list:	User space registered robust list pointer for compat tasks
+ * @pi_state_list:	List head for Priority Inheritance (PI) state management
+ * @pi_state_cache:	Pointer to cache one PI state object per task
+ * @exit_mutex:		Mutex for serializing exit
+ * @state:		Futex handling state to handle exit races correctly
+ */
+struct futex_sched_data {
+	struct robust_list_head __user		*robust_list;
+#ifdef CONFIG_COMPAT
+	struct compat_robust_list_head __user	*compat_robust_list;
+#endif
+	struct list_head			pi_state_list;
+	struct futex_pi_state			*pi_state_cache;
+	struct mutex				exit_mutex;
+	unsigned int				state;
+};
+#else
+struct futex_sched_data { };
+#endif /* !CONFIG_FUTEX */
+
+#endif /* _LINUX_FUTEX_TYPES_H */
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -16,6 +16,7 @@
 #include <linux/cpumask_types.h>
 
 #include <linux/cache.h>
+#include <linux/futex_types.h>
 #include <linux/irqflags_types.h>
 #include <linux/smp_types.h>
 #include <linux/pid_types.h>
@@ -64,7 +65,6 @@ struct bpf_net_context;
 struct capture_control;
 struct cfs_rq;
 struct fs_struct;
-struct futex_pi_state;
 struct io_context;
 struct io_uring_task;
 struct mempolicy;
@@ -76,7 +76,6 @@ struct pid_namespace;
 struct pipe_inode_info;
 struct rcu_node;
 struct reclaim_state;
-struct robust_list_head;
 struct root_domain;
 struct rq;
 struct sched_attr;
@@ -1331,16 +1330,9 @@ struct task_struct {
 	u32				closid;
 	u32				rmid;
 #endif
-#ifdef CONFIG_FUTEX
-	struct robust_list_head __user	*robust_list;
-#ifdef CONFIG_COMPAT
-	struct compat_robust_list_head __user *compat_robust_list;
-#endif
-	struct list_head		pi_state_list;
-	struct futex_pi_state		*pi_state_cache;
-	struct mutex			futex_exit_mutex;
-	unsigned int			futex_state;
-#endif
+
+	struct futex_sched_data		futex;
+
 #ifdef CONFIG_PERF_EVENTS
 	u8				perf_recursion[PERF_NR_CONTEXTS];
 	struct perf_event_context	*perf_event_ctxp;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -988,8 +988,8 @@ void __noreturn do_exit(long code)
 	proc_exit_connector(tsk);
 	mpol_put_task_policy(tsk);
 #ifdef CONFIG_FUTEX
-	if (unlikely(current->pi_state_cache))
-		kfree(current->pi_state_cache);
+	if (unlikely(current->futex.pi_state_cache))
+		kfree(current->futex.pi_state_cache);
 #endif
 	/*
 	 * Make sure we are holding no locks:
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -32,18 +32,19 @@
  *  "But they come in a choice of three flavours!"
  */
 #include <linux/compat.h>
-#include <linux/jhash.h>
-#include <linux/pagemap.h>
 #include <linux/debugfs.h>
-#include <linux/plist.h>
+#include <linux/fault-inject.h>
 #include <linux/gfp.h>
-#include <linux/vmalloc.h>
+#include <linux/jhash.h>
 #include <linux/memblock.h>
-#include <linux/fault-inject.h>
-#include <linux/slab.h>
-#include <linux/prctl.h>
 #include <linux/mempolicy.h>
 #include <linux/mmap_lock.h>
+#include <linux/pagemap.h>
+#include <linux/plist.h>
+#include <linux/prctl.h>
+#include <linux/rseq.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
 
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
@@ -829,7 +830,7 @@ void wait_for_owner_exiting(int ret, str
 	if (WARN_ON_ONCE(ret == -EBUSY && !exiting))
 		return;
 
-	mutex_lock(&exiting->futex_exit_mutex);
+	mutex_lock(&exiting->futex.exit_mutex);
 	/*
 	 * No point in doing state checking here. If the waiter got here
 	 * while the task was in exec()->exec_futex_release() then it can
@@ -838,7 +839,7 @@ void wait_for_owner_exiting(int ret, str
 	 * already. Highly unlikely and not a problem. Just one more round
 	 * through the futex maze.
 	 */
-	mutex_unlock(&exiting->futex_exit_mutex);
+	mutex_unlock(&exiting->futex.exit_mutex);
 
 	put_task_struct(exiting);
 }
@@ -1047,7 +1048,7 @@ static int handle_futex_death(u32 __user
 	 *
 	 * In both cases the following conditions are met:
 	 *
-	 *	1) task->robust_list->list_op_pending != NULL
+	 *	1) task->futex.robust_list->list_op_pending != NULL
 	 *	   @pending_op == true
 	 *	2) The owner part of user space futex value == 0
 	 *	3) Regular futex: @pi == false
@@ -1152,7 +1153,7 @@ static inline int fetch_robust_entry(str
  */
 static void exit_robust_list(struct task_struct *curr)
 {
-	struct robust_list_head __user *head = curr->robust_list;
+	struct robust_list_head __user *head = curr->futex.robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1246,7 +1247,7 @@ compat_fetch_robust_entry(compat_uptr_t
  */
 static void compat_exit_robust_list(struct task_struct *curr)
 {
-	struct compat_robust_list_head __user *head = curr->compat_robust_list;
+	struct compat_robust_list_head __user *head = curr->futex.compat_robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1322,7 +1323,7 @@ static void compat_exit_robust_list(stru
  */
 static void exit_pi_state_list(struct task_struct *curr)
 {
-	struct list_head *next, *head = &curr->pi_state_list;
+	struct list_head *next, *head = &curr->futex.pi_state_list;
 	struct futex_pi_state *pi_state;
 	union futex_key key = FUTEX_KEY_INIT;
 
@@ -1406,19 +1407,19 @@ static inline void exit_pi_state_list(st
 
 static void futex_cleanup(struct task_struct *tsk)
 {
-	if (unlikely(tsk->robust_list)) {
+	if (unlikely(tsk->futex.robust_list)) {
 		exit_robust_list(tsk);
-		tsk->robust_list = NULL;
+		tsk->futex.robust_list = NULL;
 	}
 
 #ifdef CONFIG_COMPAT
-	if (unlikely(tsk->compat_robust_list)) {
+	if (unlikely(tsk->futex.compat_robust_list)) {
 		compat_exit_robust_list(tsk);
-		tsk->compat_robust_list = NULL;
+		tsk->futex.compat_robust_list = NULL;
 	}
 #endif
 
-	if (unlikely(!list_empty(&tsk->pi_state_list)))
+	if (unlikely(!list_empty(&tsk->futex.pi_state_list)))
 		exit_pi_state_list(tsk);
 }
 
@@ -1442,23 +1443,23 @@ static void futex_cleanup(struct task_st
 void futex_exit_recursive(struct task_struct *tsk)
 {
 	/* If the state is FUTEX_STATE_EXITING then futex_exit_mutex is held */
-	if (tsk->futex_state == FUTEX_STATE_EXITING) {
-		__assume_ctx_lock(&tsk->futex_exit_mutex);
-		mutex_unlock(&tsk->futex_exit_mutex);
+	if (tsk->futex.state == FUTEX_STATE_EXITING) {
+		__assume_ctx_lock(&tsk->futex.exit_mutex);
+		mutex_unlock(&tsk->futex.exit_mutex);
 	}
-	tsk->futex_state = FUTEX_STATE_DEAD;
+	tsk->futex.state = FUTEX_STATE_DEAD;
 }
 
 static void futex_cleanup_begin(struct task_struct *tsk)
-	__acquires(&tsk->futex_exit_mutex)
+	__acquires(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Prevent various race issues against a concurrent incoming waiter
 	 * including live locks by forcing the waiter to block on
-	 * tsk->futex_exit_mutex when it observes FUTEX_STATE_EXITING in
+	 * tsk->futex.exit_mutex when it observes FUTEX_STATE_EXITING in
 	 * attach_to_pi_owner().
 	 */
-	mutex_lock(&tsk->futex_exit_mutex);
+	mutex_lock(&tsk->futex.exit_mutex);
 
 	/*
 	 * Switch the state to FUTEX_STATE_EXITING under tsk->pi_lock.
@@ -1472,23 +1473,23 @@ static void futex_cleanup_begin(struct t
 	 * be observed in exit_pi_state_list().
 	 */
 	raw_spin_lock_irq(&tsk->pi_lock);
-	tsk->futex_state = FUTEX_STATE_EXITING;
+	tsk->futex.state = FUTEX_STATE_EXITING;
 	raw_spin_unlock_irq(&tsk->pi_lock);
 }
 
 static void futex_cleanup_end(struct task_struct *tsk, int state)
-	__releases(&tsk->futex_exit_mutex)
+	__releases(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Lockless store. The only side effect is that an observer might
 	 * take another loop until it becomes visible.
 	 */
-	tsk->futex_state = state;
+	tsk->futex.state = state;
 	/*
 	 * Drop the exit protection. This unblocks waiters which observed
 	 * FUTEX_STATE_EXITING to reevaluate the state.
 	 */
-	mutex_unlock(&tsk->futex_exit_mutex);
+	mutex_unlock(&tsk->futex.exit_mutex);
 }
 
 void futex_exec_release(struct task_struct *tsk)
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -14,7 +14,7 @@ int refill_pi_state_cache(void)
 {
 	struct futex_pi_state *pi_state;
 
-	if (likely(current->pi_state_cache))
+	if (likely(current->futex.pi_state_cache))
 		return 0;
 
 	pi_state = kzalloc_obj(*pi_state);
@@ -28,17 +28,17 @@ int refill_pi_state_cache(void)
 	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
-	current->pi_state_cache = pi_state;
+	current->futex.pi_state_cache = pi_state;
 
 	return 0;
 }
 
 static struct futex_pi_state *alloc_pi_state(void)
 {
-	struct futex_pi_state *pi_state = current->pi_state_cache;
+	struct futex_pi_state *pi_state = current->futex.pi_state_cache;
 
 	WARN_ON(!pi_state);
-	current->pi_state_cache = NULL;
+	current->futex.pi_state_cache = NULL;
 
 	return pi_state;
 }
@@ -60,7 +60,7 @@ static void pi_state_update_owner(struct
 	if (new_owner) {
 		raw_spin_lock(&new_owner->pi_lock);
 		WARN_ON(!list_empty(&pi_state->list));
-		list_add(&pi_state->list, &new_owner->pi_state_list);
+		list_add(&pi_state->list, &new_owner->futex.pi_state_list);
 		pi_state->owner = new_owner;
 		raw_spin_unlock(&new_owner->pi_lock);
 	}
@@ -96,7 +96,7 @@ void put_pi_state(struct futex_pi_state
 		raw_spin_unlock_irqrestore(&pi_state->pi_mutex.wait_lock, flags);
 	}
 
-	if (current->pi_state_cache) {
+	if (current->futex.pi_state_cache) {
 		kfree(pi_state);
 	} else {
 		/*
@@ -106,7 +106,7 @@ void put_pi_state(struct futex_pi_state
 		 */
 		pi_state->owner = NULL;
 		refcount_set(&pi_state->refcount, 1);
-		current->pi_state_cache = pi_state;
+		current->futex.pi_state_cache = pi_state;
 	}
 }
 
@@ -179,7 +179,7 @@ void put_pi_state(struct futex_pi_state
  *
  * p->pi_lock:
  *
- *	p->pi_state_list -> pi_state->list, relation
+ *	p->futex.pi_state_list -> pi_state->list, relation
  *	pi_mutex->owner -> pi_state->owner, relation
  *
  * pi_state->refcount:
@@ -327,7 +327,7 @@ static int handle_exit_race(u32 __user *
 	 * If the futex exit state is not yet FUTEX_STATE_DEAD, tell the
 	 * caller that the alleged owner is busy.
 	 */
-	if (tsk && tsk->futex_state != FUTEX_STATE_DEAD)
+	if (tsk && tsk->futex.state != FUTEX_STATE_DEAD)
 		return -EBUSY;
 
 	/*
@@ -346,8 +346,8 @@ static int handle_exit_race(u32 __user *
 	 *    *uaddr = 0xC0000000;	     tsk = get_task(PID);
 	 *   }				     if (!tsk->flags & PF_EXITING) {
 	 *  ...				       attach();
-	 *  tsk->futex_state =               } else {
-	 *	FUTEX_STATE_DEAD;              if (tsk->futex_state !=
+	 *  tsk->futex.state =               } else {
+	 *	FUTEX_STATE_DEAD;              if (tsk->futex.state !=
 	 *					  FUTEX_STATE_DEAD)
 	 *				         return -EAGAIN;
 	 *				       return -ESRCH; <--- FAIL
@@ -396,7 +396,7 @@ static void __attach_to_pi_owner(struct
 	pi_state->key = *key;
 
 	WARN_ON(!list_empty(&pi_state->list));
-	list_add(&pi_state->list, &p->pi_state_list);
+	list_add(&pi_state->list, &p->futex.pi_state_list);
 	/*
 	 * Assignment without holding pi_state->pi_mutex.wait_lock is safe
 	 * because there is no concurrency as the object is not published yet.
@@ -440,7 +440,7 @@ static int attach_to_pi_owner(u32 __user
 	 * in futex_exit_release(), we do this protected by p->pi_lock:
 	 */
 	raw_spin_lock_irq(&p->pi_lock);
-	if (unlikely(p->futex_state != FUTEX_STATE_OK)) {
+	if (unlikely(p->futex.state != FUTEX_STATE_OK)) {
 		/*
 		 * The task is on the way out. When the futex state is
 		 * FUTEX_STATE_DEAD, we know that the task has finished
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -25,17 +25,13 @@
  * @head:	pointer to the list-head
  * @len:	length of the list-head, as userspace expects
  */
-SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
-		size_t, len)
+SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head, size_t, len)
 {
-	/*
-	 * The kernel knows only one size for now:
-	 */
+	/* The kernel knows only one size for now. */
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->robust_list = head;
-
+	current->futex.robust_list = head;
 	return 0;
 }
 
@@ -43,9 +39,9 @@ static inline void __user *futex_task_ro
 {
 #ifdef CONFIG_COMPAT
 	if (compat)
-		return p->compat_robust_list;
+		return p->futex.compat_robust_list;
 #endif
-	return p->robust_list;
+	return p->futex.robust_list;
 }
 
 static void __user *futex_get_robust_list_common(int pid, bool compat)
@@ -475,15 +471,13 @@ SYSCALL_DEFINE4(futex_requeue,
 }
 
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE2(set_robust_list,
-		struct compat_robust_list_head __user *, head,
-		compat_size_t, len)
+COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head,
+		       compat_size_t, len)
 {
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->compat_robust_list = head;
-
+	current->futex.compat_robust_list = head;
 	return 0;
 }
 
@@ -523,4 +517,3 @@ SYSCALL_DEFINE6(futex_time32, u32 __user
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 #endif /* CONFIG_COMPAT_32BIT_TIME */
-


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 03/16] futex: Make futex_mm_init() void
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 02/16] futex: Move futex task related data into a struct Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 04/16] futex: Move futex related mm_struct data into a struct Thomas Gleixner
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Nothing fails there. Mop up the leftovers of the early version of this,
which did an allocation.

While at it clean up the stubs and the #ifdef comments to make the header
file readable.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
 include/linux/futex.h |   28 +++++++++++-----------------
 kernel/fork.c         |    8 ++------
 kernel/futex/core.c   |    3 +--
 3 files changed, 14 insertions(+), 25 deletions(-)

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -81,22 +81,20 @@ int futex_hash_prctl(unsigned long arg2,
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 int futex_hash_allocate_default(void);
 void futex_hash_free(struct mm_struct *mm);
-int futex_mm_init(struct mm_struct *mm);
-
-#else /* !CONFIG_FUTEX_PRIVATE_HASH */
+void futex_mm_init(struct mm_struct *mm);
+#else  /* CONFIG_FUTEX_PRIVATE_HASH */
 static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-#endif /* CONFIG_FUTEX_PRIVATE_HASH */
+static inline void futex_mm_init(struct mm_struct *mm) { }
+#endif /* !CONFIG_FUTEX_PRIVATE_HASH */
 
-#else /* !CONFIG_FUTEX */
+#else  /* CONFIG_FUTEX */
 static inline void futex_init_task(struct task_struct *tsk) { }
 static inline void futex_exit_recursive(struct task_struct *tsk) { }
 static inline void futex_exit_release(struct task_struct *tsk) { }
 static inline void futex_exec_release(struct task_struct *tsk) { }
-static inline long do_futex(u32 __user *uaddr, int op, u32 val,
-			    ktime_t *timeout, u32 __user *uaddr2,
-			    u32 val2, u32 val3)
+static inline long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
+			    u32 __user *uaddr2, u32 val2, u32 val3)
 {
 	return -EINVAL;
 }
@@ -104,13 +102,9 @@ static inline int futex_hash_prctl(unsig
 {
 	return -EINVAL;
 }
-static inline int futex_hash_allocate_default(void)
-{
-	return 0;
-}
+static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-
-#endif
+static inline void futex_mm_init(struct mm_struct *mm) { }
+#endif /* !CONFIG_FUTEX */
 
-#endif
+#endif /* _LINUX_FUTEX_H */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1101,6 +1101,7 @@ static struct mm_struct *mm_init(struct
 #endif
 	mm_init_uprobes_state(mm);
 	hugetlb_count_init(mm);
+	futex_mm_init(mm);
 
 	mm_flags_clear_all(mm);
 	if (current->mm) {
@@ -1113,11 +1114,8 @@ static struct mm_struct *mm_init(struct
 		mm->def_flags = 0;
 	}
 
-	if (futex_mm_init(mm))
-		goto fail_mm_init;
-
 	if (mm_alloc_pgd(mm))
-		goto fail_nopgd;
+		goto fail_mm_init;
 
 	if (mm_alloc_id(mm))
 		goto fail_noid;
@@ -1144,8 +1142,6 @@ static struct mm_struct *mm_init(struct
 	mm_free_id(mm);
 fail_noid:
 	mm_free_pgd(mm);
-fail_nopgd:
-	futex_hash_free(mm);
 fail_mm_init:
 	free_mm(mm);
 	return NULL;
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1720,7 +1720,7 @@ static bool futex_ref_is_dead(struct fut
 	return atomic_long_read(&mm->futex_atomic) == 0;
 }
 
-int futex_mm_init(struct mm_struct *mm)
+void futex_mm_init(struct mm_struct *mm)
 {
 	mutex_init(&mm->futex_hash_lock);
 	RCU_INIT_POINTER(mm->futex_phash, NULL);
@@ -1729,7 +1729,6 @@ int futex_mm_init(struct mm_struct *mm)
 	mm->futex_ref = NULL;
 	atomic_long_set(&mm->futex_atomic, 0);
 	mm->futex_batches = get_state_synchronize_rcu();
-	return 0;
 }
 
 void futex_hash_free(struct mm_struct *mm)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 04/16] futex: Move futex related mm_struct data into a struct
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (2 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 03/16] futex: Make futex_mm_init() void Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 05/16] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Having all these members in mm_struct along with the required #ifdeffery is
annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

The extra struct for the private hash is intentional to make integration of
other conditional mechanisms easier in terms of initialization and separation.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V5: Include compiler_types.h so that __percpu is defined - Nam
V3: Split out the private hash data
V2: Use an empty stub struct as for the others - Mathieu
---
 include/linux/futex_types.h |   36 +++++++++++
 include/linux/mm_types.h    |   12 ---
 kernel/futex/core.c         |  133 ++++++++++++++++++++------------------------
 3 files changed, 98 insertions(+), 83 deletions(-)

--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -3,6 +3,7 @@
 #define _LINUX_FUTEX_TYPES_H
 
 #ifdef CONFIG_FUTEX
+#include <linux/compiler_types.h>
 #include <linux/mutex_types.h>
 #include <linux/types.h>
 
@@ -29,8 +30,41 @@ struct futex_sched_data {
 	struct mutex				exit_mutex;
 	unsigned int				state;
 };
-#else
+
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+/**
+ * struct futex_mm_phash - Futex private hash related per MM data
+ * @lock:	Mutex to protect the private hash operations
+ * @hash:	RCU managed pointer to the private hash
+ * @hash_new:	Pointer to a newly allocated private hash
+ * @batches:	Batch state for RCU synchronization
+ * @rcu:	RCU head for call_rcu()
+ * @atomic:	Aggregate value for @hash_ref
+ * @ref:	Per CPU reference counter for a private hash
+ */
+struct futex_mm_phash {
+	struct mutex			lock;
+	struct futex_private_hash	__rcu *hash;
+	struct futex_private_hash	*hash_new;
+	unsigned long			batches;
+	struct rcu_head			rcu;
+	atomic_long_t			atomic;
+	unsigned int			__percpu *ref;
+};
+#else  /* CONFIG_FUTEX_ROBUST_UNLOCK */
+struct futex_mm_phash { };
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
+/**
+ * struct futex_mm_data - Futex related per MM data
+ * @phash:	Futex private hash related data
+ */
+struct futex_mm_data {
+	struct futex_mm_phash		phash;
+};
+#else  /* CONFIG_FUTEX */
 struct futex_sched_data { };
+struct futex_mm_data { };
 #endif /* !CONFIG_FUTEX */
 
 #endif /* _LINUX_FUTEX_TYPES_H */
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -20,6 +20,7 @@
 #include <linux/seqlock.h>
 #include <linux/percpu_counter.h>
 #include <linux/types.h>
+#include <linux/futex_types.h>
 #include <linux/rseq_types.h>
 #include <linux/bitmap.h>
 
@@ -1270,16 +1271,7 @@ struct mm_struct {
 		 */
 		seqcount_t mm_lock_seq;
 #endif
-#ifdef CONFIG_FUTEX_PRIVATE_HASH
-		struct mutex			futex_hash_lock;
-		struct futex_private_hash	__rcu *futex_phash;
-		struct futex_private_hash	*futex_phash_new;
-		/* futex-ref */
-		unsigned long			futex_batches;
-		struct rcu_head			futex_rcu;
-		atomic_long_t			futex_atomic;
-		unsigned int			__percpu *futex_ref;
-#endif
+		struct futex_mm_data	futex;
 
 		unsigned long hiwater_rss; /* High-watermark of RSS usage */
 		unsigned long hiwater_vm;  /* High-water virtual memory usage */
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -188,13 +188,13 @@ static struct futex_hash_bucket *
 		return NULL;
 
 	if (!fph)
-		fph = rcu_dereference(key->private.mm->futex_phash);
+		fph = rcu_dereference(key->private.mm->futex.phash.hash);
 	if (!fph || !fph->hash_mask)
 		return NULL;
 
-	hash = jhash2((void *)&key->private.address,
-		      sizeof(key->private.address) / 4,
+	hash = jhash2((void *)&key->private.address, sizeof(key->private.address) / 4,
 		      key->both.offset);
+
 	return &fph->queues[hash & fph->hash_mask];
 }
 
@@ -233,18 +233,17 @@ static void futex_rehash_private(struct
 	}
 }
 
-static bool __futex_pivot_hash(struct mm_struct *mm,
-			       struct futex_private_hash *new)
+static bool __futex_pivot_hash(struct mm_struct *mm, struct futex_private_hash *new)
 {
+	struct futex_mm_phash *mmph = &mm->futex.phash;
 	struct futex_private_hash *fph;
 
-	WARN_ON_ONCE(mm->futex_phash_new);
+	WARN_ON_ONCE(mmph->hash_new);
 
-	fph = rcu_dereference_protected(mm->futex_phash,
-					lockdep_is_held(&mm->futex_hash_lock));
+	fph = rcu_dereference_protected(mmph->hash, lockdep_is_held(&mmph->lock));
 	if (fph) {
 		if (!futex_ref_is_dead(fph)) {
-			mm->futex_phash_new = new;
+			mmph->hash_new = new;
 			return false;
 		}
 
@@ -252,8 +251,8 @@ static bool __futex_pivot_hash(struct mm
 	}
 	new->state = FR_PERCPU;
 	scoped_guard(rcu) {
-		mm->futex_batches = get_state_synchronize_rcu();
-		rcu_assign_pointer(mm->futex_phash, new);
+		mmph->batches = get_state_synchronize_rcu();
+		rcu_assign_pointer(mmph->hash, new);
 	}
 	kvfree_rcu(fph, rcu);
 	return true;
@@ -261,12 +260,12 @@ static bool __futex_pivot_hash(struct mm
 
 static void futex_pivot_hash(struct mm_struct *mm)
 {
-	scoped_guard(mutex, &mm->futex_hash_lock) {
+	scoped_guard(mutex, &mm->futex.phash.lock) {
 		struct futex_private_hash *fph;
 
-		fph = mm->futex_phash_new;
+		fph = mm->futex.phash.hash_new;
 		if (fph) {
-			mm->futex_phash_new = NULL;
+			mm->futex.phash.hash_new = NULL;
 			__futex_pivot_hash(mm, fph);
 		}
 	}
@@ -289,7 +288,7 @@ struct futex_private_hash *futex_private
 	scoped_guard(rcu) {
 		struct futex_private_hash *fph;
 
-		fph = rcu_dereference(mm->futex_phash);
+		fph = rcu_dereference(mm->futex.phash.hash);
 		if (!fph)
 			return NULL;
 
@@ -412,8 +411,7 @@ static int futex_mpol(struct mm_struct *
  * private hash) is returned if existing. Otherwise a hash bucket from the
  * global hash is returned.
  */
-static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph)
+static struct futex_hash_bucket *__futex_hash(union futex_key *key, struct futex_private_hash *fph)
 {
 	int node = key->both.node;
 	u32 hash;
@@ -426,8 +424,7 @@ static struct futex_hash_bucket *
 			return hb;
 	}
 
-	hash = jhash2((u32 *)key,
-		      offsetof(typeof(*key), both.offset) / sizeof(u32),
+	hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / sizeof(u32),
 		      key->both.offset);
 
 	if (node == FUTEX_NO_NODE) {
@@ -442,8 +439,7 @@ static struct futex_hash_bucket *
 		 */
 		node = (hash >> futex_hashshift) % nr_node_ids;
 		if (!node_possible(node)) {
-			node = find_next_bit_wrap(node_possible_map.bits,
-						  nr_node_ids, node);
+			node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
 		}
 	}
 
@@ -460,9 +456,8 @@ static struct futex_hash_bucket *
  * Return: Initialized hrtimer_sleeper structure or NULL if no timeout
  *	   value given
  */
-struct hrtimer_sleeper *
-futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
-		  int flags, u64 range_ns)
+struct hrtimer_sleeper *futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
+					  int flags, u64 range_ns)
 {
 	if (!time)
 		return NULL;
@@ -1554,17 +1549,17 @@ static void __futex_ref_atomic_begin(str
 	 * otherwise it would be impossible for it to have reported success
 	 * from futex_ref_is_dead().
 	 */
-	WARN_ON_ONCE(atomic_long_read(&mm->futex_atomic) != 0);
+	WARN_ON_ONCE(atomic_long_read(&mm->futex.phash.atomic) != 0);
 
 	/*
 	 * Set the atomic to the bias value such that futex_ref_{get,put}()
 	 * will never observe 0. Will be fixed up in __futex_ref_atomic_end()
 	 * when folding in the percpu count.
 	 */
-	atomic_long_set(&mm->futex_atomic, LONG_MAX);
+	atomic_long_set(&mm->futex.phash.atomic, LONG_MAX);
 	smp_store_release(&fph->state, FR_ATOMIC);
 
-	call_rcu_hurry(&mm->futex_rcu, futex_ref_rcu);
+	call_rcu_hurry(&mm->futex.phash.rcu, futex_ref_rcu);
 }
 
 static void __futex_ref_atomic_end(struct futex_private_hash *fph)
@@ -1585,7 +1580,7 @@ static void __futex_ref_atomic_end(struc
 	 * Therefore the per-cpu counter is now stable, sum and reset.
 	 */
 	for_each_possible_cpu(cpu) {
-		unsigned int *ptr = per_cpu_ptr(mm->futex_ref, cpu);
+		unsigned int *ptr = per_cpu_ptr(mm->futex.phash.ref, cpu);
 		count += *ptr;
 		*ptr = 0;
 	}
@@ -1593,7 +1588,7 @@ static void __futex_ref_atomic_end(struc
 	/*
 	 * Re-init for the next cycle.
 	 */
-	this_cpu_inc(*mm->futex_ref); /* 0 -> 1 */
+	this_cpu_inc(*mm->futex.phash.ref); /* 0 -> 1 */
 
 	/*
 	 * Add actual count, subtract bias and initial refcount.
@@ -1601,7 +1596,7 @@ static void __futex_ref_atomic_end(struc
 	 * The moment this atomic operation happens, futex_ref_is_dead() can
 	 * become true.
 	 */
-	ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex_atomic);
+	ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex.phash.atomic);
 	if (!ret)
 		wake_up_var(mm);
 
@@ -1611,8 +1606,8 @@ static void __futex_ref_atomic_end(struc
 
 static void futex_ref_rcu(struct rcu_head *head)
 {
-	struct mm_struct *mm = container_of(head, struct mm_struct, futex_rcu);
-	struct futex_private_hash *fph = rcu_dereference_raw(mm->futex_phash);
+	struct mm_struct *mm = container_of(head, struct mm_struct, futex.phash.rcu);
+	struct futex_private_hash *fph = rcu_dereference_raw(mm->futex.phash.hash);
 
 	if (fph->state == FR_PERCPU) {
 		/*
@@ -1641,7 +1636,7 @@ static void futex_ref_drop(struct futex_
 	/*
 	 * Can only transition the current fph;
 	 */
-	WARN_ON_ONCE(rcu_dereference_raw(mm->futex_phash) != fph);
+	WARN_ON_ONCE(rcu_dereference_raw(mm->futex.phash.hash) != fph);
 	/*
 	 * We enqueue at least one RCU callback. Ensure mm stays if the task
 	 * exits before the transition is completed.
@@ -1652,9 +1647,9 @@ static void futex_ref_drop(struct futex_
 	 * In order to avoid the following scenario:
 	 *
 	 * futex_hash()			__futex_pivot_hash()
-	 *   guard(rcu);		  guard(mm->futex_hash_lock);
-	 *   fph = mm->futex_phash;
-	 *				  rcu_assign_pointer(&mm->futex_phash, new);
+	 *   guard(rcu);		  guard(mm->futex.phash.lock);
+	 *   fph = mm->futex.phash.hash;
+	 *				  rcu_assign_pointer(&mm->futex.phash.hash, new);
 	 *				futex_hash_allocate()
 	 *				  futex_ref_drop()
 	 *				    fph->state = FR_ATOMIC;
@@ -1669,7 +1664,7 @@ static void futex_ref_drop(struct futex_
 	 * There must be at least one full grace-period between publishing a
 	 * new fph and trying to replace it.
 	 */
-	if (poll_state_synchronize_rcu(mm->futex_batches)) {
+	if (poll_state_synchronize_rcu(mm->futex.phash.batches)) {
 		/*
 		 * There was a grace-period, we can begin now.
 		 */
@@ -1677,7 +1672,7 @@ static void futex_ref_drop(struct futex_
 		return;
 	}
 
-	call_rcu_hurry(&mm->futex_rcu, futex_ref_rcu);
+	call_rcu_hurry(&mm->futex.phash.rcu, futex_ref_rcu);
 }
 
 static bool futex_ref_get(struct futex_private_hash *fph)
@@ -1687,11 +1682,11 @@ static bool futex_ref_get(struct futex_p
 	guard(preempt)();
 
 	if (READ_ONCE(fph->state) == FR_PERCPU) {
-		__this_cpu_inc(*mm->futex_ref);
+		__this_cpu_inc(*mm->futex.phash.ref);
 		return true;
 	}
 
-	return atomic_long_inc_not_zero(&mm->futex_atomic);
+	return atomic_long_inc_not_zero(&mm->futex.phash.atomic);
 }
 
 static bool futex_ref_put(struct futex_private_hash *fph)
@@ -1701,11 +1696,11 @@ static bool futex_ref_put(struct futex_p
 	guard(preempt)();
 
 	if (READ_ONCE(fph->state) == FR_PERCPU) {
-		__this_cpu_dec(*mm->futex_ref);
+		__this_cpu_dec(*mm->futex.phash.ref);
 		return false;
 	}
 
-	return atomic_long_dec_and_test(&mm->futex_atomic);
+	return atomic_long_dec_and_test(&mm->futex.phash.atomic);
 }
 
 static bool futex_ref_is_dead(struct futex_private_hash *fph)
@@ -1717,27 +1712,23 @@ static bool futex_ref_is_dead(struct fut
 	if (smp_load_acquire(&fph->state) == FR_PERCPU)
 		return false;
 
-	return atomic_long_read(&mm->futex_atomic) == 0;
+	return atomic_long_read(&mm->futex.phash.atomic) == 0;
 }
 
 void futex_mm_init(struct mm_struct *mm)
 {
-	mutex_init(&mm->futex_hash_lock);
-	RCU_INIT_POINTER(mm->futex_phash, NULL);
-	mm->futex_phash_new = NULL;
-	/* futex-ref */
-	mm->futex_ref = NULL;
-	atomic_long_set(&mm->futex_atomic, 0);
-	mm->futex_batches = get_state_synchronize_rcu();
+	memset(&mm->futex, 0, sizeof(mm->futex));
+	mutex_init(&mm->futex.phash.lock);
+	mm->futex.phash.batches = get_state_synchronize_rcu();
 }
 
 void futex_hash_free(struct mm_struct *mm)
 {
 	struct futex_private_hash *fph;
 
-	free_percpu(mm->futex_ref);
-	kvfree(mm->futex_phash_new);
-	fph = rcu_dereference_raw(mm->futex_phash);
+	free_percpu(mm->futex.phash.ref);
+	kvfree(mm->futex.phash.hash_new);
+	fph = rcu_dereference_raw(mm->futex.phash.hash);
 	if (fph)
 		kvfree(fph);
 }
@@ -1748,10 +1739,10 @@ static bool futex_pivot_pending(struct m
 
 	guard(rcu)();
 
-	if (!mm->futex_phash_new)
+	if (!mm->futex.phash.hash_new)
 		return true;
 
-	fph = rcu_dereference(mm->futex_phash);
+	fph = rcu_dereference(mm->futex.phash.hash);
 	return futex_ref_is_dead(fph);
 }
 
@@ -1793,7 +1784,7 @@ static int futex_hash_allocate(unsigned
 	 * Once we've disabled the global hash there is no way back.
 	 */
 	scoped_guard(rcu) {
-		fph = rcu_dereference(mm->futex_phash);
+		fph = rcu_dereference(mm->futex.phash.hash);
 		if (fph && !fph->hash_mask) {
 			if (custom)
 				return -EBUSY;
@@ -1801,15 +1792,15 @@ static int futex_hash_allocate(unsigned
 		}
 	}
 
-	if (!mm->futex_ref) {
+	if (!mm->futex.phash.ref) {
 		/*
 		 * This will always be allocated by the first thread and
 		 * therefore requires no locking.
 		 */
-		mm->futex_ref = alloc_percpu(unsigned int);
-		if (!mm->futex_ref)
+		mm->futex.phash.ref = alloc_percpu(unsigned int);
+		if (!mm->futex.phash.ref)
 			return -ENOMEM;
-		this_cpu_inc(*mm->futex_ref); /* 0 -> 1 */
+		this_cpu_inc(*mm->futex.phash.ref); /* 0 -> 1 */
 	}
 
 	fph = kvzalloc(struct_size(fph, queues, hash_slots),
@@ -1832,14 +1823,14 @@ static int futex_hash_allocate(unsigned
 		wait_var_event(mm, futex_pivot_pending(mm));
 	}
 
-	scoped_guard(mutex, &mm->futex_hash_lock) {
+	scoped_guard(mutex, &mm->futex.phash.lock) {
 		struct futex_private_hash *free __free(kvfree) = NULL;
 		struct futex_private_hash *cur, *new;
 
-		cur = rcu_dereference_protected(mm->futex_phash,
-						lockdep_is_held(&mm->futex_hash_lock));
-		new = mm->futex_phash_new;
-		mm->futex_phash_new = NULL;
+		cur = rcu_dereference_protected(mm->futex.phash.hash,
+						lockdep_is_held(&mm->futex.phash.lock));
+		new = mm->futex.phash.hash_new;
+		mm->futex.phash.hash_new = NULL;
 
 		if (fph) {
 			if (cur && !cur->hash_mask) {
@@ -1849,7 +1840,7 @@ static int futex_hash_allocate(unsigned
 				 * the second one returns here.
 				 */
 				free = fph;
-				mm->futex_phash_new = new;
+				mm->futex.phash.hash_new = new;
 				return -EBUSY;
 			}
 			if (cur && !new) {
@@ -1879,7 +1870,7 @@ static int futex_hash_allocate(unsigned
 
 		if (new) {
 			/*
-			 * Will set mm->futex_phash_new on failure;
+			 * Will set mm->futex.phash.new_hash on failure;
 			 * futex_private_hash_get() will try again.
 			 */
 			if (!__futex_pivot_hash(mm, new) && custom)
@@ -1898,11 +1889,9 @@ int futex_hash_allocate_default(void)
 		return 0;
 
 	scoped_guard(rcu) {
-		threads = min_t(unsigned int,
-				get_nr_threads(current),
-				num_online_cpus());
+		threads = min_t(unsigned int, get_nr_threads(current), num_online_cpus());
 
-		fph = rcu_dereference(current->mm->futex_phash);
+		fph = rcu_dereference(current->mm->futex.phash.hash);
 		if (fph) {
 			if (fph->custom)
 				return 0;
@@ -1929,7 +1918,7 @@ static int futex_hash_get_slots(void)
 	struct futex_private_hash *fph;
 
 	guard(rcu)();
-	fph = rcu_dereference(current->mm->futex_phash);
+	fph = rcu_dereference(current->mm->futex.phash.hash);
 	if (fph && fph->hash_mask)
 		return fph->hash_mask + 1;
 	return 0;


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 05/16] futex: Provide UABI defines for robust list entry modifiers
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (3 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 04/16] futex: Move futex related mm_struct data into a struct Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 06/16] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

The marker for PI futexes in the robust list is a hardcoded 0x1 which lacks
any sensible form of documentation.

Provide proper defines for the bit and the mask and fix up the usage
sites. Thereby convert the boolean pi argument into a modifier argument,
which allows new modifier bits to be trivially added and conveyed.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V2: Explain the code shuffling - Andre
---
 include/uapi/linux/futex.h |    4 +++
 kernel/futex/core.c        |   53 +++++++++++++++++++++------------------------
 2 files changed, 29 insertions(+), 28 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -177,6 +177,10 @@ struct robust_list_head {
  */
 #define ROBUST_LIST_LIMIT	2048
 
+/* Modifiers for robust_list_head::list_op_pending */
+#define FUTEX_ROBUST_MOD_PI		(0x1UL)
+#define FUTEX_ROBUST_MOD_MASK		(FUTEX_ROBUST_MOD_PI)
+
 /*
  * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a
  * match of any bit.
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1008,8 +1008,9 @@ void futex_unqueue_pi(struct futex_q *q)
  * dying task, and do notification if so:
  */
 static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
-			      bool pi, bool pending_op)
+			      unsigned int mod, bool pending_op)
 {
+	bool pi = !!(mod & FUTEX_ROBUST_MOD_PI);
 	u32 uval, nval, mval;
 	pid_t owner;
 	int err;
@@ -1127,21 +1128,21 @@ static int handle_futex_death(u32 __user
  */
 static inline int fetch_robust_entry(struct robust_list __user **entry,
 				     struct robust_list __user * __user *head,
-				     unsigned int *pi)
+				     unsigned int *mod)
 {
 	unsigned long uentry;
 
 	if (get_user(uentry, (unsigned long __user *)head))
 		return -EFAULT;
 
-	*entry = (void __user *)(uentry & ~1UL);
-	*pi = uentry & 1;
+	*entry = (void __user *)(uentry & ~FUTEX_ROBUST_MOD_MASK);
+	*mod = uentry & FUTEX_ROBUST_MOD_MASK;
 
 	return 0;
 }
 
 /*
- * Walk curr->robust_list (very carefully, it's a userspace list!)
+ * Walk curr->futex.robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
@@ -1149,9 +1150,8 @@ static inline int fetch_robust_entry(str
 static void exit_robust_list(struct task_struct *curr)
 {
 	struct robust_list_head __user *head = curr->futex.robust_list;
+	unsigned int limit = ROBUST_LIST_LIMIT, cur_mod, next_mod, pend_mod;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
-	unsigned int next_pi;
 	unsigned long futex_offset;
 	int rc;
 
@@ -1159,7 +1159,7 @@ static void exit_robust_list(struct task
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (fetch_robust_entry(&entry, &head->list.next, &pi))
+	if (fetch_robust_entry(&entry, &head->list.next, &cur_mod))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1170,7 +1170,7 @@ static void exit_robust_list(struct task
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (fetch_robust_entry(&pending, &head->list_op_pending, &pip))
+	if (fetch_robust_entry(&pending, &head->list_op_pending, &pend_mod))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1179,20 +1179,20 @@ static void exit_robust_list(struct task
 		 * Fetch the next entry in the list before calling
 		 * handle_futex_death:
 		 */
-		rc = fetch_robust_entry(&next_entry, &entry->next, &next_pi);
+		rc = fetch_robust_entry(&next_entry, &entry->next, &next_mod);
 		/*
 		 * A pending lock might already be on the list, so
 		 * don't process it twice:
 		 */
 		if (entry != pending) {
 			if (handle_futex_death((void __user *)entry + futex_offset,
-						curr, pi, HANDLE_DEATH_LIST))
+						curr, cur_mod, HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
 			return;
 		entry = next_entry;
-		pi = next_pi;
+		cur_mod = next_mod;
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
@@ -1204,7 +1204,7 @@ static void exit_robust_list(struct task
 
 	if (pending) {
 		handle_futex_death((void __user *)pending + futex_offset,
-				   curr, pip, HANDLE_DEATH_PENDING);
+				   curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
 
@@ -1223,29 +1223,28 @@ static void __user *futex_uaddr(struct r
  */
 static inline int
 compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
-		   compat_uptr_t __user *head, unsigned int *pi)
+		   compat_uptr_t __user *head, unsigned int *pflags)
 {
 	if (get_user(*uentry, head))
 		return -EFAULT;
 
-	*entry = compat_ptr((*uentry) & ~1);
-	*pi = (unsigned int)(*uentry) & 1;
+	*entry = compat_ptr((*uentry) & ~FUTEX_ROBUST_MOD_MASK);
+	*pflags = (unsigned int)(*uentry) & FUTEX_ROBUST_MOD_MASK;
 
 	return 0;
 }
 
 /*
- * Walk curr->robust_list (very carefully, it's a userspace list!)
+ * Walk curr->futex.robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
  */
 static void compat_exit_robust_list(struct task_struct *curr)
 {
-	struct compat_robust_list_head __user *head = curr->futex.compat_robust_list;
+	struct compat_robust_list_head __user *head = current->futex.compat_robust_list;
+	unsigned int limit = ROBUST_LIST_LIMIT, cur_mod, next_mod, pend_mod;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
-	unsigned int next_pi;
 	compat_uptr_t uentry, next_uentry, upending;
 	compat_long_t futex_offset;
 	int rc;
@@ -1254,7 +1253,7 @@ static void compat_exit_robust_list(stru
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi))
+	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &cur_mod))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1265,8 +1264,7 @@ static void compat_exit_robust_list(stru
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (compat_fetch_robust_entry(&upending, &pending,
-			       &head->list_op_pending, &pip))
+	if (compat_fetch_robust_entry(&upending, &pending, &head->list_op_pending, &pend_mod))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1276,7 +1274,7 @@ static void compat_exit_robust_list(stru
 		 * handle_futex_death:
 		 */
 		rc = compat_fetch_robust_entry(&next_uentry, &next_entry,
-			(compat_uptr_t __user *)&entry->next, &next_pi);
+			(compat_uptr_t __user *)&entry->next, &next_mod);
 		/*
 		 * A pending lock might already be on the list, so
 		 * dont process it twice:
@@ -1284,15 +1282,14 @@ static void compat_exit_robust_list(stru
 		if (entry != pending) {
 			void __user *uaddr = futex_uaddr(entry, futex_offset);
 
-			if (handle_futex_death(uaddr, curr, pi,
-					       HANDLE_DEATH_LIST))
+			if (handle_futex_death(uaddr, curr, cur_mod, HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
 			return;
 		uentry = next_uentry;
 		entry = next_entry;
-		pi = next_pi;
+		cur_mod = next_mod;
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
@@ -1304,7 +1301,7 @@ static void compat_exit_robust_list(stru
 	if (pending) {
 		void __user *uaddr = futex_uaddr(pending, futex_offset);
 
-		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
+		handle_futex_death(uaddr, curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
 #endif


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 06/16] uaccess: Provide unsafe_atomic_store_release_user()
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (4 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 05/16] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 07/16] x86: Select ARCH_MEMORY_ORDER_TSO Thomas Gleixner
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

The upcoming support for unlocking robust futexes in the kernel requires
store release semantics. Syscalls do not imply memory ordering on all
architectures so the unlock operation requires a barrier.

This barrier can be avoided when stores imply release like on x86.

Provide a generic version with a smp_mb() before the unsafe_put_user(),
which can be overridden by architectures.

Provide also a ARCH_MEMORY_ORDER_TSO Kconfig option, which can be selected
by architectures with Total Store Order (TSO), where store implies release,
so that the smp_mb() in the generic implementation can be avoided.

If that is set a barrier() is used instead of smp_mb(), which is not
required for the use case at hand, but makes it future proof for other
usage to prevent the compiler from reordering.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V4: Rename it really ....
    Add a barrier when TSO=y
V3: Rename to CONFIG_ARCH_MEMORY_ORDER_TSO - Peter
V2: New patch
---
 arch/Kconfig            |    4 ++++
 include/linux/uaccess.h |   11 +++++++++++
 2 files changed, 15 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -403,6 +403,10 @@ config ARCH_32BIT_OFF_T
 config ARCH_32BIT_USTAT_F_TINODE
 	bool
 
+# Selected by architectures with Total Store Order (TSO)
+config ARCH_MEMORY_ORDER_TSO
+	bool
+
 config HAVE_ASM_MODVERSIONS
 	bool
 	help
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -649,6 +649,17 @@ static inline void user_access_restore(u
 #define user_read_access_end user_access_end
 #endif
 
+#ifndef unsafe_atomic_store_release_user
+# define unsafe_atomic_store_release_user(val, uptr, elbl)	\
+	do {							\
+		if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TSO))	\
+			smp_mb();				\
+		else						\
+			barrier();				\
+		unsafe_put_user(val, uptr, elbl);		\
+	} while (0)
+#endif
+
 /* Define RW variant so the below _mode macro expansion works */
 #define masked_user_rw_access_begin(u)	masked_user_access_begin(u)
 #define user_rw_access_begin(u, s)	user_access_begin(u, s)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 07/16] x86: Select ARCH_MEMORY_ORDER_TSO
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (5 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 06/16] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 08/16] futex: Cleanup UAPI defines Thomas Gleixner
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

The generic unsafe_atomic_store_release_user() implementation does:

    if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TSO))
        smp_mb();
    unsafe_put_user();

As x86 implements Total Store Order (TSO) which means stores imply release,
select ARCH_MEMORY_ORDER_TSO to avoid the unnecessary smp_mb().

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V4: Rename it correctly
V3: Rename to TOS - Peter
V2: New patch
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -113,6 +113,7 @@ config X86
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_HAVE_EXTRA_ELF_NOTES
+	select ARCH_MEMORY_ORDER_TSO
 	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 08/16] futex: Cleanup UAPI defines
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (6 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 07/16] x86: Select ARCH_MEMORY_ORDER_TSO Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Make the operand defines tabular for readability sake.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V2: New patch
---
 include/uapi/linux/futex.h |   27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,23 +25,22 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
-#define FUTEX_CMD_MASK		~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
 
-#define FUTEX_WAIT_PRIVATE	(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_PRIVATE	(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_REQUEUE_PRIVATE	(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_OP_PRIVATE	(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI_PRIVATE	(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI2_PRIVATE	(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
-#define FUTEX_UNLOCK_PI_PRIVATE	(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+
+#define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_REQUEUE_PRIVATE		(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PRIVATE	(FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_OP_PRIVATE		(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI_PRIVATE		(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI2_PRIVATE		(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_PRIVATE		(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_TRYLOCK_PI_PRIVATE	(FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
 
 /*
  * Flags for futex2 syscalls.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 09/16] futex: Add support for unlocking robust futexes
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (7 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 08/16] futex: Cleanup UAPI defines Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03  8:22   ` Peter Zijlstra
                     ` (2 more replies)
  2026-06-02  9:09 ` [patch V5 10/16] futex: Add robust futex unlock IP range Thomas Gleixner
                   ` (6 subsequent siblings)
  15 siblings, 3 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Unlocking robust non-PI futexes happens in user space with the following
sequence:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = 0;
  3)	lval = atomic_xchg(lock, lval);
  4)	if (lval & WAITERS)
  5)		sys_futex(WAKE,....);
  6)	robust_list_clear_op_pending();

That opens a window between #3 and #6 where the mutex could be acquired by
some other task which observes that it is the last user and:

  A) unmaps the mutex memory
  B) maps a different file, which ends up covering the same address

When the original task exits before reaching #6 then the kernel robust list
handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupting unrelated data.

PI futexes have a similar problem both for the non-contented user space
unlock and the in kernel unlock:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = gettid();
  3)	if (!atomic_try_cmpxchg(lock, lval, 0))
  4)		sys_futex(UNLOCK_PI,....);
  5)	robust_list_clear_op_pending();

Address the first part of the problem where the futexes have waiters and
need to enter the kernel anyway. Add a new FUTEX_ROBUST_UNLOCK flag, which
is valid for the sys_futex() FUTEX_UNLOCK_PI, FUTEX_WAKE, FUTEX_WAKE_BITSET
operations.

This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear
whether this is needed and there is no usage of it in glibc either to
investigate.

For the futex2 syscall family this needs to be implemented with a new
syscall.

The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
robust_list_head::list_pending_op into the kernel. This argument is only
evaluated when the FUTEX_ROBUST_UNLOCK bit is set and is therefore backward
compatible.

This is an explicit argument to avoid the lookup of the robust list pointer
and retrieving the pending op pointer from there. User space has the
pointer already available so it can just put it into the @uaddr2
argument. Aside of that this allows the usage of multiple robust lists in
the future without any changes to the internal functions as they just operate
on the provided pointer.

This requires a second flag FUTEX_ROBUST_LIST32 which indicates that the
robust list pointer points to an u32 and not to an u64. This is required
for two reasons:

    1) sys_futex() has no compat variant

    2) The gaming emulators use both both 64-bit and compat 32-bit robust
       lists in the same 64-bit application

As a consequence 32-bit applications have to set this flag unconditionally
so they can run on a 64-bit kernel in compat mode unmodified. 32-bit
kernels return an error code when the flag is not set. 64-bit kernels will
happily clear the full 64 bits if user space fails to set it.

In case of FUTEX_UNLOCK_PI this clears the robust list pending op when the
unlock succeeded. In case of errors, the user space value is still locked
by the caller and therefore the above cannot happen.

In case of FUTEX_WAKE* this does the unlock of the futex in the kernel and
clears the robust list pending op when the unlock was successful. If not,
the user space value is still locked and user space has to deal with the
returned error. That means that the unlocking of non-PI robust futexes has
to use the same try_cmpxchg() unlock scheme as PI futexes.

If the clearing of the pending list op fails (fault) then the kernel clears
the registered robust list pointer if it matches to prevent that exit()
will try to handle invalid data. That's a valid paranoid decision because
the robust list head sits usually in the TLS and if the TLS is not longer
accessible then the chance for fixing up the resulting mess is very close
to zero.

The problem of non-contended unlocks still exists and will be addressed
separately.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V4: Fix the uapi defines
V3: Expand changelog to explain @uaddr2 - Andre
V2: Use store release for unlock	- Andre, Peter
    Use a separate FLAG for 32bit lists	- Florian
    Add command defines
---
 include/uapi/linux/futex.h |   29 +++++++++++++++++++++++-
 io_uring/futex.c           |    2 -
 kernel/futex/core.c        |   53 +++++++++++++++++++++++++++++++++++++++++++--
 kernel/futex/futex.h       |   15 +++++++++++-
 kernel/futex/pi.c          |   15 +++++++++++-
 kernel/futex/syscalls.c    |   13 ++++++++---
 kernel/futex/waitwake.c    |   30 +++++++++++++++++++++++--
 7 files changed, 144 insertions(+), 13 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,8 +25,11 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
+#define FUTEX_UNLOCK_ROBUST	512
+#define FUTEX_ROBUST_LIST32	1024
 
-#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | \
+					  FUTEX_UNLOCK_ROBUST | FUTEX_ROBUST_LIST32)
 
 #define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
@@ -43,6 +46,30 @@
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
 
 /*
+ * Operations to unlock a futex, clear the robust list pending op pointer and
+ * wake waiters.
+ */
+#define FUTEX_UNLOCK_PI_LIST64			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_PI_LIST64_PRIVATE		(FUTEX_UNLOCK_PI_LIST64 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_LIST32			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_PI_LIST32_PRIVATE		(FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST64		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE	(FUTEX_UNLOCK_WAKE_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST32		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE	(FUTEX_UNLOCK_WAKE_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST64		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST32		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_BITSET_LIST32_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST32 | FUTEX_PRIVATE_FLAG)
+
+/*
  * Flags for futex2 syscalls.
  *
  * NOTE: these are not pure flags, they can also be seen as:
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -327,7 +327,7 @@ int io_futex_wake(struct io_kiocb *req,
 	 * Strict flags - ensure that waking 0 futexes yields a 0 result.
 	 * See commit 43adf8449510 ("futex: FLAGS_STRICT") for details.
 	 */
-	ret = futex_wake(iof->uaddr, FLAGS_STRICT | iof->futex_flags,
+	ret = futex_wake(iof->uaddr, FLAGS_STRICT | iof->futex_flags, NULL,
 			 iof->futex_val, iof->futex_mask);
 	if (ret < 0)
 		req_set_fail(req);
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1062,7 +1062,7 @@ static int handle_futex_death(u32 __user
 	owner = uval & FUTEX_TID_MASK;
 
 	if (pending_op && !pi && !owner) {
-		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, NULL, 1,
 			   FUTEX_BITSET_MATCH_ANY);
 		return 0;
 	}
@@ -1116,7 +1116,7 @@ static int handle_futex_death(u32 __user
 	 * PI futexes happens in exit_pi_state():
 	 */
 	if (!pi && (uval & FUTEX_WAITERS)) {
-		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, NULL, 1,
 			   FUTEX_BITSET_MATCH_ANY);
 	}
 
@@ -1208,6 +1208,27 @@ static void exit_robust_list(struct task
 	}
 }
 
+static bool robust_list_clear_pending(unsigned long __user *pop)
+{
+	struct robust_list_head __user *head = current->futex.robust_list;
+
+	if (!put_user(0UL, pop))
+		return true;
+
+	/*
+	 * Just give up. The robust list head is usually part of TLS, so the
+	 * chance that this gets resolved is close to zero.
+	 *
+	 * If @pop_addr is the robust_list_head::list_op_pending pointer then
+	 * clear the robust list head pointer to prevent further damage when the
+	 * task exits.  Better a few stale futexes than corrupted memory. But
+	 * that's mostly an academic exercise.
+	 */
+	if (pop == (unsigned long __user *)&head->list_op_pending)
+		current->futex.robust_list = NULL;
+	return false;
+}
+
 #ifdef CONFIG_COMPAT
 static void __user *futex_uaddr(struct robust_list __user *entry,
 				compat_long_t futex_offset)
@@ -1304,6 +1325,21 @@ static void compat_exit_robust_list(stru
 		handle_futex_death(uaddr, curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
+
+static bool compat_robust_list_clear_pending(u32 __user *pop)
+{
+	struct compat_robust_list_head __user *head = current->futex.compat_robust_list;
+
+	if (!put_user(0U, pop))
+		return true;
+
+	/* See comment in robust_list_clear_pending(). */
+	if (pop == &head->list_op_pending)
+		current->futex.compat_robust_list = NULL;
+	return false;
+}
+#else
+static bool compat_robust_list_clear_pending(u32 __user *pop_addr) { return false; }
 #endif
 
 #ifdef CONFIG_FUTEX_PI
@@ -1397,6 +1433,19 @@ static void exit_pi_state_list(struct ta
 static inline void exit_pi_state_list(struct task_struct *curr) { }
 #endif
 
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
+{
+	bool size32bit = !!(flags & FLAGS_ROBUST_LIST32);
+
+	if (!IS_ENABLED(CONFIG_64BIT) && !size32bit)
+		return false;
+
+	if (IS_ENABLED(CONFIG_64BIT) && size32bit)
+		return compat_robust_list_clear_pending(pop);
+
+	return robust_list_clear_pending(pop);
+}
+
 static void futex_cleanup(struct task_struct *tsk)
 {
 	if (unlikely(tsk->futex.robust_list)) {
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -40,6 +40,8 @@
 #define FLAGS_NUMA		0x0080
 #define FLAGS_STRICT		0x0100
 #define FLAGS_MPOL		0x0200
+#define FLAGS_UNLOCK_ROBUST	0x0400
+#define FLAGS_ROBUST_LIST32	0x0800
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
@@ -52,6 +54,12 @@ static inline unsigned int futex_to_flag
 	if (op & FUTEX_CLOCK_REALTIME)
 		flags |= FLAGS_CLOCKRT;
 
+	if (op & FUTEX_UNLOCK_ROBUST)
+		flags |= FLAGS_UNLOCK_ROBUST;
+
+	if (op & FUTEX_ROBUST_LIST32)
+		flags |= FLAGS_ROBUST_LIST32;
+
 	return flags;
 }
 
@@ -449,13 +457,16 @@ extern int futex_unqueue_multiple(struct
 extern int futex_wait_multiple(struct futex_vector *vs, unsigned int count,
 			       struct hrtimer_sleeper *to);
 
-extern int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset);
+extern int futex_wake(u32 __user *uaddr, unsigned int flags, void __user *pop,
+		      int nr_wake, u32 bitset);
 
 extern int futex_wake_op(u32 __user *uaddr1, unsigned int flags,
 			 u32 __user *uaddr2, int nr_wake, int nr_wake2, int op);
 
-extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags);
+extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop);
 
 extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock);
 
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags);
+
 #endif /* _FUTEX_H */
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -1139,7 +1139,7 @@ int futex_lock_pi(u32 __user *uaddr, uns
  * This is the in-kernel slowpath: we look up the PI state (if any),
  * and do the rt-mutex unlock.
  */
-int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
+static int __futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 {
 	u32 curval, uval, vpid = task_pid_vnr(current);
 	union futex_key key = FUTEX_KEY_INIT;
@@ -1148,7 +1148,6 @@ int futex_unlock_pi(u32 __user *uaddr, u
 
 	if (!IS_ENABLED(CONFIG_FUTEX_PI))
 		return -ENOSYS;
-
 retry:
 	if (get_user(uval, uaddr))
 		return -EFAULT;
@@ -1302,3 +1301,15 @@ int futex_unlock_pi(u32 __user *uaddr, u
 	return ret;
 }
 
+int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop)
+{
+	int ret = __futex_unlock_pi(uaddr, flags);
+
+	if (ret || !(flags & FLAGS_UNLOCK_ROBUST))
+		return ret;
+
+	if (!futex_robust_list_clear_pending(pop, flags))
+		return -EFAULT;
+
+	return 0;
+}
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -118,6 +118,13 @@ long do_futex(u32 __user *uaddr, int op,
 			return -ENOSYS;
 	}
 
+	if (flags & FLAGS_UNLOCK_ROBUST) {
+		if (cmd != FUTEX_WAKE &&
+		    cmd != FUTEX_WAKE_BITSET &&
+		    cmd != FUTEX_UNLOCK_PI)
+			return -ENOSYS;
+	}
+
 	switch (cmd) {
 	case FUTEX_WAIT:
 		val3 = FUTEX_BITSET_MATCH_ANY;
@@ -128,7 +135,7 @@ long do_futex(u32 __user *uaddr, int op,
 		val3 = FUTEX_BITSET_MATCH_ANY;
 		fallthrough;
 	case FUTEX_WAKE_BITSET:
-		return futex_wake(uaddr, flags, val, val3);
+		return futex_wake(uaddr, flags, uaddr2, val, val3);
 	case FUTEX_REQUEUE:
 		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0);
 	case FUTEX_CMP_REQUEUE:
@@ -141,7 +148,7 @@ long do_futex(u32 __user *uaddr, int op,
 	case FUTEX_LOCK_PI2:
 		return futex_lock_pi(uaddr, flags, timeout, 0);
 	case FUTEX_UNLOCK_PI:
-		return futex_unlock_pi(uaddr, flags);
+		return futex_unlock_pi(uaddr, flags, uaddr2);
 	case FUTEX_TRYLOCK_PI:
 		return futex_lock_pi(uaddr, flags, NULL, 1);
 	case FUTEX_WAIT_REQUEUE_PI:
@@ -375,7 +382,7 @@ SYSCALL_DEFINE4(futex_wake,
 	if (!futex_validate_input(flags, mask))
 		return -EINVAL;
 
-	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
+	return futex_wake(uaddr, FLAGS_STRICT | flags, NULL, nr, mask);
 }
 
 /*
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -150,12 +150,35 @@ void futex_wake_mark(struct wake_q_head
 }
 
 /*
+ * If requested, clear the robust list pending op and unlock the futex
+ */
+static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __user *pop)
+{
+	if (!(flags & FLAGS_UNLOCK_ROBUST))
+		return true;
+
+	/* First unlock the futex, which requires release semantics. */
+	scoped_user_write_access(uaddr, efault)
+		unsafe_atomic_store_release_user(0, uaddr, efault);
+
+	/*
+	 * Clear the pending list op now. If that fails, then the task is in
+	 * deeper trouble as the robust list head is usually part of the TLS.
+	 * The chance of survival is close to zero.
+	 */
+	return futex_robust_list_clear_pending(pop, flags);
+
+efault:
+	return false;
+}
+
+/*
  * Wake up waiters matching bitset queued on this futex (uaddr).
  */
-int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
+int futex_wake(u32 __user *uaddr, unsigned int flags, void __user *pop, int nr_wake, u32 bitset)
 {
-	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
+	struct futex_q *this, *next;
 	DEFINE_WAKE_Q(wake_q);
 	int ret;
 
@@ -166,6 +189,9 @@ int futex_wake(u32 __user *uaddr, unsign
 	if (unlikely(ret != 0))
 		return ret;
 
+	if (!futex_robust_unlock(uaddr, flags, pop))
+		return -EFAULT;
+
 	if ((flags & FLAGS_STRICT) && !nr_wake)
 		return 0;
 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 10/16] futex: Add robust futex unlock IP range
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (8 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
@ 2026-06-02  9:09 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

There will be a VDSO function to unlock robust futexes in user space. The
unlock sequence is racy vs. clearing the list_pending_op pointer in the
tasks robust list head. To plug this race the kernel needs to know the
instruction window. As the VDSO is per MM the addresses are stored in
mm_struct::futex.

Architectures which implement support for this have to update these
addresses when the VDSO is (re)mapped and indicate the pending op pointer
size which is matching the IP.

Arguably this could be resolved by chasing mm->context->vdso->image, but
that's architecture specific and requires to touch quite some cache
lines. Having it in mm::futex reduces the cache line impact and avoids
having yet another set of architecture specific functionality.

To support multi size robust list applications (gaming) this provides two
ranges when COMPAT is enabled.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V5: Let the caller provide the real start/end IP - Andre
V4: Guard futex_mm_init() for builds w/o HASH and UNLOCK
V3: Make the number of ranges depend on COMPAT - Peter
V2: Store ranges in a struct with size information and allow up to two ranges.
---
 include/linux/futex.h       |   21 +++++++++++++++++---
 include/linux/futex_types.h |   28 ++++++++++++++++++++++++++
 init/Kconfig                |    6 +++++
 kernel/futex/core.c         |   46 +++++++++++++++++++++++++++++++++++---------
 4 files changed, 89 insertions(+), 12 deletions(-)

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -81,11 +81,9 @@ int futex_hash_prctl(unsigned long arg2,
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 int futex_hash_allocate_default(void);
 void futex_hash_free(struct mm_struct *mm);
-void futex_mm_init(struct mm_struct *mm);
 #else  /* CONFIG_FUTEX_PRIVATE_HASH */
 static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline void futex_mm_init(struct mm_struct *mm) { }
 #endif /* !CONFIG_FUTEX_PRIVATE_HASH */
 
 #else  /* CONFIG_FUTEX */
@@ -104,7 +102,24 @@ static inline int futex_hash_prctl(unsig
 }
 static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline void futex_mm_init(struct mm_struct *mm) { }
 #endif /* !CONFIG_FUTEX */
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+void futex_reset_cs_ranges(struct futex_mm_data *fd);
+
+static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
+					   unsigned long start, unsigned long end, bool sz32)
+{
+	fd->unlock.cs_ranges[idx].start_ip = start;
+	fd->unlock.cs_ranges[idx].len = end - start;
+	fd->unlock.cs_ranges[idx].pop_size32 = sz32;
+}
+#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+
+#if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
+void futex_mm_init(struct mm_struct *mm);
+#else
+static inline void futex_mm_init(struct mm_struct *mm) { }
+#endif
+
 #endif /* _LINUX_FUTEX_H */
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -55,12 +55,40 @@ struct futex_mm_phash {
 struct futex_mm_phash { };
 #endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+/**
+ * struct futex_unlock_cs_range - Range for the VDSO unlock critical section
+ * @start_ip:	The start IP of the robust futex unlock critical section (inclusive)
+ * @len:	The length of the robust futex unlock critical section
+ * @pop_size32:	Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
+ */
+struct futex_unlock_cs_range {
+	unsigned long	       start_ip;
+	unsigned int	       len;
+	unsigned int	       pop_size32;
+};
+
+#define FUTEX_ROBUST_MAX_CS_RANGES	(1 + IS_ENABLED(CONFIG_COMPAT))
+
+/**
+ * struct futex_unlock_cs_ranges - Futex unlock VSDO critical sections
+ * @cs_ranges:	Array of critical section ranges
+ */
+struct futex_unlock_cs_ranges {
+	struct futex_unlock_cs_range	cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
+};
+#else  /* CONFIG_FUTEX_ROBUST_UNLOCK */
+struct futex_unlock_cs_ranges { };
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
 /**
  * struct futex_mm_data - Futex related per MM data
  * @phash:	Futex private hash related data
+ * @unlock:	Futex unlock VDSO critical sections
  */
 struct futex_mm_data {
 	struct futex_mm_phash		phash;
+	struct futex_unlock_cs_ranges	unlock;
 };
 #else  /* CONFIG_FUTEX */
 struct futex_sched_data { };
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1842,6 +1842,12 @@ config FUTEX_MPOL
 	depends on FUTEX && NUMA
 	default y
 
+config HAVE_FUTEX_ROBUST_UNLOCK
+	bool
+
+config FUTEX_ROBUST_UNLOCK
+	def_bool FUTEX && HAVE_GENERIC_VDSO && GENERIC_IRQ_ENTRY && RSEQ && HAVE_FUTEX_ROBUST_UNLOCK
+
 config EPOLL
 	bool "Enable eventpoll support" if EXPERT
 	default y
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1761,11 +1761,11 @@ static bool futex_ref_is_dead(struct fut
 	return atomic_long_read(&mm->futex.phash.atomic) == 0;
 }
 
-void futex_mm_init(struct mm_struct *mm)
+static void futex_hash_init_mm(struct futex_mm_data *fd)
 {
-	memset(&mm->futex, 0, sizeof(mm->futex));
-	mutex_init(&mm->futex.phash.lock);
-	mm->futex.phash.batches = get_state_synchronize_rcu();
+	memset(&fd->phash, 0, sizeof(fd->phash));
+	mutex_init(&fd->phash.lock);
+	fd->phash.batches = get_state_synchronize_rcu();
 }
 
 void futex_hash_free(struct mm_struct *mm)
@@ -1969,19 +1969,47 @@ static int futex_hash_get_slots(void)
 		return fph->hash_mask + 1;
 	return 0;
 }
+#else  /* CONFIG_FUTEX_PRIVATE_HASH */
+static inline int futex_hash_allocate(unsigned int hslots, unsigned int flags) { return -EINVAL; }
+static inline int futex_hash_get_slots(void) { return 0; }
+static inline void futex_hash_init_mm(struct futex_mm_data *fd) { }
+#endif /* !CONFIG_FUTEX_PRIVATE_HASH */
 
-#else
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void futex_invalidate_cs_ranges(struct futex_mm_data *fd)
+{
+	/*
+	 * Invalidate start_ip so that the quick check fails for ip >= start_ip
+	 * if VDSO is not mapped or the second slot is not available for compat
+	 * tasks as they use VDSO32 which does not provide the 64-bit pointer
+	 * variant.
+	 */
+	for (int i = 0; i < FUTEX_ROBUST_MAX_CS_RANGES; i++)
+		fd->unlock.cs_ranges[i].start_ip = ~0UL;
+}
 
-static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
+void futex_reset_cs_ranges(struct futex_mm_data *fd)
 {
-	return -EINVAL;
+	memset(fd->unlock.cs_ranges, 0, sizeof(fd->unlock.cs_ranges));
+	futex_invalidate_cs_ranges(fd);
 }
 
-static int futex_hash_get_slots(void)
+static void futex_robust_unlock_init_mm(struct futex_mm_data *fd)
 {
-	return 0;
+	/* mm_dup() preserves the range, mm_alloc() clears it */
+	if (!fd->unlock.cs_ranges[0].start_ip)
+		futex_invalidate_cs_ranges(fd);
 }
+#else  /* CONFIG_FUTEX_ROBUST_UNLOCK */
+static inline void futex_robust_unlock_init_mm(struct futex_mm_data *fd) { }
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
 
+#if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
+void futex_mm_init(struct mm_struct *mm)
+{
+	futex_hash_init_mm(&mm->futex);
+	futex_robust_unlock_init_mm(&mm->futex);
+}
 #endif
 
 int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (9 preceding siblings ...)
  2026-06-02  9:09 ` [patch V5 10/16] futex: Add robust futex unlock IP range Thomas Gleixner
@ 2026-06-02  9:10 ` Thomas Gleixner
  2026-06-03  8:42   ` Peter Zijlstra
                     ` (3 more replies)
  2026-06-02  9:10 ` [patch V5 12/16] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
                   ` (4 subsequent siblings)
  15 siblings, 4 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:10 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in user space looks like this:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = gettid();
  3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4)		robust_list_clear_op_pending();
  	else
  5)		sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task, which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

On X86 this boils down to this simplified assembly sequence:

		mov		%esi,%eax	// Load TID into EAX
        	xor		%ecx,%ecx	// Set ECX to 0
   #3		lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
	.Lstart:
		jnz    		.Lend
   #4 		movq		%rcx,(%rdx)	// Clear list_op_pending
	.Lend:

If the cmpxchg() succeeds and the task is interrupted before it can clear
list_op_pending in the robust list head (#4) and the task crashes in a
signal handler or gets killed then it ends up in do_exit() and subsequently
in the robust list handling, which then might run into the unmap/map issue
described above.

This is only relevant when user space was interrupted and a signal is
pending. The fix-up has to be done before signal delivery is attempted
because:

   1) The signal might be fatal so get_signal() ends up in do_exit()

   2) The signal handler might crash or the task is killed before returning
      from the handler. At that point the instruction pointer in pt_regs is
      not longer the instruction pointer of the initially interrupted unlock
      sequence.

The right place to handle this is in __exit_to_user_mode_loop() before
invoking arch_do_signal_or_restart() as this covers obviously both
scenarios.

As this is only relevant when the task was interrupted in user space, this
is tied to RSEQ and the generic entry code as RSEQ keeps track of user
space interrupts unconditionally even if the task does not have a RSEQ
region installed. That makes the decision very lightweight:

       if (current->rseq.user_irq && within(regs, csr->unlock_ip_range))
       		futex_fixup_robust_unlock(regs, csr);

futex_fixup_robust_unlock() then invokes a architecture specific function
to returen the pending op pointer or NULL. The function evaluates the
register content to decide whether the pending ops pointer in the robust
list head needs to be cleared.

Assuming the above unlock sequence, then on x86 this decision is the
trivial evaluation of the zero flag:

	return regs->eflags & X86_EFLAGS_ZF ? regs->dx : NULL;

Other architectures might need to do more complex evaluations due to LLSC,
but the approach is valid in general. The size of the pointer is determined
from the matching range struct, which covers both 32-bit and 64-bit builds
including COMPAT.

The unlock sequence is going to be placed in the VDSO so that the kernel
can keep everything synchronized, especially the register usage. The
resulting code sequence for user space is:

   if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid)
 	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);

Both the VDSO unlock and the kernel side unlock ensure that the pending_op
pointer is always cleared when the lock becomes unlocked.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V3: Fixup conversion leftover which was lost on the devel machine
V2: Convert to the struct range storage and simplify the fixup logic
---
 include/linux/futex.h |   39 ++++++++++++++++++++++++++++++++++++-
 include/vdso/futex.h  |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/entry/common.c |    9 +++++---
 kernel/futex/core.c   |   18 +++++++++++++++++
 4 files changed, 114 insertions(+), 4 deletions(-)

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -105,7 +105,41 @@ static inline int futex_hash_free(struct
 #endif /* !CONFIG_FUTEX */
 
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+#include <asm/futex_robust.h>
+
 void futex_reset_cs_ranges(struct futex_mm_data *fd);
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr);
+
+static inline bool futex_within_robust_unlock(struct pt_regs *regs,
+					      struct futex_unlock_cs_range *csr)
+{
+	unsigned long ip = instruction_pointer(regs);
+
+	return ip >= csr->start_ip && ip < csr->start_ip + csr->len;
+}
+
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
+{
+	struct futex_unlock_cs_range *csr;
+
+	/*
+	 * Avoid dereferencing current->mm if not returning from interrupt.
+	 * current->rseq.event is going to be used subsequently, so bringing the
+	 * cache line in is not a big deal.
+	 */
+	if (!current->rseq.event.user_irq)
+		return;
+
+	csr = current->mm->futex.unlock.cs_ranges;
+
+	/* The loop is optimized out for !COMPAT */
+	for (int r = 0; r < FUTEX_ROBUST_MAX_CS_RANGES; r++, csr++) {
+		if (unlikely(futex_within_robust_unlock(regs, csr))) {
+			__futex_fixup_robust_unlock(regs, csr);
+			return;
+		}
+	}
+}
 
 static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
 					   unsigned long start, unsigned long end, bool sz32)
@@ -114,7 +148,10 @@ static inline void futex_set_vdso_cs_ran
 	fd->unlock.cs_ranges[idx].len = end - start;
 	fd->unlock.cs_ranges[idx].pop_size32 = sz32;
 }
-#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs) { }
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
 
 #if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
 void futex_mm_init(struct mm_struct *mm);
--- /dev/null
+++ b/include/vdso/futex.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _VDSO_FUTEX_H
+#define _VDSO_FUTEX_H
+
+#include <uapi/linux/types.h>
+
+/**
+ * __vdso_futex_robust_list64_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 64-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * The function implements:
+ *	if (atomic_try_cmpxchg(lock, &tid, 0))
+ *		*op = NULL;
+ *	return tid;
+ *
+ * There is a race between a successful unlock and clearing the pending op
+ * pointer in the robust list head. If the calling task is interrupted in the
+ * race window and has to handle a (fatal) signal on return to user space then
+ * the kernel handles the clearing of @pending_op before attempting to deliver
+ * the signal. That ensures that a task cannot exit with a potentially invalid
+ * pending op pointer.
+ *
+ * User space uses it in the following way:
+ *
+ * if (__vdso_futex_robust_list64_try_unlock(lock, tid, &pending_op) != tid)
+ *	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
+ *
+ * If the unlock attempt fails due to the FUTEX_WAITERS bit set in the lock,
+ * then the syscall does the unlock, clears the pending op pointer and wakes the
+ * requested number of waiters.
+ */
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop);
+
+/**
+ * __vdso_futex_robust_list32_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 32-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * Same as __vdso_futex_robust_list64_try_unlock() just with a 32-bit @pop pointer.
+ */
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop);
+
+#endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -1,11 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include <linux/irq-entry-common.h>
-#include <linux/resume_user_mode.h>
+#include <linux/futex.h>
 #include <linux/highmem.h>
+#include <linux/irq-entry-common.h>
 #include <linux/jump_label.h>
 #include <linux/kmsan.h>
 #include <linux/livepatch.h>
+#include <linux/resume_user_mode.h>
 #include <linux/tick.h>
 
 /* Workaround to allow gradual conversion of architecture code */
@@ -60,8 +61,10 @@ static __always_inline unsigned long __e
 		if (ti_work & _TIF_PATCH_PENDING)
 			klp_update_patch_state(current);
 
-		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
+		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
+			futex_fixup_robust_unlock(regs);
 			arch_do_signal_or_restart(regs);
+		}
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
 			resume_user_mode_work(regs);
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -46,6 +46,8 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <vdso/futex.h>
+
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
@@ -1446,6 +1448,22 @@ bool futex_robust_list_clear_pending(voi
 	return robust_list_clear_pending(pop);
 }
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr)
+{
+	/*
+	 * arch_futex_robust_unlock_get_pop() returns the list pending op pointer from
+	 * @regs if the try_cmpxchg() succeeded.
+	 */
+	void __user *pop = arch_futex_robust_unlock_get_pop(regs);
+
+	if (!pop)
+		return;
+
+	futex_robust_list_clear_pending(pop, csr->pop_size32 ? FLAGS_ROBUST_LIST32 : 0);
+}
+#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+
 static void futex_cleanup(struct task_struct *tsk)
 {
 	if (unlikely(tsk->futex.robust_list)) {


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 12/16] x86/vdso: Prepare for robust futex unlock support
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (10 preceding siblings ...)
  2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
@ 2026-06-02  9:10 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:10 ` [patch V5 13/16] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:10 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

There will be a VDSO function to unlock non-contended robust futexes in
user space. The unlock sequence is racy vs. clearing the list_pending_op
pointer in the task's robust list head. To plug this race the kernel needs
to know the critical section window so it can clear the pointer when the
task is interrupted within that race window. The window is determined by
labels in the inline assembly.

Add these symbols to the vdso2c generator and use them in the VDSO VMA code
to update the critical section addresses in mm_struct::futex on (re)map().

The symbols are not exported to user space, but available in the debug
version of the vDSO.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
V3: Rename the symbols once more
V2: Rename the symbols
---
 arch/x86/entry/vdso/vma.c   |   29 +++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h |    4 ++++
 arch/x86/tools/vdso2c.c     |   16 ++++++++++------
 3 files changed, 43 insertions(+), 6 deletions(-)

--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -6,6 +6,7 @@
  */
 #include <linux/mm.h>
 #include <linux/err.h>
+#include <linux/futex.h>
 #include <linux/sched.h>
 #include <linux/sched/task_stack.h>
 #include <linux/slab.h>
@@ -73,6 +74,31 @@ static void vdso_fix_landing(const struc
 		regs->ip = new_vma->vm_start + ipoffset;
 }
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void vdso_futex_robust_unlock_update_ips(void)
+{
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	unsigned long vdso = (unsigned long) current->mm->context.vdso;
+	struct futex_mm_data *fd = &current->mm->futex;
+	unsigned int idx = 0;
+
+	futex_reset_cs_ranges(fd);
+
+#ifdef CONFIG_X86_64
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list64_try_unlock_cs_start,
+				vdso + image->sym___futex_list64_try_unlock_cs_end, false);
+	idx++;
+#endif /* CONFIG_X86_64 */
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list32_try_unlock_cs_start,
+				vdso + image->sym___futex_list32_try_unlock_cs_end, true);
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
+}
+#else
+static inline void vdso_futex_robust_unlock_update_ips(void) { }
+#endif
+
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		struct vm_area_struct *new_vma)
 {
@@ -80,6 +106,7 @@ static int vdso_mremap(const struct vm_s
 
 	vdso_fix_landing(image, new_vma);
 	current->mm->context.vdso = (void __user *)new_vma->vm_start;
+	vdso_futex_robust_unlock_update_ips();
 
 	return 0;
 }
@@ -185,6 +212,8 @@ static int map_vdso(const struct vdso_im
 	current->mm->context.vdso = (void __user *)text_start;
 	current->mm->context.vdso_image = image;
 
+	vdso_futex_robust_unlock_update_ips();
+
 up_fail:
 	mmap_write_unlock(mm);
 	return ret;
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -24,6 +24,10 @@ struct vdso_image {
 	long sym_int80_landing_pad;
 	long sym_vdso32_sigreturn_landing_pad;
 	long sym_vdso32_rt_sigreturn_landing_pad;
+	long sym___futex_list64_try_unlock_cs_start;
+	long sym___futex_list64_try_unlock_cs_end;
+	long sym___futex_list32_try_unlock_cs_start;
+	long sym___futex_list32_try_unlock_cs_end;
 };
 
 extern const struct vdso_image vdso64_image;
--- a/arch/x86/tools/vdso2c.c
+++ b/arch/x86/tools/vdso2c.c
@@ -75,12 +75,16 @@ struct vdso_sym {
 };
 
 struct vdso_sym required_syms[] = {
-	{"__kernel_vsyscall", true},
-	{"__kernel_sigreturn", true},
-	{"__kernel_rt_sigreturn", true},
-	{"int80_landing_pad", true},
-	{"vdso32_rt_sigreturn_landing_pad", true},
-	{"vdso32_sigreturn_landing_pad", true},
+	{"__kernel_vsyscall",				true},
+	{"__kernel_sigreturn",				true},
+	{"__kernel_rt_sigreturn",			true},
+	{"int80_landing_pad",				true},
+	{"vdso32_rt_sigreturn_landing_pad",		true},
+	{"vdso32_sigreturn_landing_pad",		true},
+	{"__futex_list64_try_unlock_cs_start",		true},
+	{"__futex_list64_try_unlock_cs_end",		true},
+	{"__futex_list32_try_unlock_cs_start",		true},
+	{"__futex_list32_try_unlock_cs_end",		true},
 };
 
 __attribute__((format(printf, 1, 2))) __attribute__((noreturn))


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 13/16] x86/vdso: Implement __vdso_futex_robust_try_unlock()
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (11 preceding siblings ...)
  2026-06-02  9:10 ` [patch V5 12/16] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
@ 2026-06-02  9:10 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2026-06-02  9:10 ` [patch V5 14/16] Documentation: futex: Add a note about robust list race condition Thomas Gleixner
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:10 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in userspace looks like this:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);
	
  	lval = gettid();
  3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4)		robust_list_clear_op_pending();
  	else
  5)		sys_futex(OP,...FUTEX_ROBUST_UNLOCK);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

Provide a VDSO function which exposes the critical section window in the
VDSO symbol table. The resulting addresses are updated in the task's mm
when the VDSO is (re)map()'ed.

The core code detects when a task was interrupted within the critical
section and is about to deliver a signal. It then invokes an architecture
specific function which determines whether the pending op pointer has to be
cleared or not. The unlock assembly sequence on 64-bit is:

	mov		%esi,%eax	// Load TID into EAX
       	xor		%ecx,%ecx	// Set ECX to 0
	lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
  .Lstart:
	jnz    		.Lend
	movq		%rcx,(%rdx)	// Clear list_op_pending
  .Lend:
	ret

So the decision can be simply based on the ZF state in regs->flags. The
pending op pointer is always in DX independent of the build mode
(32/64-bit) to make the pending op pointer retrieval uniform. The size of
the pointer is stored in the matching criticial section range struct and
the core code retrieves it from there. So the pointer retrieval function
does not have to care. It is bit-size independent:

     return regs->flags & X86_EFLAGS_ZF ? regs->dx : NULL;

There are two entry points to handle the different robust list pending op
pointer size:

	__vdso_futex_robust_list64_try_unlock()
	__vdso_futex_robust_list32_try_unlock()

The 32-bit VDSO provides only __vdso_futex_robust_list32_try_unlock().

The 64-bit VDSO provides always __vdso_futex_robust_list64_try_unlock() and
when COMPAT is enabled also the list32 variant, which is required to
support multi-size robust list pointers used by gaming emulators.

The unlock function is inspired by an idea from Mathieu Desnoyers.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/20260311185409.1988269-1-mathieu.desnoyers@efficios.com
--
V3: Use 'r' for the zero register       - Uros
V2: Provide different entry points	- Florian
    Use __u32 and __x86_64__		- Thomas
    Use private labels			- Thomas
    Optimize assembly		   	- Uros
    
    Split the functions up now that ranges are supported in the core and
    document the actual assembly.
---
 arch/x86/Kconfig                         |    1 
 arch/x86/entry/vdso/common/vfutex.c      |   71 +++++++++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso32/Makefile      |    5 +-
 arch/x86/entry/vdso/vdso32/vdso32.lds.S  |    3 +
 arch/x86/entry/vdso/vdso32/vfutex.c      |    1 
 arch/x86/entry/vdso/vdso64/Makefile      |    7 +--
 arch/x86/entry/vdso/vdso64/vdso64.lds.S  |    7 +++
 arch/x86/entry/vdso/vdso64/vdsox32.lds.S |    7 +++
 arch/x86/entry/vdso/vdso64/vfutex.c      |    1 
 arch/x86/include/asm/futex_robust.h      |   19 ++++++++
 10 files changed, 117 insertions(+), 5 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -239,6 +239,7 @@ config X86
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_EISA			if X86_32
 	select HAVE_EXIT_THREAD
+	select HAVE_FUTEX_ROBUST_UNLOCK
 	select HAVE_GENERIC_TIF_BITS
 	select HAVE_GUP_FAST
 	select HAVE_FENTRY			if X86_64 || DYNAMIC_FTRACE
--- /dev/null
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <vdso/futex.h>
+
+/*
+ * Assembly template for the try unlock functions. The basic functionality is:
+ *
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		lock_cmpxchgl	%ecx, (%rdi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		movq		%rcx, (%rdx)	Set the pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_success is
+ * technically not required, but there for illustration, debugging and testing.
+ *
+ * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functions.
+ * One for the regular 64-bit sized pending operation pointer and one for a
+ * 32-bit sized pointer to support gaming emulators.
+ *
+ * The 32-bit VDSO provides only the one for 32-bit sized pointers.
+ */
+#define __stringify_1(x...)	#x
+#define __stringify(x...)	__stringify_1(x)
+
+#define LABEL(prefix, which)	__stringify(prefix##_try_unlock_cs_##which:)
+
+#define JNZ_END(prefix)		"jnz " __stringify(prefix) "_try_unlock_cs_end\n"
+
+#define CLEAR_POPQ		"movq	%[zero],  %a[pop]\n"
+#define CLEAR_POPL		"movl	%k[zero], %a[pop]\n"
+
+#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)\
+({									\
+	asm volatile (							\
+		"						\n"	\
+		"	lock cmpxchgl	%k[zero], %a[lock]	\n"	\
+		"						\n"	\
+		LABEL(prefix, start)					\
+		"						\n"	\
+		JNZ_END(prefix)						\
+		"						\n"	\
+		LABEL(prefix, success)					\
+		"						\n"	\
+			clear_pop					\
+		"						\n"	\
+		LABEL(prefix, end)					\
+		: [tid]   "+&a" (__tid)					\
+		: [lock]  "D"   (__lock),				\
+		  [pop]   "d"   (__pop),				\
+		  [zero]  "r"   (0UL)					\
+		: "memory"						\
+	);								\
+	__tid;								\
+})
+
+#ifdef __x86_64__
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+{
+	return futex_robust_try_unlock(__futex_list64, CLEAR_POPQ, lock, tid, pop);
+}
+#endif /* __x86_64__ */
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	return futex_robust_try_unlock(__futex_list32, CLEAR_POPL, lock, tid, pop);
+}
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
--- a/arch/x86/entry/vdso/vdso32/Makefile
+++ b/arch/x86/entry/vdso/vdso32/Makefile
@@ -7,8 +7,9 @@
 vdsos-y			:= 32
 
 # Files to link into the vDSO:
-vobjs-y			:= note.o vclock_gettime.o vgetcpu.o
-vobjs-y			+= system_call.o sigreturn.o
+vobjs-y					:= note.o vclock_gettime.o vgetcpu.o
+vobjs-y					+= system_call.o sigreturn.o
+vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK)	+= vfutex.o
 
 # Compilation flags
 flags-y			:= -DBUILD_VDSO32 -m32 -mregparm=0
--- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
@@ -30,6 +30,9 @@ VERSION
 		__vdso_clock_gettime64;
 		__vdso_clock_getres_time64;
 		__vdso_getcpu;
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list32_try_unlock;
+#endif
 	};
 
 	LINUX_2.5 {
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso32/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
--- a/arch/x86/entry/vdso/vdso64/Makefile
+++ b/arch/x86/entry/vdso/vdso64/Makefile
@@ -8,9 +8,10 @@ vdsos-y				:= 64
 vdsos-$(CONFIG_X86_X32_ABI)	+= x32
 
 # Files to link into the vDSO:
-vobjs-y				:= note.o vclock_gettime.o vgetcpu.o
-vobjs-y				+= vgetrandom.o vgetrandom-chacha.o
-vobjs-$(CONFIG_X86_SGX)		+= vsgx.o
+vobjs-y					:= note.o vclock_gettime.o vgetcpu.o
+vobjs-y					+= vgetrandom.o vgetrandom-chacha.o
+vobjs-$(CONFIG_X86_SGX)			+= vsgx.o
+vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK)	+= vfutex.o
 
 # Compilation flags
 flags-y				:= -DBUILD_VDSO64 -m64 -mcmodel=small
--- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
@@ -32,6 +32,13 @@ VERSION {
 #endif
 		getrandom;
 		__vdso_getrandom;
+
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
+		__vdso_futex_robust_list32_try_unlock;
+#endif
+#endif
 	local: *;
 	};
 }
--- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
@@ -22,6 +22,13 @@ VERSION {
 		__vdso_getcpu;
 		__vdso_time;
 		__vdso_clock_getres;
+
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
+		__vdso_futex_robust_list32_try_unlock;
+#endif
+#endif
 	local: *;
 	};
 }
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso64/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
--- /dev/null
+++ b/arch/x86/include/asm/futex_robust.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_FUTEX_ROBUST_H
+#define _ASM_X86_FUTEX_ROBUST_H
+
+#include <asm/ptrace.h>
+
+static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs)
+{
+	/*
+	 * If ZF is set then the cmpxchg succeeded and the pending op pointer
+	 * needs to be cleared.
+	 */
+	return regs->flags & X86_EFLAGS_ZF ? (void __user *)regs->dx : NULL;
+}
+
+#define arch_futex_robust_unlock_get_pop(regs)	\
+	x86_futex_robust_unlock_get_pop(regs)
+
+#endif /* _ASM_X86_FUTEX_ROBUST_H */


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 14/16] Documentation: futex: Add a note about robust list race condition
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (12 preceding siblings ...)
  2026-06-02  9:10 ` [patch V5 13/16] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
@ 2026-06-02  9:10 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for André Almeida
  2026-06-02  9:10 ` [patch V5 15/16] selftests: futex: Add tests for robust release operations Thomas Gleixner
  2026-06-02  9:10 ` [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs Thomas Gleixner
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:10 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

From: André Almeida <andrealmeid@igalia.com>

Add a note to the documentation giving a brief explanation why doing a
robust futex release in userspace is racy, what should be done to avoid
it and provide links to read more.

[ tglx: Fixed a few typos ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260329-tonyk-vdso_test-v2-1-b7db810e44a1@igalia.com

---
 Documentation/locking/robust-futex-ABI.rst |   44 +++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
--- a/Documentation/locking/robust-futex-ABI.rst
+++ b/Documentation/locking/robust-futex-ABI.rst
@@ -153,6 +153,9 @@ manipulating this list), the user code m
  3) release the futex lock, and
  4) clear the 'lock_op_pending' word.
 
+Please note that the removal of a robust futex purely in userspace is
+racy. Refer to the next chapter to learn more and how to avoid this.
+
 On exit, the kernel will consider the address stored in
 'list_op_pending' and the address of each 'lock word' found by walking
 the list starting at 'head'.  For each such address, if the bottom 30
@@ -182,3 +185,44 @@ The kernel exit code will silently stop
 When the kernel sees a list entry whose 'lock word' doesn't have the
 current threads TID in the lower 30 bits, it does nothing with that
 entry, and goes on to the next entry.
+
+Robust release is racy
+----------------------
+
+The removal of a robust futex from the list is racy when doing it solely in
+userspace. Quoting Thomas Gleixner for the explanation:
+
+  The robust futex unlock mechanism is racy in respect to the clearing of the
+  robust_list_head::list_op_pending pointer because unlock and clearing the
+  pointer are not atomic. The race window is between the unlock and clearing
+  the pending op pointer. If the task is forced to exit in this window, exit
+  will access a potentially invalid pending op pointer when cleaning up the
+  robust list. That happens if another task manages to unmap the object
+  containing the lock before the cleanup, which results in an UAF. In the
+  worst case this UAF can lead to memory corruption when unrelated content
+  has been mapped to the same address by the time the access happens.
+
+A full in-depth analysis can be read at
+https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/
+
+To overcome that, the kernel needs to participate in the lock release operation.
+This ensures that the release happens "atomically" with regard to releasing
+the lock and removing the address from ``list_op_pending``. If the release is
+interrupted by a signal, the kernel will also verify if it interrupted the
+release operation.
+
+For the contended unlock case, where other threads are waiting for the lock
+release, there's the ``FUTEX_ROBUST_UNLOCK`` operation feature flag for the
+``futex()`` system call, which must be used with one of the following
+operations: ``FUTEX_WAKE``, ``FUTEX_WAKE_BITSET`` or ``FUTEX_UNLOCK_PI``.
+The kernel will release the lock (set the futex word to zero), clean the
+``list_op_pending`` field. Then, it will proceed with the normal wake path.
+
+For the non-contended path, there's still a race between checking the futex word
+and clearing the ``list_op_pending`` field. To solve this without the need of a
+complete system call, userspace should call the virtual syscall
+``__vdso_futex_robust_listXX_try_unlock()`` (where XX is either 32 or 64,
+depending on the size of the pointer). If the vDSO call succeeds, it means that
+it released the lock and cleared ``list_op_pending``. If it fails, that means
+that there are waiters for this lock and a call to ``futex()`` syscall with
+``FUTEX_ROBUST_UNLOCK`` is needed.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 15/16] selftests: futex: Add tests for robust release operations
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (13 preceding siblings ...)
  2026-06-02  9:10 ` [patch V5 14/16] Documentation: futex: Add a note about robust list race condition Thomas Gleixner
@ 2026-06-02  9:10 ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for André Almeida
  2026-06-02  9:10 ` [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs Thomas Gleixner
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:10 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

From: André Almeida <andrealmeid@igalia.com>

Add tests for __vdso_futex_robust_listXX_try_unlock() and for the futex()
op FUTEX_ROBUST_UNLOCK.

Test the contended and uncontended cases for the vDSO functions and all
ops combinations for FUTEX_ROBUST_UNLOCK.

[ tglx: Replace the VDSO function lookup ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260329-tonyk-vdso_test-v2-2-b7db810e44a1@igalia.com

---
V3:
  Replaced the VDSO lookup

Change from v2:
 - Add test variants for FUTEX_ROBUST_LIST32
 - Skip 64 bit tests for 32 bit builds
---
 tools/testing/selftests/futex/functional/robust_list.c |  239 +++++++++++++++++
 tools/testing/selftests/futex/include/futextest.h      |    6 
 2 files changed, 245 insertions(+)
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -27,12 +27,15 @@
 #include "futextest.h"
 #include "../../kselftest_harness.h"
 
+#include <dlfcn.h>
 #include <errno.h>
 #include <pthread.h>
 #include <signal.h>
+#include <stdint.h>
 #include <stdatomic.h>
 #include <stdbool.h>
 #include <stddef.h>
+#include <sys/auxv.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
 
@@ -42,6 +45,10 @@
 
 #define SLEEP_US 100
 
+#if __SIZEOF_LONG__ == 8
+# define BUILD_64
+#endif
+
 static pthread_barrier_t barrier, barrier2;
 
 static int set_robust_list(struct robust_list_head *head, size_t len)
@@ -54,6 +61,12 @@ static int get_robust_list(int pid, stru
 	return syscall(SYS_get_robust_list, pid, head, len_ptr);
 }
 
+static int sys_futex_robust_unlock(_Atomic(uint32_t) *uaddr, unsigned int op, int val,
+				   void *list_op_pending, unsigned int val3)
+{
+	return syscall(SYS_futex, uaddr, op, val, NULL, list_op_pending, val3, 0);
+}
+
 /*
  * Basic lock struct, contains just the futex word and the robust list element
  * Real implementations have also a *prev to easily walk in the list
@@ -549,4 +562,230 @@ TEST(test_circular_list)
 		ksft_test_result_pass("%s\n", __func__);
 }
 
+/*
+ * Below are tests for the fix of robust release race condition. Please read the following
+ * thread to learn more about the issue in the first place and why the following functions fix it:
+ * https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/
+ */
+
+/*
+ * Auxiliary code for binding the vDSO functions
+ */
+static void *get_vdso_func_addr(const char *function)
+{
+	const char *vdso_names[] = {
+		"linux-vdso.so.1", "linux-gate.so.1", "linux-vdso32.so.1", "linux-vdso64.so.1",
+	};
+
+	for (int i = 0; i < ARRAY_SIZE(vdso_names); i++) {
+		void *vdso = dlopen(vdso_names[i], RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
+
+		if (vdso)
+			return dlsym(vdso, function);
+	}
+	return NULL;
+}
+
+/*
+ * These are the real vDSO function signatures:
+ *
+ *	__vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+ *	__vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+ *
+ * So for the generic entry point we need to use a void pointer as the last argument
+ */
+FIXTURE(vdso_unlock)
+{
+	uint32_t (*vdso)(_Atomic(uint32_t) *lock, uint32_t tid, void *pop);
+};
+
+FIXTURE_VARIANT(vdso_unlock)
+{
+	bool is_32;
+	char func_name[];
+};
+
+FIXTURE_SETUP(vdso_unlock)
+{
+	self->vdso = get_vdso_func_addr(variant->func_name);
+}
+
+FIXTURE_TEARDOWN(vdso_unlock) {}
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 32)
+{
+	.func_name = "__vdso_futex_robust_list32_try_unlock",
+	.is_32 = true,
+};
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 64)
+{
+	.func_name = "__vdso_futex_robust_list64_try_unlock",
+	.is_32 = false,
+};
+
+/*
+ * Test the vDSO robust_listXX_try_unlock() for the uncontended case. The virtual syscall should
+ * return the thread ID of the lock owner, the lock word must be 0 and the list_op_pending should
+ * be NULL.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_uncontended)
+{
+	struct lock_struct lock = { .futex = 0 };
+	_Atomic(unsigned int) *futex = &lock.futex;
+	struct robust_list_head head;
+	uintptr_t exp = (uintptr_t) NULL;
+	pid_t tid = gettid();
+	int ret;
+
+	if (!self->vdso) {
+		ksft_test_result_skip("%s not found\n", variant->func_name);
+		return;
+	}
+
+	*futex = tid;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	head.list_op_pending = &lock.list;
+
+	ret = self->vdso(futex, tid, &head.list_op_pending);
+
+	ASSERT_EQ(ret, tid);
+	ASSERT_EQ(*futex, 0);
+
+	/* Check only the lower 32 bits for the 32-bit entry point */
+	if (variant->is_32) {
+		exp = (uintptr_t)(unsigned long)&lock.list;
+		exp &= ~0xFFFFFFFFULL;
+	}
+
+	ASSERT_EQ((uintptr_t)(unsigned long)head.list_op_pending, exp);
+}
+
+/*
+ * If the lock is contended, the operation fails. The return value is the value found at the
+ * futex word (tid | FUTEX_WAITERS), the futex word is not modified and the list_op_pending is_32
+ * not cleared.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_contended)
+{
+	struct lock_struct lock = { .futex = 0 };
+	_Atomic(unsigned int) *futex = &lock.futex;
+	struct robust_list_head head;
+	pid_t tid = gettid();
+	int ret;
+
+	if (!self->vdso) {
+		ksft_test_result_skip("%s not found\n", variant->func_name);
+		return;
+	}
+
+	*futex = tid | FUTEX_WAITERS;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	head.list_op_pending = &lock.list;
+
+	ret = self->vdso(futex, tid, &head.list_op_pending);
+
+	ASSERT_EQ(ret, tid | FUTEX_WAITERS);
+	ASSERT_EQ(*futex, tid | FUTEX_WAITERS);
+	ASSERT_EQ(head.list_op_pending, &lock.list);
+}
+
+FIXTURE(futex_op) {};
+
+FIXTURE_VARIANT(futex_op)
+{
+	unsigned int op;
+	unsigned int val3;
+};
+
+FIXTURE_SETUP(futex_op) {}
+
+FIXTURE_TEARDOWN(futex_op) {}
+
+FIXTURE_VARIANT_ADD(futex_op, wake)
+{
+	.op = FUTEX_WAKE,
+	.val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset)
+{
+	.op = FUTEX_WAKE_BITSET,
+	.val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi)
+{
+	.op = FUTEX_UNLOCK_PI,
+	.val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake32)
+{
+	.op = FUTEX_WAKE | FUTEX_ROBUST_LIST32,
+	.val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset32)
+{
+	.op = FUTEX_WAKE_BITSET | FUTEX_ROBUST_LIST32,
+	.val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi32)
+{
+	.op = FUTEX_UNLOCK_PI | FUTEX_ROBUST_LIST32,
+	.val3 = 0,
+};
+
+/*
+ * The syscall should return the number of tasks waken (for this test, 0), clear the futex word and
+ * clear list_op_pending
+ */
+TEST_F(futex_op, test_futex_robust_unlock)
+{
+	struct lock_struct lock = { .futex = 0 };
+	_Atomic(unsigned int) *futex = &lock.futex;
+	uintptr_t exp = (uintptr_t) NULL;
+	struct robust_list_head head;
+	pid_t tid = gettid();
+	int ret;
+
+#ifndef BUILD_64
+	if (!(variant->op & FUTEX_ROBUST_LIST32)) {
+		ksft_test_result_skip("Not supported for 32 bit build\n");
+		return;
+	}
+#endif
+
+	*futex = tid | FUTEX_WAITERS;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	head.list_op_pending = &lock.list;
+
+	ret = sys_futex_robust_unlock(futex, FUTEX_ROBUST_UNLOCK | variant->op, tid,
+				      &head.list_op_pending, variant->val3);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(*futex, 0);
+
+	if (variant->op & FUTEX_ROBUST_LIST32) {
+		exp = (uint64_t)(unsigned long)&lock.list;
+		exp &= ~0xFFFFFFFFULL;
+	}
+
+	ASSERT_EQ((uintptr_t)(unsigned long)head.list_op_pending, exp);
+}
+
 TEST_HARNESS_MAIN
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -38,6 +38,12 @@ typedef volatile u_int32_t futex_t;
 #ifndef FUTEX_CMP_REQUEUE_PI
 #define FUTEX_CMP_REQUEUE_PI		12
 #endif
+#ifndef FUTEX_ROBUST_UNLOCK
+#define FUTEX_ROBUST_UNLOCK		512
+#endif
+#ifndef FUTEX_ROBUST_LIST32
+#define FUTEX_ROBUST_LIST32		1024
+#endif
 #ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
 #define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
 					 FUTEX_PRIVATE_FLAG)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs
  2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
                   ` (14 preceding siblings ...)
  2026-06-02  9:10 ` [patch V5 15/16] selftests: futex: Add tests for robust release operations Thomas Gleixner
@ 2026-06-02  9:10 ` Thomas Gleixner
  2026-06-02 10:39   ` Thomas Weißschuh
  15 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:10 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

Finding the debug version of the VDSO is not trivial as there is no common
scheme where it is placed. That's especially problematic for CI testing.

The VDSO futex unlock mechanism requires for testing to have access to the
inner labels of the unlock assembly, which are only accessible via the
debug so.

Also for general debugging purposes it's conveniant to have access to the
debug VDSO at a well defined place.

The files are placed in /sys/kernel/vdso/ and named vdso32.so.dbg,
vdso64.so.dbg, vdsox32.so.dbg.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig            |    1 +
 arch/x86/include/asm/vdso.h |    3 +++
 arch/x86/tools/vdso2c.c     |   15 ++++++++++-----
 arch/x86/tools/vdso2c.h     |   32 ++++++++++++++++++++++++++++++--
 include/vdso/sysfs.h        |    7 +++++++
 lib/vdso/Kconfig            |    6 ++++++
 lib/vdso/Makefile           |    3 ++-
 lib/vdso/sysfs.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 103 insertions(+), 8 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -306,6 +306,7 @@ config X86
 	select HAVE_UNWIND_USER_FP		if X86_64
 	select HAVE_USER_RETURN_NOTIFIER
 	select HAVE_GENERIC_VDSO
+	select HAVE_VDSO_DEBUG_SYSFS
 	select VDSO_GETRANDOM			if X86_64
 	select HOTPLUG_PARALLEL			if SMP && X86_64
 	select HOTPLUG_SMT			if SMP
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -14,6 +14,9 @@ struct vdso_image {
 	void *data;
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
+	void *dbg_data;
+	unsigned long dbg_size;   /* Always a multiple of PAGE_SIZE */
+
 	unsigned long alt, alt_len;
 	unsigned long extable_base, extable_len;
 	const void *extable;
--- a/arch/x86/tools/vdso2c.c
+++ b/arch/x86/tools/vdso2c.c
@@ -150,16 +150,16 @@ extern void bad_put_le(void);
 
 static void go(void *raw_addr, size_t raw_len,
 	       void *stripped_addr, size_t stripped_len,
-	       FILE *outfile, const char *name)
+	       FILE *outfile, const char *name, const char *dbg_name)
 {
 	Elf64_Ehdr *hdr = (Elf64_Ehdr *)raw_addr;
 
 	if (hdr->e_ident[EI_CLASS] == ELFCLASS64) {
 		go64(raw_addr, raw_len, stripped_addr, stripped_len,
-		     outfile, name);
+		     outfile, name, dbg_name);
 	} else if (hdr->e_ident[EI_CLASS] == ELFCLASS32) {
 		go32(raw_addr, raw_len, stripped_addr, stripped_len,
-		     outfile, name);
+		     outfile, name, dbg_name);
 	} else {
 		fail("unknown ELF class\n");
 	}
@@ -189,8 +189,8 @@ int main(int argc, char **argv)
 {
 	size_t raw_len, stripped_len;
 	void *raw_addr, *stripped_addr;
+	char *name, *tmp, *dbg_name;
 	FILE *outfile;
-	char *name, *tmp;
 	int namelen;
 
 	if (argc != 4) {
@@ -226,7 +226,12 @@ int main(int argc, char **argv)
 	if (!outfile)
 		err(1, "fopen(%s)", outfilename);
 
-	go(raw_addr, raw_len, stripped_addr, stripped_len, outfile, name);
+	dbg_name = strdup(argv[1]);
+	tmp = strrchr(dbg_name, '/');
+	if (tmp)
+		dbg_name = tmp + 1;
+
+	go(raw_addr, raw_len, stripped_addr, stripped_len, outfile, name, dbg_name);
 
 	munmap(raw_addr, raw_len);
 	munmap(stripped_addr, stripped_len);
--- a/arch/x86/tools/vdso2c.h
+++ b/arch/x86/tools/vdso2c.h
@@ -42,11 +42,12 @@ static void BITSFUNC(extract)(const unsi
 
 static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 			 void *stripped_addr, size_t stripped_len,
-			 FILE *outfile, const char *image_name)
+			 FILE *outfile, const char *image_name,
+			 const char *dbg_name)
 {
 	int found_load = 0;
 	unsigned long load_size = -1;  /* Work around bogus warning */
-	unsigned long mapping_size;
+	unsigned long mapping_size, dbg_size;
 	ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
 	unsigned long i, syms_nr;
 	ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
@@ -160,6 +161,7 @@ static void BITSFUNC(go)(void *raw_addr,
 	fprintf(outfile, "/* AUTOMATICALLY GENERATED -- DO NOT EDIT */\n\n");
 	fprintf(outfile, "#include <linux/linkage.h>\n");
 	fprintf(outfile, "#include <linux/init.h>\n");
+	fprintf(outfile, "#include <vdso/sysfs.h>\n");
 	fprintf(outfile, "#include <asm/page_types.h>\n");
 	fprintf(outfile, "#include <asm/vdso.h>\n");
 	fprintf(outfile, "\n");
@@ -173,6 +175,21 @@ static void BITSFUNC(go)(void *raw_addr,
 			(int)((unsigned char *)stripped_addr)[i]);
 	}
 	fprintf(outfile, "\n};\n\n");
+
+	dbg_size = (raw_len + 4095) / 4096 * 4096;
+
+	fprintf(outfile, "#ifdef CONFIG_VDSO_DEBUG_SYSFS\n");
+	fprintf(outfile,
+		"static unsigned char dbg_data[%lu] __ro_after_init __aligned(PAGE_SIZE) = {",
+		dbg_size);
+	for (i = 0; i < raw_len; i++) {
+		if (i % 10 == 0)
+			fprintf(outfile, "\n\t");
+		fprintf(outfile, "0x%02X, ", (int)((unsigned char *)raw_addr)[i]);
+	}
+	fprintf(outfile, "\n};\n");
+	fprintf(outfile, "#endif\n\n");
+
 	if (extable_sec)
 		BITSFUNC(extract)(raw_addr, raw_len, outfile,
 				  extable_sec, "extable");
@@ -180,6 +197,10 @@ static void BITSFUNC(go)(void *raw_addr,
 	fprintf(outfile, "const struct vdso_image %s = {\n", image_name);
 	fprintf(outfile, "\t.data = raw_data,\n");
 	fprintf(outfile, "\t.size = %lu,\n", mapping_size);
+	fprintf(outfile, "#ifdef CONFIG_VDSO_DEBUG_SYSFS\n");
+	fprintf(outfile, "\t.dbg_data = dbg_data,\n");
+	fprintf(outfile, "\t.dbg_size = %lu,\n", dbg_size);
+	fprintf(outfile, "#endif\n");
 	if (alt_sec) {
 		fprintf(outfile, "\t.alt = %lu,\n",
 			(unsigned long)GET_LE(&alt_sec->sh_offset));
@@ -205,4 +226,11 @@ static void BITSFUNC(go)(void *raw_addr,
 	fprintf(outfile, "};\n");
 	fprintf(outfile, "subsys_initcall(init_%s);\n", image_name);
 
+	fprintf(outfile, "\n#ifdef CONFIG_VDSO_DEBUG_SYSFS\n");
+	fprintf(outfile, "static __init int sysfs_init_%s(void) {\n", image_name);
+	fprintf(outfile, "\treturn vdso_sysfs_init_image(\"%s\", (void *)%s.dbg_data, %lu);\n",
+		dbg_name, image_name, raw_len);
+	fprintf(outfile, "};\n");
+	fprintf(outfile, "late_initcall(sysfs_init_%s);\n", image_name);
+	fprintf(outfile, "#endif\n");
 }
--- /dev/null
+++ b/include/vdso/sysfs.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_SYSFS_H
+#define __VDSO_SYSFS_H
+
+int vdso_sysfs_init_image(const char *name, void *addr, unsigned int size);
+
+#endif	/* __VDSO_SYSFS_H */
--- a/lib/vdso/Kconfig
+++ b/lib/vdso/Kconfig
@@ -3,6 +3,9 @@
 config HAVE_GENERIC_VDSO
 	bool
 
+config HAVE_VDSO_DEBUG_SYSFS
+	bool
+
 if HAVE_GENERIC_VDSO
 
 config GENERIC_GETTIMEOFDAY
@@ -24,4 +27,7 @@ config VDSO_GETRANDOM
 	help
 	  Selected by architectures that support vDSO getrandom().
 
+config VDSO_DEBUG_SYSFS
+	def_bool y if SYSFS && HAVE_VDSO_DEBUG_SYSFS
+
 endif
--- a/lib/vdso/Makefile
+++ b/lib/vdso/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-obj-$(CONFIG_HAVE_GENERIC_VDSO) += datastore.o
+obj-$(CONFIG_HAVE_GENERIC_VDSO)	+= datastore.o
+obj-$(CONFIG_VDSO_DEBUG_SYSFS)	+= sysfs.o
--- /dev/null
+++ b/lib/vdso/sysfs.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kobject.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/sysfs.h>
+#include <vdso/sysfs.h>
+
+static struct kobject *vdso_kobj __ro_after_init;
+static DEFINE_MUTEX(sysfs_mutex);
+
+int __init vdso_sysfs_init_image(const char *name, void *addr, unsigned int size)
+{
+	struct bin_attribute *attr = NULL;
+	int ret = -ENOMEM;
+
+	guard(mutex)(&sysfs_mutex);
+	if (!vdso_kobj) {
+		vdso_kobj = kobject_create_and_add("vdso", kernel_kobj);
+		if (!vdso_kobj)
+			return -ENOMEM;
+	}
+
+	attr = kzalloc_obj(*attr);
+	if (!attr)
+		goto out;
+
+	sysfs_bin_attr_init(attr);
+	attr->attr.name = name;
+	attr->attr.mode = 0444;
+	attr->private = addr;
+	attr->size = size;
+	attr->read = sysfs_bin_attr_simple_read;
+
+	ret = sysfs_create_bin_file(vdso_kobj, attr);
+	if (ret) {
+		pr_warn("Failed to register %s in sysfs: %d\n", name, ret);
+		goto out;
+	}
+	return 0;
+out:
+	kobject_put(vdso_kobj);
+	kfree(attr);
+	return ret;
+}


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs
  2026-06-02  9:10 ` [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs Thomas Gleixner
@ 2026-06-02 10:39   ` Thomas Weißschuh
  2026-06-02 20:02     ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Weißschuh @ 2026-06-02 10:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Peter Zijlstra,
	Florian Weimer, Rich Felker, Torvald Riegel, Darren Hart,
	Ingo Molnar, Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett,
	Uros Bizjak, Mark Brown, Richard Weinberger

On 2026-06-02 11:10:25+0200, Thomas Gleixner wrote:
> Finding the debug version of the VDSO is not trivial as there is no common
> scheme where it is placed. That's especially problematic for CI testing.
> 
> The VDSO futex unlock mechanism requires for testing to have access to the
> inner labels of the unlock assembly, which are only accessible via the
> debug so.
> 
> Also for general debugging purposes it's conveniant to have access to the
> debug VDSO at a well defined place.
> 
> The files are placed in /sys/kernel/vdso/ and named vdso32.so.dbg,
> vdso64.so.dbg, vdsox32.so.dbg.

How is a user supposed to find the correct one for a given task?
As currently proposed that requires architecture-specific logic.

What about mirroring CONFIG_IKCONFIG and CONFIG_IKHEADERS, packaging
the output of 'make vdso_install' as an archive and embedding that
into the kernel.

It has the following advantages:
* Can be loaded on-demand from a module.
* Does not require additional (per-architecture) code.
* Contains build-id symlinks which can be followed automatically to find
  a task's debug vDSO.

Currently CONFIG_IKHEADERS only provide the compressed archives, and not
a directly usable directory, though. That could be fine here, too.

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/Kconfig            |    1 +
>  arch/x86/include/asm/vdso.h |    3 +++
>  arch/x86/tools/vdso2c.c     |   15 ++++++++++-----
>  arch/x86/tools/vdso2c.h     |   32 ++++++++++++++++++++++++++++++--
>  include/vdso/sysfs.h        |    7 +++++++
>  lib/vdso/Kconfig            |    6 ++++++
>  lib/vdso/Makefile           |    3 ++-
>  lib/vdso/sysfs.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 103 insertions(+), 8 deletions(-)

(...)

> --- /dev/null
> +++ b/include/vdso/sysfs.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __VDSO_SYSFS_H
> +#define __VDSO_SYSFS_H
> +
> +int vdso_sysfs_init_image(const char *name, void *addr, unsigned int size);
> +
> +#endif	/* __VDSO_SYSFS_H */

This breaks the vdso/ header namespace.
For the datastore I went with include/linux/vdso_datastore.h.
Maye we can use include/linux/vdso/ ?

(...)


Thomas

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs
  2026-06-02 10:39   ` Thomas Weißschuh
@ 2026-06-02 20:02     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02 20:02 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Peter Zijlstra,
	Florian Weimer, Rich Felker, Torvald Riegel, Darren Hart,
	Ingo Molnar, Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett,
	Uros Bizjak, Mark Brown, Richard Weinberger

On Tue, Jun 02 2026 at 12:39, Thomas Weißschuh wrote:
> On 2026-06-02 11:10:25+0200, Thomas Gleixner wrote:
>> Finding the debug version of the VDSO is not trivial as there is no common
>> scheme where it is placed. That's especially problematic for CI testing.
>> 
>> The VDSO futex unlock mechanism requires for testing to have access to the
>> inner labels of the unlock assembly, which are only accessible via the
>> debug so.
>> 
>> Also for general debugging purposes it's conveniant to have access to the
>> debug VDSO at a well defined place.
>> 
>> The files are placed in /sys/kernel/vdso/ and named vdso32.so.dbg,
>> vdso64.so.dbg, vdsox32.so.dbg.
>
> How is a user supposed to find the correct one for a given task?
> As currently proposed that requires architecture-specific logic.
>
> What about mirroring CONFIG_IKCONFIG and CONFIG_IKHEADERS, packaging
> the output of 'make vdso_install' as an archive and embedding that
> into the kernel.
>
> It has the following advantages:
> * Can be loaded on-demand from a module.
> * Does not require additional (per-architecture) code.
> * Contains build-id symlinks which can be followed automatically to find
>   a task's debug vDSO.
>
> Currently CONFIG_IKHEADERS only provide the compressed archives, and not
> a directly usable directory, though. That could be fine here, too.

I'm fine with any sensible solution. This one just worked for my
nefarious purposes and that's why it's marked RFC. :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 09/16] futex: Add support for unlocking robust futexes
  2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
@ 2026-06-03  8:22   ` Peter Zijlstra
  2026-06-03  9:30     ` Peter Zijlstra
  2026-06-03 14:40     ` Thomas Gleixner
  2026-06-03  8:35   ` Peter Zijlstra
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2 siblings, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2026-06-03  8:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Tue, Jun 02, 2026 at 11:09:55AM +0200, Thomas Gleixner wrote:
> --- a/kernel/futex/futex.h
> +++ b/kernel/futex/futex.h
> @@ -40,6 +40,8 @@
>  #define FLAGS_NUMA		0x0080
>  #define FLAGS_STRICT		0x0100
>  #define FLAGS_MPOL		0x0200
> +#define FLAGS_UNLOCK_ROBUST	0x0400
> +#define FLAGS_ROBUST_LIST32	0x0800
>  
>  /* FUTEX_ to FLAGS_ */
>  static inline unsigned int futex_to_flags(unsigned int op)
> @@ -52,6 +54,12 @@ static inline unsigned int futex_to_flag
>  	if (op & FUTEX_CLOCK_REALTIME)
>  		flags |= FLAGS_CLOCKRT;
>  
> +	if (op & FUTEX_UNLOCK_ROBUST)
> +		flags |= FLAGS_UNLOCK_ROBUST;
> +
> +	if (op & FUTEX_ROBUST_LIST32)
> +		flags |= FLAGS_ROBUST_LIST32;
> +
>  	return flags;
>  }
>  

Would you mind terribly if I did: 's/UNLOCK_ROBUST/ROBUST_UNLOCK/g' on
the whole series?

Then we get:

FUTEX_ROBUST_UNLOCK
FUTEX_ROBUST_LIST32

FLAGS_ROBUST_UNLOCK
FLAGS_ROBUST_LIST32

which to me looks just a tad better.

Anyway, let me continue staring at things.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 09/16] futex: Add support for unlocking robust futexes
  2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
  2026-06-03  8:22   ` Peter Zijlstra
@ 2026-06-03  8:35   ` Peter Zijlstra
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2026-06-03  8:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Tue, Jun 02, 2026 at 11:09:55AM +0200, Thomas Gleixner wrote:

> This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear
> whether this is needed and there is no usage of it in glibc either to
> investigate.

Well, that and because that already consumes uaddr2 its just not doable
as-is.

> For the futex2 syscall family this needs to be implemented with a new
> syscall.

Yeah, this would need to be part of futex_lock() / futex_unlock(), which
we don't yet have.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
@ 2026-06-03  8:42   ` Peter Zijlstra
  2026-06-03  9:14   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2026-06-03  8:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Tue, Jun 02, 2026 at 11:10:04AM +0200, Thomas Gleixner wrote:
> When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
> then the unlock sequence in user space looks like this:
> 
>   1)	robust_list_set_op_pending(mutex);
>   2)	robust_list_remove(mutex);
> 	
>   	lval = gettid();
>   3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
>   4)		robust_list_clear_op_pending();
>   	else
>   5)		sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);

Ah!, see, your uapi patch earlier called that FUTEX_UNLOCK_ROBUST.

I'll fix it all up if I don't fine real issues.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
  2026-06-03  8:42   ` Peter Zijlstra
@ 2026-06-03  9:14   ` Peter Zijlstra
  2026-06-03 14:47     ` Thomas Gleixner
  2026-06-03  9:23   ` Peter Zijlstra
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  3 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2026-06-03  9:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Tue, Jun 02, 2026 at 11:10:04AM +0200, Thomas Gleixner wrote:

> On X86 this boils down to this simplified assembly sequence:
> 
> 		mov		%esi,%eax	// Load TID into EAX
>         	xor		%ecx,%ecx	// Set ECX to 0
>    #3		lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
> 	.Lstart:
> 		jnz    		.Lend
>    #4 		movq		%rcx,(%rdx)	// Clear list_op_pending
> 	.Lend:
> 
> If the cmpxchg() succeeds and the task is interrupted before it can clear
> list_op_pending in the robust list head (#4) and the task crashes in a
> signal handler or gets killed then it ends up in do_exit() and subsequently
> in the robust list handling, which then might run into the unmap/map issue
> described above.
> 
> This is only relevant when user space was interrupted and a signal is
> pending. The fix-up has to be done before signal delivery is attempted
> because:
> 
>    1) The signal might be fatal so get_signal() ends up in do_exit()
> 
>    2) The signal handler might crash or the task is killed before returning
>       from the handler. At that point the instruction pointer in pt_regs is
>       not longer the instruction pointer of the initially interrupted unlock
>       sequence.

However, due to the pending field being strictly per thread (thread
local storage and all that), the whole construct of futex robust unlock
is not signal safe in the sense that signal handlers must not use it.

A signal handler trying to use this would result in nested use of the
pending field, and that leads to corrupted state.

> The right place to handle this is in __exit_to_user_mode_loop() before
> invoking arch_do_signal_or_restart() as this covers obviously both
> scenarios.
> 
> As this is only relevant when the task was interrupted in user space, this
> is tied to RSEQ and the generic entry code as RSEQ keeps track of user
> space interrupts unconditionally even if the task does not have a RSEQ
> region installed. That makes the decision very lightweight:
> 
>        if (current->rseq.user_irq && within(regs, csr->unlock_ip_range))
>        		futex_fixup_robust_unlock(regs, csr);
> 
> futex_fixup_robust_unlock() then invokes a architecture specific function
> to returen the pending op pointer or NULL. The function evaluates the
> register content to decide whether the pending ops pointer in the robust
> list head needs to be cleared.
> 
> Assuming the above unlock sequence, then on x86 this decision is the
> trivial evaluation of the zero flag:
> 
> 	return regs->eflags & X86_EFLAGS_ZF ? regs->dx : NULL;
> 
> Other architectures might need to do more complex evaluations due to LLSC,
> but the approach is valid in general. The size of the pointer is determined
> from the matching range struct, which covers both 32-bit and 64-bit builds
> including COMPAT.

So my initial thoughts today were that we should probably also move the
IP to .Lend, to avoid userspace from writing to that location again.

However, due to the above mentioned restrictions vs signals, there
cannot be a situation where this matters, and so the point is moot.

A double store is harmless and it makes the kernel just this little bit
simpler.

The only reason I'm sending this email is to have this more explicitly
documented for posterity I suppose ;-)

> The unlock sequence is going to be placed in the VDSO so that the kernel
> can keep everything synchronized, especially the register usage. The
> resulting code sequence for user space is:
> 
>    if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid)
>  	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
> 
> Both the VDSO unlock and the kernel side unlock ensure that the pending_op
> pointer is always cleared when the lock becomes unlocked.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
  2026-06-03  8:42   ` Peter Zijlstra
  2026-06-03  9:14   ` Peter Zijlstra
@ 2026-06-03  9:23   ` Peter Zijlstra
  2026-06-03 14:42     ` Thomas Gleixner
  2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  3 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2026-06-03  9:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Tue, Jun 02, 2026 at 11:10:04AM +0200, Thomas Gleixner wrote:
> When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
> then the unlock sequence in user space looks like this:
> 
>   1)	robust_list_set_op_pending(mutex);
>   2)	robust_list_remove(mutex);
> 	
>   	lval = gettid();
>   3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
>   4)		robust_list_clear_op_pending();
>   	else
>   5)		sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);
> 
> That still leaves a minimal race window between #3 and #4 where the mutex
> could be acquired by some other task, which observes that it is the last
> user and:
> 
>   1) unmaps the mutex memory
>   2) maps a different file, which ends up covering the same address
> 
> When then the original task exits before reaching #5 then the kernel robust
> list handling observes the pending op entry and tries to fix up user space.

This #5 reference, should be #4, yeah? Same bit of Changelog is
replicated in a later patch and has the same issue.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 09/16] futex: Add support for unlocking robust futexes
  2026-06-03  8:22   ` Peter Zijlstra
@ 2026-06-03  9:30     ` Peter Zijlstra
  2026-06-03 14:40     ` Thomas Gleixner
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2026-06-03  9:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Wed, Jun 03, 2026 at 10:22:20AM +0200, Peter Zijlstra wrote:

> Would you mind terribly if I did: 's/UNLOCK_ROBUST/ROBUST_UNLOCK/g' on
> the whole series?

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,11 +25,11 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
-#define FUTEX_UNLOCK_ROBUST	512
+#define FUTEX_ROBUST_UNLOCK	512
 #define FUTEX_ROBUST_LIST32	1024
 
 #define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | \
-					  FUTEX_UNLOCK_ROBUST | FUTEX_ROBUST_LIST32)
+					  FUTEX_ROBUST_UNLOCK | FUTEX_ROBUST_LIST32)
 
 #define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
@@ -49,23 +49,23 @@
  * Operations to unlock a futex, clear the robust list pending op pointer and
  * wake waiters.
  */
-#define FUTEX_UNLOCK_PI_LIST64			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_PI_LIST64			(FUTEX_UNLOCK_PI | FUTEX_ROBUST_UNLOCK)
 #define FUTEX_UNLOCK_PI_LIST64_PRIVATE		(FUTEX_UNLOCK_PI_LIST64 | FUTEX_PRIVATE_FLAG)
-#define FUTEX_UNLOCK_PI_LIST32			(FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST | \
+#define FUTEX_UNLOCK_PI_LIST32			(FUTEX_UNLOCK_PI | FUTEX_ROBUST_UNLOCK | \
 						 FUTEX_ROBUST_LIST32)
 #define FUTEX_UNLOCK_PI_LIST32_PRIVATE		(FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG)
 
-#define FUTEX_UNLOCK_WAKE_LIST64		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_WAKE_LIST64		(FUTEX_WAKE | FUTEX_ROBUST_UNLOCK)
 #define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE	(FUTEX_UNLOCK_WAKE_LIST64 | FUTEX_PRIVATE_FLAG)
 
-#define FUTEX_UNLOCK_WAKE_LIST32		(FUTEX_WAKE | FUTEX_UNLOCK_ROBUST | \
+#define FUTEX_UNLOCK_WAKE_LIST32		(FUTEX_WAKE | FUTEX_ROBUST_UNLOCK | \
 						 FUTEX_ROBUST_LIST32)
 #define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE	(FUTEX_UNLOCK_WAKE_LIST32 | FUTEX_PRIVATE_FLAG)
 
-#define FUTEX_UNLOCK_BITSET_LIST64		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST)
+#define FUTEX_UNLOCK_BITSET_LIST64		(FUTEX_WAKE_BITSET | FUTEX_ROBUST_UNLOCK)
 #define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG)
 
-#define FUTEX_UNLOCK_BITSET_LIST32		(FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST | \
+#define FUTEX_UNLOCK_BITSET_LIST32		(FUTEX_WAKE_BITSET | FUTEX_ROBUST_UNLOCK | \
 						 FUTEX_ROBUST_LIST32)
 #define FUTEX_UNLOCK_BITSET_LIST32_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST32 | FUTEX_PRIVATE_FLAG)
 
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -40,7 +40,7 @@
 #define FLAGS_NUMA		0x0080
 #define FLAGS_STRICT		0x0100
 #define FLAGS_MPOL		0x0200
-#define FLAGS_UNLOCK_ROBUST	0x0400
+#define FLAGS_ROBUST_UNLOCK	0x0400
 #define FLAGS_ROBUST_LIST32	0x0800
 
 /* FUTEX_ to FLAGS_ */
@@ -54,8 +54,8 @@ static inline unsigned int futex_to_flag
 	if (op & FUTEX_CLOCK_REALTIME)
 		flags |= FLAGS_CLOCKRT;
 
-	if (op & FUTEX_UNLOCK_ROBUST)
-		flags |= FLAGS_UNLOCK_ROBUST;
+	if (op & FUTEX_ROBUST_UNLOCK)
+		flags |= FLAGS_ROBUST_UNLOCK;
 
 	if (op & FUTEX_ROBUST_LIST32)
 		flags |= FLAGS_ROBUST_LIST32;
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -1305,7 +1305,7 @@ int futex_unlock_pi(u32 __user *uaddr, u
 {
 	int ret = __futex_unlock_pi(uaddr, flags);
 
-	if (ret || !(flags & FLAGS_UNLOCK_ROBUST))
+	if (ret || !(flags & FLAGS_ROBUST_UNLOCK))
 		return ret;
 
 	if (!futex_robust_list_clear_pending(pop, flags))
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -118,7 +118,7 @@ long do_futex(u32 __user *uaddr, int op,
 			return -ENOSYS;
 	}
 
-	if (flags & FLAGS_UNLOCK_ROBUST) {
+	if (flags & FLAGS_ROBUST_UNLOCK) {
 		if (cmd != FUTEX_WAKE &&
 		    cmd != FUTEX_WAKE_BITSET &&
 		    cmd != FUTEX_UNLOCK_PI)
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -154,7 +154,7 @@ void futex_wake_mark(struct wake_q_head
  */
 static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __user *pop)
 {
-	if (!(flags & FLAGS_UNLOCK_ROBUST))
+	if (!(flags & FLAGS_ROBUST_UNLOCK))
 		return true;
 
 	/* First unlock the futex, which requires release semantics. */

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [tip: locking/core] selftests: futex: Add tests for robust release operations
  2026-06-02  9:10 ` [patch V5 15/16] selftests: futex: Add tests for robust release operations Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for André Almeida
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for André Almeida @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andrealmeid, Thomas Gleixner, Peter Zijlstra (Intel), x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     608323bf7bb85bbb647eca4373acef247f105e67
Gitweb:        https://git.kernel.org/tip/608323bf7bb85bbb647eca4373acef247f105e67
Author:        André Almeida <andrealmeid@igalia.com>
AuthorDate:    Tue, 02 Jun 2026 11:10:21 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:53 +02:00

selftests: futex: Add tests for robust release operations

Add tests for __vdso_futex_robust_listXX_try_unlock() and for the futex()
op FUTEX_ROBUST_UNLOCK.

Test the contended and uncontended cases for the vDSO functions and all
ops combinations for FUTEX_ROBUST_UNLOCK.

[ tglx: Replace the VDSO function lookup ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260329-tonyk-vdso_test-v2-2-b7db810e44a1@igalia.com
Link: https://patch.msgid.link/20260602090535.988101541@kernel.org
---
 tools/testing/selftests/futex/functional/robust_list.c | 239 ++++++++-
 tools/testing/selftests/futex/include/futextest.h      |   6 +-
 2 files changed, 245 insertions(+)

diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index e7d1254..b3fab60 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -27,12 +27,15 @@
 #include "futextest.h"
 #include "../../kselftest_harness.h"
 
+#include <dlfcn.h>
 #include <errno.h>
 #include <pthread.h>
 #include <signal.h>
+#include <stdint.h>
 #include <stdatomic.h>
 #include <stdbool.h>
 #include <stddef.h>
+#include <sys/auxv.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
 
@@ -42,6 +45,10 @@
 
 #define SLEEP_US 100
 
+#if __SIZEOF_LONG__ == 8
+# define BUILD_64
+#endif
+
 static pthread_barrier_t barrier, barrier2;
 
 static int set_robust_list(struct robust_list_head *head, size_t len)
@@ -54,6 +61,12 @@ static int get_robust_list(int pid, struct robust_list_head **head, size_t *len_
 	return syscall(SYS_get_robust_list, pid, head, len_ptr);
 }
 
+static int sys_futex_robust_unlock(_Atomic(uint32_t) *uaddr, unsigned int op, int val,
+				   void *list_op_pending, unsigned int val3)
+{
+	return syscall(SYS_futex, uaddr, op, val, NULL, list_op_pending, val3, 0);
+}
+
 /*
  * Basic lock struct, contains just the futex word and the robust list element
  * Real implementations have also a *prev to easily walk in the list
@@ -549,4 +562,230 @@ TEST(test_circular_list)
 		ksft_test_result_pass("%s\n", __func__);
 }
 
+/*
+ * Below are tests for the fix of robust release race condition. Please read the following
+ * thread to learn more about the issue in the first place and why the following functions fix it:
+ * https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/
+ */
+
+/*
+ * Auxiliary code for binding the vDSO functions
+ */
+static void *get_vdso_func_addr(const char *function)
+{
+	const char *vdso_names[] = {
+		"linux-vdso.so.1", "linux-gate.so.1", "linux-vdso32.so.1", "linux-vdso64.so.1",
+	};
+
+	for (int i = 0; i < ARRAY_SIZE(vdso_names); i++) {
+		void *vdso = dlopen(vdso_names[i], RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
+
+		if (vdso)
+			return dlsym(vdso, function);
+	}
+	return NULL;
+}
+
+/*
+ * These are the real vDSO function signatures:
+ *
+ *	__vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+ *	__vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+ *
+ * So for the generic entry point we need to use a void pointer as the last argument
+ */
+FIXTURE(vdso_unlock)
+{
+	uint32_t (*vdso)(_Atomic(uint32_t) *lock, uint32_t tid, void *pop);
+};
+
+FIXTURE_VARIANT(vdso_unlock)
+{
+	bool is_32;
+	char func_name[];
+};
+
+FIXTURE_SETUP(vdso_unlock)
+{
+	self->vdso = get_vdso_func_addr(variant->func_name);
+}
+
+FIXTURE_TEARDOWN(vdso_unlock) {}
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 32)
+{
+	.func_name = "__vdso_futex_robust_list32_try_unlock",
+	.is_32 = true,
+};
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 64)
+{
+	.func_name = "__vdso_futex_robust_list64_try_unlock",
+	.is_32 = false,
+};
+
+/*
+ * Test the vDSO robust_listXX_try_unlock() for the uncontended case. The virtual syscall should
+ * return the thread ID of the lock owner, the lock word must be 0 and the list_op_pending should
+ * be NULL.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_uncontended)
+{
+	struct lock_struct lock = { .futex = 0 };
+	_Atomic(unsigned int) *futex = &lock.futex;
+	struct robust_list_head head;
+	uintptr_t exp = (uintptr_t) NULL;
+	pid_t tid = gettid();
+	int ret;
+
+	if (!self->vdso) {
+		ksft_test_result_skip("%s not found\n", variant->func_name);
+		return;
+	}
+
+	*futex = tid;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	head.list_op_pending = &lock.list;
+
+	ret = self->vdso(futex, tid, &head.list_op_pending);
+
+	ASSERT_EQ(ret, tid);
+	ASSERT_EQ(*futex, 0);
+
+	/* Check only the lower 32 bits for the 32-bit entry point */
+	if (variant->is_32) {
+		exp = (uintptr_t)(unsigned long)&lock.list;
+		exp &= ~0xFFFFFFFFULL;
+	}
+
+	ASSERT_EQ((uintptr_t)(unsigned long)head.list_op_pending, exp);
+}
+
+/*
+ * If the lock is contended, the operation fails. The return value is the value found at the
+ * futex word (tid | FUTEX_WAITERS), the futex word is not modified and the list_op_pending is_32
+ * not cleared.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_contended)
+{
+	struct lock_struct lock = { .futex = 0 };
+	_Atomic(unsigned int) *futex = &lock.futex;
+	struct robust_list_head head;
+	pid_t tid = gettid();
+	int ret;
+
+	if (!self->vdso) {
+		ksft_test_result_skip("%s not found\n", variant->func_name);
+		return;
+	}
+
+	*futex = tid | FUTEX_WAITERS;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	head.list_op_pending = &lock.list;
+
+	ret = self->vdso(futex, tid, &head.list_op_pending);
+
+	ASSERT_EQ(ret, tid | FUTEX_WAITERS);
+	ASSERT_EQ(*futex, tid | FUTEX_WAITERS);
+	ASSERT_EQ(head.list_op_pending, &lock.list);
+}
+
+FIXTURE(futex_op) {};
+
+FIXTURE_VARIANT(futex_op)
+{
+	unsigned int op;
+	unsigned int val3;
+};
+
+FIXTURE_SETUP(futex_op) {}
+
+FIXTURE_TEARDOWN(futex_op) {}
+
+FIXTURE_VARIANT_ADD(futex_op, wake)
+{
+	.op = FUTEX_WAKE,
+	.val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset)
+{
+	.op = FUTEX_WAKE_BITSET,
+	.val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi)
+{
+	.op = FUTEX_UNLOCK_PI,
+	.val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake32)
+{
+	.op = FUTEX_WAKE | FUTEX_ROBUST_LIST32,
+	.val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset32)
+{
+	.op = FUTEX_WAKE_BITSET | FUTEX_ROBUST_LIST32,
+	.val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi32)
+{
+	.op = FUTEX_UNLOCK_PI | FUTEX_ROBUST_LIST32,
+	.val3 = 0,
+};
+
+/*
+ * The syscall should return the number of tasks waken (for this test, 0), clear the futex word and
+ * clear list_op_pending
+ */
+TEST_F(futex_op, test_futex_robust_unlock)
+{
+	struct lock_struct lock = { .futex = 0 };
+	_Atomic(unsigned int) *futex = &lock.futex;
+	uintptr_t exp = (uintptr_t) NULL;
+	struct robust_list_head head;
+	pid_t tid = gettid();
+	int ret;
+
+#ifndef BUILD_64
+	if (!(variant->op & FUTEX_ROBUST_LIST32)) {
+		ksft_test_result_skip("Not supported for 32 bit build\n");
+		return;
+	}
+#endif
+
+	*futex = tid | FUTEX_WAITERS;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	head.list_op_pending = &lock.list;
+
+	ret = sys_futex_robust_unlock(futex, FUTEX_ROBUST_UNLOCK | variant->op, tid,
+				      &head.list_op_pending, variant->val3);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(*futex, 0);
+
+	if (variant->op & FUTEX_ROBUST_LIST32) {
+		exp = (uint64_t)(unsigned long)&lock.list;
+		exp &= ~0xFFFFFFFFULL;
+	}
+
+	ASSERT_EQ((uintptr_t)(unsigned long)head.list_op_pending, exp);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index 3d48e97..df33f31 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -38,6 +38,12 @@ typedef volatile u_int32_t futex_t;
 #ifndef FUTEX_CMP_REQUEUE_PI
 #define FUTEX_CMP_REQUEUE_PI		12
 #endif
+#ifndef FUTEX_ROBUST_UNLOCK
+#define FUTEX_ROBUST_UNLOCK		512
+#endif
+#ifndef FUTEX_ROBUST_LIST32
+#define FUTEX_ROBUST_LIST32		1024
+#endif
 #ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
 #define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
 					 FUTEX_PRIVATE_FLAG)

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] Documentation: futex: Add a note about robust list race condition
  2026-06-02  9:10 ` [patch V5 14/16] Documentation: futex: Add a note about robust list race condition Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for André Almeida
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for André Almeida @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andrealmeid, Thomas Gleixner, Peter Zijlstra (Intel), x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     3f63e2545978abda58f2cf7ff0d7a2942965e8cb
Gitweb:        https://git.kernel.org/tip/3f63e2545978abda58f2cf7ff0d7a2942965e8cb
Author:        André Almeida <andrealmeid@igalia.com>
AuthorDate:    Tue, 02 Jun 2026 11:10:16 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:52 +02:00

Documentation: futex: Add a note about robust list race condition

Add a note to the documentation giving a brief explanation why doing a
robust futex release in userspace is racy, what should be done to avoid
it and provide links to read more.

[ tglx: Fixed a few typos ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260329-tonyk-vdso_test-v2-1-b7db810e44a1@igalia.com
Link: https://patch.msgid.link/20260602090535.936286833@kernel.org
---
 Documentation/locking/robust-futex-ABI.rst | 44 +++++++++++++++++++++-
 1 file changed, 44 insertions(+)

diff --git a/Documentation/locking/robust-futex-ABI.rst b/Documentation/locking/robust-futex-ABI.rst
index f24904f..5e6a066 100644
--- a/Documentation/locking/robust-futex-ABI.rst
+++ b/Documentation/locking/robust-futex-ABI.rst
@@ -153,6 +153,9 @@ On removal:
  3) release the futex lock, and
  4) clear the 'lock_op_pending' word.
 
+Please note that the removal of a robust futex purely in userspace is
+racy. Refer to the next chapter to learn more and how to avoid this.
+
 On exit, the kernel will consider the address stored in
 'list_op_pending' and the address of each 'lock word' found by walking
 the list starting at 'head'.  For each such address, if the bottom 30
@@ -182,3 +185,44 @@ any point:
 When the kernel sees a list entry whose 'lock word' doesn't have the
 current threads TID in the lower 30 bits, it does nothing with that
 entry, and goes on to the next entry.
+
+Robust release is racy
+----------------------
+
+The removal of a robust futex from the list is racy when doing it solely in
+userspace. Quoting Thomas Gleixner for the explanation:
+
+  The robust futex unlock mechanism is racy in respect to the clearing of the
+  robust_list_head::list_op_pending pointer because unlock and clearing the
+  pointer are not atomic. The race window is between the unlock and clearing
+  the pending op pointer. If the task is forced to exit in this window, exit
+  will access a potentially invalid pending op pointer when cleaning up the
+  robust list. That happens if another task manages to unmap the object
+  containing the lock before the cleanup, which results in an UAF. In the
+  worst case this UAF can lead to memory corruption when unrelated content
+  has been mapped to the same address by the time the access happens.
+
+A full in-depth analysis can be read at
+https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/
+
+To overcome that, the kernel needs to participate in the lock release operation.
+This ensures that the release happens "atomically" with regard to releasing
+the lock and removing the address from ``list_op_pending``. If the release is
+interrupted by a signal, the kernel will also verify if it interrupted the
+release operation.
+
+For the contended unlock case, where other threads are waiting for the lock
+release, there's the ``FUTEX_ROBUST_UNLOCK`` operation feature flag for the
+``futex()`` system call, which must be used with one of the following
+operations: ``FUTEX_WAKE``, ``FUTEX_WAKE_BITSET`` or ``FUTEX_UNLOCK_PI``.
+The kernel will release the lock (set the futex word to zero), clean the
+``list_op_pending`` field. Then, it will proceed with the normal wake path.
+
+For the non-contended path, there's still a race between checking the futex word
+and clearing the ``list_op_pending`` field. To solve this without the need of a
+complete system call, userspace should call the virtual syscall
+``__vdso_futex_robust_listXX_try_unlock()`` (where XX is either 32 or 64,
+depending on the size of the pointer). If the vDSO call succeeds, it means that
+it released the lock and cleared ``list_op_pending``. If it fails, that means
+that there are waiters for this lock and a call to ``futex()`` syscall with
+``FUTEX_ROBUST_UNLOCK`` is needed.

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] x86/vdso: Implement __vdso_futex_robust_try_unlock()
  2026-06-02  9:10 ` [patch V5 13/16] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, Uros Bizjak,
	x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     a2274cc0091ed4fdce10fad68d08c529b8d3e7dd
Gitweb:        https://git.kernel.org/tip/a2274cc0091ed4fdce10fad68d08c529b8d3e7dd
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:10:12 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:52 +02:00

x86/vdso: Implement __vdso_futex_robust_try_unlock()

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in userspace looks like this:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);

  	lval = gettid();
  3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4)		robust_list_clear_op_pending();
  	else
  5)		sys_futex(OP,...FUTEX_ROBUST_UNLOCK);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

Provide a VDSO function which exposes the critical section window in the
VDSO symbol table. The resulting addresses are updated in the task's mm
when the VDSO is (re)map()'ed.

The core code detects when a task was interrupted within the critical
section and is about to deliver a signal. It then invokes an architecture
specific function which determines whether the pending op pointer has to be
cleared or not. The unlock assembly sequence on 64-bit is:

	mov		%esi,%eax	// Load TID into EAX
       	xor		%ecx,%ecx	// Set ECX to 0
	lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
  .Lstart:
	jnz    		.Lend
	movq		%rcx,(%rdx)	// Clear list_op_pending
  .Lend:
	ret

So the decision can be simply based on the ZF state in regs->flags. The
pending op pointer is always in DX independent of the build mode
(32/64-bit) to make the pending op pointer retrieval uniform. The size of
the pointer is stored in the matching criticial section range struct and
the core code retrieves it from there. So the pointer retrieval function
does not have to care. It is bit-size independent:

     return regs->flags & X86_EFLAGS_ZF ? regs->dx : NULL;

There are two entry points to handle the different robust list pending op
pointer size:

	__vdso_futex_robust_list64_try_unlock()
	__vdso_futex_robust_list32_try_unlock()

The 32-bit VDSO provides only __vdso_futex_robust_list32_try_unlock().

The 64-bit VDSO provides always __vdso_futex_robust_list64_try_unlock() and
when COMPAT is enabled also the list32 variant, which is required to
support multi-size robust list pointers used by gaming emulators.

The unlock function is inspired by an idea from Mathieu Desnoyers.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/20260311185409.1988269-1-mathieu.desnoyers@efficios.com
Link: https://patch.msgid.link/20260602090535.883796247@kernel.org
---
 arch/x86/Kconfig                         |  1 +-
 arch/x86/entry/vdso/common/vfutex.c      | 71 +++++++++++++++++++++++-
 arch/x86/entry/vdso/vdso32/Makefile      |  5 +-
 arch/x86/entry/vdso/vdso32/vdso32.lds.S  |  3 +-
 arch/x86/entry/vdso/vdso32/vfutex.c      |  1 +-
 arch/x86/entry/vdso/vdso64/Makefile      |  7 +-
 arch/x86/entry/vdso/vdso64/vdso64.lds.S  |  7 ++-
 arch/x86/entry/vdso/vdso64/vdsox32.lds.S |  7 ++-
 arch/x86/entry/vdso/vdso64/vfutex.c      |  1 +-
 arch/x86/include/asm/futex_robust.h      | 19 ++++++-
 10 files changed, 117 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/entry/vdso/common/vfutex.c
 create mode 100644 arch/x86/entry/vdso/vdso32/vfutex.c
 create mode 100644 arch/x86/entry/vdso/vdso64/vfutex.c
 create mode 100644 arch/x86/include/asm/futex_robust.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1ce62a9..fdaef60 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -239,6 +239,7 @@ config X86
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_EISA			if X86_32
 	select HAVE_EXIT_THREAD
+	select HAVE_FUTEX_ROBUST_UNLOCK
 	select HAVE_GENERIC_TIF_BITS
 	select HAVE_GUP_FAST
 	select HAVE_FENTRY			if X86_64 || DYNAMIC_FTRACE
diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
new file mode 100644
index 0000000..454f059
--- /dev/null
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <vdso/futex.h>
+
+/*
+ * Assembly template for the try unlock functions. The basic functionality is:
+ *
+ *		mov		esi, %eax	Move the TID into EAX
+ *		xor		%ecx, %ecx	Clear ECX
+ *		lock_cmpxchgl	%ecx, (%rdi)	Attempt the TID -> 0 transition
+ * .Lcs_start:					Start of the critical section
+ *		jnz		.Lcs_end	If cmpxchl failed jump to the end
+ * .Lcs_success:				Start of the success section
+ *		movq		%rcx, (%rdx)	Set the pending op pointer to 0
+ * .Lcs_end:					End of the critical section
+ *
+ * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_success is
+ * technically not required, but there for illustration, debugging and testing.
+ *
+ * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functions.
+ * One for the regular 64-bit sized pending operation pointer and one for a
+ * 32-bit sized pointer to support gaming emulators.
+ *
+ * The 32-bit VDSO provides only the one for 32-bit sized pointers.
+ */
+#define __stringify_1(x...)	#x
+#define __stringify(x...)	__stringify_1(x)
+
+#define LABEL(prefix, which)	__stringify(prefix##_try_unlock_cs_##which:)
+
+#define JNZ_END(prefix)		"jnz " __stringify(prefix) "_try_unlock_cs_end\n"
+
+#define CLEAR_POPQ		"movq	%[zero],  %a[pop]\n"
+#define CLEAR_POPL		"movl	%k[zero], %a[pop]\n"
+
+#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)\
+({									\
+	asm volatile (							\
+		"						\n"	\
+		"	lock cmpxchgl	%k[zero], %a[lock]	\n"	\
+		"						\n"	\
+		LABEL(prefix, start)					\
+		"						\n"	\
+		JNZ_END(prefix)						\
+		"						\n"	\
+		LABEL(prefix, success)					\
+		"						\n"	\
+			clear_pop					\
+		"						\n"	\
+		LABEL(prefix, end)					\
+		: [tid]   "+&a" (__tid)					\
+		: [lock]  "D"   (__lock),				\
+		  [pop]   "d"   (__pop),				\
+		  [zero]  "r"   (0UL)					\
+		: "memory"						\
+	);								\
+	__tid;								\
+})
+
+#ifdef __x86_64__
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+{
+	return futex_robust_try_unlock(__futex_list64, CLEAR_POPQ, lock, tid, pop);
+}
+#endif /* __x86_64__ */
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	return futex_robust_try_unlock(__futex_list32, CLEAR_POPL, lock, tid, pop);
+}
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
diff --git a/arch/x86/entry/vdso/vdso32/Makefile b/arch/x86/entry/vdso/vdso32/Makefile
index ded4fc6..ab4b1f6 100644
--- a/arch/x86/entry/vdso/vdso32/Makefile
+++ b/arch/x86/entry/vdso/vdso32/Makefile
@@ -7,8 +7,9 @@
 vdsos-y			:= 32
 
 # Files to link into the vDSO:
-vobjs-y			:= note.o vclock_gettime.o vgetcpu.o
-vobjs-y			+= system_call.o sigreturn.o
+vobjs-y					:= note.o vclock_gettime.o vgetcpu.o
+vobjs-y					+= system_call.o sigreturn.o
+vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK)	+= vfutex.o
 
 # Compilation flags
 flags-y			:= -DBUILD_VDSO32 -m32 -mregparm=0
diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
index 55554f8..cee8f7f 100644
--- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
@@ -30,6 +30,9 @@ VERSION
 		__vdso_clock_gettime64;
 		__vdso_clock_getres_time64;
 		__vdso_getcpu;
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list32_try_unlock;
+#endif
 	};
 
 	LINUX_2.5 {
diff --git a/arch/x86/entry/vdso/vdso32/vfutex.c b/arch/x86/entry/vdso/vdso32/vfutex.c
new file mode 100644
index 0000000..940a6ee
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso32/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
diff --git a/arch/x86/entry/vdso/vdso64/Makefile b/arch/x86/entry/vdso/vdso64/Makefile
index bfffaf1..7c07900 100644
--- a/arch/x86/entry/vdso/vdso64/Makefile
+++ b/arch/x86/entry/vdso/vdso64/Makefile
@@ -8,9 +8,10 @@ vdsos-y				:= 64
 vdsos-$(CONFIG_X86_X32_ABI)	+= x32
 
 # Files to link into the vDSO:
-vobjs-y				:= note.o vclock_gettime.o vgetcpu.o
-vobjs-y				+= vgetrandom.o vgetrandom-chacha.o
-vobjs-$(CONFIG_X86_SGX)		+= vsgx.o
+vobjs-y					:= note.o vclock_gettime.o vgetcpu.o
+vobjs-y					+= vgetrandom.o vgetrandom-chacha.o
+vobjs-$(CONFIG_X86_SGX)			+= vsgx.o
+vobjs-$(CONFIG_FUTEX_ROBUST_UNLOCK)	+= vfutex.o
 
 # Compilation flags
 flags-y				:= -DBUILD_VDSO64 -m64 -mcmodel=small
diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
index 5ce3f2b..4a72122 100644
--- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S
@@ -32,6 +32,13 @@ VERSION {
 #endif
 		getrandom;
 		__vdso_getrandom;
+
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
+		__vdso_futex_robust_list32_try_unlock;
+#endif
+#endif
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
index 3dbd20c..b917dc6 100644
--- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S
@@ -22,6 +22,13 @@ VERSION {
 		__vdso_getcpu;
 		__vdso_time;
 		__vdso_clock_getres;
+
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+		__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
+		__vdso_futex_robust_list32_try_unlock;
+#endif
+#endif
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vdso64/vfutex.c b/arch/x86/entry/vdso/vdso64/vfutex.c
new file mode 100644
index 0000000..940a6ee
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso64/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
diff --git a/arch/x86/include/asm/futex_robust.h b/arch/x86/include/asm/futex_robust.h
new file mode 100644
index 0000000..e879547
--- /dev/null
+++ b/arch/x86/include/asm/futex_robust.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_FUTEX_ROBUST_H
+#define _ASM_X86_FUTEX_ROBUST_H
+
+#include <asm/ptrace.h>
+
+static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs)
+{
+	/*
+	 * If ZF is set then the cmpxchg succeeded and the pending op pointer
+	 * needs to be cleared.
+	 */
+	return regs->flags & X86_EFLAGS_ZF ? (void __user *)regs->dx : NULL;
+}
+
+#define arch_futex_robust_unlock_get_pop(regs)	\
+	x86_futex_robust_unlock_get_pop(regs)
+
+#endif /* _ASM_X86_FUTEX_ROBUST_H */

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] x86/vdso: Prepare for robust futex unlock support
  2026-06-02  9:10 ` [patch V5 12/16] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     61cfc8e372d1971e0a96d3f1f8b5ee29916b3385
Gitweb:        https://git.kernel.org/tip/61cfc8e372d1971e0a96d3f1f8b5ee29916b3385
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:10:08 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:52 +02:00

x86/vdso: Prepare for robust futex unlock support

There will be a VDSO function to unlock non-contended robust futexes in
user space. The unlock sequence is racy vs. clearing the list_pending_op
pointer in the task's robust list head. To plug this race the kernel needs
to know the critical section window so it can clear the pointer when the
task is interrupted within that race window. The window is determined by
labels in the inline assembly.

Add these symbols to the vdso2c generator and use them in the VDSO VMA code
to update the critical section addresses in mm_struct::futex on (re)map().

The symbols are not exported to user space, but available in the debug
version of the vDSO.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.828312645@kernel.org
---
 arch/x86/entry/vdso/vma.c   | 29 +++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h |  4 ++++
 arch/x86/tools/vdso2c.c     | 16 ++++++++++------
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index a6bfcc8..9a953e7 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -6,6 +6,7 @@
  */
 #include <linux/mm.h>
 #include <linux/err.h>
+#include <linux/futex.h>
 #include <linux/sched.h>
 #include <linux/sched/task_stack.h>
 #include <linux/slab.h>
@@ -73,6 +74,31 @@ static void vdso_fix_landing(const struct vdso_image *image,
 		regs->ip = new_vma->vm_start + ipoffset;
 }
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void vdso_futex_robust_unlock_update_ips(void)
+{
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	unsigned long vdso = (unsigned long) current->mm->context.vdso;
+	struct futex_mm_data *fd = &current->mm->futex;
+	unsigned int idx = 0;
+
+	futex_reset_cs_ranges(fd);
+
+#ifdef CONFIG_X86_64
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list64_try_unlock_cs_start,
+				vdso + image->sym___futex_list64_try_unlock_cs_end, false);
+	idx++;
+#endif /* CONFIG_X86_64 */
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list32_try_unlock_cs_start,
+				vdso + image->sym___futex_list32_try_unlock_cs_end, true);
+#endif /* CONFIG_X86_32 || CONFIG_COMPAT */
+}
+#else
+static inline void vdso_futex_robust_unlock_update_ips(void) { }
+#endif
+
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		struct vm_area_struct *new_vma)
 {
@@ -80,6 +106,7 @@ static int vdso_mremap(const struct vm_special_mapping *sm,
 
 	vdso_fix_landing(image, new_vma);
 	current->mm->context.vdso = (void __user *)new_vma->vm_start;
+	vdso_futex_robust_unlock_update_ips();
 
 	return 0;
 }
@@ -185,6 +212,8 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
 	current->mm->context.vdso = (void __user *)text_start;
 	current->mm->context.vdso_image = image;
 
+	vdso_futex_robust_unlock_update_ips();
+
 up_fail:
 	mmap_write_unlock(mm);
 	return ret;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index f2d4921..4e73515 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -24,6 +24,10 @@ struct vdso_image {
 	long sym_int80_landing_pad;
 	long sym_vdso32_sigreturn_landing_pad;
 	long sym_vdso32_rt_sigreturn_landing_pad;
+	long sym___futex_list64_try_unlock_cs_start;
+	long sym___futex_list64_try_unlock_cs_end;
+	long sym___futex_list32_try_unlock_cs_start;
+	long sym___futex_list32_try_unlock_cs_end;
 };
 
 extern const struct vdso_image vdso64_image;
diff --git a/arch/x86/tools/vdso2c.c b/arch/x86/tools/vdso2c.c
index b8a5557..64a636b 100644
--- a/arch/x86/tools/vdso2c.c
+++ b/arch/x86/tools/vdso2c.c
@@ -75,12 +75,16 @@ struct vdso_sym {
 };
 
 struct vdso_sym required_syms[] = {
-	{"__kernel_vsyscall", true},
-	{"__kernel_sigreturn", true},
-	{"__kernel_rt_sigreturn", true},
-	{"int80_landing_pad", true},
-	{"vdso32_rt_sigreturn_landing_pad", true},
-	{"vdso32_sigreturn_landing_pad", true},
+	{"__kernel_vsyscall",				true},
+	{"__kernel_sigreturn",				true},
+	{"__kernel_rt_sigreturn",			true},
+	{"int80_landing_pad",				true},
+	{"vdso32_rt_sigreturn_landing_pad",		true},
+	{"vdso32_sigreturn_landing_pad",		true},
+	{"__futex_list64_try_unlock_cs_start",		true},
+	{"__futex_list64_try_unlock_cs_end",		true},
+	{"__futex_list32_try_unlock_cs_start",		true},
+	{"__futex_list32_try_unlock_cs_end",		true},
 };
 
 __attribute__((format(printf, 1, 2))) __attribute__((noreturn))

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
                     ` (2 preceding siblings ...)
  2026-06-03  9:23   ` Peter Zijlstra
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  3 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     7010c39d8fc5063af69ee63f905e592e046f8e5d
Gitweb:        https://git.kernel.org/tip/7010c39d8fc5063af69ee63f905e592e046f8e5d
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:10:04 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:52 +02:00

futex: Provide infrastructure to plug the non contended robust futex unlock race

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in user space looks like this:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);

  	lval = gettid();
  3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4)		robust_list_clear_op_pending();
  	else
  5)		sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task, which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

On X86 this boils down to this simplified assembly sequence:

		mov		%esi,%eax	// Load TID into EAX
        	xor		%ecx,%ecx	// Set ECX to 0
   #3		lock cmpxchg	%ecx,(%rdi)	// Try the TID -> 0 transition
	.Lstart:
		jnz    		.Lend
   #4 		movq		%rcx,(%rdx)	// Clear list_op_pending
	.Lend:

If the cmpxchg() succeeds and the task is interrupted before it can clear
list_op_pending in the robust list head (#4) and the task crashes in a
signal handler or gets killed then it ends up in do_exit() and subsequently
in the robust list handling, which then might run into the unmap/map issue
described above.

This is only relevant when user space was interrupted and a signal is
pending. The fix-up has to be done before signal delivery is attempted
because:

   1) The signal might be fatal so get_signal() ends up in do_exit()

   2) The signal handler might crash or the task is killed before returning
      from the handler. At that point the instruction pointer in pt_regs is
      not longer the instruction pointer of the initially interrupted unlock
      sequence.

The right place to handle this is in __exit_to_user_mode_loop() before
invoking arch_do_signal_or_restart() as this covers obviously both
scenarios.

As this is only relevant when the task was interrupted in user space, this
is tied to RSEQ and the generic entry code as RSEQ keeps track of user
space interrupts unconditionally even if the task does not have a RSEQ
region installed. That makes the decision very lightweight:

       if (current->rseq.user_irq && within(regs, csr->unlock_ip_range))
       		futex_fixup_robust_unlock(regs, csr);

futex_fixup_robust_unlock() then invokes a architecture specific function
to return the pending op pointer or NULL. The function evaluates the
register content to decide whether the pending ops pointer in the robust
list head needs to be cleared.

Assuming the above unlock sequence, then on x86 this decision is the
trivial evaluation of the zero flag:

	return regs->eflags & X86_EFLAGS_ZF ? regs->dx : NULL;

Other architectures might need to do more complex evaluations due to LLSC,
but the approach is valid in general. The size of the pointer is determined
from the matching range struct, which covers both 32-bit and 64-bit builds
including COMPAT.

The unlock sequence is going to be placed in the VDSO so that the kernel
can keep everything synchronized, especially the register usage. The
resulting code sequence for user space is:

   if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid)
 	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);

Both the VDSO unlock and the kernel side unlock ensure that the pending_op
pointer is always cleared when the lock becomes unlocked.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.773669210@kernel.org
---
 include/linux/futex.h | 39 +++++++++++++++++++++++++++++++-
 include/vdso/futex.h  | 52 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/entry/common.c |  9 ++++---
 kernel/futex/core.c   | 18 +++++++++++++++-
 4 files changed, 114 insertions(+), 4 deletions(-)
 create mode 100644 include/vdso/futex.h

diff --git a/include/linux/futex.h b/include/linux/futex.h
index cb2a182..51f4ccd 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -105,7 +105,41 @@ static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
 #endif /* !CONFIG_FUTEX */
 
 #ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+#include <asm/futex_robust.h>
+
 void futex_reset_cs_ranges(struct futex_mm_data *fd);
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr);
+
+static inline bool futex_within_robust_unlock(struct pt_regs *regs,
+					      struct futex_unlock_cs_range *csr)
+{
+	unsigned long ip = instruction_pointer(regs);
+
+	return ip >= csr->start_ip && ip < csr->start_ip + csr->len;
+}
+
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
+{
+	struct futex_unlock_cs_range *csr;
+
+	/*
+	 * Avoid dereferencing current->mm if not returning from interrupt.
+	 * current->rseq.event is going to be used subsequently, so bringing the
+	 * cache line in is not a big deal.
+	 */
+	if (!current->rseq.event.user_irq)
+		return;
+
+	csr = current->mm->futex.unlock.cs_ranges;
+
+	/* The loop is optimized out for !COMPAT */
+	for (int r = 0; r < FUTEX_ROBUST_MAX_CS_RANGES; r++, csr++) {
+		if (unlikely(futex_within_robust_unlock(regs, csr))) {
+			__futex_fixup_robust_unlock(regs, csr);
+			return;
+		}
+	}
+}
 
 static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
 					   unsigned long start, unsigned long end, bool sz32)
@@ -114,7 +148,10 @@ static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned in
 	fd->unlock.cs_ranges[idx].len = end - start;
 	fd->unlock.cs_ranges[idx].pop_size32 = sz32;
 }
-#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+#else /* CONFIG_FUTEX_ROBUST_UNLOCK */
+static inline void futex_fixup_robust_unlock(struct pt_regs *regs) { }
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
 
 #if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
 void futex_mm_init(struct mm_struct *mm);
diff --git a/include/vdso/futex.h b/include/vdso/futex.h
new file mode 100644
index 0000000..3cd175e
--- /dev/null
+++ b/include/vdso/futex.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _VDSO_FUTEX_H
+#define _VDSO_FUTEX_H
+
+#include <uapi/linux/types.h>
+
+/**
+ * __vdso_futex_robust_list64_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 64-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * The function implements:
+ *	if (atomic_try_cmpxchg(lock, &tid, 0))
+ *		*op = NULL;
+ *	return tid;
+ *
+ * There is a race between a successful unlock and clearing the pending op
+ * pointer in the robust list head. If the calling task is interrupted in the
+ * race window and has to handle a (fatal) signal on return to user space then
+ * the kernel handles the clearing of @pending_op before attempting to deliver
+ * the signal. That ensures that a task cannot exit with a potentially invalid
+ * pending op pointer.
+ *
+ * User space uses it in the following way:
+ *
+ * if (__vdso_futex_robust_list64_try_unlock(lock, tid, &pending_op) != tid)
+ *	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
+ *
+ * If the unlock attempt fails due to the FUTEX_WAITERS bit set in the lock,
+ * then the syscall does the unlock, clears the pending op pointer and wakes the
+ * requested number of waiters.
+ */
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop);
+
+/**
+ * __vdso_futex_robust_list32_try_unlock - Try to unlock an uncontended robust futex
+ *					   with a 32-bit pending op pointer
+ * @lock:	Pointer to the futex lock object
+ * @tid:	The TID of the calling task
+ * @pop:	Pointer to the task's robust_list_head::list_pending_op
+ *
+ * Return: The content of *@lock. On success this is the same as @tid.
+ *
+ * Same as __vdso_futex_robust_list64_try_unlock() just with a 32-bit @pop pointer.
+ */
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop);
+
+#endif
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 19d2244..e3d381f 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -1,11 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include <linux/irq-entry-common.h>
-#include <linux/resume_user_mode.h>
+#include <linux/futex.h>
 #include <linux/highmem.h>
+#include <linux/irq-entry-common.h>
 #include <linux/jump_label.h>
 #include <linux/kmsan.h>
 #include <linux/livepatch.h>
+#include <linux/resume_user_mode.h>
 #include <linux/tick.h>
 
 /* Workaround to allow gradual conversion of architecture code */
@@ -60,8 +61,10 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 		if (ti_work & _TIF_PATCH_PENDING)
 			klp_update_patch_state(current);
 
-		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
+		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
+			futex_fixup_robust_unlock(regs);
 			arch_do_signal_or_restart(regs);
+		}
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
 			resume_user_mode_work(regs);
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index aad6e50..6ea4a97 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -46,6 +46,8 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <vdso/futex.h>
+
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
@@ -1446,6 +1448,22 @@ bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
 	return robust_list_clear_pending(pop);
 }
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr)
+{
+	/*
+	 * arch_futex_robust_unlock_get_pop() returns the list pending op pointer from
+	 * @regs if the try_cmpxchg() succeeded.
+	 */
+	void __user *pop = arch_futex_robust_unlock_get_pop(regs);
+
+	if (!pop)
+		return;
+
+	futex_robust_list_clear_pending(pop, csr->pop_size32 ? FLAGS_ROBUST_LIST32 : 0);
+}
+#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+
 static void futex_cleanup(struct task_struct *tsk)
 {
 	if (unlikely(tsk->futex.robust_list)) {

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Add robust futex unlock IP range
  2026-06-02  9:09 ` [patch V5 10/16] futex: Add robust futex unlock IP range Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     042df0c1d48609a85580dcbaff498c95ced20a5f
Gitweb:        https://git.kernel.org/tip/042df0c1d48609a85580dcbaff498c95ced20a5f
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:59 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:51 +02:00

futex: Add robust futex unlock IP range

There will be a VDSO function to unlock robust futexes in user space. The
unlock sequence is racy vs. clearing the list_pending_op pointer in the
tasks robust list head. To plug this race the kernel needs to know the
instruction window. As the VDSO is per MM the addresses are stored in
mm_struct::futex.

Architectures which implement support for this have to update these
addresses when the VDSO is (re)mapped and indicate the pending op pointer
size which is matching the IP.

Arguably this could be resolved by chasing mm->context->vdso->image, but
that's architecture specific and requires to touch quite some cache
lines. Having it in mm::futex reduces the cache line impact and avoids
having yet another set of architecture specific functionality.

To support multi size robust list applications (gaming) this provides two
ranges when COMPAT is enabled.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.718926819@kernel.org
---
 include/linux/futex.h       | 21 +++++++++++++---
 include/linux/futex_types.h | 28 ++++++++++++++++++++++-
 init/Kconfig                |  6 +++++-
 kernel/futex/core.c         | 46 ++++++++++++++++++++++++++++--------
 4 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 9e6218c..cb2a182 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -81,11 +81,9 @@ int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 int futex_hash_allocate_default(void);
 void futex_hash_free(struct mm_struct *mm);
-void futex_mm_init(struct mm_struct *mm);
 #else  /* CONFIG_FUTEX_PRIVATE_HASH */
 static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline void futex_mm_init(struct mm_struct *mm) { }
 #endif /* !CONFIG_FUTEX_PRIVATE_HASH */
 
 #else  /* CONFIG_FUTEX */
@@ -104,7 +102,24 @@ static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsig
 }
 static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline void futex_mm_init(struct mm_struct *mm) { }
 #endif /* !CONFIG_FUTEX */
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+void futex_reset_cs_ranges(struct futex_mm_data *fd);
+
+static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
+					   unsigned long start, unsigned long end, bool sz32)
+{
+	fd->unlock.cs_ranges[idx].start_ip = start;
+	fd->unlock.cs_ranges[idx].len = end - start;
+	fd->unlock.cs_ranges[idx].pop_size32 = sz32;
+}
+#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+
+#if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
+void futex_mm_init(struct mm_struct *mm);
+#else
+static inline void futex_mm_init(struct mm_struct *mm) { }
+#endif
+
 #endif /* _LINUX_FUTEX_H */
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index d41557d..d320c05 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -55,12 +55,40 @@ struct futex_mm_phash {
 struct futex_mm_phash { };
 #endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+/**
+ * struct futex_unlock_cs_range - Range for the VDSO unlock critical section
+ * @start_ip:	The start IP of the robust futex unlock critical section (inclusive)
+ * @len:	The length of the robust futex unlock critical section
+ * @pop_size32:	Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit
+ */
+struct futex_unlock_cs_range {
+	unsigned long	       start_ip;
+	unsigned int	       len;
+	unsigned int	       pop_size32;
+};
+
+#define FUTEX_ROBUST_MAX_CS_RANGES	(1 + IS_ENABLED(CONFIG_COMPAT))
+
+/**
+ * struct futex_unlock_cs_ranges - Futex unlock VSDO critical sections
+ * @cs_ranges:	Array of critical section ranges
+ */
+struct futex_unlock_cs_ranges {
+	struct futex_unlock_cs_range	cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES];
+};
+#else  /* CONFIG_FUTEX_ROBUST_UNLOCK */
+struct futex_unlock_cs_ranges { };
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
 /**
  * struct futex_mm_data - Futex related per MM data
  * @phash:	Futex private hash related data
+ * @unlock:	Futex unlock VDSO critical sections
  */
 struct futex_mm_data {
 	struct futex_mm_phash		phash;
+	struct futex_unlock_cs_ranges	unlock;
 };
 #else  /* CONFIG_FUTEX */
 struct futex_sched_data { };
diff --git a/init/Kconfig b/init/Kconfig
index 2937c4d..165b08e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1842,6 +1842,12 @@ config FUTEX_MPOL
 	depends on FUTEX && NUMA
 	default y
 
+config HAVE_FUTEX_ROBUST_UNLOCK
+	bool
+
+config FUTEX_ROBUST_UNLOCK
+	def_bool FUTEX && HAVE_GENERIC_VDSO && GENERIC_IRQ_ENTRY && RSEQ && HAVE_FUTEX_ROBUST_UNLOCK
+
 config EPOLL
 	bool "Enable eventpoll support" if EXPERT
 	default y
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 77ccb77..aad6e50 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1761,11 +1761,11 @@ static bool futex_ref_is_dead(struct futex_private_hash *fph)
 	return atomic_long_read(&mm->futex.phash.atomic) == 0;
 }
 
-void futex_mm_init(struct mm_struct *mm)
+static void futex_hash_init_mm(struct futex_mm_data *fd)
 {
-	memset(&mm->futex, 0, sizeof(mm->futex));
-	mutex_init(&mm->futex.phash.lock);
-	mm->futex.phash.batches = get_state_synchronize_rcu();
+	memset(&fd->phash, 0, sizeof(fd->phash));
+	mutex_init(&fd->phash.lock);
+	fd->phash.batches = get_state_synchronize_rcu();
 }
 
 void futex_hash_free(struct mm_struct *mm)
@@ -1969,19 +1969,47 @@ static int futex_hash_get_slots(void)
 		return fph->hash_mask + 1;
 	return 0;
 }
+#else  /* CONFIG_FUTEX_PRIVATE_HASH */
+static inline int futex_hash_allocate(unsigned int hslots, unsigned int flags) { return -EINVAL; }
+static inline int futex_hash_get_slots(void) { return 0; }
+static inline void futex_hash_init_mm(struct futex_mm_data *fd) { }
+#endif /* !CONFIG_FUTEX_PRIVATE_HASH */
 
-#else
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void futex_invalidate_cs_ranges(struct futex_mm_data *fd)
+{
+	/*
+	 * Invalidate start_ip so that the quick check fails for ip >= start_ip
+	 * if VDSO is not mapped or the second slot is not available for compat
+	 * tasks as they use VDSO32 which does not provide the 64-bit pointer
+	 * variant.
+	 */
+	for (int i = 0; i < FUTEX_ROBUST_MAX_CS_RANGES; i++)
+		fd->unlock.cs_ranges[i].start_ip = ~0UL;
+}
 
-static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
+void futex_reset_cs_ranges(struct futex_mm_data *fd)
 {
-	return -EINVAL;
+	memset(fd->unlock.cs_ranges, 0, sizeof(fd->unlock.cs_ranges));
+	futex_invalidate_cs_ranges(fd);
 }
 
-static int futex_hash_get_slots(void)
+static void futex_robust_unlock_init_mm(struct futex_mm_data *fd)
 {
-	return 0;
+	/* mm_dup() preserves the range, mm_alloc() clears it */
+	if (!fd->unlock.cs_ranges[0].start_ip)
+		futex_invalidate_cs_ranges(fd);
 }
+#else  /* CONFIG_FUTEX_ROBUST_UNLOCK */
+static inline void futex_robust_unlock_init_mm(struct futex_mm_data *fd) { }
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
 
+#if defined(CONFIG_FUTEX_PRIVATE_HASH) || defined(CONFIG_FUTEX_ROBUST_UNLOCK)
+void futex_mm_init(struct mm_struct *mm)
+{
+	futex_hash_init_mm(&mm->futex);
+	futex_robust_unlock_init_mm(&mm->futex);
+}
 #endif
 
 int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Add support for unlocking robust futexes
  2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
  2026-06-03  8:22   ` Peter Zijlstra
  2026-06-03  8:35   ` Peter Zijlstra
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     3ca9595d9fb6cce6633a5b03d98c2aecb5499838
Gitweb:        https://git.kernel.org/tip/3ca9595d9fb6cce6633a5b03d98c2aecb5499838
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:55 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:51 +02:00

futex: Add support for unlocking robust futexes

Unlocking robust non-PI futexes happens in user space with the following
sequence:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);

  	lval = 0;
  3)	lval = atomic_xchg(lock, lval);
  4)	if (lval & WAITERS)
  5)		sys_futex(WAKE,....);
  6)	robust_list_clear_op_pending();

That opens a window between #3 and #6 where the mutex could be acquired by
some other task which observes that it is the last user and:

  A) unmaps the mutex memory
  B) maps a different file, which ends up covering the same address

When the original task exits before reaching #6 then the kernel robust list
handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupting unrelated data.

PI futexes have a similar problem both for the non-contented user space
unlock and the in kernel unlock:

  1)	robust_list_set_op_pending(mutex);
  2)	robust_list_remove(mutex);

  	lval = gettid();
  3)	if (!atomic_try_cmpxchg(lock, lval, 0))
  4)		sys_futex(UNLOCK_PI,....);
  5)	robust_list_clear_op_pending();

Address the first part of the problem where the futexes have waiters and
need to enter the kernel anyway. Add a new FUTEX_ROBUST_UNLOCK flag, which
is valid for the sys_futex() FUTEX_UNLOCK_PI, FUTEX_WAKE, FUTEX_WAKE_BITSET
operations.

This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear
whether this is needed and there is no usage of it in glibc either to
investigate.

For the futex2 syscall family this needs to be implemented with a new
syscall.

The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
robust_list_head::list_pending_op into the kernel. This argument is only
evaluated when the FUTEX_ROBUST_UNLOCK bit is set and is therefore backward
compatible.

This is an explicit argument to avoid the lookup of the robust list pointer
and retrieving the pending op pointer from there. User space has the
pointer already available so it can just put it into the @uaddr2
argument. Aside of that this allows the usage of multiple robust lists in
the future without any changes to the internal functions as they just operate
on the provided pointer.

This requires a second flag FUTEX_ROBUST_LIST32 which indicates that the
robust list pointer points to an u32 and not to an u64. This is required
for two reasons:

    1) sys_futex() has no compat variant

    2) The gaming emulators use both both 64-bit and compat 32-bit robust
       lists in the same 64-bit application

As a consequence 32-bit applications have to set this flag unconditionally
so they can run on a 64-bit kernel in compat mode unmodified. 32-bit
kernels return an error code when the flag is not set. 64-bit kernels will
happily clear the full 64 bits if user space fails to set it.

In case of FUTEX_UNLOCK_PI this clears the robust list pending op when the
unlock succeeded. In case of errors, the user space value is still locked
by the caller and therefore the above cannot happen.

In case of FUTEX_WAKE* this does the unlock of the futex in the kernel and
clears the robust list pending op when the unlock was successful. If not,
the user space value is still locked and user space has to deal with the
returned error. That means that the unlocking of non-PI robust futexes has
to use the same try_cmpxchg() unlock scheme as PI futexes.

If the clearing of the pending list op fails (fault) then the kernel clears
the registered robust list pointer if it matches to prevent that exit()
will try to handle invalid data. That's a valid paranoid decision because
the robust list head sits usually in the TLS and if the TLS is not longer
accessible then the chance for fixing up the resulting mess is very close
to zero.

The problem of non-contended unlocks still exists and will be addressed
separately.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.670514505@kernel.org
---
 include/uapi/linux/futex.h | 29 +++++++++++++++++++-
 io_uring/futex.c           |  2 +-
 kernel/futex/core.c        | 53 +++++++++++++++++++++++++++++++++++--
 kernel/futex/futex.h       | 15 ++++++++--
 kernel/futex/pi.c          | 15 ++++++++--
 kernel/futex/syscalls.c    | 13 ++++++---
 kernel/futex/waitwake.c    | 30 +++++++++++++++++++--
 7 files changed, 144 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 75df1ea..10a36c5 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,8 +25,11 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
+#define FUTEX_ROBUST_UNLOCK	512
+#define FUTEX_ROBUST_LIST32	1024
 
-#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | \
+					  FUTEX_ROBUST_UNLOCK | FUTEX_ROBUST_LIST32)
 
 #define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
@@ -43,6 +46,30 @@
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
 
 /*
+ * Operations to unlock a futex, clear the robust list pending op pointer and
+ * wake waiters.
+ */
+#define FUTEX_UNLOCK_PI_LIST64			(FUTEX_UNLOCK_PI | FUTEX_ROBUST_UNLOCK)
+#define FUTEX_UNLOCK_PI_LIST64_PRIVATE		(FUTEX_UNLOCK_PI_LIST64 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_LIST32			(FUTEX_UNLOCK_PI | FUTEX_ROBUST_UNLOCK | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_PI_LIST32_PRIVATE		(FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST64		(FUTEX_WAKE | FUTEX_ROBUST_UNLOCK)
+#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE	(FUTEX_UNLOCK_WAKE_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_WAKE_LIST32		(FUTEX_WAKE | FUTEX_ROBUST_UNLOCK | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE	(FUTEX_UNLOCK_WAKE_LIST32 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST64		(FUTEX_WAKE_BITSET | FUTEX_ROBUST_UNLOCK)
+#define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_UNLOCK_BITSET_LIST32		(FUTEX_WAKE_BITSET | FUTEX_ROBUST_UNLOCK | \
+						 FUTEX_ROBUST_LIST32)
+#define FUTEX_UNLOCK_BITSET_LIST32_PRIVATE	(FUTEX_UNLOCK_BITSET_LIST32 | FUTEX_PRIVATE_FLAG)
+
+/*
  * Flags for futex2 syscalls.
  *
  * NOTE: these are not pure flags, they can also be seen as:
diff --git a/io_uring/futex.c b/io_uring/futex.c
index 9cc1788..906701b 100644
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -327,7 +327,7 @@ int io_futex_wake(struct io_kiocb *req, unsigned int issue_flags)
 	 * Strict flags - ensure that waking 0 futexes yields a 0 result.
 	 * See commit 43adf8449510 ("futex: FLAGS_STRICT") for details.
 	 */
-	ret = futex_wake(iof->uaddr, FLAGS_STRICT | iof->futex_flags,
+	ret = futex_wake(iof->uaddr, FLAGS_STRICT | iof->futex_flags, NULL,
 			 iof->futex_val, iof->futex_mask);
 	if (ret < 0)
 		req_set_fail(req);
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 61f4f55..77ccb77 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1062,7 +1062,7 @@ retry:
 	owner = uval & FUTEX_TID_MASK;
 
 	if (pending_op && !pi && !owner) {
-		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, NULL, 1,
 			   FUTEX_BITSET_MATCH_ANY);
 		return 0;
 	}
@@ -1116,7 +1116,7 @@ retry:
 	 * PI futexes happens in exit_pi_state():
 	 */
 	if (!pi && (uval & FUTEX_WAITERS)) {
-		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, NULL, 1,
 			   FUTEX_BITSET_MATCH_ANY);
 	}
 
@@ -1208,6 +1208,27 @@ static void exit_robust_list(struct task_struct *curr)
 	}
 }
 
+static bool robust_list_clear_pending(unsigned long __user *pop)
+{
+	struct robust_list_head __user *head = current->futex.robust_list;
+
+	if (!put_user(0UL, pop))
+		return true;
+
+	/*
+	 * Just give up. The robust list head is usually part of TLS, so the
+	 * chance that this gets resolved is close to zero.
+	 *
+	 * If @pop_addr is the robust_list_head::list_op_pending pointer then
+	 * clear the robust list head pointer to prevent further damage when the
+	 * task exits.  Better a few stale futexes than corrupted memory. But
+	 * that's mostly an academic exercise.
+	 */
+	if (pop == (unsigned long __user *)&head->list_op_pending)
+		current->futex.robust_list = NULL;
+	return false;
+}
+
 #ifdef CONFIG_COMPAT
 static void __user *futex_uaddr(struct robust_list __user *entry,
 				compat_long_t futex_offset)
@@ -1304,6 +1325,21 @@ static void compat_exit_robust_list(struct task_struct *curr)
 		handle_futex_death(uaddr, curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
+
+static bool compat_robust_list_clear_pending(u32 __user *pop)
+{
+	struct compat_robust_list_head __user *head = current->futex.compat_robust_list;
+
+	if (!put_user(0U, pop))
+		return true;
+
+	/* See comment in robust_list_clear_pending(). */
+	if (pop == &head->list_op_pending)
+		current->futex.compat_robust_list = NULL;
+	return false;
+}
+#else
+static bool compat_robust_list_clear_pending(u32 __user *pop_addr) { return false; }
 #endif
 
 #ifdef CONFIG_FUTEX_PI
@@ -1397,6 +1433,19 @@ static void exit_pi_state_list(struct task_struct *curr)
 static inline void exit_pi_state_list(struct task_struct *curr) { }
 #endif
 
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags)
+{
+	bool size32bit = !!(flags & FLAGS_ROBUST_LIST32);
+
+	if (!IS_ENABLED(CONFIG_64BIT) && !size32bit)
+		return false;
+
+	if (IS_ENABLED(CONFIG_64BIT) && size32bit)
+		return compat_robust_list_clear_pending(pop);
+
+	return robust_list_clear_pending(pop);
+}
+
 static void futex_cleanup(struct task_struct *tsk)
 {
 	if (unlikely(tsk->futex.robust_list)) {
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 9f6bf6f..79ef2c7 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -40,6 +40,8 @@
 #define FLAGS_NUMA		0x0080
 #define FLAGS_STRICT		0x0100
 #define FLAGS_MPOL		0x0200
+#define FLAGS_ROBUST_UNLOCK	0x0400
+#define FLAGS_ROBUST_LIST32	0x0800
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
@@ -52,6 +54,12 @@ static inline unsigned int futex_to_flags(unsigned int op)
 	if (op & FUTEX_CLOCK_REALTIME)
 		flags |= FLAGS_CLOCKRT;
 
+	if (op & FUTEX_ROBUST_UNLOCK)
+		flags |= FLAGS_ROBUST_UNLOCK;
+
+	if (op & FUTEX_ROBUST_LIST32)
+		flags |= FLAGS_ROBUST_LIST32;
+
 	return flags;
 }
 
@@ -449,13 +457,16 @@ extern int futex_unqueue_multiple(struct futex_vector *v, int count);
 extern int futex_wait_multiple(struct futex_vector *vs, unsigned int count,
 			       struct hrtimer_sleeper *to);
 
-extern int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset);
+extern int futex_wake(u32 __user *uaddr, unsigned int flags, void __user *pop,
+		      int nr_wake, u32 bitset);
 
 extern int futex_wake_op(u32 __user *uaddr1, unsigned int flags,
 			 u32 __user *uaddr2, int nr_wake, int nr_wake2, int op);
 
-extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags);
+extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop);
 
 extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock);
 
+bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags);
+
 #endif /* _FUTEX_H */
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index e037a97..9dd5c0b 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -1139,7 +1139,7 @@ out:
  * This is the in-kernel slowpath: we look up the PI state (if any),
  * and do the rt-mutex unlock.
  */
-int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
+static int __futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 {
 	u32 curval, uval, vpid = task_pid_vnr(current);
 	union futex_key key = FUTEX_KEY_INIT;
@@ -1148,7 +1148,6 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 
 	if (!IS_ENABLED(CONFIG_FUTEX_PI))
 		return -ENOSYS;
-
 retry:
 	if (get_user(uval, uaddr))
 		return -EFAULT;
@@ -1302,3 +1301,15 @@ pi_faulted:
 	return ret;
 }
 
+int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop)
+{
+	int ret = __futex_unlock_pi(uaddr, flags);
+
+	if (ret || !(flags & FLAGS_ROBUST_UNLOCK))
+		return ret;
+
+	if (!futex_robust_list_clear_pending(pop, flags))
+		return -EFAULT;
+
+	return 0;
+}
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 8944ff4..2fa19d9 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -118,6 +118,13 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 			return -ENOSYS;
 	}
 
+	if (flags & FLAGS_ROBUST_UNLOCK) {
+		if (cmd != FUTEX_WAKE &&
+		    cmd != FUTEX_WAKE_BITSET &&
+		    cmd != FUTEX_UNLOCK_PI)
+			return -ENOSYS;
+	}
+
 	switch (cmd) {
 	case FUTEX_WAIT:
 		val3 = FUTEX_BITSET_MATCH_ANY;
@@ -128,7 +135,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		val3 = FUTEX_BITSET_MATCH_ANY;
 		fallthrough;
 	case FUTEX_WAKE_BITSET:
-		return futex_wake(uaddr, flags, val, val3);
+		return futex_wake(uaddr, flags, uaddr2, val, val3);
 	case FUTEX_REQUEUE:
 		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0);
 	case FUTEX_CMP_REQUEUE:
@@ -141,7 +148,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	case FUTEX_LOCK_PI2:
 		return futex_lock_pi(uaddr, flags, timeout, 0);
 	case FUTEX_UNLOCK_PI:
-		return futex_unlock_pi(uaddr, flags);
+		return futex_unlock_pi(uaddr, flags, uaddr2);
 	case FUTEX_TRYLOCK_PI:
 		return futex_lock_pi(uaddr, flags, NULL, 1);
 	case FUTEX_WAIT_REQUEUE_PI:
@@ -375,7 +382,7 @@ SYSCALL_DEFINE4(futex_wake,
 	if (!futex_validate_input(flags, mask))
 		return -EINVAL;
 
-	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
+	return futex_wake(uaddr, FLAGS_STRICT | flags, NULL, nr, mask);
 }
 
 /*
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index ceed9d8..8f5e5d3 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -150,12 +150,35 @@ void futex_wake_mark(struct wake_q_head *wake_q, struct futex_q *q)
 }
 
 /*
+ * If requested, clear the robust list pending op and unlock the futex
+ */
+static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __user *pop)
+{
+	if (!(flags & FLAGS_ROBUST_UNLOCK))
+		return true;
+
+	/* First unlock the futex, which requires release semantics. */
+	scoped_user_write_access(uaddr, efault)
+		unsafe_atomic_store_release_user(0, uaddr, efault);
+
+	/*
+	 * Clear the pending list op now. If that fails, then the task is in
+	 * deeper trouble as the robust list head is usually part of the TLS.
+	 * The chance of survival is close to zero.
+	 */
+	return futex_robust_list_clear_pending(pop, flags);
+
+efault:
+	return false;
+}
+
+/*
  * Wake up waiters matching bitset queued on this futex (uaddr).
  */
-int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
+int futex_wake(u32 __user *uaddr, unsigned int flags, void __user *pop, int nr_wake, u32 bitset)
 {
-	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
+	struct futex_q *this, *next;
 	DEFINE_WAKE_Q(wake_q);
 	int ret;
 
@@ -166,6 +189,9 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	if (unlikely(ret != 0))
 		return ret;
 
+	if (!futex_robust_unlock(uaddr, flags, pop))
+		return -EFAULT;
+
 	if ((flags & FLAGS_STRICT) && !nr_wake)
 		return 0;
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Cleanup UAPI defines
  2026-06-02  9:09 ` [patch V5 08/16] futex: Cleanup UAPI defines Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     1fd053d26f0333485cdbaa9d6e7b8cb53f54de95
Gitweb:        https://git.kernel.org/tip/1fd053d26f0333485cdbaa9d6e7b8cb53f54de95
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:51 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:51 +02:00

futex: Cleanup UAPI defines

Make the operand defines tabular for readability sake.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.615600933@kernel.org
---
 include/uapi/linux/futex.h | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 29bf2f6..75df1ea 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -25,23 +25,22 @@
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
-#define FUTEX_CMD_MASK		~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
-
-#define FUTEX_WAIT_PRIVATE	(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_PRIVATE	(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_REQUEUE_PRIVATE	(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_OP_PRIVATE	(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI_PRIVATE	(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_LOCK_PI2_PRIVATE	(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
-#define FUTEX_UNLOCK_PI_PRIVATE	(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
+
+#define FUTEX_CMD_MASK			~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+
+#define FUTEX_WAIT_PRIVATE		(FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_PRIVATE		(FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_REQUEUE_PRIVATE		(FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PRIVATE	(FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_OP_PRIVATE		(FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI_PRIVATE		(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PI2_PRIVATE		(FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG)
+#define FUTEX_UNLOCK_PI_PRIVATE		(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_TRYLOCK_PI_PRIVATE	(FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
-#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
-					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
 
 /*
  * Flags for futex2 syscalls.

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] x86: Select ARCH_MEMORY_ORDER_TSO
  2026-06-02  9:09 ` [patch V5 07/16] x86: Select ARCH_MEMORY_ORDER_TSO Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     7b125c44d0b7f617ee81dffd14ce116149d03cb6
Gitweb:        https://git.kernel.org/tip/7b125c44d0b7f617ee81dffd14ce116149d03cb6
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:47 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:50 +02:00

x86: Select ARCH_MEMORY_ORDER_TSO

The generic unsafe_atomic_store_release_user() implementation does:

    if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TSO))
        smp_mb();
    unsafe_put_user();

As x86 implements Total Store Order (TSO) which means stores imply release,
select ARCH_MEMORY_ORDER_TSO to avoid the unnecessary smp_mb().

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.564499644@kernel.org
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb0..1ce62a9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -113,6 +113,7 @@ config X86
 	select ARCH_HAS_ZONE_DMA_SET if EXPERT
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_HAVE_EXTRA_ELF_NOTES
+	select ARCH_MEMORY_ORDER_TSO
 	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] uaccess: Provide unsafe_atomic_store_release_user()
  2026-06-02  9:09 ` [patch V5 06/16] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
@ 2026-06-03 14:24   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), andrealmeid, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     6149fc36c09b91050b62e8e68a91027df8df7345
Gitweb:        https://git.kernel.org/tip/6149fc36c09b91050b62e8e68a91027df8df7345
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:42 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:50 +02:00

uaccess: Provide unsafe_atomic_store_release_user()

The upcoming support for unlocking robust futexes in the kernel requires
store release semantics. Syscalls do not imply memory ordering on all
architectures so the unlock operation requires a barrier.

This barrier can be avoided when stores imply release like on x86.

Provide a generic version with a smp_mb() before the unsafe_put_user(),
which can be overridden by architectures.

Provide also a ARCH_MEMORY_ORDER_TSO Kconfig option, which can be selected
by architectures with Total Store Order (TSO), where store implies release,
so that the smp_mb() in the generic implementation can be avoided.

If that is set a barrier() is used instead of smp_mb(), which is not
required for the use case at hand, but makes it future proof for other
usage to prevent the compiler from reordering.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.513181528@kernel.org
---
 arch/Kconfig            |  4 ++++
 include/linux/uaccess.h | 11 +++++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index e868800..83d362f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -403,6 +403,10 @@ config ARCH_32BIT_OFF_T
 config ARCH_32BIT_USTAT_F_TINODE
 	bool
 
+# Selected by architectures with Total Store Order (TSO)
+config ARCH_MEMORY_ORDER_TSO
+	bool
+
 config HAVE_ASM_MODVERSIONS
 	bool
 	help
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5632860..c6bd200 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -649,6 +649,17 @@ static inline void user_access_restore(unsigned long flags) { }
 #define user_read_access_end user_access_end
 #endif
 
+#ifndef unsafe_atomic_store_release_user
+# define unsafe_atomic_store_release_user(val, uptr, elbl)	\
+	do {							\
+		if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TSO))	\
+			smp_mb();				\
+		else						\
+			barrier();				\
+		unsafe_put_user(val, uptr, elbl);		\
+	} while (0)
+#endif
+
 /* Define RW variant so the below _mode macro expansion works */
 #define masked_user_rw_access_begin(u)	masked_user_access_begin(u)
 #define user_rw_access_begin(u, s)	user_access_begin(u, s)

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Provide UABI defines for robust list entry modifiers
  2026-06-02  9:09 ` [patch V5 05/16] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
@ 2026-06-03 14:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), Mathieu Desnoyers,
	andrealmeid, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     2cb5251d3d64d57c172185b9b608f704b3015f26
Gitweb:        https://git.kernel.org/tip/2cb5251d3d64d57c172185b9b608f704b3015f26
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:38 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:50 +02:00

futex: Provide UABI defines for robust list entry modifiers

The marker for PI futexes in the robust list is a hardcoded 0x1 which lacks
any sensible form of documentation.

Provide proper defines for the bit and the mask and fix up the usage
sites. Thereby convert the boolean pi argument into a modifier argument,
which allows new modifier bits to be trivially added and conveyed.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.458758556@kernel.org
---
 include/uapi/linux/futex.h |  4 +++-
 kernel/futex/core.c        | 53 +++++++++++++++++--------------------
 2 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 7e2744e..29bf2f6 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -177,6 +177,10 @@ struct robust_list_head {
  */
 #define ROBUST_LIST_LIMIT	2048
 
+/* Modifiers for robust_list_head::list_op_pending */
+#define FUTEX_ROBUST_MOD_PI		(0x1UL)
+#define FUTEX_ROBUST_MOD_MASK		(FUTEX_ROBUST_MOD_PI)
+
 /*
  * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a
  * match of any bit.
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 79456b0..61f4f55 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1008,8 +1008,9 @@ void futex_unqueue_pi(struct futex_q *q)
  * dying task, and do notification if so:
  */
 static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
-			      bool pi, bool pending_op)
+			      unsigned int mod, bool pending_op)
 {
+	bool pi = !!(mod & FUTEX_ROBUST_MOD_PI);
 	u32 uval, nval, mval;
 	pid_t owner;
 	int err;
@@ -1127,21 +1128,21 @@ retry:
  */
 static inline int fetch_robust_entry(struct robust_list __user **entry,
 				     struct robust_list __user * __user *head,
-				     unsigned int *pi)
+				     unsigned int *mod)
 {
 	unsigned long uentry;
 
 	if (get_user(uentry, (unsigned long __user *)head))
 		return -EFAULT;
 
-	*entry = (void __user *)(uentry & ~1UL);
-	*pi = uentry & 1;
+	*entry = (void __user *)(uentry & ~FUTEX_ROBUST_MOD_MASK);
+	*mod = uentry & FUTEX_ROBUST_MOD_MASK;
 
 	return 0;
 }
 
 /*
- * Walk curr->robust_list (very carefully, it's a userspace list!)
+ * Walk curr->futex.robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
@@ -1149,9 +1150,8 @@ static inline int fetch_robust_entry(struct robust_list __user **entry,
 static void exit_robust_list(struct task_struct *curr)
 {
 	struct robust_list_head __user *head = curr->futex.robust_list;
+	unsigned int limit = ROBUST_LIST_LIMIT, cur_mod, next_mod, pend_mod;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
-	unsigned int next_pi;
 	unsigned long futex_offset;
 	int rc;
 
@@ -1159,7 +1159,7 @@ static void exit_robust_list(struct task_struct *curr)
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (fetch_robust_entry(&entry, &head->list.next, &pi))
+	if (fetch_robust_entry(&entry, &head->list.next, &cur_mod))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1170,7 +1170,7 @@ static void exit_robust_list(struct task_struct *curr)
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (fetch_robust_entry(&pending, &head->list_op_pending, &pip))
+	if (fetch_robust_entry(&pending, &head->list_op_pending, &pend_mod))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1179,20 +1179,20 @@ static void exit_robust_list(struct task_struct *curr)
 		 * Fetch the next entry in the list before calling
 		 * handle_futex_death:
 		 */
-		rc = fetch_robust_entry(&next_entry, &entry->next, &next_pi);
+		rc = fetch_robust_entry(&next_entry, &entry->next, &next_mod);
 		/*
 		 * A pending lock might already be on the list, so
 		 * don't process it twice:
 		 */
 		if (entry != pending) {
 			if (handle_futex_death((void __user *)entry + futex_offset,
-						curr, pi, HANDLE_DEATH_LIST))
+						curr, cur_mod, HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
 			return;
 		entry = next_entry;
-		pi = next_pi;
+		cur_mod = next_mod;
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
@@ -1204,7 +1204,7 @@ static void exit_robust_list(struct task_struct *curr)
 
 	if (pending) {
 		handle_futex_death((void __user *)pending + futex_offset,
-				   curr, pip, HANDLE_DEATH_PENDING);
+				   curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
 
@@ -1223,29 +1223,28 @@ static void __user *futex_uaddr(struct robust_list __user *entry,
  */
 static inline int
 compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
-		   compat_uptr_t __user *head, unsigned int *pi)
+		   compat_uptr_t __user *head, unsigned int *pflags)
 {
 	if (get_user(*uentry, head))
 		return -EFAULT;
 
-	*entry = compat_ptr((*uentry) & ~1);
-	*pi = (unsigned int)(*uentry) & 1;
+	*entry = compat_ptr((*uentry) & ~FUTEX_ROBUST_MOD_MASK);
+	*pflags = (unsigned int)(*uentry) & FUTEX_ROBUST_MOD_MASK;
 
 	return 0;
 }
 
 /*
- * Walk curr->robust_list (very carefully, it's a userspace list!)
+ * Walk curr->futex.robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
  */
 static void compat_exit_robust_list(struct task_struct *curr)
 {
-	struct compat_robust_list_head __user *head = curr->futex.compat_robust_list;
+	struct compat_robust_list_head __user *head = current->futex.compat_robust_list;
+	unsigned int limit = ROBUST_LIST_LIMIT, cur_mod, next_mod, pend_mod;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
-	unsigned int next_pi;
 	compat_uptr_t uentry, next_uentry, upending;
 	compat_long_t futex_offset;
 	int rc;
@@ -1254,7 +1253,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi))
+	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &cur_mod))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1265,8 +1264,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (compat_fetch_robust_entry(&upending, &pending,
-			       &head->list_op_pending, &pip))
+	if (compat_fetch_robust_entry(&upending, &pending, &head->list_op_pending, &pend_mod))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1276,7 +1274,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 		 * handle_futex_death:
 		 */
 		rc = compat_fetch_robust_entry(&next_uentry, &next_entry,
-			(compat_uptr_t __user *)&entry->next, &next_pi);
+			(compat_uptr_t __user *)&entry->next, &next_mod);
 		/*
 		 * A pending lock might already be on the list, so
 		 * dont process it twice:
@@ -1284,15 +1282,14 @@ static void compat_exit_robust_list(struct task_struct *curr)
 		if (entry != pending) {
 			void __user *uaddr = futex_uaddr(entry, futex_offset);
 
-			if (handle_futex_death(uaddr, curr, pi,
-					       HANDLE_DEATH_LIST))
+			if (handle_futex_death(uaddr, curr, cur_mod, HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
 			return;
 		uentry = next_uentry;
 		entry = next_entry;
-		pi = next_pi;
+		cur_mod = next_mod;
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
@@ -1304,7 +1301,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	if (pending) {
 		void __user *uaddr = futex_uaddr(pending, futex_offset);
 
-		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
+		handle_futex_death(uaddr, curr, pend_mod, HANDLE_DEATH_PENDING);
 	}
 }
 #endif

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Move futex related mm_struct data into a struct
  2026-06-02  9:09 ` [patch V5 04/16] futex: Move futex related mm_struct data into a struct Thomas Gleixner
@ 2026-06-03 14:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     1f7f4816b9b05e5110bc1c8a05c3c478e2dae11b
Gitweb:        https://git.kernel.org/tip/1f7f4816b9b05e5110bc1c8a05c3c478e2dae11b
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:34 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:49 +02:00

futex: Move futex related mm_struct data into a struct

Having all these members in mm_struct along with the required #ifdeffery is
annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

The extra struct for the private hash is intentional to make integration of
other conditional mechanisms easier in terms of initialization and separation.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260602090535.407756793@kernel.org
---
 include/linux/futex_types.h |  36 ++++++++-
 include/linux/mm_types.h    |  12 +---
 kernel/futex/core.c         | 133 ++++++++++++++++-------------------
 3 files changed, 98 insertions(+), 83 deletions(-)

diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index 9c6c0dc..d41557d 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -3,6 +3,7 @@
 #define _LINUX_FUTEX_TYPES_H
 
 #ifdef CONFIG_FUTEX
+#include <linux/compiler_types.h>
 #include <linux/mutex_types.h>
 #include <linux/types.h>
 
@@ -29,8 +30,41 @@ struct futex_sched_data {
 	struct mutex				exit_mutex;
 	unsigned int				state;
 };
-#else
+
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+/**
+ * struct futex_mm_phash - Futex private hash related per MM data
+ * @lock:	Mutex to protect the private hash operations
+ * @hash:	RCU managed pointer to the private hash
+ * @hash_new:	Pointer to a newly allocated private hash
+ * @batches:	Batch state for RCU synchronization
+ * @rcu:	RCU head for call_rcu()
+ * @atomic:	Aggregate value for @hash_ref
+ * @ref:	Per CPU reference counter for a private hash
+ */
+struct futex_mm_phash {
+	struct mutex			lock;
+	struct futex_private_hash	__rcu *hash;
+	struct futex_private_hash	*hash_new;
+	unsigned long			batches;
+	struct rcu_head			rcu;
+	atomic_long_t			atomic;
+	unsigned int			__percpu *ref;
+};
+#else  /* CONFIG_FUTEX_ROBUST_UNLOCK */
+struct futex_mm_phash { };
+#endif /* !CONFIG_FUTEX_ROBUST_UNLOCK */
+
+/**
+ * struct futex_mm_data - Futex related per MM data
+ * @phash:	Futex private hash related data
+ */
+struct futex_mm_data {
+	struct futex_mm_phash		phash;
+};
+#else  /* CONFIG_FUTEX */
 struct futex_sched_data { };
+struct futex_mm_data { };
 #endif /* !CONFIG_FUTEX */
 
 #endif /* _LINUX_FUTEX_TYPES_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c..1d0c8d8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -20,6 +20,7 @@
 #include <linux/seqlock.h>
 #include <linux/percpu_counter.h>
 #include <linux/types.h>
+#include <linux/futex_types.h>
 #include <linux/rseq_types.h>
 #include <linux/bitmap.h>
 
@@ -1270,16 +1271,7 @@ struct mm_struct {
 		 */
 		seqcount_t mm_lock_seq;
 #endif
-#ifdef CONFIG_FUTEX_PRIVATE_HASH
-		struct mutex			futex_hash_lock;
-		struct futex_private_hash	__rcu *futex_phash;
-		struct futex_private_hash	*futex_phash_new;
-		/* futex-ref */
-		unsigned long			futex_batches;
-		struct rcu_head			futex_rcu;
-		atomic_long_t			futex_atomic;
-		unsigned int			__percpu *futex_ref;
-#endif
+		struct futex_mm_data	futex;
 
 		unsigned long hiwater_rss; /* High-watermark of RSS usage */
 		unsigned long hiwater_vm;  /* High-water virtual memory usage */
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ec23de4..79456b0 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -188,13 +188,13 @@ __futex_hash_private(union futex_key *key, struct futex_private_hash *fph)
 		return NULL;
 
 	if (!fph)
-		fph = rcu_dereference(key->private.mm->futex_phash);
+		fph = rcu_dereference(key->private.mm->futex.phash.hash);
 	if (!fph || !fph->hash_mask)
 		return NULL;
 
-	hash = jhash2((void *)&key->private.address,
-		      sizeof(key->private.address) / 4,
+	hash = jhash2((void *)&key->private.address, sizeof(key->private.address) / 4,
 		      key->both.offset);
+
 	return &fph->queues[hash & fph->hash_mask];
 }
 
@@ -233,18 +233,17 @@ static void futex_rehash_private(struct futex_private_hash *old,
 	}
 }
 
-static bool __futex_pivot_hash(struct mm_struct *mm,
-			       struct futex_private_hash *new)
+static bool __futex_pivot_hash(struct mm_struct *mm, struct futex_private_hash *new)
 {
+	struct futex_mm_phash *mmph = &mm->futex.phash;
 	struct futex_private_hash *fph;
 
-	WARN_ON_ONCE(mm->futex_phash_new);
+	WARN_ON_ONCE(mmph->hash_new);
 
-	fph = rcu_dereference_protected(mm->futex_phash,
-					lockdep_is_held(&mm->futex_hash_lock));
+	fph = rcu_dereference_protected(mmph->hash, lockdep_is_held(&mmph->lock));
 	if (fph) {
 		if (!futex_ref_is_dead(fph)) {
-			mm->futex_phash_new = new;
+			mmph->hash_new = new;
 			return false;
 		}
 
@@ -252,8 +251,8 @@ static bool __futex_pivot_hash(struct mm_struct *mm,
 	}
 	new->state = FR_PERCPU;
 	scoped_guard(rcu) {
-		mm->futex_batches = get_state_synchronize_rcu();
-		rcu_assign_pointer(mm->futex_phash, new);
+		mmph->batches = get_state_synchronize_rcu();
+		rcu_assign_pointer(mmph->hash, new);
 	}
 	kvfree_rcu(fph, rcu);
 	return true;
@@ -261,12 +260,12 @@ static bool __futex_pivot_hash(struct mm_struct *mm,
 
 static void futex_pivot_hash(struct mm_struct *mm)
 {
-	scoped_guard(mutex, &mm->futex_hash_lock) {
+	scoped_guard(mutex, &mm->futex.phash.lock) {
 		struct futex_private_hash *fph;
 
-		fph = mm->futex_phash_new;
+		fph = mm->futex.phash.hash_new;
 		if (fph) {
-			mm->futex_phash_new = NULL;
+			mm->futex.phash.hash_new = NULL;
 			__futex_pivot_hash(mm, fph);
 		}
 	}
@@ -289,7 +288,7 @@ again:
 	scoped_guard(rcu) {
 		struct futex_private_hash *fph;
 
-		fph = rcu_dereference(mm->futex_phash);
+		fph = rcu_dereference(mm->futex.phash.hash);
 		if (!fph)
 			return NULL;
 
@@ -412,8 +411,7 @@ static int futex_mpol(struct mm_struct *mm, unsigned long addr)
  * private hash) is returned if existing. Otherwise a hash bucket from the
  * global hash is returned.
  */
-static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph)
+static struct futex_hash_bucket *__futex_hash(union futex_key *key, struct futex_private_hash *fph)
 {
 	int node = key->both.node;
 	u32 hash;
@@ -426,8 +424,7 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph)
 			return hb;
 	}
 
-	hash = jhash2((u32 *)key,
-		      offsetof(typeof(*key), both.offset) / sizeof(u32),
+	hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / sizeof(u32),
 		      key->both.offset);
 
 	if (node == FUTEX_NO_NODE) {
@@ -442,8 +439,7 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph)
 		 */
 		node = (hash >> futex_hashshift) % nr_node_ids;
 		if (!node_possible(node)) {
-			node = find_next_bit_wrap(node_possible_map.bits,
-						  nr_node_ids, node);
+			node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
 		}
 	}
 
@@ -460,9 +456,8 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph)
  * Return: Initialized hrtimer_sleeper structure or NULL if no timeout
  *	   value given
  */
-struct hrtimer_sleeper *
-futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
-		  int flags, u64 range_ns)
+struct hrtimer_sleeper *futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
+					  int flags, u64 range_ns)
 {
 	if (!time)
 		return NULL;
@@ -1554,17 +1549,17 @@ static void __futex_ref_atomic_begin(struct futex_private_hash *fph)
 	 * otherwise it would be impossible for it to have reported success
 	 * from futex_ref_is_dead().
 	 */
-	WARN_ON_ONCE(atomic_long_read(&mm->futex_atomic) != 0);
+	WARN_ON_ONCE(atomic_long_read(&mm->futex.phash.atomic) != 0);
 
 	/*
 	 * Set the atomic to the bias value such that futex_ref_{get,put}()
 	 * will never observe 0. Will be fixed up in __futex_ref_atomic_end()
 	 * when folding in the percpu count.
 	 */
-	atomic_long_set(&mm->futex_atomic, LONG_MAX);
+	atomic_long_set(&mm->futex.phash.atomic, LONG_MAX);
 	smp_store_release(&fph->state, FR_ATOMIC);
 
-	call_rcu_hurry(&mm->futex_rcu, futex_ref_rcu);
+	call_rcu_hurry(&mm->futex.phash.rcu, futex_ref_rcu);
 }
 
 static void __futex_ref_atomic_end(struct futex_private_hash *fph)
@@ -1585,7 +1580,7 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
 	 * Therefore the per-cpu counter is now stable, sum and reset.
 	 */
 	for_each_possible_cpu(cpu) {
-		unsigned int *ptr = per_cpu_ptr(mm->futex_ref, cpu);
+		unsigned int *ptr = per_cpu_ptr(mm->futex.phash.ref, cpu);
 		count += *ptr;
 		*ptr = 0;
 	}
@@ -1593,7 +1588,7 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
 	/*
 	 * Re-init for the next cycle.
 	 */
-	this_cpu_inc(*mm->futex_ref); /* 0 -> 1 */
+	this_cpu_inc(*mm->futex.phash.ref); /* 0 -> 1 */
 
 	/*
 	 * Add actual count, subtract bias and initial refcount.
@@ -1601,7 +1596,7 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
 	 * The moment this atomic operation happens, futex_ref_is_dead() can
 	 * become true.
 	 */
-	ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex_atomic);
+	ret = atomic_long_add_return(count - LONG_MAX - 1, &mm->futex.phash.atomic);
 	if (!ret)
 		wake_up_var(mm);
 
@@ -1611,8 +1606,8 @@ static void __futex_ref_atomic_end(struct futex_private_hash *fph)
 
 static void futex_ref_rcu(struct rcu_head *head)
 {
-	struct mm_struct *mm = container_of(head, struct mm_struct, futex_rcu);
-	struct futex_private_hash *fph = rcu_dereference_raw(mm->futex_phash);
+	struct mm_struct *mm = container_of(head, struct mm_struct, futex.phash.rcu);
+	struct futex_private_hash *fph = rcu_dereference_raw(mm->futex.phash.hash);
 
 	if (fph->state == FR_PERCPU) {
 		/*
@@ -1641,7 +1636,7 @@ static void futex_ref_drop(struct futex_private_hash *fph)
 	/*
 	 * Can only transition the current fph;
 	 */
-	WARN_ON_ONCE(rcu_dereference_raw(mm->futex_phash) != fph);
+	WARN_ON_ONCE(rcu_dereference_raw(mm->futex.phash.hash) != fph);
 	/*
 	 * We enqueue at least one RCU callback. Ensure mm stays if the task
 	 * exits before the transition is completed.
@@ -1652,9 +1647,9 @@ static void futex_ref_drop(struct futex_private_hash *fph)
 	 * In order to avoid the following scenario:
 	 *
 	 * futex_hash()			__futex_pivot_hash()
-	 *   guard(rcu);		  guard(mm->futex_hash_lock);
-	 *   fph = mm->futex_phash;
-	 *				  rcu_assign_pointer(&mm->futex_phash, new);
+	 *   guard(rcu);		  guard(mm->futex.phash.lock);
+	 *   fph = mm->futex.phash.hash;
+	 *				  rcu_assign_pointer(&mm->futex.phash.hash, new);
 	 *				futex_hash_allocate()
 	 *				  futex_ref_drop()
 	 *				    fph->state = FR_ATOMIC;
@@ -1669,7 +1664,7 @@ static void futex_ref_drop(struct futex_private_hash *fph)
 	 * There must be at least one full grace-period between publishing a
 	 * new fph and trying to replace it.
 	 */
-	if (poll_state_synchronize_rcu(mm->futex_batches)) {
+	if (poll_state_synchronize_rcu(mm->futex.phash.batches)) {
 		/*
 		 * There was a grace-period, we can begin now.
 		 */
@@ -1677,7 +1672,7 @@ static void futex_ref_drop(struct futex_private_hash *fph)
 		return;
 	}
 
-	call_rcu_hurry(&mm->futex_rcu, futex_ref_rcu);
+	call_rcu_hurry(&mm->futex.phash.rcu, futex_ref_rcu);
 }
 
 static bool futex_ref_get(struct futex_private_hash *fph)
@@ -1687,11 +1682,11 @@ static bool futex_ref_get(struct futex_private_hash *fph)
 	guard(preempt)();
 
 	if (READ_ONCE(fph->state) == FR_PERCPU) {
-		__this_cpu_inc(*mm->futex_ref);
+		__this_cpu_inc(*mm->futex.phash.ref);
 		return true;
 	}
 
-	return atomic_long_inc_not_zero(&mm->futex_atomic);
+	return atomic_long_inc_not_zero(&mm->futex.phash.atomic);
 }
 
 static bool futex_ref_put(struct futex_private_hash *fph)
@@ -1701,11 +1696,11 @@ static bool futex_ref_put(struct futex_private_hash *fph)
 	guard(preempt)();
 
 	if (READ_ONCE(fph->state) == FR_PERCPU) {
-		__this_cpu_dec(*mm->futex_ref);
+		__this_cpu_dec(*mm->futex.phash.ref);
 		return false;
 	}
 
-	return atomic_long_dec_and_test(&mm->futex_atomic);
+	return atomic_long_dec_and_test(&mm->futex.phash.atomic);
 }
 
 static bool futex_ref_is_dead(struct futex_private_hash *fph)
@@ -1717,27 +1712,23 @@ static bool futex_ref_is_dead(struct futex_private_hash *fph)
 	if (smp_load_acquire(&fph->state) == FR_PERCPU)
 		return false;
 
-	return atomic_long_read(&mm->futex_atomic) == 0;
+	return atomic_long_read(&mm->futex.phash.atomic) == 0;
 }
 
 void futex_mm_init(struct mm_struct *mm)
 {
-	mutex_init(&mm->futex_hash_lock);
-	RCU_INIT_POINTER(mm->futex_phash, NULL);
-	mm->futex_phash_new = NULL;
-	/* futex-ref */
-	mm->futex_ref = NULL;
-	atomic_long_set(&mm->futex_atomic, 0);
-	mm->futex_batches = get_state_synchronize_rcu();
+	memset(&mm->futex, 0, sizeof(mm->futex));
+	mutex_init(&mm->futex.phash.lock);
+	mm->futex.phash.batches = get_state_synchronize_rcu();
 }
 
 void futex_hash_free(struct mm_struct *mm)
 {
 	struct futex_private_hash *fph;
 
-	free_percpu(mm->futex_ref);
-	kvfree(mm->futex_phash_new);
-	fph = rcu_dereference_raw(mm->futex_phash);
+	free_percpu(mm->futex.phash.ref);
+	kvfree(mm->futex.phash.hash_new);
+	fph = rcu_dereference_raw(mm->futex.phash.hash);
 	if (fph)
 		kvfree(fph);
 }
@@ -1748,10 +1739,10 @@ static bool futex_pivot_pending(struct mm_struct *mm)
 
 	guard(rcu)();
 
-	if (!mm->futex_phash_new)
+	if (!mm->futex.phash.hash_new)
 		return true;
 
-	fph = rcu_dereference(mm->futex_phash);
+	fph = rcu_dereference(mm->futex.phash.hash);
 	return futex_ref_is_dead(fph);
 }
 
@@ -1793,7 +1784,7 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
 	 * Once we've disabled the global hash there is no way back.
 	 */
 	scoped_guard(rcu) {
-		fph = rcu_dereference(mm->futex_phash);
+		fph = rcu_dereference(mm->futex.phash.hash);
 		if (fph && !fph->hash_mask) {
 			if (custom)
 				return -EBUSY;
@@ -1801,15 +1792,15 @@ static int futex_hash_allocate(unsigned int hash_slots, unsigned int flags)
 		}
 	}
 
-	if (!mm->futex_ref) {
+	if (!mm->futex.phash.ref) {
 		/*
 		 * This will always be allocated by the first thread and
 		 * therefore requires no locking.
 		 */
-		mm->futex_ref = alloc_percpu(unsigned int);
-		if (!mm->futex_ref)
+		mm->futex.phash.ref = alloc_percpu(unsigned int);
+		if (!mm->futex.phash.ref)
 			return -ENOMEM;
-		this_cpu_inc(*mm->futex_ref); /* 0 -> 1 */
+		this_cpu_inc(*mm->futex.phash.ref); /* 0 -> 1 */
 	}
 
 	fph = kvzalloc(struct_size(fph, queues, hash_slots),
@@ -1832,14 +1823,14 @@ again:
 		wait_var_event(mm, futex_pivot_pending(mm));
 	}
 
-	scoped_guard(mutex, &mm->futex_hash_lock) {
+	scoped_guard(mutex, &mm->futex.phash.lock) {
 		struct futex_private_hash *free __free(kvfree) = NULL;
 		struct futex_private_hash *cur, *new;
 
-		cur = rcu_dereference_protected(mm->futex_phash,
-						lockdep_is_held(&mm->futex_hash_lock));
-		new = mm->futex_phash_new;
-		mm->futex_phash_new = NULL;
+		cur = rcu_dereference_protected(mm->futex.phash.hash,
+						lockdep_is_held(&mm->futex.phash.lock));
+		new = mm->futex.phash.hash_new;
+		mm->futex.phash.hash_new = NULL;
 
 		if (fph) {
 			if (cur && !cur->hash_mask) {
@@ -1849,7 +1840,7 @@ again:
 				 * the second one returns here.
 				 */
 				free = fph;
-				mm->futex_phash_new = new;
+				mm->futex.phash.hash_new = new;
 				return -EBUSY;
 			}
 			if (cur && !new) {
@@ -1879,7 +1870,7 @@ again:
 
 		if (new) {
 			/*
-			 * Will set mm->futex_phash_new on failure;
+			 * Will set mm->futex.phash.new_hash on failure;
 			 * futex_private_hash_get() will try again.
 			 */
 			if (!__futex_pivot_hash(mm, new) && custom)
@@ -1898,11 +1889,9 @@ int futex_hash_allocate_default(void)
 		return 0;
 
 	scoped_guard(rcu) {
-		threads = min_t(unsigned int,
-				get_nr_threads(current),
-				num_online_cpus());
+		threads = min_t(unsigned int, get_nr_threads(current), num_online_cpus());
 
-		fph = rcu_dereference(current->mm->futex_phash);
+		fph = rcu_dereference(current->mm->futex.phash.hash);
 		if (fph) {
 			if (fph->custom)
 				return 0;
@@ -1929,7 +1918,7 @@ static int futex_hash_get_slots(void)
 	struct futex_private_hash *fph;
 
 	guard(rcu)();
-	fph = rcu_dereference(current->mm->futex_phash);
+	fph = rcu_dereference(current->mm->futex.phash.hash);
 	if (fph && fph->hash_mask)
 		return fph->hash_mask + 1;
 	return 0;

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Make futex_mm_init() void
  2026-06-02  9:09 ` [patch V5 03/16] futex: Make futex_mm_init() void Thomas Gleixner
@ 2026-06-03 14:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     d7b3f52c861f54ba2fff15696d3798277fb4c19f
Gitweb:        https://git.kernel.org/tip/d7b3f52c861f54ba2fff15696d3798277fb4c19f
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:29 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:49 +02:00

futex: Make futex_mm_init() void

Nothing fails there. Mop up the leftovers of the early version of this,
which did an allocation.

While at it clean up the stubs and the #ifdef comments to make the header
file readable.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260602090535.356789395@kernel.org
---
 include/linux/futex.h | 28 +++++++++++-----------------
 kernel/fork.c         |  8 ++------
 kernel/futex/core.c   |  3 +--
 3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 563e8dd..9e6218c 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -81,22 +81,20 @@ int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4)
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 int futex_hash_allocate_default(void);
 void futex_hash_free(struct mm_struct *mm);
-int futex_mm_init(struct mm_struct *mm);
-
-#else /* !CONFIG_FUTEX_PRIVATE_HASH */
+void futex_mm_init(struct mm_struct *mm);
+#else  /* CONFIG_FUTEX_PRIVATE_HASH */
 static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-#endif /* CONFIG_FUTEX_PRIVATE_HASH */
+static inline void futex_mm_init(struct mm_struct *mm) { }
+#endif /* !CONFIG_FUTEX_PRIVATE_HASH */
 
-#else /* !CONFIG_FUTEX */
+#else  /* CONFIG_FUTEX */
 static inline void futex_init_task(struct task_struct *tsk) { }
 static inline void futex_exit_recursive(struct task_struct *tsk) { }
 static inline void futex_exit_release(struct task_struct *tsk) { }
 static inline void futex_exec_release(struct task_struct *tsk) { }
-static inline long do_futex(u32 __user *uaddr, int op, u32 val,
-			    ktime_t *timeout, u32 __user *uaddr2,
-			    u32 val2, u32 val3)
+static inline long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
+			    u32 __user *uaddr2, u32 val2, u32 val3)
 {
 	return -EINVAL;
 }
@@ -104,13 +102,9 @@ static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsig
 {
 	return -EINVAL;
 }
-static inline int futex_hash_allocate_default(void)
-{
-	return 0;
-}
+static inline int futex_hash_allocate_default(void) { return 0; }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
-static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-
-#endif
+static inline void futex_mm_init(struct mm_struct *mm) { }
+#endif /* !CONFIG_FUTEX */
 
-#endif
+#endif /* _LINUX_FUTEX_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5f3fdfd..bb490d9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1101,6 +1101,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 #endif
 	mm_init_uprobes_state(mm);
 	hugetlb_count_init(mm);
+	futex_mm_init(mm);
 
 	mm_flags_clear_all(mm);
 	if (current->mm) {
@@ -1113,11 +1114,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 		mm->def_flags = 0;
 	}
 
-	if (futex_mm_init(mm))
-		goto fail_mm_init;
-
 	if (mm_alloc_pgd(mm))
-		goto fail_nopgd;
+		goto fail_mm_init;
 
 	if (mm_alloc_id(mm))
 		goto fail_noid;
@@ -1144,8 +1142,6 @@ fail_nocontext:
 	mm_free_id(mm);
 fail_noid:
 	mm_free_pgd(mm);
-fail_nopgd:
-	futex_hash_free(mm);
 fail_mm_init:
 	free_mm(mm);
 	return NULL;
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index e7d33d2..ec23de4 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1720,7 +1720,7 @@ static bool futex_ref_is_dead(struct futex_private_hash *fph)
 	return atomic_long_read(&mm->futex_atomic) == 0;
 }
 
-int futex_mm_init(struct mm_struct *mm)
+void futex_mm_init(struct mm_struct *mm)
 {
 	mutex_init(&mm->futex_hash_lock);
 	RCU_INIT_POINTER(mm->futex_phash, NULL);
@@ -1729,7 +1729,6 @@ int futex_mm_init(struct mm_struct *mm)
 	mm->futex_ref = NULL;
 	atomic_long_set(&mm->futex_atomic, 0);
 	mm->futex_batches = get_state_synchronize_rcu();
-	return 0;
 }
 
 void futex_hash_free(struct mm_struct *mm)

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] futex: Move futex task related data into a struct
  2026-06-02  9:09 ` [patch V5 02/16] futex: Move futex task related data into a struct Thomas Gleixner
@ 2026-06-03 14:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), Mathieu Desnoyers,
	andrealmeid, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     c1ffc9c6e4f8a13dd68e97920c9a24d095c6e41a
Gitweb:        https://git.kernel.org/tip/c1ffc9c6e4f8a13dd68e97920c9a24d095c6e41a
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:25 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:49 +02:00

futex: Move futex task related data into a struct

Having all these members in task_struct along with the required #ifdeffery
is annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.308220888@kernel.org
---
 Documentation/locking/robust-futexes.rst |  8 +--
 include/linux/futex.h                    | 12 +----
 include/linux/futex_types.h              | 36 ++++++++++++++-
 include/linux/sched.h                    | 16 +-----
 kernel/exit.c                            |  4 +-
 kernel/futex/core.c                      | 59 +++++++++++------------
 kernel/futex/pi.c                        | 26 +++++-----
 kernel/futex/syscalls.c                  | 23 +++------
 8 files changed, 101 insertions(+), 83 deletions(-)
 create mode 100644 include/linux/futex_types.h

diff --git a/Documentation/locking/robust-futexes.rst b/Documentation/locking/robust-futexes.rst
index 6361fb0..1423f53 100644
--- a/Documentation/locking/robust-futexes.rst
+++ b/Documentation/locking/robust-futexes.rst
@@ -94,7 +94,7 @@ time, the kernel checks this user-space list: are there any robust futex
 locks to be cleaned up?
 
 In the common case, at do_exit() time, there is no list registered, so
-the cost of robust futexes is just a simple current->robust_list != NULL
+the cost of robust futexes is just a current->futex.robust_list != NULL
 comparison. If the thread has registered a list, then normally the list
 is empty. If the thread/process crashed or terminated in some incorrect
 way then the list might be non-empty: in this case the kernel carefully
@@ -178,9 +178,9 @@ one to query the registered list pointer::
                      size_t __user *len_ptr);
 
 List registration is very fast: the pointer is simply stored in
-current->robust_list. [Note that in the future, if robust futexes become
-widespread, we could extend sys_clone() to register a robust-list head
-for new threads, without the need of another syscall.]
+current->futex.robust_list. [Note that in the future, if robust futexes
+become widespread, we could extend sys_clone() to register a robust-list
+head for new threads, without the need of another syscall.]
 
 So there is virtually zero overhead for tasks not using robust futexes,
 and even for robust futex users, there is only one extra syscall per
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 9e9750f..563e8dd 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -64,14 +64,10 @@ enum {
 
 static inline void futex_init_task(struct task_struct *tsk)
 {
-	tsk->robust_list = NULL;
-#ifdef CONFIG_COMPAT
-	tsk->compat_robust_list = NULL;
-#endif
-	INIT_LIST_HEAD(&tsk->pi_state_list);
-	tsk->pi_state_cache = NULL;
-	tsk->futex_state = FUTEX_STATE_OK;
-	mutex_init(&tsk->futex_exit_mutex);
+	memset(&tsk->futex, 0, sizeof(tsk->futex));
+	INIT_LIST_HEAD(&tsk->futex.pi_state_list);
+	tsk->futex.state = FUTEX_STATE_OK;
+	mutex_init(&tsk->futex.exit_mutex);
 }
 
 void futex_exit_recursive(struct task_struct *tsk);
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
new file mode 100644
index 0000000..9c6c0dc
--- /dev/null
+++ b/include/linux/futex_types.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_FUTEX_TYPES_H
+#define _LINUX_FUTEX_TYPES_H
+
+#ifdef CONFIG_FUTEX
+#include <linux/mutex_types.h>
+#include <linux/types.h>
+
+struct compat_robust_list_head;
+struct futex_pi_state;
+struct robust_list_head;
+
+/**
+ * struct futex_sched_data - Futex related per task data
+ * @robust_list:	User space registered robust list pointer
+ * @compat_robust_list:	User space registered robust list pointer for compat tasks
+ * @pi_state_list:	List head for Priority Inheritance (PI) state management
+ * @pi_state_cache:	Pointer to cache one PI state object per task
+ * @exit_mutex:		Mutex for serializing exit
+ * @state:		Futex handling state to handle exit races correctly
+ */
+struct futex_sched_data {
+	struct robust_list_head __user		*robust_list;
+#ifdef CONFIG_COMPAT
+	struct compat_robust_list_head __user	*compat_robust_list;
+#endif
+	struct list_head			pi_state_list;
+	struct futex_pi_state			*pi_state_cache;
+	struct mutex				exit_mutex;
+	unsigned int				state;
+};
+#else
+struct futex_sched_data { };
+#endif /* !CONFIG_FUTEX */
+
+#endif /* _LINUX_FUTEX_TYPES_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 368c7b4..c88fc10 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -16,6 +16,7 @@
 #include <linux/cpumask_types.h>
 
 #include <linux/cache.h>
+#include <linux/futex_types.h>
 #include <linux/irqflags_types.h>
 #include <linux/smp_types.h>
 #include <linux/pid_types.h>
@@ -64,7 +65,6 @@ struct bpf_net_context;
 struct capture_control;
 struct cfs_rq;
 struct fs_struct;
-struct futex_pi_state;
 struct io_context;
 struct io_uring_task;
 struct mempolicy;
@@ -76,7 +76,6 @@ struct pid_namespace;
 struct pipe_inode_info;
 struct rcu_node;
 struct reclaim_state;
-struct robust_list_head;
 struct root_domain;
 struct rq;
 struct sched_attr;
@@ -1331,16 +1330,9 @@ struct task_struct {
 	u32				closid;
 	u32				rmid;
 #endif
-#ifdef CONFIG_FUTEX
-	struct robust_list_head __user	*robust_list;
-#ifdef CONFIG_COMPAT
-	struct compat_robust_list_head __user *compat_robust_list;
-#endif
-	struct list_head		pi_state_list;
-	struct futex_pi_state		*pi_state_cache;
-	struct mutex			futex_exit_mutex;
-	unsigned int			futex_state;
-#endif
+
+	struct futex_sched_data		futex;
+
 #ifdef CONFIG_PERF_EVENTS
 	u8				perf_recursion[PERF_NR_CONTEXTS];
 	struct perf_event_context	*perf_event_ctxp;
diff --git a/kernel/exit.c b/kernel/exit.c
index 25e9cb6..1b4e55b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -988,8 +988,8 @@ void __noreturn do_exit(long code)
 	proc_exit_connector(tsk);
 	mpol_put_task_policy(tsk);
 #ifdef CONFIG_FUTEX
-	if (unlikely(current->pi_state_cache))
-		kfree(current->pi_state_cache);
+	if (unlikely(current->futex.pi_state_cache))
+		kfree(current->futex.pi_state_cache);
 #endif
 	/*
 	 * Make sure we are holding no locks:
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ff2a4fb..e7d33d2 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -32,18 +32,19 @@
  *  "But they come in a choice of three flavours!"
  */
 #include <linux/compat.h>
-#include <linux/jhash.h>
-#include <linux/pagemap.h>
 #include <linux/debugfs.h>
-#include <linux/plist.h>
+#include <linux/fault-inject.h>
 #include <linux/gfp.h>
-#include <linux/vmalloc.h>
+#include <linux/jhash.h>
 #include <linux/memblock.h>
-#include <linux/fault-inject.h>
-#include <linux/slab.h>
-#include <linux/prctl.h>
 #include <linux/mempolicy.h>
 #include <linux/mmap_lock.h>
+#include <linux/pagemap.h>
+#include <linux/plist.h>
+#include <linux/prctl.h>
+#include <linux/rseq.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
 
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
@@ -829,7 +830,7 @@ void wait_for_owner_exiting(int ret, struct task_struct *exiting)
 	if (WARN_ON_ONCE(ret == -EBUSY && !exiting))
 		return;
 
-	mutex_lock(&exiting->futex_exit_mutex);
+	mutex_lock(&exiting->futex.exit_mutex);
 	/*
 	 * No point in doing state checking here. If the waiter got here
 	 * while the task was in exec()->exec_futex_release() then it can
@@ -838,7 +839,7 @@ void wait_for_owner_exiting(int ret, struct task_struct *exiting)
 	 * already. Highly unlikely and not a problem. Just one more round
 	 * through the futex maze.
 	 */
-	mutex_unlock(&exiting->futex_exit_mutex);
+	mutex_unlock(&exiting->futex.exit_mutex);
 
 	put_task_struct(exiting);
 }
@@ -1047,7 +1048,7 @@ retry:
 	 *
 	 * In both cases the following conditions are met:
 	 *
-	 *	1) task->robust_list->list_op_pending != NULL
+	 *	1) task->futex.robust_list->list_op_pending != NULL
 	 *	   @pending_op == true
 	 *	2) The owner part of user space futex value == 0
 	 *	3) Regular futex: @pi == false
@@ -1152,7 +1153,7 @@ static inline int fetch_robust_entry(struct robust_list __user **entry,
  */
 static void exit_robust_list(struct task_struct *curr)
 {
-	struct robust_list_head __user *head = curr->robust_list;
+	struct robust_list_head __user *head = curr->futex.robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1246,7 +1247,7 @@ compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **ent
  */
 static void compat_exit_robust_list(struct task_struct *curr)
 {
-	struct compat_robust_list_head __user *head = curr->compat_robust_list;
+	struct compat_robust_list_head __user *head = curr->futex.compat_robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1322,7 +1323,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
  */
 static void exit_pi_state_list(struct task_struct *curr)
 {
-	struct list_head *next, *head = &curr->pi_state_list;
+	struct list_head *next, *head = &curr->futex.pi_state_list;
 	struct futex_pi_state *pi_state;
 	union futex_key key = FUTEX_KEY_INIT;
 
@@ -1406,19 +1407,19 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
 
 static void futex_cleanup(struct task_struct *tsk)
 {
-	if (unlikely(tsk->robust_list)) {
+	if (unlikely(tsk->futex.robust_list)) {
 		exit_robust_list(tsk);
-		tsk->robust_list = NULL;
+		tsk->futex.robust_list = NULL;
 	}
 
 #ifdef CONFIG_COMPAT
-	if (unlikely(tsk->compat_robust_list)) {
+	if (unlikely(tsk->futex.compat_robust_list)) {
 		compat_exit_robust_list(tsk);
-		tsk->compat_robust_list = NULL;
+		tsk->futex.compat_robust_list = NULL;
 	}
 #endif
 
-	if (unlikely(!list_empty(&tsk->pi_state_list)))
+	if (unlikely(!list_empty(&tsk->futex.pi_state_list)))
 		exit_pi_state_list(tsk);
 }
 
@@ -1442,23 +1443,23 @@ static void futex_cleanup(struct task_struct *tsk)
 void futex_exit_recursive(struct task_struct *tsk)
 {
 	/* If the state is FUTEX_STATE_EXITING then futex_exit_mutex is held */
-	if (tsk->futex_state == FUTEX_STATE_EXITING) {
-		__assume_ctx_lock(&tsk->futex_exit_mutex);
-		mutex_unlock(&tsk->futex_exit_mutex);
+	if (tsk->futex.state == FUTEX_STATE_EXITING) {
+		__assume_ctx_lock(&tsk->futex.exit_mutex);
+		mutex_unlock(&tsk->futex.exit_mutex);
 	}
-	tsk->futex_state = FUTEX_STATE_DEAD;
+	tsk->futex.state = FUTEX_STATE_DEAD;
 }
 
 static void futex_cleanup_begin(struct task_struct *tsk)
-	__acquires(&tsk->futex_exit_mutex)
+	__acquires(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Prevent various race issues against a concurrent incoming waiter
 	 * including live locks by forcing the waiter to block on
-	 * tsk->futex_exit_mutex when it observes FUTEX_STATE_EXITING in
+	 * tsk->futex.exit_mutex when it observes FUTEX_STATE_EXITING in
 	 * attach_to_pi_owner().
 	 */
-	mutex_lock(&tsk->futex_exit_mutex);
+	mutex_lock(&tsk->futex.exit_mutex);
 
 	/*
 	 * Switch the state to FUTEX_STATE_EXITING under tsk->pi_lock.
@@ -1472,23 +1473,23 @@ static void futex_cleanup_begin(struct task_struct *tsk)
 	 * be observed in exit_pi_state_list().
 	 */
 	raw_spin_lock_irq(&tsk->pi_lock);
-	tsk->futex_state = FUTEX_STATE_EXITING;
+	tsk->futex.state = FUTEX_STATE_EXITING;
 	raw_spin_unlock_irq(&tsk->pi_lock);
 }
 
 static void futex_cleanup_end(struct task_struct *tsk, int state)
-	__releases(&tsk->futex_exit_mutex)
+	__releases(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Lockless store. The only side effect is that an observer might
 	 * take another loop until it becomes visible.
 	 */
-	tsk->futex_state = state;
+	tsk->futex.state = state;
 	/*
 	 * Drop the exit protection. This unblocks waiters which observed
 	 * FUTEX_STATE_EXITING to reevaluate the state.
 	 */
-	mutex_unlock(&tsk->futex_exit_mutex);
+	mutex_unlock(&tsk->futex.exit_mutex);
 }
 
 void futex_exec_release(struct task_struct *tsk)
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index 643199f..e037a97 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -14,7 +14,7 @@ int refill_pi_state_cache(void)
 {
 	struct futex_pi_state *pi_state;
 
-	if (likely(current->pi_state_cache))
+	if (likely(current->futex.pi_state_cache))
 		return 0;
 
 	pi_state = kzalloc_obj(*pi_state);
@@ -28,17 +28,17 @@ int refill_pi_state_cache(void)
 	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
-	current->pi_state_cache = pi_state;
+	current->futex.pi_state_cache = pi_state;
 
 	return 0;
 }
 
 static struct futex_pi_state *alloc_pi_state(void)
 {
-	struct futex_pi_state *pi_state = current->pi_state_cache;
+	struct futex_pi_state *pi_state = current->futex.pi_state_cache;
 
 	WARN_ON(!pi_state);
-	current->pi_state_cache = NULL;
+	current->futex.pi_state_cache = NULL;
 
 	return pi_state;
 }
@@ -60,7 +60,7 @@ static void pi_state_update_owner(struct futex_pi_state *pi_state,
 	if (new_owner) {
 		raw_spin_lock(&new_owner->pi_lock);
 		WARN_ON(!list_empty(&pi_state->list));
-		list_add(&pi_state->list, &new_owner->pi_state_list);
+		list_add(&pi_state->list, &new_owner->futex.pi_state_list);
 		pi_state->owner = new_owner;
 		raw_spin_unlock(&new_owner->pi_lock);
 	}
@@ -96,7 +96,7 @@ void put_pi_state(struct futex_pi_state *pi_state)
 		raw_spin_unlock_irqrestore(&pi_state->pi_mutex.wait_lock, flags);
 	}
 
-	if (current->pi_state_cache) {
+	if (current->futex.pi_state_cache) {
 		kfree(pi_state);
 	} else {
 		/*
@@ -106,7 +106,7 @@ void put_pi_state(struct futex_pi_state *pi_state)
 		 */
 		pi_state->owner = NULL;
 		refcount_set(&pi_state->refcount, 1);
-		current->pi_state_cache = pi_state;
+		current->futex.pi_state_cache = pi_state;
 	}
 }
 
@@ -179,7 +179,7 @@ void put_pi_state(struct futex_pi_state *pi_state)
  *
  * p->pi_lock:
  *
- *	p->pi_state_list -> pi_state->list, relation
+ *	p->futex.pi_state_list -> pi_state->list, relation
  *	pi_mutex->owner -> pi_state->owner, relation
  *
  * pi_state->refcount:
@@ -327,7 +327,7 @@ static int handle_exit_race(u32 __user *uaddr, u32 uval,
 	 * If the futex exit state is not yet FUTEX_STATE_DEAD, tell the
 	 * caller that the alleged owner is busy.
 	 */
-	if (tsk && tsk->futex_state != FUTEX_STATE_DEAD)
+	if (tsk && tsk->futex.state != FUTEX_STATE_DEAD)
 		return -EBUSY;
 
 	/*
@@ -346,8 +346,8 @@ static int handle_exit_race(u32 __user *uaddr, u32 uval,
 	 *    *uaddr = 0xC0000000;	     tsk = get_task(PID);
 	 *   }				     if (!tsk->flags & PF_EXITING) {
 	 *  ...				       attach();
-	 *  tsk->futex_state =               } else {
-	 *	FUTEX_STATE_DEAD;              if (tsk->futex_state !=
+	 *  tsk->futex.state =               } else {
+	 *	FUTEX_STATE_DEAD;              if (tsk->futex.state !=
 	 *					  FUTEX_STATE_DEAD)
 	 *				         return -EAGAIN;
 	 *				       return -ESRCH; <--- FAIL
@@ -396,7 +396,7 @@ static void __attach_to_pi_owner(struct task_struct *p, union futex_key *key,
 	pi_state->key = *key;
 
 	WARN_ON(!list_empty(&pi_state->list));
-	list_add(&pi_state->list, &p->pi_state_list);
+	list_add(&pi_state->list, &p->futex.pi_state_list);
 	/*
 	 * Assignment without holding pi_state->pi_mutex.wait_lock is safe
 	 * because there is no concurrency as the object is not published yet.
@@ -440,7 +440,7 @@ static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
 	 * in futex_exit_release(), we do this protected by p->pi_lock:
 	 */
 	raw_spin_lock_irq(&p->pi_lock);
-	if (unlikely(p->futex_state != FUTEX_STATE_OK)) {
+	if (unlikely(p->futex.state != FUTEX_STATE_OK)) {
 		/*
 		 * The task is on the way out. When the futex state is
 		 * FUTEX_STATE_DEAD, we know that the task has finished
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 77ad969..8944ff4 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -25,17 +25,13 @@
  * @head:	pointer to the list-head
  * @len:	length of the list-head, as userspace expects
  */
-SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
-		size_t, len)
+SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head, size_t, len)
 {
-	/*
-	 * The kernel knows only one size for now:
-	 */
+	/* The kernel knows only one size for now. */
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->robust_list = head;
-
+	current->futex.robust_list = head;
 	return 0;
 }
 
@@ -43,9 +39,9 @@ static inline void __user *futex_task_robust_list(struct task_struct *p, bool co
 {
 #ifdef CONFIG_COMPAT
 	if (compat)
-		return p->compat_robust_list;
+		return p->futex.compat_robust_list;
 #endif
-	return p->robust_list;
+	return p->futex.robust_list;
 }
 
 static void __user *futex_get_robust_list_common(int pid, bool compat)
@@ -475,15 +471,13 @@ SYSCALL_DEFINE4(futex_requeue,
 }
 
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE2(set_robust_list,
-		struct compat_robust_list_head __user *, head,
-		compat_size_t, len)
+COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head,
+		       compat_size_t, len)
 {
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->compat_robust_list = head;
-
+	current->futex.compat_robust_list = head;
 	return 0;
 }
 
@@ -523,4 +517,3 @@ SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 #endif /* CONFIG_COMPAT_32BIT_TIME */
-

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [tip: locking/core] percpu: Sanitize __percpu_qual include hell
  2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
@ 2026-06-03 14:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-06-03 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     c06cd66387da92e6cdac44e16c7b5ef9219c53ac
Gitweb:        https://git.kernel.org/tip/c06cd66387da92e6cdac44e16c7b5ef9219c53ac
Author:        Thomas Gleixner <tglx@kernel.org>
AuthorDate:    Tue, 02 Jun 2026 11:09:21 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Jun 2026 11:38:48 +02:00

percpu: Sanitize __percpu_qual include hell

Slapping __percpu_qual into the next available header is sloppy at best.

It's required by __percpu which is defined in compiler_types.h and that is
meant to be included without requiring a boatload of other headers so that
a struct or function declaration can contain a __percpu qualifier w/o
further prerequisites.

This implicit dependency on linux/percpu.h makes that impossible and causes
a major problem when trying to separate headers.

Create asm/percpu_types.h and move it there. Include that from
compiler_types.h and the whole recursion problem goes away.

Fix up UM so it uses the generic header and includes it in the UM_HOST
build, which pulls in compiler_types.h. The USER_CFLAGS fix was suggested
by Richard.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260602090535.254874125@kernel.org
---
 arch/um/Makefile                    |  3 ++-
 arch/um/include/asm/Kbuild          |  1 +
 arch/x86/include/asm/percpu.h       |  5 -----
 arch/x86/include/asm/percpu_types.h | 17 +++++++++++++++++
 include/asm-generic/Kbuild          |  1 +
 include/asm-generic/percpu_types.h  | 19 +++++++++++++++++++
 include/linux/compiler_types.h      |  3 +++
 include/linux/percpu.h              |  9 +++++----
 8 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/include/asm/percpu_types.h
 create mode 100644 include/asm-generic/percpu_types.h

diff --git a/arch/um/Makefile b/arch/um/Makefile
index 721b652..937639e 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -71,7 +71,8 @@ USER_CFLAGS = $(patsubst $(KERNEL_DEFINES),,$(patsubst -I%,,$(KBUILD_CFLAGS))) \
 		-D_FILE_OFFSET_BITS=64 -idirafter $(srctree)/include \
 		-idirafter $(objtree)/include -D__KERNEL__ -D__UM_HOST__ \
 		-include $(srctree)/include/linux/compiler-version.h \
-		-include $(srctree)/include/linux/kconfig.h
+		-include $(srctree)/include/linux/kconfig.h \
+		-idirafter $(ARCH_DIR)/include/generated
 
 #This will adjust *FLAGS accordingly to the platform.
 include $(srctree)/$(ARCH_DIR)/Makefile-os-Linux
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82b..e91ba12 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += module.h
 generic-y += module.lds.h
 generic-y += parport.h
 generic-y += percpu.h
+generic-y += percpu_types.h
 generic-y += preempt.h
 generic-y += runtime-const.h
 generic-y += softirq_stack.h
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 4099814..cef9a4c 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -40,12 +40,10 @@
 #endif
 
 #define __percpu_prefix
-#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
 
 #else /* !CONFIG_CC_HAS_NAMED_AS: */
 
 #define __percpu_prefix		__force_percpu_prefix
-#define __percpu_seg_override
 
 #endif /* CONFIG_CC_HAS_NAMED_AS */
 
@@ -82,7 +80,6 @@
 
 #define __force_percpu_prefix
 #define __percpu_prefix
-#define __percpu_seg_override
 
 #define PER_CPU_VAR(var)	(var)__percpu_rel
 
@@ -92,8 +89,6 @@
 # define __my_cpu_type(var)	typeof(var)
 # define __my_cpu_ptr(ptr)	(ptr)
 # define __my_cpu_var(var)	(var)
-
-# define __percpu_qual		__percpu_seg_override
 #else
 # define __my_cpu_type(var)	typeof(var) __percpu_seg_override
 # define __my_cpu_ptr(ptr)	(__my_cpu_type(*(ptr))*)(__force uintptr_t)(ptr)
diff --git a/arch/x86/include/asm/percpu_types.h b/arch/x86/include/asm/percpu_types.h
new file mode 100644
index 0000000..0aa3e47
--- /dev/null
+++ b/arch/x86/include/asm/percpu_types.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PERCPU_TYPES_H
+#define _ASM_X86_PERCPU_TYPES_H
+
+#if defined(CONFIG_SMP) && defined(CONFIG_CC_HAS_NAMED_AS)
+#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
+#else /* !CONFIG_CC_HAS_NAMED_AS: */
+#define __percpu_seg_override
+#endif
+
+#if defined(CONFIG_USE_X86_SEG_SUPPORT) && defined(USE_TYPEOF_UNQUAL)
+#define __percpu_qual		__percpu_seg_override
+#endif
+
+#include <asm-generic/percpu_types.h>
+
+#endif
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 2c53a1e..15df9dc 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -44,6 +44,7 @@ mandatory-y += module.lds.h
 mandatory-y += msi.h
 mandatory-y += pci.h
 mandatory-y += percpu.h
+mandatory-y += percpu_types.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
 mandatory-y += rqspinlock.h
diff --git a/include/asm-generic/percpu_types.h b/include/asm-generic/percpu_types.h
new file mode 100644
index 0000000..a095cea
--- /dev/null
+++ b/include/asm-generic/percpu_types.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_PERCPU_TYPES_H_
+#define _ASM_GENERIC_PERCPU_TYPES_H_
+
+#ifndef __ASSEMBLER__
+/*
+ * __percpu_qual is the qualifier for the percpu named address space.
+ *
+ * Most architectures use generic named address space for percpu variables but
+ * some architectures define percpu variables in different named address space.
+ * E.g. on x86, percpu variable may be declared as being relative to the %fs or
+ * %gs segments using __seg_fs or __seg_gs named address space qualifier.
+ */
+#ifndef __percpu_qual
+# define __percpu_qual
+#endif
+
+#endif /* __ASSEMBLER__ */
+#endif /* _ASM_GENERIC_PERCPU_TYPES_H_ */
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index e8fd775..7ad37ad 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -634,6 +634,9 @@ struct ftrace_likely_data {
 #else
 #define __unqual_scalar_typeof(x) __typeof_unqual__(x)
 #endif
+
+#include <asm/percpu_types.h>
+
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 85bf8dd..2f5a889 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -3,13 +3,14 @@
 #define __LINUX_PERCPU_H
 
 #include <linux/alloc_tag.h>
+#include <linux/cleanup.h>
+#include <linux/compiler_types.h>
+#include <linux/init.h>
 #include <linux/mmdebug.h>
-#include <linux/preempt.h>
-#include <linux/smp.h>
 #include <linux/pfn.h>
-#include <linux/init.h>
-#include <linux/cleanup.h>
+#include <linux/preempt.h>
 #include <linux/sched.h>
+#include <linux/smp.h>
 
 #include <asm/percpu.h>
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [patch V5 09/16] futex: Add support for unlocking robust futexes
  2026-06-03  8:22   ` Peter Zijlstra
  2026-06-03  9:30     ` Peter Zijlstra
@ 2026-06-03 14:40     ` Thomas Gleixner
  1 sibling, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-03 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Wed, Jun 03 2026 at 10:22, Peter Zijlstra wrote:
> On Tue, Jun 02, 2026 at 11:09:55AM +0200, Thomas Gleixner wrote:
>> --- a/kernel/futex/futex.h
>> +++ b/kernel/futex/futex.h
>> @@ -40,6 +40,8 @@
>>  #define FLAGS_NUMA		0x0080
>>  #define FLAGS_STRICT		0x0100
>>  #define FLAGS_MPOL		0x0200
>> +#define FLAGS_UNLOCK_ROBUST	0x0400
>> +#define FLAGS_ROBUST_LIST32	0x0800
>>  
>>  /* FUTEX_ to FLAGS_ */
>>  static inline unsigned int futex_to_flags(unsigned int op)
>> @@ -52,6 +54,12 @@ static inline unsigned int futex_to_flag
>>  	if (op & FUTEX_CLOCK_REALTIME)
>>  		flags |= FLAGS_CLOCKRT;
>>  
>> +	if (op & FUTEX_UNLOCK_ROBUST)
>> +		flags |= FLAGS_UNLOCK_ROBUST;
>> +
>> +	if (op & FUTEX_ROBUST_LIST32)
>> +		flags |= FLAGS_ROBUST_LIST32;
>> +
>>  	return flags;
>>  }
>>  
>
> Would you mind terribly if I did: 's/UNLOCK_ROBUST/ROBUST_UNLOCK/g' on
> the whole series?

No objections.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-03  9:23   ` Peter Zijlstra
@ 2026-06-03 14:42     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-03 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Wed, Jun 03 2026 at 11:23, Peter Zijlstra wrote:

> On Tue, Jun 02, 2026 at 11:10:04AM +0200, Thomas Gleixner wrote:
>> When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
>> then the unlock sequence in user space looks like this:
>> 
>>   1)	robust_list_set_op_pending(mutex);
>>   2)	robust_list_remove(mutex);
>> 	
>>   	lval = gettid();
>>   3)	if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
>>   4)		robust_list_clear_op_pending();
>>   	else
>>   5)		sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);
>> 
>> That still leaves a minimal race window between #3 and #4 where the mutex
>> could be acquired by some other task, which observes that it is the last
>> user and:
>> 
>>   1) unmaps the mutex memory
>>   2) maps a different file, which ends up covering the same address
>> 
>> When then the original task exits before reaching #5 then the kernel robust
>> list handling observes the pending op entry and tries to fix up user space.
>
> This #5 reference, should be #4, yeah? Same bit of Changelog is
> replicated in a later patch and has the same issue.

Yes.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race
  2026-06-03  9:14   ` Peter Zijlstra
@ 2026-06-03 14:47     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-03 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Mathieu Desnoyers, André Almeida,
	Sebastian Andrzej Siewior, Carlos O'Donell, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

On Wed, Jun 03 2026 at 11:14, Peter Zijlstra wrote:
> On Tue, Jun 02, 2026 at 11:10:04AM +0200, Thomas Gleixner wrote:
>> This is only relevant when user space was interrupted and a signal is
>> pending. The fix-up has to be done before signal delivery is attempted
>> because:
>> 
>>    1) The signal might be fatal so get_signal() ends up in do_exit()
>> 
>>    2) The signal handler might crash or the task is killed before returning
>>       from the handler. At that point the instruction pointer in pt_regs is
>>       not longer the instruction pointer of the initially interrupted unlock
>>       sequence.
>
> However, due to the pending field being strictly per thread (thread
> local storage and all that), the whole construct of futex robust unlock
> is not signal safe in the sense that signal handlers must not use it.
>
> A signal handler trying to use this would result in nested use of the
> pending field, and that leads to corrupted state.

That's true already today and this unlock magic does not change it.

>> Other architectures might need to do more complex evaluations due to LLSC,
>> but the approach is valid in general. The size of the pointer is determined
>> from the matching range struct, which covers both 32-bit and 64-bit builds
>> including COMPAT.
>
> So my initial thoughts today were that we should probably also move the
> IP to .Lend, to avoid userspace from writing to that location again.
>
> However, due to the above mentioned restrictions vs signals, there
> cannot be a situation where this matters, and so the point is moot.
>
> A double store is harmless and it makes the kernel just this little bit
> simpler.
>
> The only reason I'm sending this email is to have this more explicitly
> documented for posterity I suppose ;-)

Indeed :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2026-06-03 14:47 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 02/16] futex: Move futex task related data into a struct Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 03/16] futex: Make futex_mm_init() void Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 04/16] futex: Move futex related mm_struct data into a struct Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 05/16] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 06/16] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 07/16] x86: Select ARCH_MEMORY_ORDER_TSO Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 08/16] futex: Cleanup UAPI defines Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
2026-06-03  8:22   ` Peter Zijlstra
2026-06-03  9:30     ` Peter Zijlstra
2026-06-03 14:40     ` Thomas Gleixner
2026-06-03  8:35   ` Peter Zijlstra
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 10/16] futex: Add robust futex unlock IP range Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
2026-06-03  8:42   ` Peter Zijlstra
2026-06-03  9:14   ` Peter Zijlstra
2026-06-03 14:47     ` Thomas Gleixner
2026-06-03  9:23   ` Peter Zijlstra
2026-06-03 14:42     ` Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 12/16] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 13/16] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 14/16] Documentation: futex: Add a note about robust list race condition Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for André Almeida
2026-06-02  9:10 ` [patch V5 15/16] selftests: futex: Add tests for robust release operations Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for André Almeida
2026-06-02  9:10 ` [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs Thomas Gleixner
2026-06-02 10:39   ` Thomas Weißschuh
2026-06-02 20:02     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox