The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [patch V5 00/16] futex: Address the robust futex unlock race for real
@ 2026-06-02  9:09 Thomas Gleixner
  2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
                   ` (15 more replies)
  0 siblings, 16 replies; 43+ messages in thread
From: Thomas Gleixner @ 2026-06-02  9:09 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, André Almeida, Sebastian Andrzej Siewior,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett, Uros Bizjak,
	Thomas Weißschuh, Mark Brown, Richard Weinberger

This is a follow up to v4 which can be found here:

     https://lore.kernel.org/20260402151131.876492985@kernel.org

The v1 cover letter contains a detailed analysis of the underlying
problem:

    https://lore.kernel.org/20260316162316.356674433@kernel.org

TLDR:

The robust futex unlock mechanism is racy in respect to the clearing of the
robust_list_head::list_op_pending pointer because unlock and clearing the
pointer are not atomic. The race window is between the unlock and clearing
the pending op pointer. If the task is forced to exit in this window, exit
will access a potentially invalid pending op pointer when cleaning up the
robust list. That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF. In the
worst case this UAF can lead to memory corruption when unrelated content
has been mapped to the same address by the time the access happens.

User space can't solve this problem without help from the kernel. This
series provides the kernel side infrastructure to help it along:

  1) Combined unlock, pointer clearing, wake-up for the contended case

  2) VDSO based unlock and pointer clearing helpers with a fix-up function
     in the kernel when user space was interrupted within the critical
     section.

Both ensure that the pointer clearing happens _before_ a task exits and the
kernel cleans up the robust list during the exit procedure.

Changes since v4:

   - Fixed the build fails caused by using __percpu in the types
     header, which unearthed a nasty header inclusion order and works
     by chance problem. Reported by 0-day and Nam

   - Changed the range initializer to take the full address - Andre

   - Picked up tags where appropriate

   - Added the debug vdso image to sysfs. This allows CI and general
     debugging to find the debug.so in a uniform place. The patch is marked
     RFC and can be discussed and applied/rejected separately of the
     rest. The images are small enough and only one instance of each
     variant is required, so the memory overhead is minimal.

The delta patch (w/o the RFC VDSO sysfs change) against the previous
version can be found below.

The series applies on v7.1-rc2 and is also available via git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git locking-futex-v5

Thanks,

	tglx
---
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 721b652ffb65..937639edc295 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -71,7 +71,8 @@ USER_CFLAGS = $(patsubst $(KERNEL_DEFINES),,$(patsubst -I%,,$(KBUILD_CFLAGS))) \
 		-D_FILE_OFFSET_BITS=64 -idirafter $(srctree)/include \
 		-idirafter $(objtree)/include -D__KERNEL__ -D__UM_HOST__ \
 		-include $(srctree)/include/linux/compiler-version.h \
-		-include $(srctree)/include/linux/kconfig.h
+		-include $(srctree)/include/linux/kconfig.h \
+		-idirafter $(ARCH_DIR)/include/generated
 
 #This will adjust *FLAGS accordingly to the platform.
 include $(srctree)/$(ARCH_DIR)/Makefile-os-Linux
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..e91ba12b7ffc 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += module.h
 generic-y += module.lds.h
 generic-y += parport.h
 generic-y += percpu.h
+generic-y += percpu_types.h
 generic-y += preempt.h
 generic-y += runtime-const.h
 generic-y += softirq_stack.h
diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
index dba54745b355..454f059278e4 100644
--- a/arch/x86/entry/vdso/common/vfutex.c
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -32,7 +32,7 @@
 #define CLEAR_POPQ		"movq	%[zero],  %a[pop]\n"
 #define CLEAR_POPL		"movl	%k[zero], %a[pop]\n"
 
-#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)	\
+#define futex_robust_try_unlock(prefix, clear_pop, __lock, __tid, __pop)\
 ({									\
 	asm volatile (							\
 		"						\n"	\
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b835f6b0f53..9a953e7c76db 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -85,14 +85,14 @@ static void vdso_futex_robust_unlock_update_ips(void)
 	futex_reset_cs_ranges(fd);
 
 #ifdef CONFIG_X86_64
-	futex_set_vdso_cs_range(fd, idx, vdso, image->sym___futex_list64_try_unlock_cs_start,
-				image->sym___futex_list64_try_unlock_cs_end, false);
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list64_try_unlock_cs_start,
+				vdso + image->sym___futex_list64_try_unlock_cs_end, false);
 	idx++;
 #endif /* CONFIG_X86_64 */
 
 #if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT)
-	futex_set_vdso_cs_range(fd, idx, vdso, image->sym___futex_list32_try_unlock_cs_start,
-				image->sym___futex_list32_try_unlock_cs_end, true);
+	futex_set_vdso_cs_range(fd, idx, vdso + image->sym___futex_list32_try_unlock_cs_start,
+				vdso + image->sym___futex_list32_try_unlock_cs_end, true);
 #endif /* CONFIG_X86_32 || CONFIG_COMPAT */
 }
 #else
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 409981468cba..cef9a4ca9841 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -40,12 +40,10 @@
 #endif
 
 #define __percpu_prefix
-#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
 
 #else /* !CONFIG_CC_HAS_NAMED_AS: */
 
 #define __percpu_prefix		__force_percpu_prefix
-#define __percpu_seg_override
 
 #endif /* CONFIG_CC_HAS_NAMED_AS */
 
@@ -82,7 +80,6 @@
 
 #define __force_percpu_prefix
 #define __percpu_prefix
-#define __percpu_seg_override
 
 #define PER_CPU_VAR(var)	(var)__percpu_rel
 
@@ -92,8 +89,6 @@
 # define __my_cpu_type(var)	typeof(var)
 # define __my_cpu_ptr(ptr)	(ptr)
 # define __my_cpu_var(var)	(var)
-
-# define __percpu_qual		__percpu_seg_override
 #else
 # define __my_cpu_type(var)	typeof(var) __percpu_seg_override
 # define __my_cpu_ptr(ptr)	(__my_cpu_type(*(ptr))*)(__force uintptr_t)(ptr)
diff --git a/arch/x86/include/asm/percpu_types.h b/arch/x86/include/asm/percpu_types.h
new file mode 100644
index 000000000000..0aa3e47a3643
--- /dev/null
+++ b/arch/x86/include/asm/percpu_types.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PERCPU_TYPES_H
+#define _ASM_X86_PERCPU_TYPES_H
+
+#if defined(CONFIG_SMP) && defined(CONFIG_CC_HAS_NAMED_AS)
+#define __percpu_seg_override	CONCATENATE(__seg_, __percpu_seg)
+#else /* !CONFIG_CC_HAS_NAMED_AS: */
+#define __percpu_seg_override
+#endif
+
+#if defined(CONFIG_USE_X86_SEG_SUPPORT) && defined(USE_TYPEOF_UNQUAL)
+#define __percpu_qual		__percpu_seg_override
+#endif
+
+#include <asm-generic/percpu_types.h>
+
+#endif
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 2c53a1e0b760..15df9dcb42a5 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -44,6 +44,7 @@ mandatory-y += module.lds.h
 mandatory-y += msi.h
 mandatory-y += pci.h
 mandatory-y += percpu.h
+mandatory-y += percpu_types.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
 mandatory-y += rqspinlock.h
diff --git a/include/asm-generic/percpu_types.h b/include/asm-generic/percpu_types.h
new file mode 100644
index 000000000000..a095cea7fa20
--- /dev/null
+++ b/include/asm-generic/percpu_types.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_PERCPU_TYPES_H_
+#define _ASM_GENERIC_PERCPU_TYPES_H_
+
+#ifndef __ASSEMBLER__
+/*
+ * __percpu_qual is the qualifier for the percpu named address space.
+ *
+ * Most architectures use generic named address space for percpu variables but
+ * some architectures define percpu variables in different named address space.
+ * E.g. on x86, percpu variable may be declared as being relative to the %fs or
+ * %gs segments using __seg_fs or __seg_gs named address space qualifier.
+ */
+#ifndef __percpu_qual
+# define __percpu_qual
+#endif
+
+#endif /* __ASSEMBLER__ */
+#endif /* _ASM_GENERIC_PERCPU_TYPES_H_ */
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index e8fd77593b68..7ad37adda1dd 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -634,6 +634,9 @@ struct ftrace_likely_data {
 #else
 #define __unqual_scalar_typeof(x) __typeof_unqual__(x)
 #endif
+
+#include <asm/percpu_types.h>
+
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 33524dfb3fe4..51f4ccdc9092 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -142,10 +142,9 @@ static inline void futex_fixup_robust_unlock(struct pt_regs *regs)
 }
 
 static inline void futex_set_vdso_cs_range(struct futex_mm_data *fd, unsigned int idx,
-					   unsigned long vdso, unsigned long start,
-					   unsigned long end, bool sz32)
+					   unsigned long start, unsigned long end, bool sz32)
 {
-	fd->unlock.cs_ranges[idx].start_ip = vdso + start;
+	fd->unlock.cs_ranges[idx].start_ip = start;
 	fd->unlock.cs_ranges[idx].len = end - start;
 	fd->unlock.cs_ranges[idx].pop_size32 = sz32;
 }
diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h
index 288666fb37b6..d320c0571f0c 100644
--- a/include/linux/futex_types.h
+++ b/include/linux/futex_types.h
@@ -3,6 +3,7 @@
 #define _LINUX_FUTEX_TYPES_H
 
 #ifdef CONFIG_FUTEX
+#include <linux/compiler_types.h>
 #include <linux/mutex_types.h>
 #include <linux/types.h>
 
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 85bf8dd9f087..2f5a889aa50d 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -3,13 +3,14 @@
 #define __LINUX_PERCPU_H
 
 #include <linux/alloc_tag.h>
+#include <linux/cleanup.h>
+#include <linux/compiler_types.h>
+#include <linux/init.h>
 #include <linux/mmdebug.h>
-#include <linux/preempt.h>
-#include <linux/smp.h>
 #include <linux/pfn.h>
-#include <linux/init.h>
-#include <linux/cleanup.h>
+#include <linux/preempt.h>
 #include <linux/sched.h>
+#include <linux/smp.h>
 
 #include <asm/percpu.h>
 
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index d4080e7834dc..6ea4a97796a1 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1510,7 +1510,7 @@ void futex_exit_recursive(struct task_struct *tsk)
 }
 
 static void futex_cleanup_begin(struct task_struct *tsk)
-	__acquires(&tsk->futex_exit_mutex)
+	__acquires(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Prevent various race issues against a concurrent incoming waiter
@@ -1537,7 +1537,7 @@ static void futex_cleanup_begin(struct task_struct *tsk)
 }
 
 static void futex_cleanup_end(struct task_struct *tsk, int state)
-	__releases(&tsk->futex_exit_mutex)
+	__releases(&tsk->futex.exit_mutex)
 {
 	/*
 	 * Lockless store. The only side effect is that an observer might
diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index 43059f6dbc40..b3fab60181d5 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -45,7 +45,7 @@
 
 #define SLEEP_US 100
 
-#if UINTPTR_MAX == 0xffffffffffffffff
+#if __SIZEOF_LONG__ == 8
 # define BUILD_64
 #endif
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2026-06-03 14:47 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02  9:09 [patch V5 00/16] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-06-02  9:09 ` [patch V5 01/16] percpu: Sanitize __percpu_qual include hell Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 02/16] futex: Move futex task related data into a struct Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 03/16] futex: Make futex_mm_init() void Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 04/16] futex: Move futex related mm_struct data into a struct Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 05/16] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
2026-06-03 14:25   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 06/16] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 07/16] x86: Select ARCH_MEMORY_ORDER_TSO Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 08/16] futex: Cleanup UAPI defines Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 09/16] futex: Add support for unlocking robust futexes Thomas Gleixner
2026-06-03  8:22   ` Peter Zijlstra
2026-06-03  9:30     ` Peter Zijlstra
2026-06-03 14:40     ` Thomas Gleixner
2026-06-03  8:35   ` Peter Zijlstra
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:09 ` [patch V5 10/16] futex: Add robust futex unlock IP range Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 11/16] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
2026-06-03  8:42   ` Peter Zijlstra
2026-06-03  9:14   ` Peter Zijlstra
2026-06-03 14:47     ` Thomas Gleixner
2026-06-03  9:23   ` Peter Zijlstra
2026-06-03 14:42     ` Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 12/16] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 13/16] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2026-06-02  9:10 ` [patch V5 14/16] Documentation: futex: Add a note about robust list race condition Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for André Almeida
2026-06-02  9:10 ` [patch V5 15/16] selftests: futex: Add tests for robust release operations Thomas Gleixner
2026-06-03 14:24   ` [tip: locking/core] " tip-bot2 for André Almeida
2026-06-02  9:10 ` [patch V5 16/16] [RFC] vdso, x86: Expose vdso.so.dbg through sysfs Thomas Gleixner
2026-06-02 10:39   ` Thomas Weißschuh
2026-06-02 20:02     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox