public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
@ 2026-03-11 18:54 Mathieu Desnoyers
  2026-03-11 20:11 ` Mathieu Desnoyers
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-11 18:54 UTC (permalink / raw)
  To: André Almeida
  Cc: linux-kernel, Mathieu Desnoyers, Carlos O'Donell,
	Sebastian Andrzej Siewior, Peter Zijlstra, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Thomas Gleixner,
	Ingo Molnar, Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett

This vDSO unlocks the robust futex by exchanging the content of
*uaddr with 0 with a store-release semantic. If the futex has
waiters, it sets bit 1 of *op_pending_addr, else it clears
*op_pending_addr. Those operations are within a code region
known by the kernel, making them safe with respect to asynchronous
program termination either from thread context or from a nested
signal handler.

Expected use of this vDSO:

if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
    & FUTEX_WAITERS) != 0)
        futex_wake((u32 *) &mutex->__data.__lock, 1, private);
WRITE_ONCE(pd->robust_head.list_op_pending, 0);

This fixes a long standing data corruption race condition with robust
futexes, as pointed out here:

  "File corruption race condition in robust mutex unlocking"
  https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Known limitation: this only takes care of non-PI futexes.

The approach taken by this vDSO is to extend the x86 vDSO exception
table to track the relevant ip range. The two kernel execution paths
impacted by this change are:

  1) Process exit
  2) Signal delivery

[ This patch is lightly compiled tested only, submitted for feedback. ]

Link: https://lore.kernel.org/lkml/20260220202620.139584-1-andrealmeid@igalia.com/
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "André Almeida" <andrealmeid@igalia.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Torvald Riegel <triegel@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Liam R . Howlett" <Liam.Howlett@oracle.com>
---
 arch/x86/entry/vdso/common/vfutex.c |  51 ++++++++++
 arch/x86/entry/vdso/extable.c       |  54 +++++++++-
 arch/x86/entry/vdso/extable.h       |  26 +++--
 arch/x86/entry/vdso/vdso64/Makefile |   1 +
 arch/x86/entry/vdso/vdso64/vfutex.c |   1 +
 arch/x86/entry/vdso/vdso64/vsgx.S   |   2 +-
 arch/x86/include/asm/vdso.h         |   3 +
 arch/x86/kernel/signal.c            |   4 +
 include/linux/futex.h               |   1 +
 include/vdso/futex.h                |  35 +++++++
 kernel/futex/core.c                 | 151 ++++++++++++++++++++++++----
 11 files changed, 296 insertions(+), 33 deletions(-)
 create mode 100644 arch/x86/entry/vdso/common/vfutex.c
 create mode 100644 arch/x86/entry/vdso/vdso64/vfutex.c
 create mode 100644 include/vdso/futex.h

diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c
new file mode 100644
index 000000000000..fe730e0d3dfa
--- /dev/null
+++ b/arch/x86/entry/vdso/common/vfutex.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2026 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+#include <linux/types.h>
+#include <vdso/futex.h>
+#include "extable.h"
+
+#ifdef CONFIG_X86_64
+# define ASM_PTR_BIT_SET	"btsq "
+# define ASM_PTR_SET		"movq "
+#else
+# define ASM_PTR_BIT_SET	"btsl "
+# define ASM_PTR_SET		"movl "
+#endif
+
+u32 __vdso_robust_futex_unlock(u32 *uaddr, uintptr_t *op_pending_addr)
+{
+	u32 val = 0;
+
+	/*
+	 * Within the ip range identified by the futex exception table,
+	 * the register "eax" contains the value loaded by xchg. This is
+	 * expected by futex_vdso_exception() to check whether waiters
+	 * need to be woken up. This register state is transferred to
+	 * bit 1 (NEED_WAKEUP) of *op_pending_addr before the ip range
+	 * ends.
+	 */
+	asm volatile (	_ASM_VDSO_EXTABLE_FUTEX_HANDLE(1f, 3f)
+			/* Exchange uaddr (store-release). */
+			"xchg %[uaddr], %[val]\n\t"
+			"1:\n\t"
+			/* Test if FUTEX_WAITERS (0x80000000) is set. */
+			"test %[val], %[val]\n\t"
+			"js 2f\n\t"
+			/* Clear *op_pending_addr if there are no waiters. */
+			ASM_PTR_SET "$0, %[op_pending_addr]\n\t"
+			"jmp 3f\n\t"
+			"2:\n\t"
+			/* Set bit 1 (NEED_WAKEUP) in *op_pending_addr. */
+			ASM_PTR_BIT_SET "$1, %[op_pending_addr]\n\t"
+			"3:\n\t"
+			: [val] "+a" (val),
+			  [uaddr] "+m" (*uaddr)
+			: [op_pending_addr] "m" (*op_pending_addr)
+			: "memory");
+	return val;
+}
+
+u32 robust_futex_unlock(u32 *, uintptr_t *)
+	__attribute__((weak, alias("__vdso_robust_futex_unlock")));
diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c
index afcf5b65beef..a668fc2c93dd 100644
--- a/arch/x86/entry/vdso/extable.c
+++ b/arch/x86/entry/vdso/extable.c
@@ -1,12 +1,26 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/err.h>
 #include <linux/mm.h>
+#include <linux/futex.h>
 #include <asm/current.h>
 #include <asm/traps.h>
 #include <asm/vdso.h>
 
+enum vdso_extable_entry_type {
+	VDSO_EXTABLE_ENTRY_FIXUP = 0,
+	VDSO_EXTABLE_ENTRY_FUTEX = 1,
+};
+
 struct vdso_exception_table_entry {
-	int insn, fixup;
+	int type;	/* enum vdso_extable_entry_type */
+	union {
+		struct {
+			int insn, fixup_insn;
+		} fixup;
+		struct {
+			int start, end;
+		} futex;
+	};
 };
 
 bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
@@ -33,8 +47,10 @@ bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
 	extable = image->extable;
 
 	for (i = 0; i < nr_entries; i++) {
-		if (regs->ip == base + extable[i].insn) {
-			regs->ip = base + extable[i].fixup;
+		if (extable[i].type != VDSO_EXTABLE_ENTRY_FIXUP)
+			continue;
+		if (regs->ip == base + extable[i].fixup.insn) {
+			regs->ip = base + extable[i].fixup.fixup_insn;
 			regs->di = trapnr;
 			regs->si = error_code;
 			regs->dx = fault_addr;
@@ -44,3 +60,35 @@ bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
 
 	return false;
 }
+
+void futex_vdso_exception(struct pt_regs *regs,
+			  bool *_in_futex_vdso,
+			  bool *_need_wakeup)
+{
+	const struct vdso_image *image = current->mm->context.vdso_image;
+	const struct vdso_exception_table_entry *extable;
+	bool in_futex_vdso = false, need_wakeup = false;
+	unsigned int nr_entries, i;
+	unsigned long base;
+
+	if (!current->mm->context.vdso)
+		goto end;
+
+	base = (unsigned long)current->mm->context.vdso + image->extable_base;
+	nr_entries = image->extable_len / (sizeof(*extable));
+	extable = image->extable;
+
+	for (i = 0; i < nr_entries; i++) {
+		if (extable[i].type != VDSO_EXTABLE_ENTRY_FUTEX)
+			continue;
+		if (regs->ip >= base + extable[i].futex.start &&
+		    regs->ip < base + extable[i].futex.end) {
+			in_futex_vdso = true;
+			if (regs->ax & FUTEX_WAITERS)
+				need_wakeup = true;
+		}
+	}
+end:
+	*_in_futex_vdso = in_futex_vdso;
+	*_need_wakeup = need_wakeup;
+}
diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
index baba612b832c..7251467ad210 100644
--- a/arch/x86/entry/vdso/extable.h
+++ b/arch/x86/entry/vdso/extable.h
@@ -8,20 +8,32 @@
  * exception table, not each individual entry.
  */
 #ifdef __ASSEMBLER__
-#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
-	ASM_VDSO_EXTABLE_HANDLE from to
+#define _ASM_VDSO_EXTABLE_FIXUP_HANDLE(from, to)	\
+	ASM_VDSO_EXTABLE_FIXUP_HANDLE from to
 
-.macro ASM_VDSO_EXTABLE_HANDLE from:req to:req
+.macro ASM_VDSO_EXTABLE_FIXUP_HANDLE from:req to:req
 	.pushsection __ex_table, "a"
+	.long 0		/* type: fixup */
 	.long (\from) - __ex_table
 	.long (\to) - __ex_table
 	.popsection
 .endm
 #else
-#define _ASM_VDSO_EXTABLE_HANDLE(from, to)	\
-	".pushsection __ex_table, \"a\"\n"      \
-	".long (" #from ") - __ex_table\n"      \
-	".long (" #to ") - __ex_table\n"        \
+#define _ASM_VDSO_EXTABLE_FIXUP_HANDLE(from, to)	\
+	".pushsection __ex_table, \"a\"\n"      	\
+	".long 0\n"	/* type: fixup */		\
+	".long (" #from ") - __ex_table\n"      	\
+	".long (" #to ") - __ex_table\n"        	\
+	".popsection\n"
+
+/*
+ * Identify robust futex unlock critical section.
+ */
+#define _ASM_VDSO_EXTABLE_FUTEX_HANDLE(start, end)	\
+	".pushsection __ex_table, \"a\"\n"      	\
+	".long 1\n"	/* type: futex */		\
+	".long (" #start ") - __ex_table\n"      	\
+	".long (" #end ") - __ex_table\n"        	\
 	".popsection\n"
 #endif
 
diff --git a/arch/x86/entry/vdso/vdso64/Makefile b/arch/x86/entry/vdso/vdso64/Makefile
index bfffaf1aeecc..df53c2d0037d 100644
--- a/arch/x86/entry/vdso/vdso64/Makefile
+++ b/arch/x86/entry/vdso/vdso64/Makefile
@@ -10,6 +10,7 @@ vdsos-$(CONFIG_X86_X32_ABI)	+= x32
 # Files to link into the vDSO:
 vobjs-y				:= note.o vclock_gettime.o vgetcpu.o
 vobjs-y				+= vgetrandom.o vgetrandom-chacha.o
+vobjs-y				+= vfutex.o
 vobjs-$(CONFIG_X86_SGX)		+= vsgx.o
 
 # Compilation flags
diff --git a/arch/x86/entry/vdso/vdso64/vfutex.c b/arch/x86/entry/vdso/vdso64/vfutex.c
new file mode 100644
index 000000000000..940a6ee30026
--- /dev/null
+++ b/arch/x86/entry/vdso/vdso64/vfutex.c
@@ -0,0 +1 @@
+#include "common/vfutex.c"
diff --git a/arch/x86/entry/vdso/vdso64/vsgx.S b/arch/x86/entry/vdso/vdso64/vsgx.S
index 37a3d4c02366..0ea5a1ebd455 100644
--- a/arch/x86/entry/vdso/vdso64/vsgx.S
+++ b/arch/x86/entry/vdso/vdso64/vsgx.S
@@ -145,6 +145,6 @@ SYM_FUNC_START(__vdso_sgx_enter_enclave)
 
 	.cfi_endproc
 
-_ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)
+_ASM_VDSO_EXTABLE_FIXUP_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)
 
 SYM_FUNC_END(__vdso_sgx_enter_enclave)
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index e8afbe9faa5b..77e465fb373c 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -38,6 +38,9 @@ extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
 extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
 				 unsigned long error_code,
 				 unsigned long fault_addr);
+extern void futex_vdso_exception(struct pt_regs *regs,
+				 bool *in_futex_vdso,
+				 bool *need_wakeup);
 #endif /* __ASSEMBLER__ */
 
 #endif /* _ASM_X86_VDSO_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 2404233336ab..c2e4db89f16d 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -28,6 +28,7 @@
 #include <linux/entry-common.h>
 #include <linux/syscalls.h>
 #include <linux/rseq.h>
+#include <linux/futex.h>
 
 #include <asm/processor.h>
 #include <asm/ucontext.h>
@@ -235,6 +236,9 @@ unsigned long get_sigframe_size(void)
 static int
 setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 {
+	/* Handle futex robust list fixup. */
+	futex_signal_deliver(ksig, regs);
+
 	/* Perform fixup for the pre-signal frame. */
 	rseq_signal_deliver(ksig, regs);
 
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 9e9750f04980..6c274c79e176 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -81,6 +81,7 @@ void futex_exec_release(struct task_struct *tsk);
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	      u32 __user *uaddr2, u32 val2, u32 val3);
 int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
+void futex_signal_deliver(struct ksignal *ksig, struct pt_regs *regs);
 
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 int futex_hash_allocate_default(void);
diff --git a/include/vdso/futex.h b/include/vdso/futex.h
new file mode 100644
index 000000000000..1e949ac1ed85
--- /dev/null
+++ b/include/vdso/futex.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2026 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#ifndef _VDSO_FUTEX_H
+#define _VDSO_FUTEX_H
+
+#include <linux/types.h>
+
+/**
+ * __vdso_robust_futex_unlock - Architecture-specific vDSO implementation of robust futex unlock.
+ * @uaddr:		Lock address (points to a 32-bit unsigned integer type).
+ * @op_pending_addr:	Robust list operation pending address (points to a pointer type).
+ *
+ * This vDSO unlocks the robust futex by exchanging the content of
+ * *uaddr with 0 with a store-release semantic. If the futex has
+ * waiters, it sets bit 1 of *op_pending_addr, else it clears
+ * *op_pending_addr. Those operations are within a code region
+ * known by the kernel, making them safe with respect to asynchronous
+ * program termination either from thread context or from a nested
+ * signal handler.
+ *
+ * Expected use of this vDSO:
+ *
+ * if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
+ *     & FUTEX_WAITERS) != 0)
+ *         futex_wake((u32 *) &mutex->__data.__lock, 1, private);
+ * WRITE_ONCE(pd->robust_head.list_op_pending, 0);
+ *
+ * Returns:	The old value present at *uaddr.
+ */
+extern u32 __vdso_robust_futex_unlock(u32 *uaddr, uintptr_t *op_pending_addr);
+
+#endif /* _VDSO_FUTEX_H */
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index cf7e610eac42..92c0f94c8077 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -48,6 +48,10 @@
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
+#define FUTEX_UADDR_PI		(1UL << 0)
+#define FUTEX_UADDR_NEED_WAKEUP	(1UL << 1)
+#define FUTEX_UADDR_MASK	(~(FUTEX_UADDR_PI | FUTEX_UADDR_NEED_WAKEUP))
+
 /*
  * The base of the bucket array and its size are always used together
  * (after initialization only in futex_hash()), so ensure that they
@@ -1004,6 +1008,77 @@ void futex_unqueue_pi(struct futex_q *q)
 	q->pi_state = NULL;
 }
 
+/*
+ * Transfer the need wakeup state from vDSO stack to the
+ * FUTEX_UADDR_NEED_WAKEUP list_op_pending bit so it's observed if the
+ * program is terminated while executing the signal handler.
+ */
+static void signal_delivery_fixup_robust_list(struct task_struct *curr, struct pt_regs *regs)
+{
+	struct robust_list_head __user *head = curr->robust_list;
+	bool in_futex_vdso, need_wakeup;
+	unsigned long pending;
+
+	if (!head)
+		return;
+	futex_vdso_exception(regs, &in_futex_vdso, &need_wakeup);
+	if (!in_futex_vdso)
+		return;
+	if (need_wakeup) {
+		if (get_user(pending, (unsigned long __user *)&head->list_op_pending))
+			goto fault;
+		pending |= FUTEX_UADDR_NEED_WAKEUP;
+		if (put_user(pending, (unsigned long __user *)&head->list_op_pending))
+			goto fault;
+	} else {
+		if (put_user(0UL, (unsigned long __user *)&head->list_op_pending))
+			goto fault;
+	}
+	return;
+fault:
+	force_sig(SIGSEGV);
+}
+
+#ifdef CONFIG_COMPAT
+static void compat_signal_delivery_fixup_robust_list(struct task_struct *curr, struct pt_regs *regs)
+{
+	struct compat_robust_list_head __user *head = curr->compat_robust_list;
+	bool in_futex_vdso, need_wakeup;
+	unsigned int pending;
+
+	if (!head)
+		return;
+	futex_vdso_exception(regs, &in_futex_vdso, &need_wakeup);
+	if (!in_futex_vdso)
+		return;
+	if (need_wakeup) {
+		if (get_user(pending, (compat_uptr_t __user *)&head->list_op_pending))
+			goto fault;
+		pending |= FUTEX_UADDR_NEED_WAKEUP;
+		if (put_user(pending, (compat_uptr_t __user *)&head->list_op_pending))
+			goto fault;
+	} else {
+		if (put_user(0U, (compat_uptr_t __user *)&head->list_op_pending))
+			goto fault;
+	}
+	return;
+fault:
+	force_sig(SIGSEGV);
+}
+#endif
+
+void futex_signal_deliver(struct ksignal *ksig, struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+
+	if (unlikely(tsk->robust_list))
+		signal_delivery_fixup_robust_list(tsk, regs);
+#ifdef CONFIG_COMPAT
+	if (unlikely(tsk->compat_robust_list))
+		compat_signal_delivery_fixup_robust_list(tsk, regs);
+#endif
+}
+
 /* Constants for the pending_op argument of handle_futex_death */
 #define HANDLE_DEATH_PENDING	true
 #define HANDLE_DEATH_LIST	false
@@ -1013,12 +1088,31 @@ void futex_unqueue_pi(struct futex_q *q)
  * dying task, and do notification if so:
  */
 static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
-			      bool pi, bool pending_op)
+			      bool pi, bool pending_op, bool need_wakeup)
 {
+	bool unlock_store_done = false;
 	u32 uval, nval, mval;
 	pid_t owner;
 	int err;
 
+	/*
+	 * Process dies after the store unlocking futex, before clearing
+	 * the pending ops. Wake up one waiter if needed. Prevent
+	 * storing to the futex after it was unlocked. Only handle
+	 * non-PI futex.
+	 */
+	if (pending_op && !pi) {
+		bool in_futex_vdso, vdso_need_wakeup;
+
+		futex_vdso_exception(task_pt_regs(curr), &in_futex_vdso, &vdso_need_wakeup);
+		if (need_wakeup || vdso_need_wakeup) {
+			futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
+				   FUTEX_BITSET_MATCH_ANY);
+		}
+		if (need_wakeup || in_futex_vdso)
+			return 0;
+	}
+
 	/* Futex address must be 32bit aligned */
 	if ((((unsigned long)uaddr) % sizeof(*uaddr)) != 0)
 		return -1;
@@ -1071,6 +1165,13 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
 		return 0;
 	}
 
+	/*
+	 * Terminated after the unlock store is done. Wake up waiters,
+	 * but do not change the lock state.
+	 */
+	if (unlock_store_done)
+		return 0;
+
 	if (owner != task_pid_vnr(curr))
 		return 0;
 
@@ -1128,19 +1229,23 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
 }
 
 /*
- * Fetch a robust-list pointer. Bit 0 signals PI futexes:
+ * Fetch a robust-list pointer. Bit 0 signals PI futexes, bit 1 signals
+ * need wakeup:
  */
 static inline int fetch_robust_entry(struct robust_list __user **entry,
 				     struct robust_list __user * __user *head,
-				     unsigned int *pi)
+				     unsigned int *pi,
+				     unsigned int *need_wakeup)
 {
 	unsigned long uentry;
 
 	if (get_user(uentry, (unsigned long __user *)head))
 		return -EFAULT;
 
-	*entry = (void __user *)(uentry & ~1UL);
-	*pi = uentry & 1;
+	*entry = (void __user *)(uentry & FUTEX_UADDR_MASK);
+	*pi = uentry & FUTEX_UADDR_PI;
+	if (need_wakeup)
+		*need_wakeup = uentry & FUTEX_UADDR_NEED_WAKEUP;
 
 	return 0;
 }
@@ -1155,7 +1260,7 @@ static void exit_robust_list(struct task_struct *curr)
 {
 	struct robust_list_head __user *head = curr->robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
+	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip, need_wakeup;
 	unsigned int next_pi;
 	unsigned long futex_offset;
 	int rc;
@@ -1164,7 +1269,7 @@ static void exit_robust_list(struct task_struct *curr)
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (fetch_robust_entry(&entry, &head->list.next, &pi))
+	if (fetch_robust_entry(&entry, &head->list.next, &pi, NULL))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1175,7 +1280,7 @@ static void exit_robust_list(struct task_struct *curr)
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (fetch_robust_entry(&pending, &head->list_op_pending, &pip))
+	if (fetch_robust_entry(&pending, &head->list_op_pending, &pip, &need_wakeup))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1184,14 +1289,14 @@ static void exit_robust_list(struct task_struct *curr)
 		 * Fetch the next entry in the list before calling
 		 * handle_futex_death:
 		 */
-		rc = fetch_robust_entry(&next_entry, &entry->next, &next_pi);
+		rc = fetch_robust_entry(&next_entry, &entry->next, &next_pi, NULL);
 		/*
 		 * A pending lock might already be on the list, so
 		 * don't process it twice:
 		 */
 		if (entry != pending) {
 			if (handle_futex_death((void __user *)entry + futex_offset,
-						curr, pi, HANDLE_DEATH_LIST))
+						curr, pi, HANDLE_DEATH_LIST, false))
 				return;
 		}
 		if (rc)
@@ -1209,7 +1314,7 @@ static void exit_robust_list(struct task_struct *curr)
 
 	if (pending) {
 		handle_futex_death((void __user *)pending + futex_offset,
-				   curr, pip, HANDLE_DEATH_PENDING);
+				   curr, pip, HANDLE_DEATH_PENDING, need_wakeup);
 	}
 }
 
@@ -1224,17 +1329,20 @@ static void __user *futex_uaddr(struct robust_list __user *entry,
 }
 
 /*
- * Fetch a robust-list pointer. Bit 0 signals PI futexes:
+ * Fetch a robust-list pointer. Bit 0 signals PI futexes, bit 1 signals
+ * need wakeup:
  */
 static inline int
 compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
-		   compat_uptr_t __user *head, unsigned int *pi)
+		   compat_uptr_t __user *head, unsigned int *pi, unsigned int *need_wakeup)
 {
 	if (get_user(*uentry, head))
 		return -EFAULT;
 
-	*entry = compat_ptr((*uentry) & ~1);
-	*pi = (unsigned int)(*uentry) & 1;
+	*entry = compat_ptr((*uentry) & FUTEX_UADDR_MASK);
+	*pi = (unsigned int)(*uentry) & FUTEX_UADDR_PI;
+	if (need_wakeup)
+		*need_wakeup = (unsigned int)(*uentry) & FUTEX_UADDR_NEED_WAKEUP;
 
 	return 0;
 }
@@ -1249,7 +1357,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 {
 	struct compat_robust_list_head __user *head = curr->compat_robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
-	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
+	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip, need_wakeup;
 	unsigned int next_pi;
 	compat_uptr_t uentry, next_uentry, upending;
 	compat_long_t futex_offset;
@@ -1259,7 +1367,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi))
+	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi, NULL))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1271,7 +1379,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	 * if it exists:
 	 */
 	if (compat_fetch_robust_entry(&upending, &pending,
-			       &head->list_op_pending, &pip))
+			       &head->list_op_pending, &pip, &need_wakeup))
 		return;
 
 	next_entry = NULL;	/* avoid warning with gcc */
@@ -1281,7 +1389,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 		 * handle_futex_death:
 		 */
 		rc = compat_fetch_robust_entry(&next_uentry, &next_entry,
-			(compat_uptr_t __user *)&entry->next, &next_pi);
+			(compat_uptr_t __user *)&entry->next, &next_pi, NULL);
 		/*
 		 * A pending lock might already be on the list, so
 		 * dont process it twice:
@@ -1289,8 +1397,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 		if (entry != pending) {
 			void __user *uaddr = futex_uaddr(entry, futex_offset);
 
-			if (handle_futex_death(uaddr, curr, pi,
-					       HANDLE_DEATH_LIST))
+			if (handle_futex_death(uaddr, curr, pi, HANDLE_DEATH_LIST, false))
 				return;
 		}
 		if (rc)
@@ -1309,7 +1416,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	if (pending) {
 		void __user *uaddr = futex_uaddr(pending, futex_offset);
 
-		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
+		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING, need_wakeup);
 	}
 }
 #endif
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-11 18:54 [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock Mathieu Desnoyers
@ 2026-03-11 20:11 ` Mathieu Desnoyers
  2026-03-12  8:49 ` Florian Weimer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-11 20:11 UTC (permalink / raw)
  To: André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Thomas Gleixner, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-11 14:54, Mathieu Desnoyers wrote:
[...]
>   {
> +	bool unlock_store_done = false;
>   	u32 uval, nval, mval;
>   	pid_t owner;
>   	int err;

Dead code. Will remove.

[...]
> @@ -1071,6 +1165,13 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
>   		return 0;
>   	}
>   
> +	/*
> +	 * Terminated after the unlock store is done. Wake up waiters,
> +	 * but do not change the lock state.
> +	 */
> +	if (unlock_store_done)
> +		return 0;

Sorry, this is leftover dead code, will remove in next version.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-11 18:54 [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock Mathieu Desnoyers
  2026-03-11 20:11 ` Mathieu Desnoyers
@ 2026-03-12  8:49 ` Florian Weimer
  2026-03-12 13:13   ` Mathieu Desnoyers
  2026-03-12 13:46 ` André Almeida
  2026-03-12 22:23 ` Thomas Gleixner
  3 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2026-03-12  8:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: André Almeida, linux-kernel, Carlos O'Donell,
	Sebastian Andrzej Siewior, Peter Zijlstra, Rich Felker,
	Torvald Riegel, Darren Hart, Thomas Gleixner, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett

* Mathieu Desnoyers:

> + * This vDSO unlocks the robust futex by exchanging the content of
> + * *uaddr with 0 with a store-release semantic. If the futex has
> + * waiters, it sets bit 1 of *op_pending_addr, else it clears
> + * *op_pending_addr. Those operations are within a code region
> + * known by the kernel, making them safe with respect to asynchronous
> + * program termination either from thread context or from a nested
> + * signal handler.
> + *
> + * Expected use of this vDSO:
> + *
> + * if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
> + *     & FUTEX_WAITERS) != 0)
> + *         futex_wake((u32 *) &mutex->__data.__lock, 1, private);
> + * WRITE_ONCE(pd->robust_head.list_op_pending, 0);

The comment could perhaps say that pd->robust_head is the
thread-specific robust list that has been registered with
set_robust_list.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12  8:49 ` Florian Weimer
@ 2026-03-12 13:13   ` Mathieu Desnoyers
  2026-03-12 14:12     ` Florian Weimer
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 13:13 UTC (permalink / raw)
  To: Florian Weimer
  Cc: André Almeida, linux-kernel, Carlos O'Donell,
	Sebastian Andrzej Siewior, Peter Zijlstra, Rich Felker,
	Torvald Riegel, Darren Hart, Thomas Gleixner, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett

On 2026-03-12 04:49, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> + * This vDSO unlocks the robust futex by exchanging the content of
>> + * *uaddr with 0 with a store-release semantic. If the futex has
>> + * waiters, it sets bit 1 of *op_pending_addr, else it clears
>> + * *op_pending_addr. Those operations are within a code region
>> + * known by the kernel, making them safe with respect to asynchronous
>> + * program termination either from thread context or from a nested
>> + * signal handler.
>> + *
>> + * Expected use of this vDSO:
>> + *
>> + * if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
>> + *     & FUTEX_WAITERS) != 0)
>> + *         futex_wake((u32 *) &mutex->__data.__lock, 1, private);
>> + * WRITE_ONCE(pd->robust_head.list_op_pending, 0);
> 
> The comment could perhaps say that pd->robust_head is the
> thread-specific robust list that has been registered with
> set_robust_list.
Good point. Considering that "robust_head" is the thread-specific
robust list registered with set_robust_list, I wonder if passing
&robust_head->list_op_pending is the right ABI choice there,
or if we should rather pass the robust_head pointer and offset it
within the vDSO.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-11 18:54 [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock Mathieu Desnoyers
  2026-03-11 20:11 ` Mathieu Desnoyers
  2026-03-12  8:49 ` Florian Weimer
@ 2026-03-12 13:46 ` André Almeida
  2026-03-12 14:04   ` Mathieu Desnoyers
  2026-03-12 20:19   ` Thomas Gleixner
  2026-03-12 22:23 ` Thomas Gleixner
  3 siblings, 2 replies; 32+ messages in thread
From: André Almeida @ 2026-03-12 13:46 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Thomas Gleixner, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

Hi Mathieu,

Thanks for your patch!

Em 11/03/2026 15:54, Mathieu Desnoyers escreveu:
> This vDSO unlocks the robust futex by exchanging the content of
> *uaddr with 0 with a store-release semantic. If the futex has
> waiters, it sets bit 1 of *op_pending_addr, else it clears
> *op_pending_addr. Those operations are within a code region
> known by the kernel, making them safe with respect to asynchronous
> program termination either from thread context or from a nested
> signal handler.
> 
> Expected use of this vDSO:
> 
> if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
>      & FUTEX_WAITERS) != 0)
>          futex_wake((u32 *) &mutex->__data.__lock, 1, private);
> WRITE_ONCE(pd->robust_head.list_op_pending, 0);
> 

[...]

> +
> +u32 __vdso_robust_futex_unlock(u32 *uaddr, uintptr_t *op_pending_addr)
> +{

The interface that I would propose here would be a bit more "generic" or 
"flexible":

__vdso_robust_futex_unlock(void *uaddr, int uval, struct 
robust_list_head *head, unsigned int flags)

First because we have FUTEX2_SIZE's, so uaddr could have different size 
here. And we need `flags` to determine the size. Also `flags` is a great 
way to expand this funciton in the future without the need to create 
__vdso_robust_futex_unlock2().

I would also have `uval` instead of `val = 0`, because even though the 
most common semanthics for futex is that (LOCK_FREE == 0), futex has no 
predetermined semanthics of what each value means, and the userspace is 
free to choose what value they want to choose for a free lock.

And as you already highlighted in the other thread, I agree that `struct 
robust_list_head` instead of just the op_pending address is more 
flexible as well.

Thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 13:46 ` André Almeida
@ 2026-03-12 14:04   ` Mathieu Desnoyers
  2026-03-12 18:40     ` Mathieu Desnoyers
  2026-03-12 19:10     ` Thomas Gleixner
  2026-03-12 20:19   ` Thomas Gleixner
  1 sibling, 2 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 14:04 UTC (permalink / raw)
  To: André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Thomas Gleixner, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-12 09:46, André Almeida wrote:
> Hi Mathieu,
> 
> Thanks for your patch!
> 
> Em 11/03/2026 15:54, Mathieu Desnoyers escreveu:
>> This vDSO unlocks the robust futex by exchanging the content of
>> *uaddr with 0 with a store-release semantic. If the futex has
>> waiters, it sets bit 1 of *op_pending_addr, else it clears
>> *op_pending_addr. Those operations are within a code region
>> known by the kernel, making them safe with respect to asynchronous
>> program termination either from thread context or from a nested
>> signal handler.
>>
>> Expected use of this vDSO:
>>
>> if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd- 
>> >robust_head.list_op_pending)
>>      & FUTEX_WAITERS) != 0)
>>          futex_wake((u32 *) &mutex->__data.__lock, 1, private);
>> WRITE_ONCE(pd->robust_head.list_op_pending, 0);
>>
> 
> [...]
> 
>> +
>> +u32 __vdso_robust_futex_unlock(u32 *uaddr, uintptr_t *op_pending_addr)
>> +{
> 
> The interface that I would propose here would be a bit more "generic" or 
> "flexible":
> 
> __vdso_robust_futex_unlock(void *uaddr, int uval, struct 
> robust_list_head *head, unsigned int flags)

I agree on adding explicit "uval" and pointer to robust list head,
I'm not convinced that the rest is an improvement.

This would require the caller to deal with errors, making it
more complex than a simple replacement for atomic xchg/cmpxchg.

"flags" could be unsupported, so the handler would have to deal with
-EINVAL.

The "size" could be unsupported (e.g. 64-bit on a 32-bit arch), which
would also require the caller to deal with -EINVAL.

> First because we have FUTEX2_SIZE's, so uaddr could have different size 
> here.

Even in your proposal, "int uval" would be limited to 32-bit and would
not cover the 64-bit size case. Making this input parameter a void
pointer would remove type validation and adds complexity.

> And we need `flags` to determine the size. Also `flags` is a great 
> way to expand this funciton in the future without the need to create 
> __vdso_robust_futex_unlock2().

But adding flags leaves additional error handling burden for the caller.
I'm not sure it's a win.

> I would also have `uval` instead of `val = 0`, because even though the 
> most common semanthics for futex is that (LOCK_FREE == 0), futex has no 
> predetermined semanthics of what each value means, and the userspace is 
> free to choose what value they want to choose for a free lock.

Agreed on adding a u32 val parameter.

We can specialise the vdso for the size, e.g.:

extern u32 __vdso_robust_futex_unlock_u32(u32 *uaddr, u32 val, struct robust_list_head *robust_list_head);

and eventually add vdsos for u8, u16, u64.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 13:13   ` Mathieu Desnoyers
@ 2026-03-12 14:12     ` Florian Weimer
  2026-03-12 14:14       ` André Almeida
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2026-03-12 14:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: André Almeida, linux-kernel, Carlos O'Donell,
	Sebastian Andrzej Siewior, Peter Zijlstra, Rich Felker,
	Torvald Riegel, Darren Hart, Thomas Gleixner, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett

* Mathieu Desnoyers:

> On 2026-03-12 04:49, Florian Weimer wrote:
>> * Mathieu Desnoyers:
>> 
>>> + * This vDSO unlocks the robust futex by exchanging the content of
>>> + * *uaddr with 0 with a store-release semantic. If the futex has
>>> + * waiters, it sets bit 1 of *op_pending_addr, else it clears
>>> + * *op_pending_addr. Those operations are within a code region
>>> + * known by the kernel, making them safe with respect to asynchronous
>>> + * program termination either from thread context or from a nested
>>> + * signal handler.
>>> + *
>>> + * Expected use of this vDSO:
>>> + *
>>> + * if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
>>> + *     & FUTEX_WAITERS) != 0)
>>> + *         futex_wake((u32 *) &mutex->__data.__lock, 1, private);
>>> + * WRITE_ONCE(pd->robust_head.list_op_pending, 0);
>> The comment could perhaps say that pd->robust_head is the
>> thread-specific robust list that has been registered with
>> set_robust_list.

> Good point. Considering that "robust_head" is the thread-specific
> robust list registered with set_robust_list, I wonder if passing
> &robust_head->list_op_pending is the right ABI choice there,
> or if we should rather pass the robust_head pointer and offset it
> within the vDSO.

I think set_robust_list has pointer and size arguments, so we should
pass those two at least.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 14:12     ` Florian Weimer
@ 2026-03-12 14:14       ` André Almeida
  2026-03-12 16:09         ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: André Almeida @ 2026-03-12 14:14 UTC (permalink / raw)
  To: Florian Weimer, Mathieu Desnoyers
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Rich Felker, Torvald Riegel, Darren Hart,
	Thomas Gleixner, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett



Em 12/03/2026 11:12, Florian Weimer escreveu:
> * Mathieu Desnoyers:
> 
>> On 2026-03-12 04:49, Florian Weimer wrote:
>>> * Mathieu Desnoyers:
>>>
>>>> + * This vDSO unlocks the robust futex by exchanging the content of
>>>> + * *uaddr with 0 with a store-release semantic. If the futex has
>>>> + * waiters, it sets bit 1 of *op_pending_addr, else it clears
>>>> + * *op_pending_addr. Those operations are within a code region
>>>> + * known by the kernel, making them safe with respect to asynchronous
>>>> + * program termination either from thread context or from a nested
>>>> + * signal handler.
>>>> + *
>>>> + * Expected use of this vDSO:
>>>> + *
>>>> + * if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, &pd->robust_head.list_op_pending)
>>>> + *     & FUTEX_WAITERS) != 0)
>>>> + *         futex_wake((u32 *) &mutex->__data.__lock, 1, private);
>>>> + * WRITE_ONCE(pd->robust_head.list_op_pending, 0);
>>> The comment could perhaps say that pd->robust_head is the
>>> thread-specific robust list that has been registered with
>>> set_robust_list.
> 
>> Good point. Considering that "robust_head" is the thread-specific
>> robust list registered with set_robust_list, I wonder if passing
>> &robust_head->list_op_pending is the right ABI choice there,
>> or if we should rather pass the robust_head pointer and offset it
>> within the vDSO.
> 
> I think set_robust_list has pointer and size arguments, so we should
> pass those two at least.
> 

The size argument for set_robust_list() has never been useful it seems, 
it just checks if (size == sizeof(*head)). I believe it was added in 
case the struct would ever be expanded, but that never happened and with 
set_robust_list2() in the horizon this is even less likely to ever happen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 14:14       ` André Almeida
@ 2026-03-12 16:09         ` Mathieu Desnoyers
  0 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 16:09 UTC (permalink / raw)
  To: André Almeida, Florian Weimer
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Rich Felker, Torvald Riegel, Darren Hart,
	Thomas Gleixner, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-12 10:14, André Almeida wrote:
> 
> 
> Em 12/03/2026 11:12, Florian Weimer escreveu:
>> * Mathieu Desnoyers:
>>
>>> On 2026-03-12 04:49, Florian Weimer wrote:
>>>> * Mathieu Desnoyers:
>>>>
>>>>> + * This vDSO unlocks the robust futex by exchanging the content of
>>>>> + * *uaddr with 0 with a store-release semantic. If the futex has
>>>>> + * waiters, it sets bit 1 of *op_pending_addr, else it clears
>>>>> + * *op_pending_addr. Those operations are within a code region
>>>>> + * known by the kernel, making them safe with respect to asynchronous
>>>>> + * program termination either from thread context or from a nested
>>>>> + * signal handler.
>>>>> + *
>>>>> + * Expected use of this vDSO:
>>>>> + *
>>>>> + * if ((__vdso_robust_futex_unlock((u32 *) &mutex->__data.__lock, 
>>>>> &pd->robust_head.list_op_pending)
>>>>> + *     & FUTEX_WAITERS) != 0)
>>>>> + *         futex_wake((u32 *) &mutex->__data.__lock, 1, private);
>>>>> + * WRITE_ONCE(pd->robust_head.list_op_pending, 0);
>>>> The comment could perhaps say that pd->robust_head is the
>>>> thread-specific robust list that has been registered with
>>>> set_robust_list.
>>
>>> Good point. Considering that "robust_head" is the thread-specific
>>> robust list registered with set_robust_list, I wonder if passing
>>> &robust_head->list_op_pending is the right ABI choice there,
>>> or if we should rather pass the robust_head pointer and offset it
>>> within the vDSO.
>>
>> I think set_robust_list has pointer and size arguments, so we should
>> pass those two at least.
>>
> 
> The size argument for set_robust_list() has never been useful it seems, 
> it just checks if (size == sizeof(*head)). I believe it was added in 
> case the struct would ever be expanded, but that never happened and with 
> set_robust_list2() in the horizon this is even less likely to ever happen.

I'd prefer not passing an extra parameter which would require the caller
to perform error validation unless it's really necessary.

The field we need to access (list_op_pending) is part of the original
structure, so it would be there even of this structure is extended in
the future.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 14:04   ` Mathieu Desnoyers
@ 2026-03-12 18:40     ` Mathieu Desnoyers
  2026-03-12 18:58       ` André Almeida
  2026-03-12 19:10     ` Thomas Gleixner
  1 sibling, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 18:40 UTC (permalink / raw)
  To: André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Thomas Gleixner, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-12 10:04, Mathieu Desnoyers wrote:
> On 2026-03-12 09:46, André Almeida wrote:
[...]
>> First because we have FUTEX2_SIZE's, so uaddr could have different 
>> size here.

The robust futex ABI defines:

#define FUTEX_OWNER_DIED        0x40000000
#define FUTEX_WAITERS           0x80000000

So how can a robust futex use a uaddr smaller than 32-bit ?

And if the uaddr is 64-bit, I would expect those two values
to be bits 63 and 62 (indexed from 0), but this contradicts
the defines.

What am I missing ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 18:40     ` Mathieu Desnoyers
@ 2026-03-12 18:58       ` André Almeida
  0 siblings, 0 replies; 32+ messages in thread
From: André Almeida @ 2026-03-12 18:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Thomas Gleixner, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

Em 12/03/2026 15:40, Mathieu Desnoyers escreveu:
> On 2026-03-12 10:04, Mathieu Desnoyers wrote:
>> On 2026-03-12 09:46, André Almeida wrote:
> [...]
>>> First because we have FUTEX2_SIZE's, so uaddr could have different 
>>> size here.
> 
> The robust futex ABI defines:
> 
> #define FUTEX_OWNER_DIED        0x40000000
> #define FUTEX_WAITERS           0x80000000
> 
> So how can a robust futex use a uaddr smaller than 32-bit ?
> 
> And if the uaddr is 64-bit, I would expect those two values
> to be bits 63 and 62 (indexed from 0), but this contradicts
> the defines.
> 
> What am I missing ?

Oh, I didn't realized that. So, for the current API, robust mutexes need 
to be 32bit integers. If everyone agrees on that, we can document this 
and let it be.

In the future we could something like FUTEX_OWNER_DIED_U64 and 
FUTEX_WAITERS_U64 to make it work correctly. But right now, robust 
futexes can only work with u32.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 14:04   ` Mathieu Desnoyers
  2026-03-12 18:40     ` Mathieu Desnoyers
@ 2026-03-12 19:10     ` Thomas Gleixner
  2026-03-12 19:16       ` Mathieu Desnoyers
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-12 19:10 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On Thu, Mar 12 2026 at 10:04, Mathieu Desnoyers wrote:
> On 2026-03-12 09:46, André Almeida wrote:
>> The interface that I would propose here would be a bit more "generic" or 
>> "flexible":
>> 
>> __vdso_robust_futex_unlock(void *uaddr, int uval, struct 
>> robust_list_head *head, unsigned int flags)
>
> I agree on adding explicit "uval" and pointer to robust list head,
> I'm not convinced that the rest is an improvement.
>
> This would require the caller to deal with errors, making it
> more complex than a simple replacement for atomic xchg/cmpxchg.
>
> "flags" could be unsupported, so the handler would have to deal with
> -EINVAL.

What's the problem with that? pthread_mutex_unlock() has a return value
too.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 19:10     ` Thomas Gleixner
@ 2026-03-12 19:16       ` Mathieu Desnoyers
  2026-03-13  8:20         ` Florian Weimer
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 19:16 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-12 15:10, Thomas Gleixner wrote:
> On Thu, Mar 12 2026 at 10:04, Mathieu Desnoyers wrote:
>> On 2026-03-12 09:46, André Almeida wrote:
>>> The interface that I would propose here would be a bit more "generic" or
>>> "flexible":
>>>
>>> __vdso_robust_futex_unlock(void *uaddr, int uval, struct
>>> robust_list_head *head, unsigned int flags)
>>
>> I agree on adding explicit "uval" and pointer to robust list head,
>> I'm not convinced that the rest is an improvement.
>>
>> This would require the caller to deal with errors, making it
>> more complex than a simple replacement for atomic xchg/cmpxchg.
>>
>> "flags" could be unsupported, so the handler would have to deal with
>> -EINVAL.
> 
> What's the problem with that? pthread_mutex_unlock() has a return value
> too.

My aim is to use this vDSO as a replacement for atomic xchg and atomic
cmpxchg within library code. I am trying to make the transition as
straightforward as possible considering that this is a design bug
fix.

If adding error handling at that precise point of the libc robust mutex
unlock code is straightforward, I don't mind internally checking flags
and returning -EINVAL, but I'd want to hear about preference from the
libc people on this topic beforehand.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 13:46 ` André Almeida
  2026-03-12 14:04   ` Mathieu Desnoyers
@ 2026-03-12 20:19   ` Thomas Gleixner
  2026-03-12 21:28     ` Mathieu Desnoyers
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-12 20:19 UTC (permalink / raw)
  To: André Almeida, Mathieu Desnoyers
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On Thu, Mar 12 2026 at 10:46, André Almeida wrote:
> Em 11/03/2026 15:54, Mathieu Desnoyers escreveu:
> I would also have `uval` instead of `val = 0`, because even though the 
> most common semanthics for futex is that (LOCK_FREE == 0), futex has no 
> predetermined semanthics of what each value means, and the userspace is 
> free to choose what value they want to choose for a free lock.

That's true for non-robust futexes, but robust futexes have clearly
defined semantics vs. the userspace value:

        0:	unlocked
        != 0:	PID of the owner

So uval is useless as the unlock value can't be anything else than 0,
no?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 20:19   ` Thomas Gleixner
@ 2026-03-12 21:28     ` Mathieu Desnoyers
  0 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 21:28 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-12 16:19, Thomas Gleixner wrote:
> On Thu, Mar 12 2026 at 10:46, André Almeida wrote:
>> Em 11/03/2026 15:54, Mathieu Desnoyers escreveu:
>> I would also have `uval` instead of `val = 0`, because even though the
>> most common semanthics for futex is that (LOCK_FREE == 0), futex has no
>> predetermined semanthics of what each value means, and the userspace is
>> free to choose what value they want to choose for a free lock.
> 
> That's true for non-robust futexes, but robust futexes have clearly
> defined semantics vs. the userspace value:
> 
>          0:	unlocked
>          != 0:	PID of the owner
> 
> So uval is useless as the unlock value can't be anything else than 0,
> no?

Yes, that was my original assumption, hence why I did not have a
"val" parameter (and internally used 0 within the vDSO). I can
go back to that behavior in a following version.

Just to confirm: for robust PI futexes, the value used for
the cmpxchg store would always be 0 as well, right ?
If that's the case, having an input "val" is useless there
too, and we just need a pointer to an expected/loaded value.

I'm also not convinced that adding a _u32 suffix to the
vDSO symbol is useful if the robust futexes already define
the futex value to be 32-bit wide.

Thanks,

Mathieu

> 
> Thanks,
> 
>          tglx


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-11 18:54 [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2026-03-12 13:46 ` André Almeida
@ 2026-03-12 22:23 ` Thomas Gleixner
  2026-03-12 22:52   ` Mathieu Desnoyers
  3 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-12 22:23 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Mathieu Desnoyers, Carlos O'Donell,
	Sebastian Andrzej Siewior, Peter Zijlstra, Florian Weimer,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett

On Wed, Mar 11 2026 at 14:54, Mathieu Desnoyers wrote:
> +u32 __vdso_robust_futex_unlock(u32 *uaddr, uintptr_t *op_pending_addr)
> +{
> +	u32 val = 0;
> +
> +	/*
> +	 * Within the ip range identified by the futex exception table,
> +	 * the register "eax" contains the value loaded by xchg. This is
> +	 * expected by futex_vdso_exception() to check whether waiters
> +	 * need to be woken up. This register state is transferred to
> +	 * bit 1 (NEED_WAKEUP) of *op_pending_addr before the ip range
> +	 * ends.
> +	 */
> +	asm volatile (	_ASM_VDSO_EXTABLE_FUTEX_HANDLE(1f, 3f)
> +			/* Exchange uaddr (store-release). */
> +			"xchg %[uaddr], %[val]\n\t"
> +			"1:\n\t"
> +			/* Test if FUTEX_WAITERS (0x80000000) is set. */
> +			"test %[val], %[val]\n\t"
> +			"js 2f\n\t"
> +			/* Clear *op_pending_addr if there are no waiters. */
> +			ASM_PTR_SET "$0, %[op_pending_addr]\n\t"
> +			"jmp 3f\n\t"
> +			"2:\n\t"
> +			/* Set bit 1 (NEED_WAKEUP) in *op_pending_addr. */
> +			ASM_PTR_BIT_SET "$1, %[op_pending_addr]\n\t"
> +			"3:\n\t"
> +			: [val] "+a" (val),
> +			  [uaddr] "+m" (*uaddr)
> +			: [op_pending_addr] "m" (*op_pending_addr)
> +			: "memory");

TBH, all of this is completely overengineered and tasteless bloat.

The exactly same thing can be achieved by doing the obvious:

struct robust_list_head2 {
	struct robust_list_head		rhead;
        u32				unlock_val;
};

// User space
unlock(futex)
{
        struct robust_list_head2 *h = ....;

        h->unlock_val = 0;
        h->rhead.list_op_pending = .... | FUTEX_ROBUST_UNLOCK;

        xchg(futex->uval, h->unlock_val);

        if (h->unlock_val & FUTEX_WAITERS)
        	syscall(FUTEX, &futex->uval, FUTEX_WAKE, ....);

	h->rhead.list_op_pending = NULL;
}

And then the kernel robust list code does:

    	if (fetch_robust_entry(&pending, &head->list_op_pending, &pip))
        	return;

        if (pending & FUTEX_ROBUST_UNLOCK_PENDING) {
        	if (get_user(unlock_val, &head_v2->unlock_val))
                	return;
        }

        .....

        if (!pending)
        	return;

        /*
         * If userspace unlocked the futex already, but did not manage
         * to clear the pending pointer, then the futex is not longer
         * owned by the task and might have been freed already.
         *
         * As the dying task it not the owner anymore there is no need
         * to access the futex and to set the OWNERDEAD bit, just wake
         * up a waiter in case the task died before doing so.
         *
         * That wakeup might be spurious, but that's harmless as all
         * futex users must be able to handle spurious wake ups
         * correctly.
         */
        if (unlock_val) {
         	if (unlock_val & FUTEX_WAITERS)
                	futex_wake(pending + offset,....);
		return;
        }

No?

If you do it clever you can extend the existing code with minimally
intrusive changes.

But yeah, no ASM, no VDSO, no signal magic, no architecture EXTABLE
mess, no architecture specific hackery, too generic and not convoluted
enough, seriously?

And replying to your other mail right here:

> My aim is to use this vDSO as a replacement for atomic xchg and atomic
> cmpxchg within library code. I am trying to make the transition as
> straightforward as possible considering that this is a design bug
> fix.

Absolutely not for the price of creating a completely incomprehensible
and unjustified mess in the kernel when it can be done with a trivial
new interface, which just extends the existing one by the missing
functionality in a generic way.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 22:23 ` Thomas Gleixner
@ 2026-03-12 22:52   ` Mathieu Desnoyers
  2026-03-13 12:12     ` Sebastian Andrzej Siewior
  2026-03-16 17:12     ` Thomas Gleixner
  0 siblings, 2 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-12 22:52 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-12 18:23, Thomas Gleixner wrote:
> On Wed, Mar 11 2026 at 14:54, Mathieu Desnoyers wrote:
[...]
> 
> TBH, all of this is completely overengineered and tasteless bloat.
> 
> The exactly same thing can be achieved by doing the obvious:
> 
> struct robust_list_head2 {
> 	struct robust_list_head		rhead;
>          u32				unlock_val;
> };
> 
> // User space
> unlock(futex)
> {
>          struct robust_list_head2 *h = ....;
> 
>          h->unlock_val = 0;
>          h->rhead.list_op_pending = .... | FUTEX_ROBUST_UNLOCK;
> 
>          xchg(futex->uval, h->unlock_val);

Here is the problem with your proposed approach:

   "XCHG — Exchange Register/Memory With Register"
                                         ^^^^^^^^

So only one of the xchg arguments can be a memory location.
Therefore, you will end up needing an extra store after xchg
to store the content of the result register into h->unlock_val.

If the process dies between those two instructions, your proposed
robust list code will be fooled and fall into the same bug that's
been lingering for 14 years.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 19:16       ` Mathieu Desnoyers
@ 2026-03-13  8:20         ` Florian Weimer
  0 siblings, 0 replies; 32+ messages in thread
From: Florian Weimer @ 2026-03-13  8:20 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, André Almeida, linux-kernel,
	Carlos O'Donell, Sebastian Andrzej Siewior, Peter Zijlstra,
	Rich Felker, Torvald Riegel, Darren Hart, Ingo Molnar,
	Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett

* Mathieu Desnoyers:

> On 2026-03-12 15:10, Thomas Gleixner wrote:
>> On Thu, Mar 12 2026 at 10:04, Mathieu Desnoyers wrote:
>>> On 2026-03-12 09:46, André Almeida wrote:
>>>> The interface that I would propose here would be a bit more "generic" or
>>>> "flexible":
>>>>
>>>> __vdso_robust_futex_unlock(void *uaddr, int uval, struct
>>>> robust_list_head *head, unsigned int flags)
>>>
>>> I agree on adding explicit "uval" and pointer to robust list head,
>>> I'm not convinced that the rest is an improvement.
>>>
>>> This would require the caller to deal with errors, making it
>>> more complex than a simple replacement for atomic xchg/cmpxchg.
>>>
>>> "flags" could be unsupported, so the handler would have to deal with
>>> -EINVAL.
>> What's the problem with that? pthread_mutex_unlock() has a return
>> value
>> too.
>
> My aim is to use this vDSO as a replacement for atomic xchg and atomic
> cmpxchg within library code. I am trying to make the transition as
> straightforward as possible considering that this is a design bug
> fix.
>
> If adding error handling at that precise point of the libc robust mutex
> unlock code is straightforward, I don't mind internally checking flags
> and returning -EINVAL, but I'd want to hear about preference from the
> libc people on this topic beforehand.

As a deallocation operation, unlock must not fail.  We would therefore
ignore the error return value (because EINVAL can only happen for
invalid arguments), or we'd terminate the process on failure.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 22:52   ` Mathieu Desnoyers
@ 2026-03-13 12:12     ` Sebastian Andrzej Siewior
  2026-03-13 12:17       ` Mathieu Desnoyers
  2026-03-16 17:12     ` Thomas Gleixner
  1 sibling, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-13 12:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, André Almeida, linux-kernel,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-12 18:52:43 [-0400], Mathieu Desnoyers wrote:
> On 2026-03-12 18:23, Thomas Gleixner wrote:
> > On Wed, Mar 11 2026 at 14:54, Mathieu Desnoyers wrote:
> [...]
> > 
> > TBH, all of this is completely overengineered and tasteless bloat.
> > 
> > The exactly same thing can be achieved by doing the obvious:
> > 
> > struct robust_list_head2 {
> > 	struct robust_list_head		rhead;
> >          u32				unlock_val;
> > };
> > 
> > // User space
> > unlock(futex)
> > {
> >          struct robust_list_head2 *h = ....;
> > 
> >          h->unlock_val = 0;
> >          h->rhead.list_op_pending = .... | FUTEX_ROBUST_UNLOCK;
> > 
> >          xchg(futex->uval, h->unlock_val);
> 
> Here is the problem with your proposed approach:
> 
>   "XCHG — Exchange Register/Memory With Register"
>                                         ^^^^^^^^
> 
> So only one of the xchg arguments can be a memory location.
> Therefore, you will end up needing an extra store after xchg
> to store the content of the result register into h->unlock_val.

But can't we also assign a role to pthread_mutex_destroy() here? So it
would ensure that the futex death cleanup did run for every task having
access to this memory? So it is either 0 or pid-of-dead-task before this
memory location can be used again?

> Thanks,
> 
> Mathieu

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-13 12:12     ` Sebastian Andrzej Siewior
@ 2026-03-13 12:17       ` Mathieu Desnoyers
  2026-03-13 13:29         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-13 12:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, André Almeida, linux-kernel,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-13 08:12, Sebastian Andrzej Siewior wrote:
> On 2026-03-12 18:52:43 [-0400], Mathieu Desnoyers wrote:
>> On 2026-03-12 18:23, Thomas Gleixner wrote:
>>> On Wed, Mar 11 2026 at 14:54, Mathieu Desnoyers wrote:
>> [...]
>>>
>>> TBH, all of this is completely overengineered and tasteless bloat.
>>>
>>> The exactly same thing can be achieved by doing the obvious:
>>>
>>> struct robust_list_head2 {
>>> 	struct robust_list_head		rhead;
>>>           u32				unlock_val;
>>> };
>>>
>>> // User space
>>> unlock(futex)
>>> {
>>>           struct robust_list_head2 *h = ....;
>>>
>>>           h->unlock_val = 0;
>>>           h->rhead.list_op_pending = .... | FUTEX_ROBUST_UNLOCK;
>>>
>>>           xchg(futex->uval, h->unlock_val);
>>
>> Here is the problem with your proposed approach:
>>
>>    "XCHG — Exchange Register/Memory With Register"
>>                                          ^^^^^^^^
>>
>> So only one of the xchg arguments can be a memory location.
>> Therefore, you will end up needing an extra store after xchg
>> to store the content of the result register into h->unlock_val.
> 
> But can't we also assign a role to pthread_mutex_destroy() here? So it
> would ensure that the futex death cleanup did run for every task having
> access to this memory? So it is either 0 or pid-of-dead-task before this
> memory location can be used again?

I did propose this exact approach recently:

https://lore.kernel.org/lkml/bd7a8dd3-8dee-4886-abe6-bdda25fe4a0d@efficios.com/

but it's a far reaching change. Then I thought of using rseq to identify the
critical section:

https://lore.kernel.org/lkml/694424f4-20d1-4473-8955-859acbad466f@efficios.com/

And then Florian proposed to hide this under a vDSO:

https://lore.kernel.org/lkml/lhufr6ihelv.fsf@oldenburg.str.redhat.com/

and here we are.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-13 12:17       ` Mathieu Desnoyers
@ 2026-03-13 13:29         ` Sebastian Andrzej Siewior
  2026-03-13 13:35           ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-13 13:29 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, André Almeida, linux-kernel,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-13 08:17:57 [-0400], Mathieu Desnoyers wrote:
> and here we are.

I would still prefer a "small" solution which is expensive in the
unlikely case if it has to be.
Your vdso version includes asm code which needs to be implemented by
every architecture. The exception table did not look like something that
universally available. Also, if we take this seriously it needs to be
backported to every stable kernel.

> Thanks,
> 
> Mathieu
> 

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-13 13:29         ` Sebastian Andrzej Siewior
@ 2026-03-13 13:35           ` Mathieu Desnoyers
  0 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-13 13:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, André Almeida, linux-kernel,
	Carlos O'Donell, Peter Zijlstra, Florian Weimer, Rich Felker,
	Torvald Riegel, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Arnd Bergmann, Liam R . Howlett

On 2026-03-13 09:29, Sebastian Andrzej Siewior wrote:
> On 2026-03-13 08:17:57 [-0400], Mathieu Desnoyers wrote:
>> and here we are.
> 
> I would still prefer a "small" solution which is expensive in the
> unlikely case if it has to be.

I'm not sure I would call the new robust futex destroy hook solution
"small". Also, I don't think it would be sufficient to fix the robust
PI futex races (my approach handles this as well, see follow up
patches).

> Your vdso version includes asm code which needs to be implemented by
> every architecture.

Correct.

> The exception table did not look like something that
> universally available.

Indeed, we'd need to port the vDSO exception table code from x86 to
other archs.

> Also, if we take this seriously it needs to be
> backported to every stable kernel.

Yes. The same can be said about the hypothetical robust futex destroy
hook.

That being said, I do not expect any "easy" fix for a design
bug that's been unsolved for 16+ years.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-12 22:52   ` Mathieu Desnoyers
  2026-03-13 12:12     ` Sebastian Andrzej Siewior
@ 2026-03-16 17:12     ` Thomas Gleixner
  2026-03-16 19:36       ` Mathieu Desnoyers
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-16 17:12 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On Thu, Mar 12 2026 at 18:52, Mathieu Desnoyers wrote:
> On 2026-03-12 18:23, Thomas Gleixner wrote:
>>          xchg(futex->uval, h->unlock_val);
>
> Here is the problem with your proposed approach:
>
>    "XCHG — Exchange Register/Memory With Register"
>                                          ^^^^^^^^
>
> So only one of the xchg arguments can be a memory location.
> Therefore, you will end up needing an extra store after xchg
> to store the content of the result register into h->unlock_val.

Indeed.

> If the process dies between those two instructions, your proposed
> robust list code will be fooled and fall into the same bug that's
> been lingering for 14 years.

s/lingering/ignored/

To fix this for correctness sake it needs more than a hack in the kernel
without even looking at the overall larger picture. I sat down and did a
full analysis and here are the most important questions:

Q: Have non-PI and PI to be treated differently?

A: No.

   That's just historical evolution. While PI can't use XCHG because that
   would create inconsistent state, there is absolutely no reason why
   non-PI can't use try_cmpxchg().


Q: Is it required to unlock in user space first and then go into the kernel
   to wake up waiters?

A: No.

   That's again a historical leftover from the 1st generation futexes which
   preceeded both robust and PI. There is no technical reason to keep it
   this way.

   So both can do:

       if (cmpxchg(lock, tid, 0) != tid)
       		sys_futex(UNLOCK,....);

   which then allows for both non-PI and PI to hand the pending op pointer
   into the syscall and let the kernel deal with the unlock, the op pointer
   and the wake up in one go.

   That reduces the problem space to take care of the non-contended unlock
   case, where the pending op is cleared after the cmpxchg() succeeded.

   And yes, that part can be done in the VDSO and a fixup mechanism in the
   kernel.


Q: Are robust list pointers guaranteed to be 64-bit when running as a
   64-bit task?

A: No.

   The gaming emulators use both the native 64-bit robust list and the
   32-bit robust list from the same 64-bit application to make the
   emulation work.

   So both the UNLOCK syscall and the fixup need to have means to figure
   out the to be cleared size for that pointer.

   Sure, this can be done with a boat load of different functions and flags
   and whatever, but that makes the actual fixup handling in the kernel
   more complicated than necessary.


Q: Have regular signal delivery and process exit in case of crash or being
   killed by a external signal to be treated differently?

A: No.

   A task always goes through the same signal code path for both cases so
   all of this can be handled in _one_ place without even touching the
   robust list cleanup code.

   sys_exit() is different because there a task voluntarily exits and if
   it does so between the unlock and the clearing of the op pointer,
   then so be it. That'd be wilfull ignorance or malice and not any
   different from the task doing the corruption itself in user space
   right away.


Q: Are exception tables a good idea?

A: No.

   This is not an exception handling case. It's a fixup similar to RSEQ
   critical section fixups and so it has to be handled with dedicated
   mechanisms which are performant and not glued onto something which has a
   completely different purpose.


>> This fixes a long standing data corruption race condition with robust
>>  futexes, as pointed out here:
>>
>>  "File corruption race condition in robust mutex unlocking"
>>  https://sourceware.org/bugzilla/show_bug.cgi?id=14485
  
No comment.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 17:12     ` Thomas Gleixner
@ 2026-03-16 19:36       ` Mathieu Desnoyers
  2026-03-16 20:27         ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-16 19:36 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-16 13:12, Thomas Gleixner wrote:
> On Thu, Mar 12 2026 at 18:52, Mathieu Desnoyers wrote:
[...]

> To fix this for correctness sake it needs more than a hack in the kernel
> without even looking at the overall larger picture.

If my POC helped move the discussion forward, then it has achieved
its purpose. :)

> I sat down and did a
> full analysis and here are the most important questions:
> 
> Q: Have non-PI and PI to be treated differently?
> 
> A: No.
> 
>     That's just historical evolution. While PI can't use XCHG because that
>     would create inconsistent state, there is absolutely no reason why
>     non-PI can't use try_cmpxchg().

Agreed.

> 
> 
> Q: Is it required to unlock in user space first and then go into the kernel
>     to wake up waiters?
> 
> A: No.
> 
>     That's again a historical leftover from the 1st generation futexes which
>     preceeded both robust and PI. There is no technical reason to keep it
>     this way.
> 
>     So both can do:
> 
>         if (cmpxchg(lock, tid, 0) != tid)
>         		sys_futex(UNLOCK,....);
> 
>     which then allows for both non-PI and PI to hand the pending op pointer
>     into the syscall and let the kernel deal with the unlock, the op pointer
>     and the wake up in one go.

Yes, that's a nice simplification.

> 
>     That reduces the problem space to take care of the non-contended unlock
>     case, where the pending op is cleared after the cmpxchg() succeeded.
> 
>     And yes, that part can be done in the VDSO and a fixup mechanism in the
>     kernel.

Yes.

> 
> 
> Q: Are robust list pointers guaranteed to be 64-bit when running as a
>     64-bit task?
> 
> A: No.
> 
>     The gaming emulators use both the native 64-bit robust list and the
>     32-bit robust list from the same 64-bit application to make the
>     emulation work.
> 
>     So both the UNLOCK syscall and the fixup need to have means to figure
>     out the to be cleared size for that pointer.
> 
>     Sure, this can be done with a boat load of different functions and flags
>     and whatever, but that makes the actual fixup handling in the kernel
>     more complicated than necessary.

Good point, this is a requirement I did not know about. I notice you
are dealing with it in your series.

> 
> 
> Q: Have regular signal delivery and process exit in case of crash or being
>     killed by a external signal to be treated differently?
> 
> A: No.
> 
>     A task always goes through the same signal code path for both cases so
>     all of this can be handled in _one_ place without even touching the
>     robust list cleanup code.

So far, yes.

> 
>     sys_exit() is different because there a task voluntarily exits and if
>     it does so between the unlock and the clearing of the op pointer,
>     then so be it. That'd be wilfull ignorance or malice and not any
>     different from the task doing the corruption itself in user space
>     right away.

I'm not sure about this one. How about the two following scenario:
A concurrent thread calls sys_exit concurrently with the vdso. Is this
something we should handle or consider it "wilfull ignorance/malice" ?

> Q: Are exception tables a good idea?
> 
> A: No.
> 
>     This is not an exception handling case. It's a fixup similar to RSEQ
>     critical section fixups and so it has to be handled with dedicated
>     mechanisms which are performant and not glued onto something which has a
>     completely different purpose.

I agree with your kernel-level approach. I've proposed a few changes to
the vdso itself and vdso2c script to increase robustness in my review.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 19:36       ` Mathieu Desnoyers
@ 2026-03-16 20:27         ` Thomas Gleixner
  2026-03-16 21:01           ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-16 20:27 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On Mon, Mar 16 2026 at 15:36, Mathieu Desnoyers wrote:
> On 2026-03-16 13:12, Thomas Gleixner wrote:
>>     sys_exit() is different because there a task voluntarily exits and if
>>     it does so between the unlock and the clearing of the op pointer,
>>     then so be it. That'd be wilfull ignorance or malice and not any
>>     different from the task doing the corruption itself in user space
>>     right away.
>
> I'm not sure about this one. How about the two following scenario:
> A concurrent thread calls sys_exit concurrently with the vdso. Is this
> something we should handle or consider it "wilfull ignorance/malice" ?

I don't understand your question. What has the exit to do with the VDSO?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 20:27         ` Thomas Gleixner
@ 2026-03-16 21:01           ` Mathieu Desnoyers
  2026-03-16 22:19             ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-16 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-16 16:27, Thomas Gleixner wrote:
> On Mon, Mar 16 2026 at 15:36, Mathieu Desnoyers wrote:
>> On 2026-03-16 13:12, Thomas Gleixner wrote:
>>>      sys_exit() is different because there a task voluntarily exits and if
>>>      it does so between the unlock and the clearing of the op pointer,
>>>      then so be it. That'd be wilfull ignorance or malice and not any
>>>      different from the task doing the corruption itself in user space
>>>      right away.
>>
>> I'm not sure about this one. How about the two following scenario:
>> A concurrent thread calls sys_exit concurrently with the vdso. Is this
>> something we should handle or consider it "wilfull ignorance/malice" ?
> 
> I don't understand your question. What has the exit to do with the VDSO?

You mentioned that "if a task exits between unlock and clearing of the op
pointer, then so be it".

But that exit could be issued by another thread, not necessarily by the
thread doing the unlock + pointer clear.

But I understand that your series takes care of this by:

- clearing the op pointer within the futex syscall,
- tracking the insn range and ZF state within the vDSO.

I'm fine with your approach, I was just not sure about your comment
about it being "different" for sys_exit.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 21:01           ` Mathieu Desnoyers
@ 2026-03-16 22:19             ` Thomas Gleixner
  2026-03-16 22:30               ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-16 22:19 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On Mon, Mar 16 2026 at 17:01, Mathieu Desnoyers wrote:
> On 2026-03-16 16:27, Thomas Gleixner wrote:
>> On Mon, Mar 16 2026 at 15:36, Mathieu Desnoyers wrote:
>>> On 2026-03-16 13:12, Thomas Gleixner wrote:
>>>>      sys_exit() is different because there a task voluntarily exits and if
>>>>      it does so between the unlock and the clearing of the op pointer,
>>>>      then so be it. That'd be wilfull ignorance or malice and not any
>>>>      different from the task doing the corruption itself in user space
>>>>      right away.
>>>
>>> I'm not sure about this one. How about the two following scenario:
>>> A concurrent thread calls sys_exit concurrently with the vdso. Is this
>>> something we should handle or consider it "wilfull ignorance/malice" ?
>> 
>> I don't understand your question. What has the exit to do with the VDSO?
>
> You mentioned that "if a task exits between unlock and clearing of the op
> pointer, then so be it".
>
> But that exit could be issued by another thread, not necessarily by the
> thread doing the unlock + pointer clear.
>
> But I understand that your series takes care of this by:
>
> - clearing the op pointer within the futex syscall,
> - tracking the insn range and ZF state within the vDSO.
>
> I'm fine with your approach, I was just not sure about your comment
> about it being "different" for sys_exit.

What I clearly described is the sequence:

   set_pointer();
   unlock();
   sys_exit();

The kernel does not care about that at all as that's what user space
asked for. That is clearly in the category of "I want to shoot myself
into the foot".

The only case where the kernel has to provide help to user space is the
involuntary exit caused by a crash or external signal between unlock()
and clear_pointer(). Simply because there is no way that user space can
solve that problem on its own.

If you want to prevent user space from shooting itself into the foot
then the above crude scenario is the least of your problems.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 22:19             ` Thomas Gleixner
@ 2026-03-16 22:30               ` Mathieu Desnoyers
  2026-03-16 23:29                 ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-16 22:30 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-16 18:19, Thomas Gleixner wrote:
> On Mon, Mar 16 2026 at 17:01, Mathieu Desnoyers wrote:
>> On 2026-03-16 16:27, Thomas Gleixner wrote:
>>> On Mon, Mar 16 2026 at 15:36, Mathieu Desnoyers wrote:
>>>> On 2026-03-16 13:12, Thomas Gleixner wrote:
>>>>>       sys_exit() is different because there a task voluntarily exits and if
>>>>>       it does so between the unlock and the clearing of the op pointer,
>>>>>       then so be it. That'd be wilfull ignorance or malice and not any
>>>>>       different from the task doing the corruption itself in user space
>>>>>       right away.
>>>>
>>>> I'm not sure about this one. How about the two following scenario:
>>>> A concurrent thread calls sys_exit concurrently with the vdso. Is this
>>>> something we should handle or consider it "wilfull ignorance/malice" ?
>>>
>>> I don't understand your question. What has the exit to do with the VDSO?
>>
>> You mentioned that "if a task exits between unlock and clearing of the op
>> pointer, then so be it".
>>
>> But that exit could be issued by another thread, not necessarily by the
>> thread doing the unlock + pointer clear.
>>
>> But I understand that your series takes care of this by:
>>
>> - clearing the op pointer within the futex syscall,
>> - tracking the insn range and ZF state within the vDSO.
>>
>> I'm fine with your approach, I was just not sure about your comment
>> about it being "different" for sys_exit.
> 
> What I clearly described is the sequence:
> 
>     set_pointer();
>     unlock();
>     sys_exit();
> 
> The kernel does not care about that at all as that's what user space
> asked for. That is clearly in the category of "I want to shoot myself
> into the foot".
> 
> The only case where the kernel has to provide help to user space is the
> involuntary exit caused by a crash or external signal between unlock()
> and clear_pointer(). Simply because there is no way that user space can
> solve that problem on its own.
> 
> If you want to prevent user space from shooting itself into the foot
> then the above crude scenario is the least of your problems.

So the extra scenario I am concerned about is:

Thread A                Thread B
----------------------------------------
set_pointer();
unlock();
                         syscall exit_group(2)

This does not fall under the "async" program termination per se, because
it is issued by Thread B, but it's not the result of an "exit(2)" call
from Thread A.

Is this scenario too far fetched, or something we should care about ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 22:30               ` Mathieu Desnoyers
@ 2026-03-16 23:29                 ` Thomas Gleixner
  2026-03-20 18:13                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-16 23:29 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On Mon, Mar 16 2026 at 18:30, Mathieu Desnoyers wrote:
> On 2026-03-16 18:19, Thomas Gleixner wrote:
>> What I clearly described is the sequence:
>> 
>>     set_pointer();
>>     unlock();
>>     sys_exit();
>> 
>> The kernel does not care about that at all as that's what user space
>> asked for. That is clearly in the category of "I want to shoot myself
>> into the foot".
>> 
>> The only case where the kernel has to provide help to user space is the
>> involuntary exit caused by a crash or external signal between unlock()
>> and clear_pointer(). Simply because there is no way that user space can
>> solve that problem on its own.
>> 
>> If you want to prevent user space from shooting itself into the foot
>> then the above crude scenario is the least of your problems.
>
> So the extra scenario I am concerned about is:
>
> Thread A                Thread B
> ----------------------------------------
> set_pointer();
> unlock();
>                          syscall exit_group(2)
>
> This does not fall under the "async" program termination per se, because
> it is issued by Thread B, but it's not the result of an "exit(2)" call
> from Thread A.
>
> Is this scenario too far fetched, or something we should care about ?

It's a legit scenario, but you still fail to try to look at the code and
understand how all of this works even after I gave you enough hints.

I'm truly amazed that you even failed to ask any AI agent the obvious
question:

   "When a task invokes the exit_group syscall on Linux how does the
    Linux kernel manage to tear down all tasks which belong to the same
    process?"

Both agents which https://arena.ai randomly picked out for me provided
very comprehensive explanations. Let me paste you one of them:

   "3. Terminate All Other Threads

    do_group_exit calls zap_other_threads, which:

    Iterates over all tasks in the thread group using for_each_thread
    (traversing the thread group list in task_struct).  Sends an
    uncatchable SIGKILL signal to every thread except the current one
    (using SEND_SIG_FORCED to bypass any signal blocking). Since SIGKILL
    cannot be caught or ignored, these threads will terminate
    immediately."

If that's not a sufficient answer for you, may I recommed to look at:

  https://training.linuxfoundation.org/

Thanks,

        tglx













^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-16 23:29                 ` Thomas Gleixner
@ 2026-03-20 18:13                   ` Mathieu Desnoyers
  2026-03-24 21:35                     ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-20 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

On 2026-03-16 19:29, Thomas Gleixner wrote:
[...]
> If that's not a sufficient answer for you, may I recommed to look at:
> 
>    https://training.linuxfoundation.org/

Thomas,

Can you have a conversation without constantly belittling those you speak with?

The scenario I raised regarding thread group termination was an act of caution.
While I didn't have the specific implementation details 'paged in' at that exact
moment, asking for clarification on how a proposed solution interacts with complex
kernel behaviors is how we avoid regressions. Suggesting that I lack basic
knowledge or recommending introductory training does nothing to advance the
technical discussion; it is simply an attempt to undermine my expertise.

We have discussed the nature of these interactions privately in the past, but the
behavior persists. In order for any developer to truly engage in these discussions,
there needs to be a baseline of professional trust. When feedback shifts from
technical merits to personal assessments of an interlocutor’s knowledge, the
discussion ceases to be technical.

On a public forum like LKML, the effect of your behavior is to make people unwilling
to provide feedback for fear of being the target of snarky comments. By doing so, you
also prevent people who have useful feedback from communicating it. This makes it
difficult for contributors like me to remain productive members of the community.

The kernel relies on diverse areas of expertise that complement each other. I’d like
us to return to a level of discourse that focuses on the code and the technical
problems we are trying to solve, rather than the person behind the keyboard.

Sincerely,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-20 18:13                   ` Mathieu Desnoyers
@ 2026-03-24 21:35                     ` Thomas Gleixner
  2026-03-25 14:12                       ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2026-03-24 21:35 UTC (permalink / raw)
  To: Mathieu Desnoyers, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

Mathieu!

On Fri, Mar 20 2026 at 14:13, Mathieu Desnoyers wrote:

Thanks for bringing this up. I reflected on my behavior and, first of all, I
want to offer my sincere apologies.

Allow me to share my reflections – neither to defend or excuse my behavior
nor to deflect it back to you – but as a form of honest root cause analysis.

I realized that I underestimated how differently the usage of hyperbolic
exaggeration – beyond the point of absurdity – is perceived across
different cultures and individuals.

It is generally known that the German communication style can be very
direct. As a thought experiment, I tried to imagine how I would receive the
same reply if sent to me. My initial unspoken thought would have probably
been, “Point taken. You bastard.” then, my reaction would have been either
to go silent and do more research or, if it annoyed me enough, to dryly
serve it back with, “Were you referring to the training about
professionalism and multicultural awareness? Was it any good?”

But I acknowledge that communication styles, expectations and perception
are very different across cultures and individuals. So I'll try to avoid it
in the future.

Honestly, my response was a signal that my patience was exhausted and my
frustration had crossed a threshold. I have publicly admitted in the past
that I have difficulties with this. I'm working on this, but every now and
then, my filter fails and I end up being too stingingly direct for my own
good.


Where does this frustration originate from?

 1) The respect for other people's time is deeply cultural and ingrained in
    me.

    It's highly valued when people show up for technical discussions with
    the details 'paged in'. Unpreparedness can be perceived as disregarding
    others’ time.

    I'm caring deeply about my time, which is a finite precious resource
    and I'm very sensitive and protective about it so that I'm able to
    distribute and re-prioritize it between bugs, regressions, submissions
    and my own work.


 2) In German engineering culture 'solved' is a binary state which implies
    that something is complete, tested and well-explained.

    Claiming it to be solved while noting that it's incomplete and untested
    is not only a logical contradiction, but it can be perceived as a
    breach of professional integrity or as an act of ‘marketing’.

    Seeking feedback on an idea or proof of concept is certainly welcome and
    encouraged, provided it's clearly communicated that way – along with an
    explanation of the analysis which led to the approach.


 3) Methodological rigor is a key expectation of my engineering culture.

    Due to that, I spend significant amounts of time and effort to provide
    full technical analysis with detailed break downs. These write-ups are
    useful for myself to self check my conclusions and work but also aimed
    to make it easy for the recipient to follow my thought process. I also
    take care of documenting the expectations for patch submissions in order
    to make the process smooth and less time consuming for everyone
    involved.

    I become deeply frustrated when I perceive that these efforts aren't
    valued, not met with reciprocal rigor, and/or when documentation or
    feedback is ignored – especially when it comes from people I respect
    and I know they are capable.


I respect you as a person and I respect your technical expertise. I
absolutely had no intention to belittle you or to undermine your expertise.

I offer you my sincere apology once again.

     Thomas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock
  2026-03-24 21:35                     ` Thomas Gleixner
@ 2026-03-25 14:12                       ` Mathieu Desnoyers
  0 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2026-03-25 14:12 UTC (permalink / raw)
  To: Thomas Gleixner, André Almeida
  Cc: linux-kernel, Carlos O'Donell, Sebastian Andrzej Siewior,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett

Thomas,

On 2026-03-24 17:35, Thomas Gleixner wrote:

Thank you for your thoughtful and candid reply. I appreciate both the
apology and the cultural self-analysis, which is a rare and valuable
thing to receive.

I want to share one additional reflection, because I think it situates
your cultural self-analysis within the structure of authority and earned
standing that governs LKML.

You approached our exchange as peer sparring, two people with equal
standing who could joust and move on. You are not wrong that I could
have responded in kind. But I don't, and not because I lack the
vocabulary for it. I don't, because I am aware that these exchanges
happen in public, in front of contributors who do not have that
standing. When two senior figures trade barbed remarks on LKML, the
message received by a less experienced observer is not "these two peers
are sparring", but rather "this is how things work here." That is a
precedent I am unwilling to set, regardless of whether the immediate
exchange between us would have resolved cleanly.

This also means the problem does not become smaller when the target is a
junior contributor, it becomes larger. The assumption of reciprocity
that makes peer sparring tolerable in your frame collapses entirely when
there is no reciprocity possible. A newer contributor on the receiving
end of the same remark cannot fire back, cannot absorb it
professionally, and is most likely to simply go quiet or step away.

I raise this not to reopen the grievance — your apology was genuine and
I accept it fully — but because I think this framing is useful for the
broader question of what norms we want to establish on this list. The
behavior is most costly precisely where it is least visible: in the
people who say nothing and disengage.

Sincerely,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-03-25 14:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 18:54 [RFC PATCH] futex: Introduce __vdso_robust_futex_unlock Mathieu Desnoyers
2026-03-11 20:11 ` Mathieu Desnoyers
2026-03-12  8:49 ` Florian Weimer
2026-03-12 13:13   ` Mathieu Desnoyers
2026-03-12 14:12     ` Florian Weimer
2026-03-12 14:14       ` André Almeida
2026-03-12 16:09         ` Mathieu Desnoyers
2026-03-12 13:46 ` André Almeida
2026-03-12 14:04   ` Mathieu Desnoyers
2026-03-12 18:40     ` Mathieu Desnoyers
2026-03-12 18:58       ` André Almeida
2026-03-12 19:10     ` Thomas Gleixner
2026-03-12 19:16       ` Mathieu Desnoyers
2026-03-13  8:20         ` Florian Weimer
2026-03-12 20:19   ` Thomas Gleixner
2026-03-12 21:28     ` Mathieu Desnoyers
2026-03-12 22:23 ` Thomas Gleixner
2026-03-12 22:52   ` Mathieu Desnoyers
2026-03-13 12:12     ` Sebastian Andrzej Siewior
2026-03-13 12:17       ` Mathieu Desnoyers
2026-03-13 13:29         ` Sebastian Andrzej Siewior
2026-03-13 13:35           ` Mathieu Desnoyers
2026-03-16 17:12     ` Thomas Gleixner
2026-03-16 19:36       ` Mathieu Desnoyers
2026-03-16 20:27         ` Thomas Gleixner
2026-03-16 21:01           ` Mathieu Desnoyers
2026-03-16 22:19             ` Thomas Gleixner
2026-03-16 22:30               ` Mathieu Desnoyers
2026-03-16 23:29                 ` Thomas Gleixner
2026-03-20 18:13                   ` Mathieu Desnoyers
2026-03-24 21:35                     ` Thomas Gleixner
2026-03-25 14:12                       ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox