* [PATCH 1/2] Documentation: futex: Add a note about robust list race condition
2026-03-26 14:31 [PATCH 0/2] futex: Docs and tests for robust futex unlock race André Almeida
@ 2026-03-26 14:31 ` André Almeida
2026-03-26 14:31 ` [PATCH 2/2] selftests: futex: Add tests for robust release operations André Almeida
2026-03-26 16:38 ` [PATCH 0/2] futex: Docs and tests for robust futex unlock race Thomas Gleixner
2 siblings, 0 replies; 6+ messages in thread
From: André Almeida @ 2026-03-26 14:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
Davidlohr Bueso, Mathieu Desnoyers, Sebastian Andrzej Siewior,
Carlos O'Donell, Florian Weimer, Darren Hart, Arnd Bergmann,
Thomas Weißschuh
Cc: linux-kernel, kernel-dev, André Almeida
Add a note to the documentation giving a brief explanation why doing a
robust futex release in userspace is racy, what should be done to avoid
it and provide links to read more.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
Documentation/locking/robust-futex-ABI.rst | 44 ++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/Documentation/locking/robust-futex-ABI.rst b/Documentation/locking/robust-futex-ABI.rst
index f24904f1c16f..1808b108a58e 100644
--- a/Documentation/locking/robust-futex-ABI.rst
+++ b/Documentation/locking/robust-futex-ABI.rst
@@ -153,6 +153,9 @@ On removal:
3) release the futex lock, and
4) clear the 'lock_op_pending' word.
+Please note that the removal of a robust futex purely in userspace is
+racy. Refer to the next chapter to learn more and how to avoid this.
+
On exit, the kernel will consider the address stored in
'list_op_pending' and the address of each 'lock word' found by walking
the list starting at 'head'. For each such address, if the bottom 30
@@ -182,3 +185,44 @@ any point:
When the kernel sees a list entry whose 'lock word' doesn't have the
current threads TID in the lower 30 bits, it does nothing with that
entry, and goes on to the next entry.
+
+Robust release is racy
+----------------------
+
+The removal of a robust futex from the list is racy when doing solely in
+userspace. Quoting Thomas Gleixer for the explanation:
+
+ The robust futex unlock mechanism is racy in respect to the clearing of the
+ robust_list_head::list_op_pending pointer because unlock and clearing the
+ pointer are not atomic. The race window is between the unlock and clearing
+ the pending op pointer. If the task is forced to exit in this window, exit
+ will access a potentially invalid pending op pointer when cleaning up the
+ robust list. That happens if another task manages to unmap the object
+ containing the lock before the cleanup, which results in an UAF. In the
+ worst case this UAF can lead to memory corruption when unrelated content
+ has been mapped to the same address by the time the access happens.
+
+A full in dept analysis can be read at
+https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/
+
+To overcome that, the kernel needs to participate of the lock release operation.
+This ensures that the release happens "atomically" in the regard of releasing
+the lock and removing the address from ``lock_op_pending``. If the release is
+interrupted by a signal, the kernel will also verify if it interrupted the
+release operation.
+
+For the contended unlock case, where other threads are waiting for the lock
+release, there's the ``FUTEX_ROBUST_UNLOCK`` operation for the ``futex()``
+system call, which must be used with one of the following operations:
+``FUTEX_WAKE``, ``FUTEX_WAKE_BITSET`` or ``FUTEX_UNLOCK_PI``. The kernel will
+release the lock (set the futex word to zero), clean the ``lock_op_pending``
+field. Then, it will proceed with the normal wake path.
+
+For the non-contended path, there's still a race between checking the futex word
+and clearing the ``lock_op_pending`` field. To solve this without the need of a
+complete system call, userspace should call the virtual syscall
+``__vdso_futex_robust_listXX_try_unlock()`` (where XX is either 32 or 64,
+depending on the size of the pointer). If the vDSO call succeeds, it means that
+it released the lock and cleared ``lock_op_pending``. If it fails, that means
+that there are waiters for this lock and a call to ``futex()`` syscall with
+``FUTEX_ROBUST_UNLOCK`` is needed.
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] selftests: futex: Add tests for robust release operations
2026-03-26 14:31 [PATCH 0/2] futex: Docs and tests for robust futex unlock race André Almeida
2026-03-26 14:31 ` [PATCH 1/2] Documentation: futex: Add a note about robust list race condition André Almeida
@ 2026-03-26 14:31 ` André Almeida
2026-03-27 16:04 ` Thomas Gleixner
2026-03-26 16:38 ` [PATCH 0/2] futex: Docs and tests for robust futex unlock race Thomas Gleixner
2 siblings, 1 reply; 6+ messages in thread
From: André Almeida @ 2026-03-26 14:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
Davidlohr Bueso, Mathieu Desnoyers, Sebastian Andrzej Siewior,
Carlos O'Donell, Florian Weimer, Darren Hart, Arnd Bergmann,
Thomas Weißschuh
Cc: linux-kernel, kernel-dev, André Almeida
Add tests for __vdso_futex_robust_listXX_try_unlock() and for the futex()
op FUTEX_ROBUST_UNLOCK.
Test the contended and uncontended cases for the vDSO functions and all
ops combinations for FUTEX_ROBUST_UNLOCK.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
.../selftests/futex/functional/robust_list.c | 203 +++++++++++++++++++++
tools/testing/selftests/futex/include/futextest.h | 3 +
2 files changed, 206 insertions(+)
diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index e7d1254e18ca..38a3f9e9efc2 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -27,12 +27,14 @@
#include "futextest.h"
#include "../../kselftest_harness.h"
+#include <dlfcn.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stddef.h>
+#include <sys/auxv.h>
#include <sys/mman.h>
#include <sys/wait.h>
@@ -54,6 +56,12 @@ static int get_robust_list(int pid, struct robust_list_head **head, size_t *len_
return syscall(SYS_get_robust_list, pid, head, len_ptr);
}
+static int sys_futex_robust_unlock(_Atomic(uint32_t) *uaddr, unsigned int op, int val,
+ void *list_op_pending, unsigned int val3)
+{
+ return syscall(SYS_futex, uaddr, op, val, NULL, list_op_pending, val3, 0);
+}
+
/*
* Basic lock struct, contains just the futex word and the robust list element
* Real implementations have also a *prev to easily walk in the list
@@ -549,4 +557,199 @@ TEST(test_circular_list)
ksft_test_result_pass("%s\n", __func__);
}
+/*
+ * Bellow are tests for the fix of robust release race condition. Please read the following
+ * thread to learn more about the issue in the first place and why the following functions fix it:
+ * https://lore.kernel.org/lkml/20260316162316.356674433@kernel.org/
+ */
+
+/*
+ * Auxiliary code for loading the vDSO functions
+ */
+#define VDSO_SIZE 0x4000
+
+void *get_vdso_func_addr(const char *str)
+{
+ void *vdso_base = (void *) getauxval(AT_SYSINFO_EHDR), *addr;
+ Dl_info info;
+
+ if (!vdso_base) {
+ perror("Error to get AT_SYSINFO_EHDR");
+ return NULL;
+ }
+
+ for (addr = vdso_base; addr < vdso_base + VDSO_SIZE; addr += sizeof(addr)) {
+ if (dladdr(addr, &info) == 0 || !info.dli_sname)
+ continue;
+
+ if (!strcmp(info.dli_sname, str))
+ return info.dli_saddr;
+ }
+
+ return NULL;
+}
+
+/*
+ * These are the real vDSO function signatures:
+ *
+ * __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+ * __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+ *
+ * So for the generic entry point we need to use a void pointer as the last argument
+ */
+FIXTURE(vdso_unlock)
+{
+ uint32_t (*vdso)(_Atomic(uint32_t) *lock, uint32_t tid, void *pop);
+};
+
+FIXTURE_VARIANT(vdso_unlock)
+{
+ bool is_32;
+ char func_name[];
+};
+
+FIXTURE_SETUP(vdso_unlock)
+{
+ self->vdso = get_vdso_func_addr(variant->func_name);
+
+ if (!self->vdso)
+ ksft_test_result_skip("%s not found\n", variant->func_name);
+}
+
+FIXTURE_TEARDOWN(vdso_unlock) {}
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 32)
+{
+ .func_name = "__vdso_futex_robust_list32_try_unlock",
+ .is_32 = true,
+};
+
+FIXTURE_VARIANT_ADD(vdso_unlock, 64)
+{
+ .func_name = "__vdso_futex_robust_list64_try_unlock",
+ .is_32 = false,
+};
+
+/*
+ * Test the vDSO robust_listXX_try_unlock() for the uncontended case. The virtual syscall should
+ * return the thread ID of the lock owner, the lock word must be 0 and the list_op_pending should
+ * be NULL.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_uncontended)
+{
+ struct lock_struct lock = { .futex = 0 };
+ _Atomic(unsigned int) *futex = &lock.futex;
+ struct robust_list_head head;
+ uint64_t exp = (uint64_t) NULL;
+ pid_t tid = gettid();
+ int ret;
+
+ *futex = tid;
+
+ ret = set_list(&head);
+ if (ret)
+ ksft_test_result_fail("set_robust_list error\n");
+
+ head.list_op_pending = &lock.list;
+
+ ret = self->vdso(futex, tid, &head.list_op_pending);
+
+ ASSERT_EQ(ret, tid);
+ ASSERT_EQ(*futex, 0);
+
+ /* Check only the lower 32 bits for the 32-bit entry point */
+ if (variant->is_32) {
+ exp = (uint64_t)(unsigned long)&lock.list;
+ exp &= ~0xFFFFFFFFULL;
+ }
+
+ ASSERT_EQ((uint64_t)(unsigned long)head.list_op_pending, exp);
+}
+
+/*
+ * If the lock is contended, the operation fails. The return value is the value found at the
+ * futex word (tid | FUTEX_WAITERS), the futex word is not modified and the list_op_pending is_32
+ * not cleared.
+ */
+TEST_F(vdso_unlock, test_robust_try_unlock_contended)
+{
+ struct lock_struct lock = { .futex = 0 };
+ _Atomic(unsigned int) *futex = &lock.futex;
+ struct robust_list_head head;
+ pid_t tid = gettid();
+ int ret;
+
+ *futex = tid | FUTEX_WAITERS;
+
+ ret = set_list(&head);
+ if (ret)
+ ksft_test_result_fail("set_robust_list error\n");
+
+ head.list_op_pending = &lock.list;
+
+ ret = self->vdso(futex, tid, &head.list_op_pending);
+
+ ASSERT_EQ(ret, tid | FUTEX_WAITERS);
+ ASSERT_EQ(*futex, tid | FUTEX_WAITERS);
+ ASSERT_EQ(head.list_op_pending, &lock.list);
+}
+
+FIXTURE(futex_op) {};
+
+FIXTURE_VARIANT(futex_op)
+{
+ unsigned int op;
+ unsigned int val3;
+};
+
+FIXTURE_SETUP(futex_op) {}
+
+FIXTURE_TEARDOWN(futex_op) {}
+
+FIXTURE_VARIANT_ADD(futex_op, wake)
+{
+ .op = FUTEX_WAKE,
+ .val3 = 0,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, wake_bitset)
+{
+ .op = FUTEX_WAKE_BITSET,
+ .val3 = FUTEX_BITSET_MATCH_ANY,
+};
+
+FIXTURE_VARIANT_ADD(futex_op, unlock_pi)
+{
+ .op = FUTEX_UNLOCK_PI,
+ .val3 = 0,
+};
+
+/*
+ * The syscall should return the number of tasks waken (for this test, 0), clear the futex word and
+ * clear list_op_pending
+ */
+TEST_F(futex_op, test_futex_robust_unlock)
+{
+ struct lock_struct lock = { .futex = 0 };
+ _Atomic(unsigned int) *futex = &lock.futex;
+ struct robust_list_head head;
+ pid_t tid = gettid();
+ int ret;
+
+ *futex = tid | FUTEX_WAITERS;
+
+ ret = set_list(&head);
+ if (ret)
+ ksft_test_result_fail("set_robust_list error\n");
+
+ head.list_op_pending = &lock.list;
+
+ ret = sys_futex_robust_unlock(futex, FUTEX_ROBUST_UNLOCK | variant->op, tid,
+ &head.list_op_pending, variant->val3);
+
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(*futex, 0);
+ ASSERT_EQ(head.list_op_pending, NULL);
+}
+
TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index 3d48e9789d9f..f4d880b8e795 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -38,6 +38,9 @@ typedef volatile u_int32_t futex_t;
#ifndef FUTEX_CMP_REQUEUE_PI
#define FUTEX_CMP_REQUEUE_PI 12
#endif
+#ifndef FUTEX_ROBUST_UNLOCK
+#define FUTEX_ROBUST_UNLOCK 512
+#endif
#ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 0/2] futex: Docs and tests for robust futex unlock race
2026-03-26 14:31 [PATCH 0/2] futex: Docs and tests for robust futex unlock race André Almeida
2026-03-26 14:31 ` [PATCH 1/2] Documentation: futex: Add a note about robust list race condition André Almeida
2026-03-26 14:31 ` [PATCH 2/2] selftests: futex: Add tests for robust release operations André Almeida
@ 2026-03-26 16:38 ` Thomas Gleixner
2026-03-27 7:18 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2026-03-26 16:38 UTC (permalink / raw)
To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
Davidlohr Bueso, Mathieu Desnoyers, Sebastian Andrzej Siewior,
Carlos O'Donell, Florian Weimer, Darren Hart, Arnd Bergmann,
Thomas Weißschuh
Cc: linux-kernel, kernel-dev, André Almeida
Andre!
On Thu, Mar 26 2026 at 11:31, André Almeida wrote:
> I have created two patches to be added to our series, please have a look. The
> first one is a note for the robust list docs about the need to use a
> vDSO/syscall to avoid the race condition (this should be added to the man page
> in the future). The second one adds the new syscalls to the robust list
> selftests, testing all available combinations.
Very appreciated. I'll pick them up for the V3 posting I'm working on.
> Ideally I would like to inject some error to the vDSO test, so we can really
> test the mechanism of it being interrupted and the kernel cleaning the
> list_op_pending, using ASM injection or ptrace(), but I have left for a next
> patch for now.
Please sync with Sebastian. He wanted to look into this ptrace thing
too.
Thanks,
tglx
^ permalink raw reply [flat|nested] 6+ messages in thread