linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] futex: Create set_robust_list2
@ 2025-06-26 17:11 André Almeida
  2025-06-26 17:11 ` [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros André Almeida
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

This patch adds a new robust_list() syscall. The current syscall
can't be expanded to cover the following use case, so a new one is
needed. This new syscall allows users to set multiple robust lists per
process and to have either 32bit or 64bit pointers in the list.

* Use case

FEX-Emu[1] is an application that runs x86 and x86-64 binaries on an
AArch64 Linux host. One of the tasks of FEX-Emu is to translate syscalls
from one platform to another. Existing set_robust_list() can't be easily
translated because of two limitations:

1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel
   this is not a problem, because of the compat entry point. But there's
   no such compat entry point for AArch64, so the kernel would do the
   pointer arithmetic wrongly. Is also unviable to userspace to keep
   track every addition/removal to the robust list and keep a 64bit
   version of it somewhere else to feed the kernel. Thus, the new
   interface has an option of telling the kernel if the list is filled
   with 32bit or 64bit pointers.

2) Apps can set just one robust list (in theory, x86-64 can set two if
   they also use the compat entry point). That means that when a x86 app
   asks FEX-Emu to call set_robust_list(), FEX have two options: to
   overwrite their own robust list pointer and make the app robust, or
   to ignore the app robust list and keep the emulator robust. The new
   interface allows for multiple robust lists per application, solving
   this.

* Interface

This is the proposed interface:

	long set_robust_list2(void *head, int index, unsigned int flags)

`head` is the head of the userspace struct robust_list_head, just as old
set_robust_list(). It needs to be a void pointer since it can point to a normal
robust_list_head or a compat_robust_list_head.

`flags` can be used for defining the list type:

	enum robust_list_type {
	 	ROBUST_LIST_32BIT,
		ROBUST_LIST_64BIT,
	 };

`index` is the index in the internal robust_list's linked list (the naming
starts to get confusing, I reckon). If `index == -1`, that means that user wants
to set a new robust_list, and the kernel will append it in the end of the list,
assign a new index and return this index to the user. If `index >= 0`, that
means that user wants to re-set `*head` of an already existing list (similarly
to what happens when you call set_robust_list() twice with different `*head`).

If `index` is out of range, or it points to a non-existing robust_list, or if
the internal list is full, an error is returned.

* Implementation

The implementation re-uses most of the existing robust list interface as
possible. The new task_struct member `struct list_head robust_list2` is just a
linked list where new lists are appended as the user requests more lists, and by
futex_cleanup(), the kernel walks through the internal list feeding
exit_robust_list() with the robust_list's.

This implementation supports up to 10 lists (defined at ROBUST_LISTS_PER_TASK),
but it was an arbitrary number for this RFC. For the described use case above, 4
should be enough, I'm not sure which should be the limit.

It doesn't support list removal (should it support?). It doesn't have a proper
get_robust_list2() yet as well, but I can add it in a next revision. We could
also have a generic robust_list() syscall that can be used to set/get and be
controlled by flags.

The new interface has a `unsigned int flags` argument, making it
extensible for future use cases as well.

It refuses unaligned `head` addresses. It doesn't have a limit for elements in a
single list (like ROBUST_LIST_LIMIT), it destroys the list as it is parsed to be
safe against circular lists.

* Testing

This patcheset has a selftest patch that expands this one:
https://lore.kernel.org/lkml/20250212131123.37431-1-andrealmeid@igalia.com/

Also, FEX-Emu added support for this interface to validate it:
https://github.com/FEX-Emu/FEX/pull/3966

Feedback is very welcomed!

Thanks,
	André

[1] https://github.com/FEX-Emu/FEX

Changelog:
- Fixed compilation issues when CONFIG_COMPAT or CONFIG_FUTEX are not
  set
- Rebased on top of new futex work (private hash)
v4: https://lore.kernel.org/lkml/20250225183531.682556-1-andrealmeid@igalia.com/

- Refuse unaligned head pointers
- Ignore ROBUST_LIST_LIMIT for lists created with this interface and make it
  robust against circular lists
- Fix a get_robust_list() syscall bug for getting the list from another thread
- Adapt selftest to use the new interface
v3: https://lore.kernel.org/lkml/20241217174958.477692-1-andrealmeid@igalia.com/

- Old syscall set_robust_list() adds new head to the internal linked list of
  robust lists pointers, instead of having a field just for them. Remove
  tsk->robust_list and use only tsk->robust_list2
v2: https://lore.kernel.org/lkml/20241101162147.284993-1-andrealmeid@igalia.com/

- Added a patch to properly deal with exit_robust_list() in 64bit vs 32bit
- Wired-up syscall for all archs
- Added more of the cover letter to the commit message
v1: https://lore.kernel.org/lkml/20241024145735.162090-1-andrealmeid@igalia.com/

---
André Almeida (7):
      selftests/futex: Add ASSERT_ macros
      selftests/futex: Create test for robust list
      futex: Use explicit sizes for compat_exit_robust_list
      futex: Create set_robust_list2
      futex: Remove the limit of elements for sys_set_robust_list2 lists
      futex: Wire up set_robust_list2 syscall
      selftests: futex: Expand robust list test for the new interface

 arch/alpha/kernel/syscalls/syscall.tbl             |   1 +
 arch/arm/tools/syscall.tbl                         |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl              |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl        |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl          |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl            |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl           |   1 +
 arch/s390/kernel/syscalls/syscall.tbl              |   1 +
 arch/sh/kernel/syscalls/syscall.tbl                |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl            |   1 +
 include/linux/compat.h                             |  12 +-
 include/linux/futex.h                              |  30 +-
 include/linux/sched.h                              |   5 +-
 include/uapi/asm-generic/unistd.h                  |   2 +
 include/uapi/linux/futex.h                         |  10 +
 kernel/futex/core.c                                | 156 ++++-
 kernel/futex/futex.h                               |   5 +
 kernel/futex/syscalls.c                            |  85 ++-
 kernel/sys_ni.c                                    |   1 +
 scripts/syscall.tbl                                |   1 +
 .../testing/selftests/futex/functional/.gitignore  |   1 +
 tools/testing/selftests/futex/functional/Makefile  |   3 +-
 .../selftests/futex/functional/robust_list.c       | 706 +++++++++++++++++++++
 tools/testing/selftests/futex/include/logging.h    |  38 ++
 29 files changed, 1021 insertions(+), 49 deletions(-)
---
base-commit: a24cc6ce1933eade12aa2b9859de0fcd2dac2c06
change-id: 20250225-tonyk-robust_futex-60adeedac695

Best regards,
-- 
André Almeida <andrealmeid@igalia.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-26 22:07   ` Thomas Gleixner
  2025-06-26 17:11 ` [PATCH v5 2/7] selftests/futex: Create test for robust list André Almeida
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

Create ASSERT_{EQ, NE, TRUE, FALSE} macros to make test creation easier.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 tools/testing/selftests/futex/include/logging.h | 38 +++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/futex/include/logging.h b/tools/testing/selftests/futex/include/logging.h
index 874c69ce5cce9efa3a9d6de246f5972a75437dbf..a19755622a877932884570c8f58aaee7371d5f8f 100644
--- a/tools/testing/selftests/futex/include/logging.h
+++ b/tools/testing/selftests/futex/include/logging.h
@@ -23,6 +23,44 @@
 #include <linux/futex.h>
 #include "kselftest.h"
 
+#define ASSERT_EQ(var, value)	\
+do {				\
+	if (var != value) {	\
+		ksft_test_result_fail("%s: expected %ld, but %s has %ld\n", \
+				      __func__, (long) value, #var, \
+				      (long) var); \
+		return;		\
+	}			\
+} while (0)
+
+#define ASSERT_NE(var, value)	\
+do {				\
+	if (var == value) {	\
+		ksft_test_result_fail("%s: expected not %ld, but %s has %ld\n", \
+				      __func__, (long) value, #var, \
+				      (long) var); \
+		return; \
+	}		\
+} while (0)
+
+#define ASSERT_TRUE(var)	\
+do {				\
+	if ((var) == 0) {	\
+		ksft_test_result_fail("%s: expected %s to be true\n", \
+				      __func__, #var); \
+		return;		\
+	}			\
+} while (0)
+
+#define ASSERT_FALSE(var)	\
+do {				\
+	if (var) {		\
+		ksft_test_result_fail("%s: expected %s to be false\n", \
+				      __func__, #var); \
+		return;		\
+	}			\
+} while (0)
+
 /*
  * Define PASS, ERROR, and FAIL strings with and without color escape
  * sequences, default to no color.

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 2/7] selftests/futex: Create test for robust list
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
  2025-06-26 17:11 ` [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-26 22:36   ` Thomas Gleixner
  2025-06-26 17:11 ` [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list André Almeida
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

Create a test for the robust list mechanism. Test the following uAPI
operations:

- Creating a robust mutex where the lock waiter is wake by the kernel
  when the lock owner died
- Setting a robust list to the current task
- Getting a robust list from the current task
- Getting a robust list from another task
- Using the list_op_pending field from robust_list_head struct to test
  robustness when the lock owner dies before completing the locking
- Setting a invalid size for syscall argument `len`
- Adding multiple elements to a robust list wait waiting for each of
  them
- Creating a circular list and checking that the kernel does not get
  stuck in an infinity loop

This is the expected output:

 TAP version 13
 1..7
 ok 1 test_robustness
 ok 2 test_set_robust_list_invalid_size
 ok 3 test_get_robust_list_self
 ok 4 test_get_robust_list_child
 ok 5 test_set_list_op_pending
 ok 6 test_robust_list_multiple_elements
 ok 7 test_circular_list
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 .../testing/selftests/futex/functional/.gitignore  |   1 +
 tools/testing/selftests/futex/functional/Makefile  |   3 +-
 .../selftests/futex/functional/robust_list.c       | 554 +++++++++++++++++++++
 3 files changed, 557 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore
index 7b24ae89594a9db211d4b8469ebcef8d1f7012d8..7f447ebfbc62bbad9add0dc86a75abcdb8a4d9a7 100644
--- a/tools/testing/selftests/futex/functional/.gitignore
+++ b/tools/testing/selftests/futex/functional/.gitignore
@@ -11,3 +11,4 @@ futex_wait_timeout
 futex_wait_uninitialized_heap
 futex_wait_wouldblock
 futex_waitv
+robust_list
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index 8cfb87f7f7c5059c82f1e6290c076d3f13f5ea41..e6fa66e622dee4de74c31c8b9b486ca01de35737 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -20,7 +20,8 @@ TEST_GEN_PROGS := \
 	futex_priv_hash \
 	futex_numa_mpol \
 	futex_waitv \
-	futex_numa
+	futex_numa \
+	robust_list
 
 TEST_PROGS := run.sh
 
diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
new file mode 100644
index 0000000000000000000000000000000000000000..42690b2440fd29a9b12c46f67f9645ccc93d1147
--- /dev/null
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -0,0 +1,554 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2024 Igalia S.L.
+ *
+ * Robust list test by André Almeida <andrealmeid@igalia.com>
+ *
+ * The robust list uAPI allows userspace to create "robust" locks, in the sense
+ * that if the lock holder thread dies, the remaining threads that are waiting
+ * for the lock won't block forever, waiting for a lock that will never be
+ * released.
+ *
+ * This is achieve by userspace setting a list where a thread can enter all the
+ * locks (futexes) that it is holding. The robust list is a linked list, and
+ * userspace register the start of the list with the syscall set_robust_list().
+ * If such thread eventually dies, the kernel will walk this list, waking up one
+ * thread waiting for each futex and marking the futex word with the flag
+ * FUTEX_OWNER_DIED.
+ *
+ * See also
+ *	man set_robust_list
+ *	Documententation/locking/robust-futex-ABI.rst
+ *	Documententation/locking/robust-futexes.rst
+ */
+
+#define _GNU_SOURCE
+
+#include "futextest.h"
+#include "logging.h"
+
+#include <errno.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdatomic.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#define STACK_SIZE (1024 * 1024)
+
+#define FUTEX_TIMEOUT 3
+
+static pthread_barrier_t barrier, barrier2;
+
+int set_robust_list(struct robust_list_head *head, size_t len)
+{
+	return syscall(SYS_set_robust_list, head, len);
+}
+
+int get_robust_list(int pid, struct robust_list_head **head, size_t *len_ptr)
+{
+	return syscall(SYS_get_robust_list, pid, head, len_ptr);
+}
+
+/*
+ * Basic lock struct, contains just the futex word and the robust list element
+ * Real implementations have also a *prev to easily walk in the list
+ */
+struct lock_struct {
+	_Atomic(unsigned int) futex;
+	struct robust_list list;
+};
+
+/*
+ * Helper function to spawn a child thread. Returns -1 on error, pid on success
+ */
+static int create_child(int (*fn)(void *arg), void *arg)
+{
+	char *stack;
+	pid_t pid;
+
+	stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
+		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
+	if (stack == MAP_FAILED)
+		return -1;
+
+	stack += STACK_SIZE;
+
+	pid = clone(fn, stack, CLONE_VM | SIGCHLD, arg);
+
+	if (pid == -1)
+		return -1;
+
+	return pid;
+}
+
+/*
+ * Helper function to prepare and register a robust list
+ */
+static int set_list(struct robust_list_head *head)
+{
+	int ret;
+
+	ret = set_robust_list(head, sizeof(struct robust_list_head));
+	if (ret)
+		return ret;
+
+	head->futex_offset = (size_t) offsetof(struct lock_struct, futex) -
+			     (size_t) offsetof(struct lock_struct, list);
+	head->list.next = &head->list;
+	head->list_op_pending = NULL;
+
+	return 0;
+}
+
+/*
+ * A basic (and incomplete) mutex lock function with robustness
+ */
+static int mutex_lock(struct lock_struct *lock, struct robust_list_head *head, bool error_inject)
+{
+	_Atomic(unsigned int) *futex = &lock->futex;
+	unsigned int zero = 0;
+	int ret = -1;
+	pid_t tid = gettid();
+
+	/*
+	 * Set list_op_pending before starting the lock, so the kernel can catch
+	 * the case where the thread died during the lock operation
+	 */
+	head->list_op_pending = &lock->list;
+
+	if (atomic_compare_exchange_strong(futex, &zero, tid)) {
+		/*
+		 * We took the lock, insert it in the robust list
+		 */
+		struct robust_list *list = &head->list;
+
+		/* Error injection to test list_op_pending */
+		if (error_inject)
+			return 0;
+
+		while (list->next != &head->list)
+			list = list->next;
+
+		list->next = &lock->list;
+		lock->list.next = &head->list;
+
+		ret = 0;
+	} else {
+		/*
+		 * We didn't take the lock, wait until the owner wakes (or dies)
+		 */
+		struct timespec to;
+
+		to.tv_sec = FUTEX_TIMEOUT;
+		to.tv_nsec = 0;
+
+		tid = atomic_load(futex);
+		/* Kernel ignores futexes without the waiters flag */
+		tid |= FUTEX_WAITERS;
+		atomic_store(futex, tid);
+
+		ret = futex_wait((futex_t *) futex, tid, &to, 0);
+
+		/*
+		 * A real mutex_lock() implementation would loop here to finally
+		 * take the lock. We don't care about that, so we stop here.
+		 */
+	}
+
+	head->list_op_pending = NULL;
+
+	return ret;
+}
+
+/*
+ * This child thread will succeed taking the lock, and then will exit holding it
+ */
+static int child_fn_lock(void *arg)
+{
+	struct lock_struct *lock = (struct lock_struct *) arg;
+	struct robust_list_head head;
+	int ret;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	ret = mutex_lock(lock, &head, false);
+	if (ret)
+		ksft_test_result_fail("mutex_lock error\n");
+
+	pthread_barrier_wait(&barrier);
+
+	/*
+	 * There's a race here: the parent thread needs to be inside
+	 * futex_wait() before the child thread dies, otherwise it will miss the
+	 * wakeup from handle_futex_death() that this child will emit. We wait a
+	 * little bit just to make sure that this happens.
+	 */
+	sleep(1);
+
+	return 0;
+}
+
+/*
+ * Spawns a child thread that will set a robust list, take the lock, register it
+ * in the robust list and die. The parent thread will wait on this futex, and
+ * should be waken up when the child exits.
+ */
+static void test_robustness(void)
+{
+	struct lock_struct lock = { .futex = 0 };
+	struct robust_list_head head;
+	_Atomic(unsigned int) *futex = &lock.futex;
+	int ret;
+
+	ret = set_list(&head);
+	ASSERT_EQ(ret, 0);
+
+	/*
+	 * Lets use a barrier to ensure that the child thread takes the lock
+	 * before the parent
+	 */
+	ret = pthread_barrier_init(&barrier, NULL, 2);
+	ASSERT_EQ(ret, 0);
+
+	ret = create_child(&child_fn_lock, &lock);
+	ASSERT_NE(ret, -1);
+
+	pthread_barrier_wait(&barrier);
+	ret = mutex_lock(&lock, &head, false);
+
+	/*
+	 * futex_wait() should return 0 and the futex word should be marked with
+	 * FUTEX_OWNER_DIED
+	 */
+	ASSERT_EQ(ret, 0);
+	if (ret != 0)
+		printf("futex wait returned %d", errno);
+
+	ASSERT_TRUE(*futex | FUTEX_OWNER_DIED);
+
+	wait(NULL);
+	pthread_barrier_destroy(&barrier);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+/*
+ * The only valid value for len is sizeof(*head)
+ */
+static void test_set_robust_list_invalid_size(void)
+{
+	struct robust_list_head head;
+	size_t head_size = sizeof(struct robust_list_head);
+	int ret;
+
+	ret = set_robust_list(&head, head_size);
+	ASSERT_EQ(ret, 0);
+
+	ret = set_robust_list(&head, head_size * 2);
+	ASSERT_EQ(ret, -1);
+	ASSERT_EQ(errno, EINVAL);
+
+	ret = set_robust_list(&head, head_size - 1);
+	ASSERT_EQ(ret, -1);
+	ASSERT_EQ(errno, EINVAL);
+
+	ret = set_robust_list(&head, 0);
+	ASSERT_EQ(ret, -1);
+	ASSERT_EQ(errno, EINVAL);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+/*
+ * Test get_robust_list with pid = 0, getting the list of the running thread
+ */
+static void test_get_robust_list_self(void)
+{
+	struct robust_list_head head, head2, *get_head;
+	size_t head_size = sizeof(struct robust_list_head), len_ptr;
+	int ret;
+
+	ret = set_robust_list(&head, head_size);
+	ASSERT_EQ(ret, 0);
+
+	ret = get_robust_list(0, &get_head, &len_ptr);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(get_head, &head);
+	ASSERT_EQ(head_size, len_ptr);
+
+	ret = set_robust_list(&head2, head_size);
+	ASSERT_EQ(ret, 0);
+
+	ret = get_robust_list(0, &get_head, &len_ptr);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(get_head, &head2);
+	ASSERT_EQ(head_size, len_ptr);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+static int child_list(void *arg)
+{
+	struct robust_list_head *head = (struct robust_list_head *) arg;
+	int ret;
+
+	ret = set_robust_list(head, sizeof(struct robust_list_head));
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	pthread_barrier_wait(&barrier);
+	pthread_barrier_wait(&barrier2);
+
+	return 0;
+}
+
+/*
+ * Test get_robust_list from another thread. We use two barriers here to ensure
+ * that:
+ *   1) the child thread set the list before we try to get it from the
+ * parent
+ *   2) the child thread still alive when we try to get the list from it
+ */
+static void test_get_robust_list_child(void)
+{
+	pid_t tid;
+	int ret;
+	struct robust_list_head head, *get_head;
+	size_t len_ptr;
+
+	ret = pthread_barrier_init(&barrier, NULL, 2);
+	ret = pthread_barrier_init(&barrier2, NULL, 2);
+	ASSERT_EQ(ret, 0);
+
+	tid = create_child(&child_list, &head);
+	ASSERT_NE(tid, -1);
+
+	pthread_barrier_wait(&barrier);
+
+	ret = get_robust_list(tid, &get_head, &len_ptr);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(&head, get_head);
+
+	pthread_barrier_wait(&barrier2);
+
+	wait(NULL);
+	pthread_barrier_destroy(&barrier);
+	pthread_barrier_destroy(&barrier2);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+static int child_fn_lock_with_error(void *arg)
+{
+	struct lock_struct *lock = (struct lock_struct *) arg;
+	struct robust_list_head head;
+	int ret;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_robust_list error\n");
+
+	ret = mutex_lock(lock, &head, true);
+	if (ret)
+		ksft_test_result_fail("mutex_lock error\n");
+
+	pthread_barrier_wait(&barrier);
+
+	sleep(1);
+
+	return 0;
+}
+
+/*
+ * Same as robustness test, but inject an error where the mutex_lock() exits
+ * earlier, just after setting list_op_pending and taking the lock, to test the
+ * list_op_pending mechanism
+ */
+static void test_set_list_op_pending(void)
+{
+	struct lock_struct lock = { .futex = 0 };
+	struct robust_list_head head;
+	_Atomic(unsigned int) *futex = &lock.futex;
+	int ret;
+
+	ret = set_list(&head);
+	ASSERT_EQ(ret, 0);
+
+	ret = pthread_barrier_init(&barrier, NULL, 2);
+	ASSERT_EQ(ret, 0);
+
+	ret = create_child(&child_fn_lock_with_error, &lock);
+	ASSERT_NE(ret, -1);
+
+	pthread_barrier_wait(&barrier);
+	ret = mutex_lock(&lock, &head, false);
+
+	ASSERT_EQ(ret, 0);
+	if (ret != 0)
+		printf("futex wait returned %d", errno);
+
+	ASSERT_TRUE(*futex | FUTEX_OWNER_DIED);
+
+	wait(NULL);
+	pthread_barrier_destroy(&barrier);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+#define CHILD_NR 10
+
+static int child_lock_holder(void *arg)
+{
+	struct lock_struct *locks = (struct lock_struct *) arg;
+	struct robust_list_head head;
+	int i;
+
+	set_list(&head);
+
+	for (i = 0; i < CHILD_NR; i++) {
+		locks[i].futex = 0;
+		mutex_lock(&locks[i], &head, false);
+	}
+
+	pthread_barrier_wait(&barrier);
+	pthread_barrier_wait(&barrier2);
+
+	sleep(1);
+	return 0;
+}
+
+static int child_wait_lock(void *arg)
+{
+	struct lock_struct *lock = (struct lock_struct *) arg;
+	struct robust_list_head head;
+	int ret;
+
+	pthread_barrier_wait(&barrier2);
+	ret = mutex_lock(lock, &head, false);
+
+	if (ret)
+		ksft_test_result_fail("mutex_lock error\n");
+
+	if (!(lock->futex | FUTEX_OWNER_DIED))
+		ksft_test_result_fail("futex not marked with FUTEX_OWNER_DIED\n");
+
+	return 0;
+}
+
+/*
+ * Test a robust list of more than one element. All the waiters should wake when
+ * the holder dies
+ */
+static void test_robust_list_multiple_elements(void)
+{
+	struct lock_struct locks[CHILD_NR];
+	int i, ret;
+
+	ret = pthread_barrier_init(&barrier, NULL, 2);
+	ASSERT_EQ(ret, 0);
+	ret = pthread_barrier_init(&barrier2, NULL, CHILD_NR + 1);
+	ASSERT_EQ(ret, 0);
+
+	create_child(&child_lock_holder, &locks);
+
+	/* Wait until the locker thread takes the look */
+	pthread_barrier_wait(&barrier);
+
+	for (i = 0; i < CHILD_NR; i++)
+		create_child(&child_wait_lock, &locks[i]);
+
+	/* Wait for all children to return */
+	while (wait(NULL) > 0);
+
+	pthread_barrier_destroy(&barrier);
+	pthread_barrier_destroy(&barrier2);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+static int child_circular_list(void *arg)
+{
+	static struct robust_list_head head;
+	struct lock_struct a, b, c;
+	int ret;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_list error\n");
+
+	head.list.next = &a.list;
+
+	/*
+	 * The last element should point to head list, but we short circuit it
+	 */
+	a.list.next = &b.list;
+	b.list.next = &c.list;
+	c.list.next = &a.list;
+
+	return 0;
+}
+
+/*
+ * Create a circular robust list. The kernel should be able to destroy the list
+ * while processing it so it won't be trapped in an infinite loop while handling
+ * a process exit
+ */
+static void test_circular_list(void)
+{
+	create_child(child_circular_list, NULL);
+
+	wait(NULL);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+void usage(char *prog)
+{
+	printf("Usage: %s\n", prog);
+	printf("  -c	Use color\n");
+	printf("  -h	Display this help message\n");
+	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
+	       VQUIET, VCRITICAL, VINFO);
+}
+
+int main(int argc, char *argv[])
+{
+	int c;
+
+	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+		switch (c) {
+		case 'c':
+			log_color(1);
+			break;
+		case 'h':
+			usage(basename(argv[0]));
+			exit(0);
+		case 'v':
+			log_verbosity(atoi(optarg));
+			break;
+		default:
+			usage(basename(argv[0]));
+			exit(1);
+		}
+	}
+
+	ksft_print_header();
+	ksft_set_plan(7);
+
+	test_robustness();
+
+	test_set_robust_list_invalid_size();
+	test_get_robust_list_self();
+	test_get_robust_list_child();
+	test_set_list_op_pending();
+	test_robust_list_multiple_elements();
+	test_circular_list();
+
+	ksft_print_cnts();
+	return 0;
+}

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
  2025-06-26 17:11 ` [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros André Almeida
  2025-06-26 17:11 ` [PATCH v5 2/7] selftests/futex: Create test for robust list André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-26 22:56   ` Thomas Gleixner
  2025-06-28 14:27   ` kernel test robot
  2025-06-26 17:11 ` [PATCH v5 4/7] futex: Create set_robust_list2 André Almeida
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

There are two functions for handling robust lists during the task
exit: exit_robust_list() and compat_exit_robust_list(). The first one
handles either 64bit or 32bit lists, depending if it's a 64bit or 32bit
kernel. The compat_exit_robust_list() only exists in 64bit kernels that
supports 32bit syscalls, and handles 32bit lists.

For the new syscall set_robust_list2(), 64bit kernels need to be able to
handle 32bit lists despite having or not support for 32bit syscalls, so
make compat_exit_robust_list() exist regardless of compat_ config.

Also, use explicitly sizing, otherwise in a 32bit kernel both
exit_robust_list() and compat_exit_robust_list() would be the exactly
same function, with none of them dealing with 64bit robust lists.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 include/linux/compat.h  | 12 +-----------
 include/linux/futex.h   | 10 ++++++++++
 include/linux/sched.h   |  2 +-
 kernel/futex/core.c     | 48 ++++++++++++++++++++++++++++++++----------------
 kernel/futex/syscalls.c |  4 ++--
 5 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 56cebaff0c910fda853a0e2b3d6d0517e55f8b38..968a9135ff486cf9c8be2a18b80cd4c46e890236 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -385,16 +385,6 @@ struct compat_ifconf {
 	compat_caddr_t  ifcbuf;
 };
 
-struct compat_robust_list {
-	compat_uptr_t			next;
-};
-
-struct compat_robust_list_head {
-	struct compat_robust_list	list;
-	compat_long_t			futex_offset;
-	compat_uptr_t			list_op_pending;
-};
-
 #ifdef CONFIG_COMPAT_OLD_SIGACTION
 struct compat_old_sigaction {
 	compat_uptr_t			sa_handler;
@@ -672,7 +662,7 @@ asmlinkage long compat_sys_waitid(int, compat_pid_t,
 		struct compat_siginfo __user *, int,
 		struct compat_rusage __user *);
 asmlinkage long
-compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
+compat_sys_set_robust_list(struct robust_list_head32 __user *head,
 			   compat_size_t len);
 asmlinkage long
 compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
diff --git a/include/linux/futex.h b/include/linux/futex.h
index b37193653e6b5d7d562ac08f93d4ee41eb8e72a2..c5b6976909c7d51360e1831e017f4d1544c258fa 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -55,6 +55,16 @@ union futex_key {
 
 #define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = 0ULL } }
 
+struct robust_list32 {
+	u32 next;
+};
+
+struct robust_list_head32 {
+	struct robust_list32	list;
+	s32			futex_offset;
+	u32			list_op_pending;
+};
+
 #ifdef CONFIG_FUTEX
 enum {
 	FUTEX_STATE_OK,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4f78a64beb52c425e46c66427b2f608ba30b3d99..b0f64029d53e8b2ab56f24f77054aa7645a185c5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1333,7 +1333,7 @@ struct task_struct {
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user	*robust_list;
 #ifdef CONFIG_COMPAT
-	struct compat_robust_list_head __user *compat_robust_list;
+	struct robust_list_head32 __user *compat_robust_list;
 #endif
 	struct list_head		pi_state_list;
 	struct futex_pi_state		*pi_state_cache;
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 90d53fb0ee9e1563c355ef0499df441367e1a46c..f0ac4b897ddbb667fa85daf5853cb47c749d512d 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1144,15 +1144,16 @@ static inline int fetch_robust_entry(struct robust_list __user **entry,
 	return 0;
 }
 
+#ifdef CONFIG_64BIT
 /*
  * Walk curr->robust_list (very carefully, it's a userspace list!)
  * and mark any locks found there dead, and notify any waiters.
  *
  * We silently return on any sign of list-walking problem.
  */
-static void exit_robust_list(struct task_struct *curr)
+static void exit_robust_list64(struct task_struct *curr,
+			       struct robust_list_head __user *head)
 {
-	struct robust_list_head __user *head = curr->robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
@@ -1211,8 +1212,15 @@ static void exit_robust_list(struct task_struct *curr)
 				   curr, pip, HANDLE_DEATH_PENDING);
 	}
 }
+#else
+static void exit_robust_list64(struct task_struct *curr,
+			       struct robust_list_head __user *head)
+{
+	pr_warn("32bit kernel should not allow ROBUST_LIST_64BIT");
+}
+#endif
 
-#ifdef CONFIG_COMPAT
+#if defined(CONFIG_COMPAT) || !defined(CONFIG_64BIT)
 static void __user *futex_uaddr(struct robust_list __user *entry,
 				compat_long_t futex_offset)
 {
@@ -1226,13 +1234,13 @@ static void __user *futex_uaddr(struct robust_list __user *entry,
  * Fetch a robust-list pointer. Bit 0 signals PI futexes:
  */
 static inline int
-compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
-		   compat_uptr_t __user *head, unsigned int *pi)
+fetch_robust_entry32(u32 *uentry, struct robust_list __user **entry,
+		     u32 __user *head, unsigned int *pi)
 {
 	if (get_user(*uentry, head))
 		return -EFAULT;
 
-	*entry = compat_ptr((*uentry) & ~1);
+	*entry = (void __user *)(unsigned long)((*uentry) & ~1);
 	*pi = (unsigned int)(*uentry) & 1;
 
 	return 0;
@@ -1244,21 +1252,21 @@ compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **ent
  *
  * We silently return on any sign of list-walking problem.
  */
-static void compat_exit_robust_list(struct task_struct *curr)
+static void exit_robust_list32(struct task_struct *curr,
+			       struct robust_list_head32 __user *head)
 {
-	struct compat_robust_list_head __user *head = curr->compat_robust_list;
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
 	unsigned int next_pi;
-	compat_uptr_t uentry, next_uentry, upending;
-	compat_long_t futex_offset;
+	u32 uentry, next_uentry, upending;
+	s32 futex_offset;
 	int rc;
 
 	/*
 	 * Fetch the list head (which was registered earlier, via
 	 * sys_set_robust_list()):
 	 */
-	if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi))
+	if (fetch_robust_entry32((u32 *)&uentry, &entry, (u32 *)&head->list.next, &pi))
 		return;
 	/*
 	 * Fetch the relative futex offset:
@@ -1269,7 +1277,7 @@ static void compat_exit_robust_list(struct task_struct *curr)
 	 * Fetch any possibly pending lock-add first, and handle it
 	 * if it exists:
 	 */
-	if (compat_fetch_robust_entry(&upending, &pending,
+	if (fetch_robust_entry32(&upending, &pending,
 			       &head->list_op_pending, &pip))
 		return;
 
@@ -1279,8 +1287,8 @@ static void compat_exit_robust_list(struct task_struct *curr)
 		 * Fetch the next entry in the list before calling
 		 * handle_futex_death:
 		 */
-		rc = compat_fetch_robust_entry(&next_uentry, &next_entry,
-			(compat_uptr_t __user *)&entry->next, &next_pi);
+		rc = fetch_robust_entry32(&next_uentry, &next_entry,
+			(u32 __user *)&entry->next, &next_pi);
 		/*
 		 * A pending lock might already be on the list, so
 		 * dont process it twice:
@@ -1406,14 +1414,22 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
 
 static void futex_cleanup(struct task_struct *tsk)
 {
+#ifdef CONFIG_64BIT
 	if (unlikely(tsk->robust_list)) {
-		exit_robust_list(tsk);
+		exit_robust_list64(tsk, tsk->robust_list);
 		tsk->robust_list = NULL;
 	}
+#else
+	if (unlikely(tsk->robust_list)) {
+		exit_robust_list32(tsk,
+				  (struct robust_list_head32 __user *) tsk->robust_list);
+		tsk->robust_list = NULL;
+	}
+#endif
 
 #ifdef CONFIG_COMPAT
 	if (unlikely(tsk->compat_robust_list)) {
-		compat_exit_robust_list(tsk);
+		exit_robust_list32(tsk, tsk->compat_robust_list);
 		tsk->compat_robust_list = NULL;
 	}
 #endif
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 4b6da9116aa6c33db9796e3055ce0c90b02d7b91..dba193dfd216cc929c8f4d979aa2bcd99237e2d8 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -440,7 +440,7 @@ SYSCALL_DEFINE4(futex_requeue,
 
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
-		struct compat_robust_list_head __user *, head,
+		struct robust_list_head32 __user *, head,
 		compat_size_t, len)
 {
 	if (unlikely(len != sizeof(*head)))
@@ -455,7 +455,7 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid,
 			compat_uptr_t __user *, head_ptr,
 			compat_size_t __user *, len_ptr)
 {
-	struct compat_robust_list_head __user *head;
+	struct robust_list_head32 __user *head;
 	unsigned long ret;
 	struct task_struct *p;
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 4/7] futex: Create set_robust_list2
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
                   ` (2 preceding siblings ...)
  2025-06-26 17:11 ` [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-27 12:06   ` Thomas Gleixner
  2025-06-26 17:11 ` [PATCH v5 5/7] futex: Remove the limit of elements for sys_set_robust_list2 lists André Almeida
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

Create a new robust_list() syscall. The current syscall can't be
expanded to cover the following use case, so a new one is needed. This
new syscall allows users to set multiple robust lists per process and to
have either 32bit or 64bit pointers in the list.

* Interface

This is the proposed interface:

	long set_robust_list2(void *head, int index, unsigned int flags)

`head` is the head of the userspace struct robust_list_head, just as old
set_robust_list(). It needs to be a void pointer since it can point to a
normal robust_list_head or a compat_robust_list_head.

`flags` can be used for defining the list type:

	enum robust_list_type {
	 	ROBUST_LIST_32BIT,
		ROBUST_LIST_64BIT,
	 };

`index` is the index in the internal robust_list's linked list (the
naming starts to get confusing, I reckon). If `index == -1`, that means
that user wants to set a new robust_list, and the kernel will append it
in the end of the list, assign a new index and return this index to the
user. If `index >= 0`, that means that user wants to re-set `*head` of
an already existing list (similarly to what happens when you call
set_robust_list() twice with different `*head`).

If `index` is out of range, or it points to a non-existing robust_list,
or if the internal list is full, an error is returned.

Unaligned `head` addresses are refused by the kernel with -EINVAL.

User cannot remove lists.

* Implementation

The old syscall's set/get_robust_list() are converted to use the linked
list as well. When using only the old syscalls user shouldn't any
difference as the internal code will handle the linked list insertion as
usual. When mixing old and new interfaces users should be aware that one
of the elements of the list was created by another syscall and they
should have special care handling this element index.

On exit, the linked list is parsed and all robust lists regardless of
which interface it was used to create them are handled.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 include/linux/futex.h             |  20 +++++++-
 include/linux/sched.h             |   5 +-
 include/uapi/asm-generic/unistd.h |   2 +
 include/uapi/linux/futex.h        |  10 ++++
 kernel/futex/core.c               | 103 +++++++++++++++++++++++++++++++-------
 kernel/futex/futex.h              |   5 ++
 kernel/futex/syscalls.c           |  81 ++++++++++++++++++++++++++++--
 7 files changed, 199 insertions(+), 27 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index c5b6976909c7d51360e1831e017f4d1544c258fa..a8db1ed27373083330ae37d429ddf343f8a0d130 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -66,6 +66,21 @@ struct robust_list_head32 {
 };
 
 #ifdef CONFIG_FUTEX
+/*
+ * This is an entry of a linked list of robust lists.
+ *
+ * @head: can point to a 64bit list or a 32bit list
+ * @list_type: determine the size of the futex pointers in the list
+ * @index: the index of this entry in the list
+ * @list: linked list element
+ */
+struct robust_list2_entry {
+	void __user *head;
+	enum robust_list2_type list_type;
+	unsigned int index;
+	struct list_head list;
+};
+
 enum {
 	FUTEX_STATE_OK,
 	FUTEX_STATE_EXITING,
@@ -74,10 +89,11 @@ enum {
 
 static inline void futex_init_task(struct task_struct *tsk)
 {
-	tsk->robust_list = NULL;
+	tsk->robust_list_index = -1;
 #ifdef CONFIG_COMPAT
-	tsk->compat_robust_list = NULL;
+	tsk->compat_robust_list_index = -1;
 #endif
+	INIT_LIST_HEAD(&tsk->robust_list2);
 	INIT_LIST_HEAD(&tsk->pi_state_list);
 	tsk->pi_state_cache = NULL;
 	tsk->futex_state = FUTEX_STATE_OK;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b0f64029d53e8b2ab56f24f77054aa7645a185c5..e97d1ade05211e7e02b8e36326f41c6a8a7f99b8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1331,10 +1331,11 @@ struct task_struct {
 	u32				rmid;
 #endif
 #ifdef CONFIG_FUTEX
-	struct robust_list_head __user	*robust_list;
+	int				robust_list_index;
 #ifdef CONFIG_COMPAT
-	struct robust_list_head32 __user *compat_robust_list;
+	int				compat_robust_list_index;
 #endif
+	struct list_head		robust_list2;
 	struct list_head		pi_state_list;
 	struct futex_pi_state		*pi_state_cache;
 	struct mutex			futex_exit_mutex;
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 2892a45023af6d3eb941623d4fed04841ab07e02..ebe68c2c88eb5390dda184ce9268a8d3a606c9e5 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -852,6 +852,8 @@ __SYSCALL(__NR_removexattrat, sys_removexattrat)
 #define __NR_open_tree_attr 467
 __SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
 
+#define __NR_set_robust_list2 467
+
 #undef __NR_syscalls
 #define __NR_syscalls 468
 
diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 7e2744ec89336a260e89883e95222eda199eeb7f..a8d8d4b1306ba56041046647bfb12146cf1ae4cb 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -153,6 +153,16 @@ struct robust_list_head {
 	struct robust_list __user *list_op_pending;
 };
 
+#define ROBUST_LISTS_PER_TASK 10
+
+enum robust_list2_type {
+	ROBUST_LIST_32BIT,
+	ROBUST_LIST_64BIT,
+};
+
+#define ROBUST_LIST_TYPE_MASK (ROBUST_LIST_32BIT | ROBUST_LIST_64BIT)
+
+
 /*
  * Are there any waiters for this robust futex:
  */
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index f0ac4b897ddbb667fa85daf5853cb47c749d512d..1049f8ef3ce3c611b3be0ca12df34a98f710121d 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1220,7 +1220,6 @@ static void exit_robust_list64(struct task_struct *curr,
 }
 #endif
 
-#if defined(CONFIG_COMPAT) || !defined(CONFIG_64BIT)
 static void __user *futex_uaddr(struct robust_list __user *entry,
 				compat_long_t futex_offset)
 {
@@ -1319,7 +1318,70 @@ static void exit_robust_list32(struct task_struct *curr,
 		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
 	}
 }
-#endif
+
+long do_set_robust_list2(struct robust_list_head __user *head,
+			 int index, unsigned int type)
+{
+	struct list_head *list2 = &current->robust_list2;
+	struct robust_list2_entry *prev, *new = NULL;
+
+	if (index == -1) {
+		if (list_empty(list2)) {
+			index = 0;
+		} else {
+			prev = list_last_entry(list2, struct robust_list2_entry, list);
+			index = prev->index + 1;
+		}
+
+		if (index >= ROBUST_LISTS_PER_TASK)
+			return -EINVAL;
+
+		new = kmalloc(sizeof(struct robust_list2_entry), GFP_KERNEL);
+		if (!new)
+			return -ENOMEM;
+
+		list_add_tail(&new->list, list2);
+		new->index = index;
+
+	} else if (index >= 0) {
+		struct robust_list2_entry *curr;
+
+		if (list_empty(list2))
+			return -ENOENT;
+
+		list_for_each_entry(curr, list2, list) {
+			if (index == curr->index) {
+				new = curr;
+				break;
+			}
+		}
+
+		if (!new)
+			return -ENOENT;
+	}
+
+	BUG_ON(!new);
+	new->head = head;
+	new->list_type = type;
+
+	return index;
+}
+
+struct robust_list_head __user *get_robust_list2(int index, struct task_struct *task)
+{
+	struct list_head *list2 = &task->robust_list2;
+	struct robust_list2_entry *curr;
+
+	if (list_empty(list2) || index == -1)
+		return NULL;
+
+	list_for_each_entry(curr, list2, list) {
+		if (index == curr->index)
+			return curr->head;
+	}
+
+	return NULL;
+}
 
 #ifdef CONFIG_FUTEX_PI
 
@@ -1414,25 +1476,28 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
 
 static void futex_cleanup(struct task_struct *tsk)
 {
-#ifdef CONFIG_64BIT
-	if (unlikely(tsk->robust_list)) {
-		exit_robust_list64(tsk, tsk->robust_list);
-		tsk->robust_list = NULL;
-	}
-#else
-	if (unlikely(tsk->robust_list)) {
-		exit_robust_list32(tsk,
-				  (struct robust_list_head32 __user *) tsk->robust_list);
-		tsk->robust_list = NULL;
-	}
-#endif
+	struct robust_list2_entry *curr, *n;
+	struct list_head *list2 = &tsk->robust_list2;
 
-#ifdef CONFIG_COMPAT
-	if (unlikely(tsk->compat_robust_list)) {
-		exit_robust_list32(tsk, tsk->compat_robust_list);
-		tsk->compat_robust_list = NULL;
+	/*
+	 * Walk through the linked list, parsing robust lists and freeing the
+	 * allocated lists
+	 */
+	if (unlikely(!list_empty(list2))) {
+		list_for_each_entry_safe(curr, n, list2, list) {
+			if (curr->head != NULL) {
+				if (curr->list_type == ROBUST_LIST_64BIT)
+					exit_robust_list64(tsk, curr->head);
+				else if (curr->list_type == ROBUST_LIST_32BIT)
+					exit_robust_list32(tsk, curr->head);
+				curr->head = NULL;
+			}
+			list_del_init(&curr->list);
+			kfree(curr);
+		}
 	}
-#endif
+
+	tsk->robust_list_index = -1;
 
 	if (unlikely(!list_empty(&tsk->pi_state_list)))
 		exit_pi_state_list(tsk);
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index fcd1617212eed0e3c2367d2b463a0e019eda6d13..67201e51fa1798a21ff68f60b1e35977b9bd267b 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -467,6 +467,11 @@ extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
 extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
 		      ktime_t *abs_time, u32 bitset);
 
+extern long do_set_robust_list2(struct robust_list_head __user *head,
+			 int index, unsigned int type);
+
+extern struct robust_list_head __user *get_robust_list2(int index, struct task_struct *task);
+
 /**
  * struct futex_vector - Auxiliary struct for futex_waitv()
  * @w: Userspace provided data
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index dba193dfd216cc929c8f4d979aa2bcd99237e2d8..56ee1123cbd8ea26c8d22aa74e5faed2974ec577 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -20,6 +20,18 @@
  * the list. There can only be one such pending lock.
  */
 
+#ifdef CONFIG_64BIT
+static inline int robust_list_native_type(void)
+{
+	return ROBUST_LIST_64BIT;
+}
+#else
+static inline int robust_list_native_type(void)
+{
+	return ROBUST_LIST_32BIT;
+}
+#endif
+
 /**
  * sys_set_robust_list() - Set the robust-futex list head of a task
  * @head:	pointer to the list-head
@@ -28,17 +40,63 @@
 SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
 		size_t, len)
 {
+	unsigned int type = robust_list_native_type();
+	int ret;
+
 	/*
 	 * The kernel knows only one size for now:
 	 */
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->robust_list = head;
+	ret = do_set_robust_list2(head, current->robust_list_index, type);
+	if (ret < 0)
+		return ret;
+
+	current->robust_list_index = ret;
 
 	return 0;
 }
 
+#define ROBUST_LIST_FLAGS ROBUST_LIST_TYPE_MASK
+
+/*
+ * sys_set_robust_list2()
+ *
+ * When index == -1, create a new list for user. When index >= 0, try to find
+ * the corresponding list and re-set the head there.
+ *
+ * Return values:
+ *  >= 0: success, index of the robust list
+ *  -EINVAL: invalid flags, invalid index
+ *  -ENOENT: requested index no where to be found
+ *  -ENOMEM: error allocating new list
+ *  -ESRCH: too many allocated lists
+ */
+SYSCALL_DEFINE3(set_robust_list2, struct robust_list_head __user *, head,
+		int, index, unsigned int, flags)
+{
+	unsigned int type;
+
+	type = flags & ROBUST_LIST_TYPE_MASK;
+
+	if (index < -1 || index >= ROBUST_LISTS_PER_TASK)
+		return -EINVAL;
+
+	if ((flags & ~ROBUST_LIST_FLAGS) != 0)
+		return -EINVAL;
+
+	if (((uintptr_t) head % sizeof(u32)) != 0)
+		return -EINVAL;
+
+#ifndef CONFIG_64BIT
+	if (type == ROBUST_LIST_64BIT)
+		return -EINVAL;
+#endif
+
+	return do_set_robust_list2(head, index, type);
+}
+
 /**
  * sys_get_robust_list() - Get the robust-futex list head of a task
  * @pid:	pid of the process [zero for current task]
@@ -52,6 +110,7 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
 	struct robust_list_head __user *head;
 	unsigned long ret;
 	struct task_struct *p;
+	int index;
 
 	rcu_read_lock();
 
@@ -68,9 +127,11 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
 	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
 		goto err_unlock;
 
-	head = p->robust_list;
+	index = p->robust_list_index;
 	rcu_read_unlock();
 
+	head = get_robust_list2(index, p);
+
 	if (put_user(sizeof(*head), len_ptr))
 		return -EFAULT;
 	return put_user(head, head_ptr);
@@ -443,10 +504,19 @@ COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct robust_list_head32 __user *, head,
 		compat_size_t, len)
 {
+	unsigned int type = ROBUST_LIST_32BIT;
+	int ret;
+
 	if (unlikely(len != sizeof(*head)))
 		return -EINVAL;
 
-	current->compat_robust_list = head;
+	ret = do_set_robust_list2((struct robust_list_head __user *) head,
+				  current->robust_list_index, type);
+	if (ret < 0)
+		return ret;
+
+	current->robust_list_index = ret;
+
 
 	return 0;
 }
@@ -458,6 +528,7 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid,
 	struct robust_list_head32 __user *head;
 	unsigned long ret;
 	struct task_struct *p;
+	int index;
 
 	rcu_read_lock();
 
@@ -474,9 +545,11 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid,
 	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
 		goto err_unlock;
 
-	head = p->compat_robust_list;
+	index = p->compat_robust_list_index;
 	rcu_read_unlock();
 
+	head = (struct robust_list_head32 __user *) get_robust_list2(index, p);
+
 	if (put_user(sizeof(*head), len_ptr))
 		return -EFAULT;
 	return put_user(ptr_to_compat(head), head_ptr);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 5/7] futex: Remove the limit of elements for sys_set_robust_list2 lists
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
                   ` (3 preceding siblings ...)
  2025-06-26 17:11 ` [PATCH v5 4/7] futex: Create set_robust_list2 André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-27 12:22   ` Thomas Gleixner
  2025-06-26 17:11 ` [PATCH v5 6/7] futex: Wire up set_robust_list2 syscall André Almeida
  2025-06-26 17:11 ` [PATCH v5 7/7] selftests: futex: Expand robust list test for the new interface André Almeida
  6 siblings, 1 reply; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

Remove the limit of ROBUST_LIST_LIMIT elements that a robust list can
have, for the ones created with the new interface. This is done by
overwritten the list as it's proceeded in a way that we avoid circular
lists.

For the old interface, we keep the limited behavior to avoid changing
the API.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 kernel/futex/core.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 1049f8ef3ce3c611b3be0ca12df34a98f710121d..942b66facdea16cd7be2235d95c2bbbae8d7cc63 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1152,7 +1152,8 @@ static inline int fetch_robust_entry(struct robust_list __user **entry,
  * We silently return on any sign of list-walking problem.
  */
 static void exit_robust_list64(struct task_struct *curr,
-			       struct robust_list_head __user *head)
+			       struct robust_list_head __user *head,
+			       bool destroyable)
 {
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
@@ -1196,13 +1197,17 @@ static void exit_robust_list64(struct task_struct *curr,
 		}
 		if (rc)
 			return;
-		entry = next_entry;
-		pi = next_pi;
+
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
-		if (!--limit)
+		if (!destroyable && !--limit)
 			break;
+		else
+			put_user(&head->list, &entry->next);
+
+		entry = next_entry;
+		pi = next_pi;
 
 		cond_resched();
 	}
@@ -1214,7 +1219,8 @@ static void exit_robust_list64(struct task_struct *curr,
 }
 #else
 static void exit_robust_list64(struct task_struct *curr,
-			       struct robust_list_head __user *head)
+			       struct robust_list_head __user *head,
+			       bool destroyable)
 {
 	pr_warn("32bit kernel should not allow ROBUST_LIST_64BIT");
 }
@@ -1252,7 +1258,8 @@ fetch_robust_entry32(u32 *uentry, struct robust_list __user **entry,
  * We silently return on any sign of list-walking problem.
  */
 static void exit_robust_list32(struct task_struct *curr,
-			       struct robust_list_head32 __user *head)
+			       struct robust_list_head32 __user *head,
+			       bool destroyable)
 {
 	struct robust_list __user *entry, *next_entry, *pending;
 	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
@@ -1474,10 +1481,19 @@ static void exit_pi_state_list(struct task_struct *curr)
 static inline void exit_pi_state_list(struct task_struct *curr) { }
 #endif
 
+/*
+ * futex_cleanup - After the task exists, process the robust lists
+ *
+ * Walk through the linked list, parsing robust lists and freeing the
+ * allocated lists. Lists created with the set_robust_list2 don't have a limit
+ * for sizing and can be destroyed while we walk on it to avoid circular list.
+ */
 static void futex_cleanup(struct task_struct *tsk)
 {
 	struct robust_list2_entry *curr, *n;
 	struct list_head *list2 = &tsk->robust_list2;
+	bool destroyable = true;
+	int i = 0;
 
 	/*
 	 * Walk through the linked list, parsing robust lists and freeing the
@@ -1485,15 +1501,20 @@ static void futex_cleanup(struct task_struct *tsk)
 	 */
 	if (unlikely(!list_empty(list2))) {
 		list_for_each_entry_safe(curr, n, list2, list) {
+			destroyable = true;
+			if (tsk->robust_list_index == i)
+				destroyable = false;
+
 			if (curr->head != NULL) {
 				if (curr->list_type == ROBUST_LIST_64BIT)
-					exit_robust_list64(tsk, curr->head);
+					exit_robust_list64(tsk, curr->head, destroyable);
 				else if (curr->list_type == ROBUST_LIST_32BIT)
-					exit_robust_list32(tsk, curr->head);
+					exit_robust_list32(tsk, curr->head, destroyable);
 				curr->head = NULL;
 			}
 			list_del_init(&curr->list);
 			kfree(curr);
+			i++;
 		}
 	}
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 6/7] futex: Wire up set_robust_list2 syscall
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
                   ` (4 preceding siblings ...)
  2025-06-26 17:11 ` [PATCH v5 5/7] futex: Remove the limit of elements for sys_set_robust_list2 lists André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-26 17:11 ` [PATCH v5 7/7] selftests: futex: Expand robust list test for the new interface André Almeida
  6 siblings, 0 replies; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

Wire up the new set_robust_list2 syscall in all available architectures.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 arch/alpha/kernel/syscalls/syscall.tbl      | 1 +
 arch/arm/tools/syscall.tbl                  | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl       | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl     | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    | 1 +
 arch/s390/kernel/syscalls/syscall.tbl       | 1 +
 arch/sh/kernel/syscalls/syscall.tbl         | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
 kernel/sys_ni.c                             | 1 +
 scripts/syscall.tbl                         | 1 +
 17 files changed, 17 insertions(+)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 2dd6340de6b4efddc406f0c235701c15cf02f650..aecc167ac7706d25da73db8099f0813e268b820c 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -507,3 +507,4 @@
 575	common	listxattrat			sys_listxattrat
 576	common	removexattrat			sys_removexattrat
 577	common	open_tree_attr			sys_open_tree_attr
+578	common	set_robust_list2		sys_robust_list2
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 27c1d5ebcd91c8c296dc6676307f66bfdf4ab78d..2e47ae5dc9a426d8e5e9dacf29caa54223cf2f5a 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -482,3 +482,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 9fe47112c586f152662af38a9a7f90957cb96cf8..7bcc8cc628c80a44fea2b53d5c69ab5e5f10a1d2 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -467,3 +467,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common  set_robust_list2		sys_set_robust_list2
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 7b6e97828e552d4da90046ddfcd4a55723e522bb..cd23608afe7e7dadfbf8e21df0486b85bfcb99ce 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index aa70e371bb54ab5d9c8dd8923b6ecf9693ee914d..0a31452ef6ed8fee8f1e2ead5d44acfbbe275fe9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -406,3 +406,4 @@
 465	n32	listxattrat			sys_listxattrat
 466	n32	removexattrat			sys_removexattrat
 467	n32	open_tree_attr			sys_open_tree_attr
+468	n32	set_robust_list2		sys_set_robust_list2
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 1e8c44c7b61492eabf00c777831e457a7a6e579c..4cb5a72256338f6fb407f940f1883d523113d609 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -382,3 +382,4 @@
 465	n64	listxattrat			sys_listxattrat
 466	n64	removexattrat			sys_removexattrat
 467	n64	open_tree_attr			sys_open_tree_attr
+468	n64	set_robust_list2		sys_set_robust_list2
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 114a5a1a62302e32dd74d1679ff423a2d57c3c6b..c46238e9edd00d2861edcfa87c5ce7a62bfdc3d4 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -455,3 +455,4 @@
 465	o32	listxattrat			sys_listxattrat
 466	o32	removexattrat			sys_removexattrat
 467	o32	open_tree_attr			sys_open_tree_attr
+468	o32	set_robust_list2		sys_set_robust_list2
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 94df3cb957e9d547d192e8732c0cf23ef2b5ce5d..71071489a18375013bbfbe26578a634283c1e07b 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -466,3 +466,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 9a084bdb892694bc562f514b55212d167cbac12f..edc4d0bef3f1c7ab826ea8180e7f5ceba4774c07 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -558,3 +558,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index a4569b96ef06c54ce7aa795d039541c90a38284f..ff8c594073ec8c3486cc61544d14a338d3f3a906 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -470,3 +470,4 @@
 465  common	listxattrat		sys_listxattrat			sys_listxattrat
 466  common	removexattrat		sys_removexattrat		sys_removexattrat
 467  common	open_tree_attr		sys_open_tree_attr		sys_open_tree_attr
+468  common	set_robust_list2	sys_set_robust_list2		sys_set_robust_list2
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 52a7652fcff6394b96ace1f3b0ed72250ee5e669..507789194570a9e7b492b210be30bb41021be289 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -471,3 +471,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 83e45eb6c095a36baaf749927628e6052fe900e6..8d1122c2235b8d5082a11392e68787efe55f58be 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -513,3 +513,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ac007ea00979dc28b0ef7c002a0615ce86dd3101..cbc0c469e66ecf7b8a61e82c38b07ecc63f6fe23 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -473,3 +473,4 @@
 465	i386	listxattrat		sys_listxattrat
 466	i386	removexattrat		sys_removexattrat
 467	i386	open_tree_attr		sys_open_tree_attr
+468	i386	set_robust_list2	sys_set_robust_list2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cfb5ca41e30de1a4e073750096f5b51a2ec137d2..b420217c72fc50ad90f291812972019606c5ff69 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -391,6 +391,7 @@
 465	common	listxattrat		sys_listxattrat
 466	common	removexattrat		sys_removexattrat
 467	common	open_tree_attr		sys_open_tree_attr
+468	common	set_robust_list2	sys_set_robust_list2
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index f657a77314f8667fa019a01e10c84ea270024adc..6b852ee8a1621c7dd24f6cd37fd990f5ff8d8527 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -438,3 +438,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c00a86931f8c6cb30d35a9d56cbcc5994add90e1..71fbac6176c8886f4fa8dd437b0aedd5f14e9f74 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -195,6 +195,7 @@ COND_SYSCALL(move_pages);
 COND_SYSCALL(set_mempolicy_home_node);
 COND_SYSCALL(cachestat);
 COND_SYSCALL(mseal);
+COND_SYSCALL(set_robust_list2);
 
 COND_SYSCALL(perf_event_open);
 COND_SYSCALL(accept4);
diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl
index 580b4e246aecd5f07d542943ba68fc4ed5961660..07d7e776d0329659e70a9a55ffff7ac18eb3ff87 100644
--- a/scripts/syscall.tbl
+++ b/scripts/syscall.tbl
@@ -408,3 +408,4 @@
 465	common	listxattrat			sys_listxattrat
 466	common	removexattrat			sys_removexattrat
 467	common	open_tree_attr			sys_open_tree_attr
+468	common	set_robust_list2		sys_set_robust_list2

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 7/7] selftests: futex: Expand robust list test for the new interface
  2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
                   ` (5 preceding siblings ...)
  2025-06-26 17:11 ` [PATCH v5 6/7] futex: Wire up set_robust_list2 syscall André Almeida
@ 2025-06-26 17:11 ` André Almeida
  2025-06-27 12:48   ` Thomas Gleixner
  6 siblings, 1 reply; 18+ messages in thread
From: André Almeida @ 2025-06-26 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

Expand the current robust list test for the new set_robust_list2
syscall. Create an option to make it possible to run the same tests
using the new syscall, and also add two new relevant test: test long
lists (bigger than ROBUST_LIST_LIMIT) and for unaligned addresses.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
 .../selftests/futex/functional/robust_list.c       | 160 ++++++++++++++++++++-
 1 file changed, 156 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
index 42690b2440fd29a9b12c46f67f9645ccc93d1147..004ad79ff6171c411fd47e699e3c38889544218e 100644
--- a/tools/testing/selftests/futex/functional/robust_list.c
+++ b/tools/testing/selftests/futex/functional/robust_list.c
@@ -35,16 +35,45 @@
 #include <stddef.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
+#include <stdint.h>
 
 #define STACK_SIZE (1024 * 1024)
 
 #define FUTEX_TIMEOUT 3
 
+#define SYS_set_robust_list2 468
+
+enum robust_list2_type {
+        ROBUST_LIST_32BIT,
+        ROBUST_LIST_64BIT,
+};
+
 static pthread_barrier_t barrier, barrier2;
 
+bool robust2 = false;
+
 int set_robust_list(struct robust_list_head *head, size_t len)
 {
-	return syscall(SYS_set_robust_list, head, len);
+	int ret, flags;
+
+	if (!robust2) {
+		return syscall(SYS_set_robust_list, head, len);
+	}
+
+	if (sizeof(head) == 8)
+		flags = ROBUST_LIST_64BIT;
+	else
+		flags = ROBUST_LIST_32BIT;
+
+	/*
+	 * We act as we have just one list here. We try to use the first slot,
+	 * but if it hasn't been alocated yet we allocate it.
+	 */
+	ret = syscall(SYS_set_robust_list2, head, 0, flags);
+	if (ret == -1 && errno == ENOENT)
+		ret = syscall(SYS_set_robust_list2, head, -1, flags);
+
+	return ret;
 }
 
 int get_robust_list(int pid, struct robust_list_head **head, size_t *len_ptr)
@@ -246,6 +275,11 @@ static void test_set_robust_list_invalid_size(void)
 	size_t head_size = sizeof(struct robust_list_head);
 	int ret;
 
+	if (robust2) {
+		ksft_test_result_skip("This test is only for old robust interface\n");
+		return;
+	}
+
 	ret = set_robust_list(&head, head_size);
 	ASSERT_EQ(ret, 0);
 
@@ -321,6 +355,11 @@ static void test_get_robust_list_child(void)
 	struct robust_list_head head, *get_head;
 	size_t len_ptr;
 
+	if (robust2) {
+		ksft_test_result_skip("Not implemented in the new robust interface\n");
+		return;
+	}
+
 	ret = pthread_barrier_init(&barrier, NULL, 2);
 	ret = pthread_barrier_init(&barrier2, NULL, 2);
 	ASSERT_EQ(ret, 0);
@@ -332,7 +371,7 @@ static void test_get_robust_list_child(void)
 
 	ret = get_robust_list(tid, &get_head, &len_ptr);
 	ASSERT_EQ(ret, 0);
-	ASSERT_EQ(&head, get_head);
+	ASSERT_EQ(get_head, &head);
 
 	pthread_barrier_wait(&barrier2);
 
@@ -507,11 +546,119 @@ static void test_circular_list(void)
 	ksft_test_result_pass("%s\n", __func__);
 }
 
+#define ROBUST_LIST_LIMIT	2048
+#define CHILD_LIST_LIMIT (ROBUST_LIST_LIMIT + 10)
+
+static int child_robust_list_limit(void *arg)
+{
+	struct lock_struct *locks;
+	struct robust_list *list;
+	struct robust_list_head head;
+	int ret, i;
+
+	locks = (struct lock_struct *) arg;
+
+	ret = set_list(&head);
+	if (ret)
+		ksft_test_result_fail("set_list error\n");
+
+	/*
+	 * Create a very long list of locks
+	 */
+	head.list.next = &locks[0].list;
+
+	list = head.list.next;
+	for (i = 0; i < CHILD_LIST_LIMIT - 1; i++) {
+		list->next = &locks[i+1].list;
+		list = list->next;
+	}
+	list->next = &head.list;
+
+	/*
+	 * Grab the lock in the last one, and die without releasing it
+	 */
+	mutex_lock(&locks[CHILD_LIST_LIMIT], &head, false);
+	pthread_barrier_wait(&barrier);
+
+	sleep(1);
+
+	return 0;
+}
+
+/*
+ * The old robust list used to have a limit of 2048 items from the kernel side.
+ * After this limit the kernel stops walking the list and ignore the other
+ * futexes, causing deadlocks.
+ *
+ * For the new interface, test if we can wait for a list of more than 2048
+ * elements.
+ */
+static void test_robust_list_limit(void)
+{
+	struct lock_struct locks[CHILD_LIST_LIMIT + 1];
+	_Atomic(unsigned int) *futex = &locks[CHILD_LIST_LIMIT].futex;
+	struct robust_list_head head;
+	int ret;
+
+	if (!robust2) {
+		ksft_test_result_skip("This test is only for new robust interface\n");
+		return;
+	}
+
+	*futex = 0;
+
+	ret = set_list(&head);
+	ASSERT_EQ(ret, 0);
+
+	ret = pthread_barrier_init(&barrier, NULL, 2);
+	ASSERT_EQ(ret, 0);
+
+	create_child(child_robust_list_limit, locks);
+
+	/*
+	 * After the child thread creates the very long list of locks, wait on
+	 * the last one.
+	 */
+	pthread_barrier_wait(&barrier);
+	ret = mutex_lock(&locks[CHILD_LIST_LIMIT], &head, false);
+
+	if (ret != 0)
+		printf("futex wait returned %d\n", errno);
+	ASSERT_EQ(ret, 0);
+
+	ASSERT_TRUE(*futex | FUTEX_OWNER_DIED);
+
+	wait(NULL);
+	pthread_barrier_destroy(&barrier);
+
+	ksft_test_result_pass("%s\n", __func__);
+}
+
+/*
+ * The kernel should refuse an unaligned head pointer
+ */
+static void test_unaligned_address(void)
+{
+	struct robust_list_head head, *h;
+	int ret;
+
+	if (!robust2) {
+		ksft_test_result_skip("This test is only for new robust interface\n");
+		return;
+	}
+
+	h = (struct robust_list_head *) ((uintptr_t) &head + 1);
+	ret = set_list(h);
+	ASSERT_EQ(ret, -1);
+	ASSERT_EQ(errno, EINVAL);
+}
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
 	printf("  -h	Display this help message\n");
+	printf("  -n	Use robust2 syscall\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
 }
@@ -520,7 +667,7 @@ int main(int argc, char *argv[])
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "chnt:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -531,6 +678,9 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'n':
+			robust2 = true;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -538,7 +688,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(7);
+	ksft_set_plan(8);
 
 	test_robustness();
 
@@ -548,6 +698,8 @@ int main(int argc, char *argv[])
 	test_set_list_op_pending();
 	test_robust_list_multiple_elements();
 	test_circular_list();
+	test_robust_list_limit();
+	test_unaligned_address();
 
 	ksft_print_cnts();
 	return 0;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros
  2025-06-26 17:11 ` [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros André Almeida
@ 2025-06-26 22:07   ` Thomas Gleixner
  2025-06-26 22:09     ` Thomas Gleixner
  2025-06-27 20:23     ` André Almeida
  0 siblings, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-26 22:07 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Thu, Jun 26 2025 at 14:11, André Almeida wrote:

> Create ASSERT_{EQ, NE, TRUE, FALSE} macros to make test creation easier.

What's so futex special about this that it can't use the same muck in

tools/testing/selftests/kselftest_harness.h

or at least share the implementation in some way?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros
  2025-06-26 22:07   ` Thomas Gleixner
@ 2025-06-26 22:09     ` Thomas Gleixner
  2025-06-27 20:23     ` André Almeida
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-26 22:09 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Fri, Jun 27 2025 at 00:07, Thomas Gleixner wrote:

> On Thu, Jun 26 2025 at 14:11, André Almeida wrote:
>
>> Create ASSERT_{EQ, NE, TRUE, FALSE} macros to make test creation easier.
>
> What's so futex special about this that it can't use the same muck in
>
> tools/testing/selftests/kselftest_harness.h
>
> or at least share the implementation in some way?

BPF has it's own set as well. Sigh...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/7] selftests/futex: Create test for robust list
  2025-06-26 17:11 ` [PATCH v5 2/7] selftests/futex: Create test for robust list André Almeida
@ 2025-06-26 22:36   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-26 22:36 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Thu, Jun 26 2025 at 14:11, André Almeida wrote:
> +
> +int set_robust_list(struct robust_list_head *head, size_t len)

This function and the get() counterpart are global because they can?

> +{
> +	return syscall(SYS_set_robust_list, head, len);
> +}
> +/*
> + * Basic lock struct, contains just the futex word and the robust list element
> + * Real implementations have also a *prev to easily walk in the list
> + */
> +struct lock_struct {
> +	_Atomic(unsigned int) futex;
> +	struct robust_list list;

tabular arrangement please.

> +	pthread_barrier_wait(&barrier);
> +
> +	/*
> +	 * There's a race here: the parent thread needs to be inside
> +	 * futex_wait() before the child thread dies, otherwise it will miss the
> +	 * wakeup from handle_futex_death() that this child will emit. We wait a
> +	 * little bit just to make sure that this happens.
> +	 */
> +	sleep(1);

One second is quite a little bit. :)

> +	/*
> +	 * futex_wait() should return 0 and the futex word should be marked with
> +	 * FUTEX_OWNER_DIED
> +	 */
> +	ASSERT_EQ(ret, 0);
> +	if (ret != 0)
> +		printf("futex wait returned %d", errno);

What's the purpose of the extra printf() after the assert here? This
code is not even reached when ret != 0, no?

> +	ASSERT_TRUE(*futex | FUTEX_OWNER_DIED);

That's always true no matter what the content of the futex variable is, no?

> +/*
> + * The only valid value for len is sizeof(*head)
> + */
> +static void test_set_robust_list_invalid_size(void)
> +{
> +	struct robust_list_head head;
> +	size_t head_size = sizeof(struct robust_list_head);

Groan. You already define the robust_list_head variable ahead of
head_size and violate the reverse fir tree ordering, so why don't you
use the obvious and actually robust 'sizeof(head)'?

> +/*
> + * Test get_robust_list with pid = 0, getting the list of the running thread
> + */
> +static void test_get_robust_list_self(void)
> +{
> +	struct robust_list_head head, head2, *get_head;
> +	size_t head_size = sizeof(struct robust_list_head), len_ptr;

Ditto.

> +static int child_list(void *arg)
> +{
> +	struct robust_list_head *head = (struct robust_list_head *) arg;

void pointers really don't require type casts

> +	int ret;
> +
> +	ret = set_robust_list(head, sizeof(struct robust_list_head));

sizeof(*head)

> +	if (ret)
> +		ksft_test_result_fail("set_robust_list error\n");
> +
> +	pthread_barrier_wait(&barrier);
> +	pthread_barrier_wait(&barrier2);

Lacks a comment what this waits for

> +	return 0;
> +}
> +
> +/*
> + * Test get_robust_list from another thread. We use two barriers here to ensure
> + * that:
> + *   1) the child thread set the list before we try to get it from the
> + * parent
> + *   2) the child thread still alive when we try to get the list from it
> + */
> +static void test_get_robust_list_child(void)
> +{
> +	pid_t tid;
> +	int ret;
> +	struct robust_list_head head, *get_head;
> +	size_t len_ptr;

Reverse fir tree ordering please.

> +	ret = pthread_barrier_init(&barrier, NULL, 2);
> +	ret = pthread_barrier_init(&barrier2, NULL, 2);
> +	ASSERT_EQ(ret, 0);
> +
> +	tid = create_child(&child_list, &head);
> +	ASSERT_NE(tid, -1);
> +
> +	pthread_barrier_wait(&barrier);
> +
> +	ret = get_robust_list(tid, &get_head, &len_ptr);
> +	ASSERT_EQ(ret, 0);
> +	ASSERT_EQ(&head, get_head);
> +
> +	pthread_barrier_wait(&barrier2);
> +
> +	wait(NULL);
> +	pthread_barrier_destroy(&barrier);
> +	pthread_barrier_destroy(&barrier2);
> +
> +	ksft_test_result_pass("%s\n", __func__);
> +}
> +
> +static int child_fn_lock_with_error(void *arg)
> +{
> +	struct lock_struct *lock = (struct lock_struct *) arg;

See above

> +	struct robust_list_head head;
> +	int ret;
> +
> +	ret = set_list(&head);
> +	if (ret)
> +		ksft_test_result_fail("set_robust_list error\n");

So you fail the test and continue to produce more fails or what? Why
does this not use one of these ASSERT thingies or return?

> +	ret = mutex_lock(lock, &head, true);
> +	if (ret)
> +		ksft_test_result_fail("mutex_lock error\n");
> +
> +	pthread_barrier_wait(&barrier);
> +
> +	sleep(1);
> +
> +	return 0;
> +}
> +
> +/*
> + * Same as robustness test, but inject an error where the mutex_lock() exits
> + * earlier, just after setting list_op_pending and taking the lock, to test the
> + * list_op_pending mechanism
> + */
> +static void test_set_list_op_pending(void)
> +{
> +	struct lock_struct lock = { .futex = 0 };
> +	struct robust_list_head head;
> +	_Atomic(unsigned int) *futex = &lock.futex;
> +	int ret;

See above

> +	ASSERT_EQ(ret, 0);
> +	if (ret != 0)
> +		printf("futex wait returned %d", errno);

The random insertion of completely pointless printf()'s is stunning.

> +	ASSERT_TRUE(*futex | FUTEX_OWNER_DIED);

Yet another always true assert which is happily optimized out by the
compiler.

> +	wait(NULL);
> +	pthread_barrier_destroy(&barrier);
> +
> +	ksft_test_result_pass("%s\n", __func__);
> +}

> +static int child_wait_lock(void *arg)
> +{
> +	struct lock_struct *lock = (struct lock_struct *) arg;
> +	struct robust_list_head head;
> +	int ret;
> +
> +	pthread_barrier_wait(&barrier2);
> +	ret = mutex_lock(lock, &head, false);
> +
> +	if (ret)
> +		ksft_test_result_fail("mutex_lock error\n");
> +
> +	if (!(lock->futex | FUTEX_OWNER_DIED))
> +		ksft_test_result_fail("futex not marked with FUTEX_OWNER_DIED\n");

Now I kinda understand this insanity. The child emits a fail and
exits. Then the parent ...

> +	for (i = 0; i < CHILD_NR; i++)
> +		create_child(&child_wait_lock, &locks[i]);
> +
> +	/* Wait for all children to return */
> +	while (wait(NULL) > 0);
> +
> +	pthread_barrier_destroy(&barrier);
> +	pthread_barrier_destroy(&barrier2);
> +
> +	ksft_test_result_pass("%s\n", __func__);

... happily claims that the test passed.

Seriously?

Thread functions have a return value for a reason and wait(2) has a
wstatus argument for the very same reason.

> +static int child_circular_list(void *arg)
> +{
> +	static struct robust_list_head head;
> +	struct lock_struct a, b, c;
> +	int ret;
> +
> +	ret = set_list(&head);
> +	if (ret)
> +		ksft_test_result_fail("set_list error\n");

Yet another instance of the same ....

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list
  2025-06-26 17:11 ` [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list André Almeida
@ 2025-06-26 22:56   ` Thomas Gleixner
  2025-06-28 14:27   ` kernel test robot
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-26 22:56 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Thu, Jun 26 2025 at 14:11, André Almeida wrote:

$subject lacks a () function notation ....

> There are two functions for handling robust lists during the task

during a tasks exit

> exit: exit_robust_list() and compat_exit_robust_list(). The first one
> handles either 64bit or 32bit lists, depending if it's a 64bit or 32bit
> kernel. The compat_exit_robust_list() only exists in 64bit kernels that

s/The//

> supports 32bit syscalls, and handles 32bit lists.

32-bit 64-bit all over the place

> For the new syscall set_robust_list2(), 64bit kernels need to be able to
> handle 32bit lists despite having or not support for 32bit syscalls, so
> make compat_exit_robust_list() exist regardless of compat_ config.

What new syscall and what are the requirements here? You really want to
add some rationale and background here.

> Also, use explicitly sizing, otherwise in a 32bit kernel both
> exit_robust_list() and compat_exit_robust_list() would be the exactly
> same function, with none of them dealing with 64bit robust lists.

Explicit sizing of what? The changelog should give information which
allows me to verify the implementation and not some blurb which makes me
to oracle the meaning of the changelog out of the actual implementation.

What is the actual gist of this patch? The subject says:

     Use explicit sizes for compat_exit_robust_list

Now you say 'Also,' which means aside of the above actual statement to
make compat_exit_robust_list() unconditional this is now a side effect
or what?

The subject line is misleading because the real purpose of this patch is
to make compat_exit_robust_list() unconditionally available independent
of bitness.

Now the obvious question is why this patch isn't split into two pieces:

    1) The patch matching the above subject line and does the
       struct/argument rename

    2) A subsequent patch which makes the function unconditionally
       available

That's not done because obfuscating changes makes everyones life easier,
right?

> +++ b/include/linux/compat.h
> @@ -385,16 +385,6 @@ struct compat_ifconf {
>  	compat_caddr_t  ifcbuf;
>  };
>  
> -struct compat_robust_list {
> -	compat_uptr_t			next;
> -};
> -
> -struct compat_robust_list_head {
> -	struct compat_robust_list	list;
> -	compat_long_t			futex_offset;
> -	compat_uptr_t			list_op_pending;
> -};
> -
>  #ifdef CONFIG_COMPAT_OLD_SIGACTION
>  struct compat_old_sigaction {
>  	compat_uptr_t			sa_handler;
> @@ -672,7 +662,7 @@ asmlinkage long compat_sys_waitid(int, compat_pid_t,
>  		struct compat_siginfo __user *, int,
>  		struct compat_rusage __user *);
>  asmlinkage long
> -compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
> +compat_sys_set_robust_list(struct robust_list_head32 __user *head,
>  			   compat_size_t len);

How does this even survive a full kernel build without a forward
declaration of struct robust_list_head32?

Not everything which includes compat.h includes futex.h first. There is
a reason why the structs were define here. Sure you can move them, but
not without a forward declaration.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 4/7] futex: Create set_robust_list2
  2025-06-26 17:11 ` [PATCH v5 4/7] futex: Create set_robust_list2 André Almeida
@ 2025-06-27 12:06   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-27 12:06 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Thu, Jun 26 2025 at 14:11, André Almeida wrote:
> Create a new robust_list() syscall. The current syscall can't be
> expanded to cover the following use case, so a new one is needed. This
> new syscall allows users to set multiple robust lists per process and to
> have either 32bit or 64bit pointers in the list.
>
> * Interface
>
> This is the proposed interface:

What's proposed about this interface once it hits the tree? Just get rid
of this line completely. The paragraph decribes the interface
implemented by this patch, no?

If you want to make proposals, then do so in a RFC cover letter or
somewhere else.

> 	long set_robust_list2(void *head, int index, unsigned int flags)
>
> `head` is the head of the userspace struct robust_list_head, just as old
> set_robust_list(). It needs to be a void pointer since it can point to a
> normal robust_list_head or a compat_robust_list_head.
>
> `flags` can be used for defining the list type:
>
> 	enum robust_list_type {
> 	 	ROBUST_LIST_32BIT,
> 		ROBUST_LIST_64BIT,
> 	 };
>
> `index` is the index in the internal robust_list's linked list (the
> naming starts to get confusing, I reckon).

Yes

> If `index == -1`, that means that user wants to set a new robust_list,
> and the kernel will append it in the end of the list, assign a new
> index and return this index to the user. If `index >= 0`, that means
> that user wants to re-set `*head` of an already existing list
> (similarly to what happens when you call set_robust_list() twice with
> different `*head`).

I'm lost in this word salad by now.

> If `index` is out of range, or it points to a non-existing robust_list,
> or if the internal list is full, an error is returned.
>
> Unaligned `head` addresses are refused by the kernel with -EINVAL.
>
> User cannot remove lists.

  A registered list cannot be removed.

But wording aside, I see that a once created list entry cannot be
removed, but it can be set to NULL, which means it is invalidated, no?

> * Implementation
>
> The old syscall's set/get_robust_list() are converted to use the linked

s/old/original/

> list as well. When using only the old syscalls user shouldn't any

user space

also 'shouldn't any difference' does not qualify as a parseable
sentence.

Aside of that this patch doing five things at once is unreviewable.

> difference as the internal code will handle the linked list insertion as
> usual. When mixing old and new interfaces users should be aware that one
> of the elements of the list was created by another syscall and they
> should have special care handling this element index.

That's a guaranteed recipe for disaster.
  
>  #ifdef CONFIG_FUTEX
> +/*
> + * This is an entry of a linked list of robust lists.
> + *
> + * @head: can point to a 64bit list or a 32bit list
> + * @list_type: determine the size of the futex pointers in the list
> + * @index: the index of this entry in the list
> + * @list: linked list element
> + */
> +struct robust_list2_entry {
> +	void __user *head;
> +	enum robust_list2_type list_type;
> +	unsigned int index;
> +	struct list_head list;
> +};

I'm quite sure that I pointed you to

  https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

before, no?

> +
>  enum {
>  	FUTEX_STATE_OK,
>  	FUTEX_STATE_EXITING,
> @@ -74,10 +89,11 @@ enum {
>  
>  static inline void futex_init_task(struct task_struct *tsk)
>  {
> -	tsk->robust_list = NULL;
> +	tsk->robust_list_index = -1;

Can we have proper defines for this instead of '-1'. NULL for a pointer
makes obviously sense as it is well defined, but '-1' is just a number
pulled out of thin air and has no meaning at all.

>  #ifdef CONFIG_COMPAT
> -	tsk->compat_robust_list = NULL;
> +	tsk->compat_robust_list_index = -1;
>  #endif

> diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
> index 7e2744ec89336a260e89883e95222eda199eeb7f..a8d8d4b1306ba56041046647bfb12146cf1ae4cb 100644
> --- a/include/uapi/linux/futex.h
> +++ b/include/uapi/linux/futex.h
> @@ -153,6 +153,16 @@ struct robust_list_head {
>  	struct robust_list __user *list_op_pending;
>  };
>  
> +#define ROBUST_LISTS_PER_TASK 10
> +
> +enum robust_list2_type {
> +	ROBUST_LIST_32BIT,
> +	ROBUST_LIST_64BIT,
> +};
> +
> +#define ROBUST_LIST_TYPE_MASK (ROBUST_LIST_32BIT | ROBUST_LIST_64BIT)

enum is a numerical value and ROBUST_LIST_32BIT is obviously 0.
Defining the mask out if these two values is surely creative and works
by chance, but seriously?

> +long do_set_robust_list2(struct robust_list_head __user *head,
> +			 int index, unsigned int type)

This expects a type defined in an named enum, So why is @type unsigned int?

> +{
> +	struct list_head *list2 = &current->robust_list2;
> +	struct robust_list2_entry *prev, *new = NULL;
> +
> +	if (index == -1) {
> +		if (list_empty(list2)) {
> +			index = 0;
> +		} else {
> +			prev = list_last_entry(list2, struct robust_list2_entry, list);
> +			index = prev->index + 1;
> +		}
> +
> +		if (index >= ROBUST_LISTS_PER_TASK)
> +			return -EINVAL;
> +
> +		new = kmalloc(sizeof(struct robust_list2_entry), GFP_KERNEL);
> +		if (!new)
> +			return -ENOMEM;
> +
> +		list_add_tail(&new->list, list2);
> +		new->index = index;

Right. Make the entry visible _before_ initializing it.

What's worse is that the list operation is done without any form of
serialization against a concurrent invocation of get_robust_list(),
which can operate on a task which is not current. IOW, you created an
inviting unpriviledged security hole.

Aside of that, this list magic is completely overengineered. Why?

  - The amount of lists is limited to 10 for whatever reason as the
    changelog is silent about it.

  - The memory required for storing 10 robust list heads is 10 *
    sizeof(unsigned long), i.e. 80 bytes on a 64bit machine

  - The magic list head and the two index variables for the legacy
    interfaces consume 20 bytes unconditionally in each task struct.

  - Each allocated robust_list2_entry consumes 32 bytes of memory

So the first robust list setup amounts already to a total of 20 + 32 =
52 bytes, right? Any subsequent robust list installment exceeds 80 bytes
obviously.

This list handling is just overly complex for absolutely zero
value. Worse, it results in incomprehensible code and bugs like the
above.

It's not rocket science to make this non-obfuscated for both the user
side API and the kernel implementation. Something like this:

enum robust_list_cmd {
	FUTEX_ROBUST_LIST_CMD_CREATE_64,
	FUTEX_ROBUST_LIST_CMD_CREATE_32,
        FUTEX_ROBUST_LIST_CMD_MODIFY_64,
        FUTEX_ROBUST_LIST_CMD_MODIFY_32,
        FUTEX_ROBUST_LIST_CMD_USER_MAX,

        // Kernel internal, rejected for user space
        FUTEX_ROBUST_LIST_SET_NATIVE = 128,
        FUTEX_ROBUST_LIST_SET_COMPAT,
};

#define FUTEX_ROBUST_LIST_NATIVE_IDX	0
#define FUTEX_ROBUST_LIST_COMPAT_IDX	1
#define FUTEX_ROBUST_LIST2_IDX		2
#define FUTEX_ROBUST_LIST_MAX_IDX	8

#define FUTEX_ROBUST_LIST_ENTRY_INUSE	0x1UL
#define FUTEX_ROBUST_LIST_ENTRY_64BIT	0x2UL
#define FUTEX_ROBUST_LIST_ENTRY_MASK	(~0x3UL)

static long futex_robust_list_create(uintptr_t head, enum robust_list_cmd cmd)
{
	uintptr_t entry = FUTEX_ROBUST_LIST_ENTRY_VALID;                                     
        uintptr_t *rl = current->futex_robust_lists;
	unsigned int index;

        if (!rl) {
                rl = kcalloc(ROBUST_LISTS_PER_TASK, sizeof(*rl), ....);
		....
                scoped_guard(mutex, &current->futex_exit_mutex)
                	current->futex_robust_lists = rl;
        }

	switch (cmd) {
        case FUTEX_ROBUST_LIST_CREATE_64:
        	if (!IS_ENABLED(CONFIG_64BIT))
                	return -EINVAL;
        	entry |= FUTEX_ROBUST_LIST_ENTRY_64BIT;
                fallthrough;
        case FUTEX_ROBUST_LIST_CREATE_32:
        	// Search for a empty slot
		for (index = FUTEX_ROBUST_LIST2_IDX; index < FUTEX_ROBUST_LIST_MAX_IDX; index++) {
			if (!rl[index])
                        	break;
		}
                return -ENOSPC;

	case FUTEX_ROBUST_LIST_SET_NATIVE:
        	index = FUTEX_ROBUST_LIST_LEGACY_IDX;
        	if (IS_ENABLED(CONFIG_64BIT))
                       	entry |= FUTEX_ROBUST_LIST_ENTRY_64BIT;
                break;
	case FUTEX_ROBUST_LIST_SET_COMPAT:
        	if (!IS_ENABLED(CONFIG_64BIT))
                	return -EINVAL;
        	index = FUTEX_ROBUST_LIST_COMPAT_IDX;
                break;
        default:
        	return -EINVAL;
        }

        entry |= head;	
        scoped_guard(mutex, &current->futex_exit_mutex)
        	rl[index] = entry;
        return index;
}

static long futex_robust_list_modify(uintptr_t head, enum robust_list_cmd cmd, unsigned int index)
{
        uintptr_t *rl = current->futex_robust_lists;
	unsigned long entry;

        if (index >= FUTEX_ROBUST_LIST_MAX_IDX)
        	return -EINVAL;

	if (!rl || !rl[index])
        	return -ENOENT;

	entry = rl[index];
        switch (cmd) {
        case FUTEX_ROBUST_LIST_MODIFY_32:
		if (entry & FUTEX_ROBUST_LIST_ENTRY_64BIT)
                	return -EINVAL;
                break;
        case FUTEX_ROBUST_LIST_MODIFY_64:
        	if (!IS_ENABLED(CONFIG_64BIT))
                	return -EINVAL;
		if (!(entry & FUTEX_ROBUST_LIST_ENTRY_64BIT))
                	return -EINVAL;
                break;
        default:
        	return -EINVAL;
        }

	entry &= FUTEX_ROBUST_LIST_ENTRY_MASK;
        entry |= head;	
        scoped_guard(mutex, &current->futex_exit_mutex)
        	rl[index] = entry;
        return index;
}

There is no confusion of index numbers and user space does not have to
care about a situation where an older libc uses the original syscall and
the application itself uses the new API. The two reserved entries are
just worth that.

The per task struct footprint is exactly one pointer for everything and
the allocation of 28 extra bytes for installing it once is definitely
not a problem at all.

get_robust_list() can safely access the array with the mutex held...

See?

> @@ -1414,25 +1476,28 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
>  
>  static void futex_cleanup(struct task_struct *tsk)
>  {
> -#ifdef CONFIG_64BIT
> -	if (unlikely(tsk->robust_list)) {
> -		exit_robust_list64(tsk, tsk->robust_list);
> -		tsk->robust_list = NULL;
> -	}
> -#else
> -	if (unlikely(tsk->robust_list)) {
> -		exit_robust_list32(tsk,
> -				  (struct robust_list_head32 __user *) tsk->robust_list);
> -		tsk->robust_list = NULL;
> -	}
> -#endif
> +	struct robust_list2_entry *curr, *n;
> +	struct list_head *list2 = &tsk->robust_list2;
>  
> -#ifdef CONFIG_COMPAT
> -	if (unlikely(tsk->compat_robust_list)) {
> -		exit_robust_list32(tsk, tsk->compat_robust_list);
> -		tsk->compat_robust_list = NULL;
> +	/*
> +	 * Walk through the linked list, parsing robust lists and freeing the
> +	 * allocated lists
> +	 */
> +	if (unlikely(!list_empty(list2))) {
> +		list_for_each_entry_safe(curr, n, list2, list) {
> +			if (curr->head != NULL) {
> +				if (curr->list_type == ROBUST_LIST_64BIT)
> +					exit_robust_list64(tsk, curr->head);
> +				else if (curr->list_type == ROBUST_LIST_32BIT)
> +					exit_robust_list32(tsk, curr->head);

Unknown type is silently ignored?

> +				curr->head = NULL;
> +			}
> +			list_del_init(&curr->list);
> +			kfree(curr);
> +		}
>  	}
> -#endif
> +
> +	tsk->robust_list_index = -1;

Also why has this to be crammed into futex_cleanup()? Just because it
can and makes it more unreabable by deep indentation levels?

     if (unlikely(tsk->futex_robust_lists))
        exit_robust_lists(tsk);

and

static void exit_robust_lists(struct task_struct *tsk)
{
	uintptr_t *rl = tsk->futex_robust_lists;

        tsk->futex_robust_lists = NULL;

	for (unsigned int idx = 0; idx < ROBUST_LISTS_PER_TASK; idx++) {
        	uintptr_t entry = rl[idx];
        
                if (!(entry & FUTEX_ROBUST_LIST_ENTRY_MASK))
                	continue;

		if (entry & FUTEX_ROBUST_LIST_ENTRY_32BIT) {
			struct robust_list_head32 __user *head;

			entry &= FUTEX_ROBUST_LIST_ENTRY_MASK;
                        head = (__force struct robust_list_head32 __user *)entry;
                 	exit_robust_list32(tsk, head);
                } else {
			struct robust_list_head __user *head;

			entry &= FUTEX_ROBUST_LIST_ENTRY_MASK;
                        head = (__force struct robust_list_head __user *)entry;
                	exit_robust_list64(tsk, head);
	}
        kfree(rl);
}

Is too readable, right?

> +#ifdef CONFIG_64BIT
> +static inline int robust_list_native_type(void)
> +{
> +	return ROBUST_LIST_64BIT;
> +}
> +#else
> +static inline int robust_list_native_type(void)
> +{
> +	return ROBUST_LIST_32BIT;
> +}
> +#endif

More ifdeffery is better

           return IS_ENABLED(CONFIG_64BIT) ? ....;

>  /**
>   * sys_set_robust_list() - Set the robust-futex list head of a task
>   * @head:	pointer to the list-head
> @@ -28,17 +40,63 @@
>  SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
>  		size_t, len)
>  {
> +	unsigned int type = robust_list_native_type();

Right. enum returned from an inline as int and then assigned to an
unsigned int for consistency....

> +	int ret;
> +
>  	/*
>  	 * The kernel knows only one size for now:
>  	 */
>  	if (unlikely(len != sizeof(*head)))
>  		return -EINVAL;
>  
> -	current->robust_list = head;
> +	ret = do_set_robust_list2(head, current->robust_list_index, type);
> +	if (ret < 0)
> +		return ret;
> +
> +	current->robust_list_index = ret;

	return futex_robust_list_create((__force uintptr)head, FUTEX_ROBUST_LIST_SET_LEGACY);

>  	return 0;
>  }
>  
> +#define ROBUST_LIST_FLAGS ROBUST_LIST_TYPE_MASK
> +
> +/*
> + * sys_set_robust_list2()
> + *
> + * When index == -1, create a new list for user. When index >= 0, try to find
> + * the corresponding list and re-set the head there.
> + *
> + * Return values:
> + *  >= 0: success, index of the robust list
> + *  -EINVAL: invalid flags, invalid index
> + *  -ENOENT: requested index no where to be found
> + *  -ENOMEM: error allocating new list
> + *  -ESRCH: too many allocated lists

-ESRCH really makes no sense. -ENOSPC is what you want, no?

Aside of that, the comment is not valid kernel doc format even if it
pretends to be,

> + */
> +SYSCALL_DEFINE3(set_robust_list2, struct robust_list_head __user *, head,
> +		int, index, unsigned int, flags)
> +{
> +	unsigned int type;
> +
> +	type = flags & ROBUST_LIST_TYPE_MASK;
> +
> +	if (index < -1 || index >= ROBUST_LISTS_PER_TASK)
> +		return -EINVAL;
> +
> +	if ((flags & ~ROBUST_LIST_FLAGS) != 0)
> +		return -EINVAL;
> +
> +	if (((uintptr_t) head % sizeof(u32)) != 0)
> +		return -EINVAL;

A 64-bit robust list head should be 64-bit aligned, no?

> +#ifndef CONFIG_64BIT

Sigh.

> +	if (type == ROBUST_LIST_64BIT)
> +		return -EINVAL;
> +#endif

> +
> +	return do_set_robust_list2(head, index, type);

	uintptr entry = (__force uintptr_t)head;
        unsigned int align = sizeof(u32);

	switch (cmd) {
        case FUTEX_ROBUST_LIST_CREATE_64:
        	if (!IS_ENABLED(CONFIG_64BIT))
                	return -ENOTSUPP;
		align = sizeof(u64);
                fallthrough;
        case FUTEX_ROBUST_LIST_CREATE_32:
        	if (entry % align)
                	return -EINVAL;
       		return futex_robust_list_create(entry, cmd);

        case FUTEX_ROBUST_LIST_MODIFY_64:
        	if (!IS_ENABLED(CONFIG_64BIT))
                	return -ENOTSUPP;
		align = sizeof(u64);
                fallthrough;
        case FUTEX_ROBUST_LIST_MODIFY_32:
        	if (entry % align)
                	return -EINVAL;
                return futex_robust_list_modify(entry, cmd, index);
        default:
        	return -EINVAL;

Or something like that.

> +}
> +
>  /**
>   * sys_get_robust_list() - Get the robust-futex list head of a task
>   * @pid:	pid of the process [zero for current task]
> @@ -52,6 +110,7 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	struct robust_list_head __user *head;
>  	unsigned long ret;
>  	struct task_struct *p;
> +	int index;
>  
>  	rcu_read_lock();
>  
> @@ -68,9 +127,11 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
>  		goto err_unlock;
>  
> -	head = p->robust_list;
> +	index = p->robust_list_index;
>  	rcu_read_unlock();
>  
> +	head = get_robust_list2(index, p);

Aside of the unprotected list walk, this adds a user exploitable exit
race on top. @p can go away immediately after rcu_read_unlock(), no?

Impressive...

With the above this becomes:

uintptr_t get_robust_list_at(pid_t pid, unsigned int idx)
{
        struct task_struct *p;
	uintptr_t entry;

	scoped_guard(rcu) {
		p = !pid ? current : find_task_by_vpid(pid);
		if (!p)
			return -ESRCH;

		if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
                	return -EPERM;
                get_task_struct(p);
	}

        scoped_guard(mutex, &p->futex_exit_mutex) {
        	uintptr_t *rl = p->futex_robust_lists;

		entry = rl ? rl[idx] : NULL;
		entry &= FUTEX_ROBUST_LIST_ENTRY_MASK;
        }

	put_task_struct(p);
        return entry;

get_robust_list()
{
	uintptr_t entry = get_robust_list_at(pid, FUTEX_ROBUST_LIST_NATIVE_IDX);
	struct robust_list_head __user *head;

	head = (__force struct robust_list_head __user *)entry;
	if (put_user(sizeof(*head), len_ptr))
		return -EFAULT;
	return put_user(head, head_ptr);
}

Or something along those lines. The scoped_guard(rcu) cleanup want's to
be in a preparatory patch.

Also why does this new syscall come without a corresponding
sys_get_robust_list2() implementation? CRIU definitely depends on
get_robust_list(), so it will need get_robust_list2() as well...

> @@ -458,6 +528,7 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	struct robust_list_head32 __user *head;
>  	unsigned long ret;
>  	struct task_struct *p;
> +	int index;
>  
>  	rcu_read_lock();
>  
> @@ -474,9 +545,11 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
>  		goto err_unlock;
>  
> -	head = p->compat_robust_list;
> +	index = p->compat_robust_list_index;
>  	rcu_read_unlock();
>  
> +	head = (struct robust_list_head32 __user *) get_robust_list2(index, p);
> +

See above....

This all wants to be properly split up into:

     1) Preparatory cleanups

     2) Implementation of the new mechanism without touching the existing
        syscalls.

     3) Conversion of the existing syscalls

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 5/7] futex: Remove the limit of elements for sys_set_robust_list2 lists
  2025-06-26 17:11 ` [PATCH v5 5/7] futex: Remove the limit of elements for sys_set_robust_list2 lists André Almeida
@ 2025-06-27 12:22   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-27 12:22 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Thu, Jun 26 2025 at 14:11, André Almeida wrote:
> Remove the limit of ROBUST_LIST_LIMIT elements that a robust list can
> have, for the ones created with the new interface. This is done by

With which new interface?

> overwritten the list as it's proceeded in a way that we avoid circular

overwriting each processed list entry to point at ...., which eliminates
a potential circular list.


> lists.
>
> For the old interface, we keep the limited behavior to avoid changing

s/we//

> the API.

Which API would be violated?

Overwriting the dying tasks robust list entries is not violating any
ABI. The task's memory is on the way to be destroyed.

> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> ---
>  kernel/futex/core.c | 37 +++++++++++++++++++++++++++++--------
>  1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/futex/core.c b/kernel/futex/core.c
> index 1049f8ef3ce3c611b3be0ca12df34a98f710121d..942b66facdea16cd7be2235d95c2bbbae8d7cc63 100644
> --- a/kernel/futex/core.c
> +++ b/kernel/futex/core.c
> @@ -1152,7 +1152,8 @@ static inline int fetch_robust_entry(struct robust_list __user **entry,
>   * We silently return on any sign of list-walking problem.
>   */
>  static void exit_robust_list64(struct task_struct *curr,
> -			       struct robust_list_head __user *head)
> +			       struct robust_list_head __user *head,
> +			       bool destroyable)
>  {
>  	struct robust_list __user *entry, *next_entry, *pending;
>  	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;
> @@ -1196,13 +1197,17 @@ static void exit_robust_list64(struct task_struct *curr,
>  		}
>  		if (rc)
>  			return;
> -		entry = next_entry;
> -		pi = next_pi;
> +
>  		/*
>  		 * Avoid excessively long or circular lists:
>  		 */
> -		if (!--limit)
> +		if (!destroyable && !--limit)
>  			break;
> +		else
> +			put_user(&head->list, &entry->next);

Unchecked put_user() with zero explanation what it actually does.

> +
> +		entry = next_entry;
> +		pi = next_pi;
>  
>  		cond_resched();
>  	}
> @@ -1214,7 +1219,8 @@ static void exit_robust_list64(struct task_struct *curr,
>  }
>  #else
>  static void exit_robust_list64(struct task_struct *curr,
> -			       struct robust_list_head __user *head)
> +			       struct robust_list_head __user *head,
> +			       bool destroyable)
>  {
>  	pr_warn("32bit kernel should not allow ROBUST_LIST_64BIT");
>  }
> @@ -1252,7 +1258,8 @@ fetch_robust_entry32(u32 *uentry, struct robust_list __user **entry,
>   * We silently return on any sign of list-walking problem.
>   */
>  static void exit_robust_list32(struct task_struct *curr,
> -			       struct robust_list_head32 __user *head)
> +			       struct robust_list_head32 __user *head,
> +			       bool destroyable)
>  {
>  	struct robust_list __user *entry, *next_entry, *pending;
>  	unsigned int limit = ROBUST_LIST_LIMIT, pi, pip;

So this get's a destroyable argument as well, but no implementation?

> @@ -1474,10 +1481,19 @@ static void exit_pi_state_list(struct task_struct *curr)
>  static inline void exit_pi_state_list(struct task_struct *curr) { }
>  #endif
>  
> +/*
> + * futex_cleanup - After the task exists, process the robust lists
> + *
> + * Walk through the linked list, parsing robust lists and freeing the
> + * allocated lists. Lists created with the set_robust_list2 don't have a limit
> + * for sizing and can be destroyed while we walk on it to avoid circular list.
> + */
>  static void futex_cleanup(struct task_struct *tsk)
>  {
>  	struct robust_list2_entry *curr, *n;
>  	struct list_head *list2 = &tsk->robust_list2;
> +	bool destroyable = true;
> +	int i = 0;
>  
>  	/*
>  	 * Walk through the linked list, parsing robust lists and freeing the
> @@ -1485,15 +1501,20 @@ static void futex_cleanup(struct task_struct *tsk)
>  	 */
>  	if (unlikely(!list_empty(list2))) {
>  		list_for_each_entry_safe(curr, n, list2, list) {
> +			destroyable = true;
> +			if (tsk->robust_list_index == i)
> +				destroyable = false;

Oh well.....

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 7/7] selftests: futex: Expand robust list test for the new interface
  2025-06-26 17:11 ` [PATCH v5 7/7] selftests: futex: Expand robust list test for the new interface André Almeida
@ 2025-06-27 12:48   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-06-27 12:48 UTC (permalink / raw)
  To: André Almeida, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: linux-kernel, linux-kselftest, linux-api, kernel-dev,
	André Almeida

On Thu, Jun 26 2025 at 14:11, André Almeida wrote:

> Expand the current robust list test for the new set_robust_list2
> syscall. Create an option to make it possible to run the same tests
> using the new syscall, and also add two new relevant test: test long
> lists (bigger than ROBUST_LIST_LIMIT) and for unaligned addresses.
>
> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> ---
>  .../selftests/futex/functional/robust_list.c       | 160 ++++++++++++++++++++-
>  1 file changed, 156 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c
> index 42690b2440fd29a9b12c46f67f9645ccc93d1147..004ad79ff6171c411fd47e699e3c38889544218e 100644
> --- a/tools/testing/selftests/futex/functional/robust_list.c
> +++ b/tools/testing/selftests/futex/functional/robust_list.c
> @@ -35,16 +35,45 @@
>  #include <stddef.h>
>  #include <sys/mman.h>
>  #include <sys/wait.h>
> +#include <stdint.h>
>  
>  #define STACK_SIZE (1024 * 1024)
>  
>  #define FUTEX_TIMEOUT 3
>  
> +#define SYS_set_robust_list2 468
> +
> +enum robust_list2_type {
> +        ROBUST_LIST_32BIT,
> +        ROBUST_LIST_64BIT,
> +};

Why can't this use an updated header?

> +
>  static pthread_barrier_t barrier, barrier2;
>  
> +bool robust2 = false;

global because ....

>  int set_robust_list(struct robust_list_head *head, size_t len)
>  {
> -	return syscall(SYS_set_robust_list, head, len);
> +	int ret, flags;
> +
> +	if (!robust2) {
> +		return syscall(SYS_set_robust_list, head, len);
> +	}

Pointless brackets.

> +	if (sizeof(head) == 8)
> +		flags = ROBUST_LIST_64BIT;
> +	else
> +		flags = ROBUST_LIST_32BIT;
> +
> +	/*
> +	 * We act as we have just one list here. We try to use the first slot,
> +	 * but if it hasn't been alocated yet we allocate it.
> +	 */
> +	ret = syscall(SYS_set_robust_list2, head, 0, flags);
> +	if (ret == -1 && errno == ENOENT)
> +		ret = syscall(SYS_set_robust_list2, head, -1, flags);

What the heck is this?

> +	return ret;
>  }
>  
>  int get_robust_list(int pid, struct robust_list_head **head, size_t *len_ptr)
> @@ -246,6 +275,11 @@ static void test_set_robust_list_invalid_size(void)
>  	size_t head_size = sizeof(struct robust_list_head);
>  	int ret;
>  
> +	if (robust2) {
> +		ksft_test_result_skip("This test is only for old robust interface\n");

Why is it invoked in the first place?

> +		return;
> +	}
> +
>  	ret = set_robust_list(&head, head_size);
>  	ASSERT_EQ(ret, 0);
>  
> @@ -321,6 +355,11 @@ static void test_get_robust_list_child(void)
>  	struct robust_list_head head, *get_head;
>  	size_t len_ptr;
>  
> +	if (robust2) {
> +		ksft_test_result_skip("Not implemented in the new robust interface\n");

For the very wrong reasons.

> +		return;
> +	}
> +
>  	ret = pthread_barrier_init(&barrier, NULL, 2);
>  	ret = pthread_barrier_init(&barrier2, NULL, 2);
>  	ASSERT_EQ(ret, 0);
> @@ -332,7 +371,7 @@ static void test_get_robust_list_child(void)
>  
>  	ret = get_robust_list(tid, &get_head, &len_ptr);
>  	ASSERT_EQ(ret, 0);
> -	ASSERT_EQ(&head, get_head);
> +	ASSERT_EQ(get_head, &head);

ROTFL

>  
>  	pthread_barrier_wait(&barrier2);
>  
> @@ -507,11 +546,119 @@ static void test_circular_list(void)
>  	ksft_test_result_pass("%s\n", __func__);
>  }
>  
> +#define ROBUST_LIST_LIMIT	2048
> +#define CHILD_LIST_LIMIT (ROBUST_LIST_LIMIT + 10)
> +
> +static int child_robust_list_limit(void *arg)
> +{
> +	struct lock_struct *locks;
> +	struct robust_list *list;
> +	struct robust_list_head head;
> +	int ret, i;
> +
> +	locks = (struct lock_struct *) arg;
> +
> +	ret = set_list(&head);
> +	if (ret)
> +		ksft_test_result_fail("set_list error\n");

Yet again the same broken crap.

> +	/*
> +	 * Create a very long list of locks
> +	 */
> +	head.list.next = &locks[0].list;
> +
> +	list = head.list.next;
> +	for (i = 0; i < CHILD_LIST_LIMIT - 1; i++) {
> +		list->next = &locks[i+1].list;
> +		list = list->next;
> +	}
> +	list->next = &head.list;
> +
> +	/*
> +	 * Grab the lock in the last one, and die without releasing it
> +	 */
> +	mutex_lock(&locks[CHILD_LIST_LIMIT], &head, false);
> +	pthread_barrier_wait(&barrier);
> +
> +	sleep(1);
> +
> +	return 0;
> +}
> +
> +/*
> + * The old robust list used to have a limit of 2048 items from the kernel side.
> + * After this limit the kernel stops walking the list and ignore the other

ignores

> + * futexes, causing deadlocks.
> + *
> + * For the new interface, test if we can wait for a list of more than 2048
> + * elements.
> + */
> +static void test_robust_list_limit(void)
> +{
> +	struct lock_struct locks[CHILD_LIST_LIMIT + 1];
> +	_Atomic(unsigned int) *futex = &locks[CHILD_LIST_LIMIT].futex;
> +	struct robust_list_head head;
> +	int ret;
> +
> +	if (!robust2) {
> +		ksft_test_result_skip("This test is only for new robust interface\n");
> +		return;
> +	}
> +
> +	*futex = 0;
> +
> +	ret = set_list(&head);
> +	ASSERT_EQ(ret, 0);
> +
> +	ret = pthread_barrier_init(&barrier, NULL, 2);
> +	ASSERT_EQ(ret, 0);
> +
> +	create_child(child_robust_list_limit, locks);
> +
> +	/*
> +	 * After the child thread creates the very long list of locks, wait on
> +	 * the last one.
> +	 */
> +	pthread_barrier_wait(&barrier);
> +	ret = mutex_lock(&locks[CHILD_LIST_LIMIT], &head, false);
> +
> +	if (ret != 0)
> +		printf("futex wait returned %d\n", errno);
> +	ASSERT_EQ(ret, 0);

lalala.

> +
> +	ASSERT_TRUE(*futex | FUTEX_OWNER_DIED);

Copy and pasta does not make it more correct.

> +	wait(NULL);
> +	pthread_barrier_destroy(&barrier);
> +
> +	ksft_test_result_pass("%s\n", __func__);
> +}
> +
> +/*
> + * The kernel should refuse an unaligned head pointer
> + */
> +static void test_unaligned_address(void)
> +{
> +	struct robust_list_head head, *h;
> +	int ret;
> +
> +	if (!robust2) {
> +		ksft_test_result_skip("This test is only for new robust interface\n");
> +		return;
> +	}
> +
> +	h = (struct robust_list_head *) ((uintptr_t) &head + 1);
> +	ret = set_list(h);
> +	ASSERT_EQ(ret, -1);
> +	ASSERT_EQ(errno, EINVAL);
> +}
> +
>  void usage(char *prog)
>  {
>  	printf("Usage: %s\n", prog);
>  	printf("  -c	Use color\n");
>  	printf("  -h	Display this help message\n");
> +	printf("  -n	Use robust2 syscall\n");

Right. We need a command line option to guarantee that the test is not
executed by bots...

>  	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
>  	       VQUIET, VCRITICAL, VINFO);
>  }
> @@ -520,7 +667,7 @@ int main(int argc, char *argv[])
>  {
>  	int c;
>  
> -	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
> +	while ((c = getopt(argc, argv, "chnt:v:")) != -1) {
>  		switch (c) {
>  		case 'c':
>  			log_color(1);
> @@ -531,6 +678,9 @@ int main(int argc, char *argv[])
>  		case 'v':
>  			log_verbosity(atoi(optarg));
>  			break;
> +		case 'n':
> +			robust2 = true;
> +			break;
>  		default:
>  			usage(basename(argv[0]));
>  			exit(1);
> @@ -538,7 +688,7 @@ int main(int argc, char *argv[])
>  	}
>  
>  	ksft_print_header();
> -	ksft_set_plan(7);
> +	ksft_set_plan(8);
>

Just check whether the new syscall is implemented and then set the
number of tests accordingly.

>  	test_robustness();
>  
> @@ -548,6 +698,8 @@ int main(int argc, char *argv[])
>  	test_set_list_op_pending();
>  	test_robust_list_multiple_elements();
>  	test_circular_list();
> +	test_robust_list_limit();
> +	test_unaligned_address();

and then do:

	test_robustness();
        ....
	test_circular_list();

        if (has_robust) {
        	robust2 = true;
                
                test_robustness();
                ...
                test_circular_list();
		test_robust_list_limit();
		test_unaligned_address();
                
        }        

or something like that.

Time for a stiff drink....

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros
  2025-06-26 22:07   ` Thomas Gleixner
  2025-06-26 22:09     ` Thomas Gleixner
@ 2025-06-27 20:23     ` André Almeida
  2025-07-01  9:20       ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: André Almeida @ 2025-06-27 20:23 UTC (permalink / raw)
  To: Thomas Gleixner, Shuah Khan
  Cc: Davidlohr Bueso, Peter Zijlstra, Arnd Bergmann, linux-kernel,
	linux-kselftest, Sebastian Andrzej Siewior, linux-api, kernel-dev,
	Darren Hart, Ingo Molnar, Waiman Long

Em 26/06/2025 19:07, Thomas Gleixner escreveu:
> On Thu, Jun 26 2025 at 14:11, André Almeida wrote:
> 
>> Create ASSERT_{EQ, NE, TRUE, FALSE} macros to make test creation easier.
> 
> What's so futex special about this that it can't use the same muck in
> 
> tools/testing/selftests/kselftest_harness.h
> 

My previous version of this test used kselftest_harness.h, but Shuah 
request to keep consistency and don't use this header, giving that the 
rest of futex test doesn't use it:

https://lore.kernel.org/lkml/fe02f42b-7ba8-4a3b-a86c-2a4a7942fd3b@linuxfoundation.org/

> or at least share the implementation in some way?
> 
> Thanks,
> 
>          tglx


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list
  2025-06-26 17:11 ` [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list André Almeida
  2025-06-26 22:56   ` Thomas Gleixner
@ 2025-06-28 14:27   ` kernel test robot
  1 sibling, 0 replies; 18+ messages in thread
From: kernel test robot @ 2025-06-28 14:27 UTC (permalink / raw)
  To: André Almeida, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Darren Hart, Davidlohr Bueso, Shuah Khan, Arnd Bergmann,
	Sebastian Andrzej Siewior, Waiman Long
  Cc: llvm, oe-kbuild-all, linux-kernel, linux-kselftest, linux-api,
	kernel-dev, André Almeida

Hi André,

kernel test robot noticed the following build warnings:

[auto build test WARNING on a24cc6ce1933eade12aa2b9859de0fcd2dac2c06]

url:    https://github.com/intel-lab-lkp/linux/commits/Andr-Almeida/selftests-futex-Add-ASSERT_-macros/20250627-011636
base:   a24cc6ce1933eade12aa2b9859de0fcd2dac2c06
patch link:    https://lore.kernel.org/r/20250626-tonyk-robust_futex-v5-3-179194dbde8f%40igalia.com
patch subject: [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list
config: arm-randconfig-003-20250627 (https://download.01.org/0day-ci/archive/20250628/202506282104.ThReVuLD-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project e04c938cc08a90ae60440ce22d072ebc69d67ee8)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250628/202506282104.ThReVuLD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506282104.ThReVuLD-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from net/sched/sch_qfq.c:13:
   In file included from include/linux/netdevice.h:44:
   In file included from include/uapi/linux/neighbour.h:6:
   In file included from include/linux/netlink.h:9:
   In file included from include/net/scm.h:13:
   In file included from include/net/compat.h:8:
>> include/linux/compat.h:665:35: warning: declaration of 'struct robust_list_head32' will not be visible outside of this function [-Wvisibility]
     665 | compat_sys_set_robust_list(struct robust_list_head32 __user *head,
         |                                   ^
   1 warning generated.
--
   In file included from net/sched/sch_multiq.c:15:
   In file included from include/net/netlink.h:6:
   In file included from include/linux/netlink.h:9:
   In file included from include/net/scm.h:13:
   In file included from include/net/compat.h:8:
>> include/linux/compat.h:665:35: warning: declaration of 'struct robust_list_head32' will not be visible outside of this function [-Wvisibility]
     665 | compat_sys_set_robust_list(struct robust_list_head32 __user *head,
         |                                   ^
   net/sched/sch_multiq.c:36:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
      36 |         int err;
         |             ^
   2 warnings generated.


vim +665 include/linux/compat.h

   621	
   622	#ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
   623	asmlinkage long compat_sys_pwritev64(unsigned long fd,
   624			const struct iovec __user *vec,
   625			unsigned long vlen, loff_t pos);
   626	#endif
   627	asmlinkage long compat_sys_sendfile(int out_fd, int in_fd,
   628					    compat_off_t __user *offset, compat_size_t count);
   629	asmlinkage long compat_sys_sendfile64(int out_fd, int in_fd,
   630					    compat_loff_t __user *offset, compat_size_t count);
   631	asmlinkage long compat_sys_pselect6_time32(int n, compat_ulong_t __user *inp,
   632					    compat_ulong_t __user *outp,
   633					    compat_ulong_t __user *exp,
   634					    struct old_timespec32 __user *tsp,
   635					    void __user *sig);
   636	asmlinkage long compat_sys_pselect6_time64(int n, compat_ulong_t __user *inp,
   637					    compat_ulong_t __user *outp,
   638					    compat_ulong_t __user *exp,
   639					    struct __kernel_timespec __user *tsp,
   640					    void __user *sig);
   641	asmlinkage long compat_sys_ppoll_time32(struct pollfd __user *ufds,
   642					 unsigned int nfds,
   643					 struct old_timespec32 __user *tsp,
   644					 const compat_sigset_t __user *sigmask,
   645					 compat_size_t sigsetsize);
   646	asmlinkage long compat_sys_ppoll_time64(struct pollfd __user *ufds,
   647					 unsigned int nfds,
   648					 struct __kernel_timespec __user *tsp,
   649					 const compat_sigset_t __user *sigmask,
   650					 compat_size_t sigsetsize);
   651	asmlinkage long compat_sys_signalfd4(int ufd,
   652					     const compat_sigset_t __user *sigmask,
   653					     compat_size_t sigsetsize, int flags);
   654	asmlinkage long compat_sys_newfstatat(unsigned int dfd,
   655					      const char __user *filename,
   656					      struct compat_stat __user *statbuf,
   657					      int flag);
   658	asmlinkage long compat_sys_newfstat(unsigned int fd,
   659					    struct compat_stat __user *statbuf);
   660	/* No generic prototype for sync_file_range and sync_file_range2 */
   661	asmlinkage long compat_sys_waitid(int, compat_pid_t,
   662			struct compat_siginfo __user *, int,
   663			struct compat_rusage __user *);
   664	asmlinkage long
 > 665	compat_sys_set_robust_list(struct robust_list_head32 __user *head,
   666				   compat_size_t len);
   667	asmlinkage long
   668	compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
   669				   compat_size_t __user *len_ptr);
   670	asmlinkage long compat_sys_getitimer(int which,
   671					     struct old_itimerval32 __user *it);
   672	asmlinkage long compat_sys_setitimer(int which,
   673					     struct old_itimerval32 __user *in,
   674					     struct old_itimerval32 __user *out);
   675	asmlinkage long compat_sys_kexec_load(compat_ulong_t entry,
   676					      compat_ulong_t nr_segments,
   677					      struct compat_kexec_segment __user *,
   678					      compat_ulong_t flags);
   679	asmlinkage long compat_sys_timer_create(clockid_t which_clock,
   680				struct compat_sigevent __user *timer_event_spec,
   681				timer_t __user *created_timer_id);
   682	asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
   683					  compat_long_t addr, compat_long_t data);
   684	asmlinkage long compat_sys_sched_setaffinity(compat_pid_t pid,
   685					     unsigned int len,
   686					     compat_ulong_t __user *user_mask_ptr);
   687	asmlinkage long compat_sys_sched_getaffinity(compat_pid_t pid,
   688					     unsigned int len,
   689					     compat_ulong_t __user *user_mask_ptr);
   690	asmlinkage long compat_sys_sigaltstack(const compat_stack_t __user *uss_ptr,
   691					       compat_stack_t __user *uoss_ptr);
   692	asmlinkage long compat_sys_rt_sigsuspend(compat_sigset_t __user *unewset,
   693						 compat_size_t sigsetsize);
   694	#ifndef CONFIG_ODD_RT_SIGACTION
   695	asmlinkage long compat_sys_rt_sigaction(int,
   696					 const struct compat_sigaction __user *,
   697					 struct compat_sigaction __user *,
   698					 compat_size_t);
   699	#endif
   700	asmlinkage long compat_sys_rt_sigprocmask(int how, compat_sigset_t __user *set,
   701						  compat_sigset_t __user *oset,
   702						  compat_size_t sigsetsize);
   703	asmlinkage long compat_sys_rt_sigpending(compat_sigset_t __user *uset,
   704						 compat_size_t sigsetsize);
   705	asmlinkage long compat_sys_rt_sigtimedwait_time32(compat_sigset_t __user *uthese,
   706			struct compat_siginfo __user *uinfo,
   707			struct old_timespec32 __user *uts, compat_size_t sigsetsize);
   708	asmlinkage long compat_sys_rt_sigtimedwait_time64(compat_sigset_t __user *uthese,
   709			struct compat_siginfo __user *uinfo,
   710			struct __kernel_timespec __user *uts, compat_size_t sigsetsize);
   711	asmlinkage long compat_sys_rt_sigqueueinfo(compat_pid_t pid, int sig,
   712					struct compat_siginfo __user *uinfo);
   713	/* No generic prototype for rt_sigreturn */
   714	asmlinkage long compat_sys_times(struct compat_tms __user *tbuf);
   715	asmlinkage long compat_sys_getrlimit(unsigned int resource,
   716					     struct compat_rlimit __user *rlim);
   717	asmlinkage long compat_sys_setrlimit(unsigned int resource,
   718					     struct compat_rlimit __user *rlim);
   719	asmlinkage long compat_sys_getrusage(int who, struct compat_rusage __user *ru);
   720	asmlinkage long compat_sys_gettimeofday(struct old_timeval32 __user *tv,
   721			struct timezone __user *tz);
   722	asmlinkage long compat_sys_settimeofday(struct old_timeval32 __user *tv,
   723			struct timezone __user *tz);
   724	asmlinkage long compat_sys_sysinfo(struct compat_sysinfo __user *info);
   725	asmlinkage long compat_sys_mq_open(const char __user *u_name,
   726				int oflag, compat_mode_t mode,
   727				struct compat_mq_attr __user *u_attr);
   728	asmlinkage long compat_sys_mq_notify(mqd_t mqdes,
   729				const struct compat_sigevent __user *u_notification);
   730	asmlinkage long compat_sys_mq_getsetattr(mqd_t mqdes,
   731				const struct compat_mq_attr __user *u_mqstat,
   732				struct compat_mq_attr __user *u_omqstat);
   733	asmlinkage long compat_sys_msgctl(int first, int second, void __user *uptr);
   734	asmlinkage long compat_sys_msgrcv(int msqid, compat_uptr_t msgp,
   735			compat_ssize_t msgsz, compat_long_t msgtyp, int msgflg);
   736	asmlinkage long compat_sys_msgsnd(int msqid, compat_uptr_t msgp,
   737			compat_ssize_t msgsz, int msgflg);
   738	asmlinkage long compat_sys_semctl(int semid, int semnum, int cmd, int arg);
   739	asmlinkage long compat_sys_shmctl(int first, int second, void __user *uptr);
   740	asmlinkage long compat_sys_shmat(int shmid, compat_uptr_t shmaddr, int shmflg);
   741	asmlinkage long compat_sys_recvfrom(int fd, void __user *buf, compat_size_t len,
   742				    unsigned flags, struct sockaddr __user *addr,
   743				    int __user *addrlen);
   744	asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user *msg,
   745					   unsigned flags);
   746	asmlinkage long compat_sys_recvmsg(int fd, struct compat_msghdr __user *msg,
   747					   unsigned int flags);
   748	/* No generic prototype for readahead */
   749	asmlinkage long compat_sys_keyctl(u32 option,
   750				      u32 arg2, u32 arg3, u32 arg4, u32 arg5);
   751	asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
   752			     const compat_uptr_t __user *envp);
   753	/* No generic prototype for fadvise64_64 */
   754	/* CONFIG_MMU only */
   755	asmlinkage long compat_sys_rt_tgsigqueueinfo(compat_pid_t tgid,
   756						compat_pid_t pid, int sig,
   757						struct compat_siginfo __user *uinfo);
   758	asmlinkage long compat_sys_recvmmsg_time64(int fd, struct compat_mmsghdr __user *mmsg,
   759					    unsigned vlen, unsigned int flags,
   760					    struct __kernel_timespec __user *timeout);
   761	asmlinkage long compat_sys_recvmmsg_time32(int fd, struct compat_mmsghdr __user *mmsg,
   762					    unsigned vlen, unsigned int flags,
   763					    struct old_timespec32 __user *timeout);
   764	asmlinkage long compat_sys_wait4(compat_pid_t pid,
   765					 compat_uint_t __user *stat_addr, int options,
   766					 struct compat_rusage __user *ru);
   767	asmlinkage long compat_sys_fanotify_mark(int, unsigned int, __u32, __u32,
   768						    int, const char __user *);
   769	asmlinkage long compat_sys_open_by_handle_at(int mountdirfd,
   770						     struct file_handle __user *handle,
   771						     int flags);
   772	asmlinkage long compat_sys_sendmmsg(int fd, struct compat_mmsghdr __user *mmsg,
   773					    unsigned vlen, unsigned int flags);
   774	asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
   775			     const compat_uptr_t __user *argv,
   776			     const compat_uptr_t __user *envp, int flags);
   777	asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
   778			const struct iovec __user *vec,
   779			compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
   780	asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
   781			const struct iovec __user *vec,
   782			compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
   783	#ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
   784	asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
   785			const struct iovec __user *vec,
   786			unsigned long vlen, loff_t pos, rwf_t flags);
   787	#endif
   788	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros
  2025-06-27 20:23     ` André Almeida
@ 2025-07-01  9:20       ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2025-07-01  9:20 UTC (permalink / raw)
  To: André Almeida, Shuah Khan
  Cc: Davidlohr Bueso, Peter Zijlstra, Arnd Bergmann, linux-kernel,
	linux-kselftest, Sebastian Andrzej Siewior, linux-api, kernel-dev,
	Darren Hart, Ingo Molnar, Waiman Long

On Fri, Jun 27 2025 at 17:23, André Almeida wrote:
> Em 26/06/2025 19:07, Thomas Gleixner escreveu:
>> On Thu, Jun 26 2025 at 14:11, André Almeida wrote:
>> 
>>> Create ASSERT_{EQ, NE, TRUE, FALSE} macros to make test creation easier.
>> 
>> What's so futex special about this that it can't use the same muck in
>> 
>> tools/testing/selftests/kselftest_harness.h
>> 
>
> My previous version of this test used kselftest_harness.h, but Shuah 
> request to keep consistency and don't use this header, giving that the 
> rest of futex test doesn't use it:
>
> https://lore.kernel.org/lkml/fe02f42b-7ba8-4a3b-a86c-2a4a7942fd3b@linuxfoundation.org/

So proliferating duplicate and pointlessly different code is the
preferred option here?

Cleaning up the existing mess first before adding more would be too
sensible, right?

I'm lost for words, which is an achievement.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-07-01  9:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 17:11 [PATCH v5 0/7] futex: Create set_robust_list2 André Almeida
2025-06-26 17:11 ` [PATCH v5 1/7] selftests/futex: Add ASSERT_ macros André Almeida
2025-06-26 22:07   ` Thomas Gleixner
2025-06-26 22:09     ` Thomas Gleixner
2025-06-27 20:23     ` André Almeida
2025-07-01  9:20       ` Thomas Gleixner
2025-06-26 17:11 ` [PATCH v5 2/7] selftests/futex: Create test for robust list André Almeida
2025-06-26 22:36   ` Thomas Gleixner
2025-06-26 17:11 ` [PATCH v5 3/7] futex: Use explicit sizes for compat_exit_robust_list André Almeida
2025-06-26 22:56   ` Thomas Gleixner
2025-06-28 14:27   ` kernel test robot
2025-06-26 17:11 ` [PATCH v5 4/7] futex: Create set_robust_list2 André Almeida
2025-06-27 12:06   ` Thomas Gleixner
2025-06-26 17:11 ` [PATCH v5 5/7] futex: Remove the limit of elements for sys_set_robust_list2 lists André Almeida
2025-06-27 12:22   ` Thomas Gleixner
2025-06-26 17:11 ` [PATCH v5 6/7] futex: Wire up set_robust_list2 syscall André Almeida
2025-06-26 17:11 ` [PATCH v5 7/7] selftests: futex: Expand robust list test for the new interface André Almeida
2025-06-27 12:48   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).