* [PATCH v2 0/3] futex: Create set_robust_list2 @ 2024-11-01 16:21 André Almeida 2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: André Almeida @ 2024-11-01 16:21 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida This patch adds a new robust_list() syscall. The current syscall can't be expanded to cover the following use case, so a new one is needed. This new syscall allows users to set multiple robust lists per process and to have either 32bit or 64bit pointers in the list. * Use case FEX-Emu[1] is an application that runs x86 and x86-64 binaries on an AArch64 Linux host. One of the tasks of FEX-Emu is to translate syscalls from one platform to another. Existing set_robust_list() can't be easily translated because of two limitations: 1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel this is not a problem, because of the compat entry point. But there's no such compat entry point for AArch64, so the kernel would do the pointer arithmetic wrongly. Is also unviable to userspace to keep track every addition/removal to the robust list and keep a 64bit version of it somewhere else to feed the kernel. Thus, the new interface has an option of telling the kernel if the list is filled with 32bit or 64bit pointers. 2) Apps can set just one robust list (in theory, x86-64 can set two if they also use the compat entry point). That means that when a x86 app asks FEX-Emu to call set_robust_list(), FEX have two options: to overwrite their own robust list pointer and make the app robust, or to ignore the app robust list and keep the emulator robust. The new interface allows for multiple robust lists per application, solving this. * Interface This is the proposed interface: long set_robust_list2(void *head, int index, unsigned int flags) `head` is the head of the userspace struct robust_list_head, just as old set_robust_list(). It needs to be a void pointer since it can point to a normal robust_list_head or a compat_robust_list_head. `flags` can be used for defining the list type: enum robust_list_type { ROBUST_LIST_32BIT, ROBUST_LIST_64BIT, }; `index` is the index in the internal robust_list's linked list (the naming starts to get confusing, I reckon). If `index == -1`, that means that user wants to set a new robust_list, and the kernel will append it in the end of the list, assign a new index and return this index to the user. If `index >= 0`, that means that user wants to re-set `*head` of an already existing list (similarly to what happens when you call set_robust_list() twice with different `*head`). If `index` is out of range, or it points to a non-existing robust_list, or if the internal list is full, an error is returned. * Implementation The implementation re-uses most of the existing robust list interface as possible. The new task_struct member `struct list_head robust_list2` is just a linked list where new lists are appended as the user requests more lists, and by futex_cleanup(), the kernel walks through the internal list feeding exit_robust_list() with the robust_list's. This implementation supports up to 10 lists (defined at ROBUST_LISTS_PER_TASK), but it was an arbitrary number for this RFC. For the described use case above, 4 should be enough, I'm not sure which should be the limit. It doesn't support list removal (should it support?). It doesn't have a proper get_robust_list2() yet as well, but I can add it in a next revision. We could also have a generic robust_list() syscall that can be used to set/get and be controlled by flags. The new interface has a `unsigned int flags` argument, making it extensible for future use cases as well. * Testing I will provide a selftest similar to the one I proposed for the current interface here: https://lore.kernel.org/lkml/20241010011142.905297-1-andrealmeid@igalia.com/ Also, FEX-Emu added support for this interface to validate it: https://github.com/FEX-Emu/FEX/pull/3966 Feedback is very welcomed! Thanks, André [1] https://github.com/FEX-Emu/FEX Changelog: - Added a patch to properly deal with exit_robust_list() in 64bit vs 32bit - Wired-up syscall for all archs - Added more of the cover letter to the commit message v1: https://lore.kernel.org/lkml/20241024145735.162090-1-andrealmeid@igalia.com/ André Almeida (3): futex: Use explicit sizes for compat_exit_robust_list futex: Create set_robust_list2 futex: Wire up set_robust_list2 syscall arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/compat.h | 12 +- include/linux/futex.h | 12 ++ include/linux/sched.h | 3 +- include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/futex.h | 24 ++++ init/init_task.c | 3 + kernel/futex/core.c | 116 +++++++++++++++++--- kernel/futex/futex.h | 3 + kernel/futex/syscalls.c | 40 ++++++- kernel/sys_ni.c | 1 + scripts/syscall.tbl | 1 + 26 files changed, 203 insertions(+), 32 deletions(-) -- 2.47.0 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list 2024-11-01 16:21 [PATCH v2 0/3] futex: Create set_robust_list2 André Almeida @ 2024-11-01 16:21 ` André Almeida 2024-11-02 5:44 ` kernel test robot 2024-11-02 14:57 ` kernel test robot 2024-11-01 16:21 ` [PATCH v2 2/3] futex: Create set_robust_list2 André Almeida ` (2 subsequent siblings) 3 siblings, 2 replies; 18+ messages in thread From: André Almeida @ 2024-11-01 16:21 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida There are two functions for handling robust lists during the task exit: exit_robust_list() and compat_exit_robust_list(). The first one handles either 64bit or 32bit lists, depending if it's a 64bit or 32bit kernel. The compat_exit_robust_list() only exists in 64bit kernels that supports 32bit syscalls, and handles 32bit lists. For the new syscall set_robust_list2(), 64bit kernels need to be able to handle 32bit lists despite having or not support for 32bit syscalls, so make compat_exit_robust_list() exist regardless of compat_ config. Also, use explicitly sizing, otherwise in a 32bit kernel both exit_robust_list() and compat_exit_robust_list() would be the exactly same function, with none of them dealing with 64bit robust lists. Signed-off-by: André Almeida <andrealmeid@igalia.com> --- This code was tested in 3 different setups: - 64bit binary in 64bit kernel - 32bit binary in 64bit kernel - 32bit binary in 32bit kernel Using this selftest: https://lore.kernel.org/lkml/20241010011142.905297-1-andrealmeid@igalia.com/ include/linux/compat.h | 12 +---------- include/linux/futex.h | 11 ++++++++++ include/linux/sched.h | 2 +- kernel/futex/core.c | 45 ++++++++++++++++++++++++++--------------- kernel/futex/syscalls.c | 4 ++-- 5 files changed, 44 insertions(+), 30 deletions(-) diff --git a/include/linux/compat.h b/include/linux/compat.h index 56cebaff0c91..968a9135ff48 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -385,16 +385,6 @@ struct compat_ifconf { compat_caddr_t ifcbuf; }; -struct compat_robust_list { - compat_uptr_t next; -}; - -struct compat_robust_list_head { - struct compat_robust_list list; - compat_long_t futex_offset; - compat_uptr_t list_op_pending; -}; - #ifdef CONFIG_COMPAT_OLD_SIGACTION struct compat_old_sigaction { compat_uptr_t sa_handler; @@ -672,7 +662,7 @@ asmlinkage long compat_sys_waitid(int, compat_pid_t, struct compat_siginfo __user *, int, struct compat_rusage __user *); asmlinkage long -compat_sys_set_robust_list(struct compat_robust_list_head __user *head, +compat_sys_set_robust_list(struct robust_list_head32 __user *head, compat_size_t len); asmlinkage long compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr, diff --git a/include/linux/futex.h b/include/linux/futex.h index b70df27d7e85..8217b5ebdd9c 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -53,6 +53,17 @@ union futex_key { #define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = 0ULL } } #ifdef CONFIG_FUTEX + +struct robust_list32 { + u32 next; +}; + +struct robust_list_head32 { + struct robust_list32 list; + s32 futex_offset; + u32 list_op_pending; +}; + enum { FUTEX_STATE_OK, FUTEX_STATE_EXITING, diff --git a/include/linux/sched.h b/include/linux/sched.h index bb343136ddd0..8f20b703557d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1282,7 +1282,7 @@ struct task_struct { #ifdef CONFIG_FUTEX struct robust_list_head __user *robust_list; #ifdef CONFIG_COMPAT - struct compat_robust_list_head __user *compat_robust_list; + struct robust_list_head32 __user *compat_robust_list; #endif struct list_head pi_state_list; struct futex_pi_state *pi_state_cache; diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 136768ae2637..bcd0e2a7ba65 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -790,13 +790,14 @@ static inline int fetch_robust_entry(struct robust_list __user **entry, return 0; } +#ifdef CONFIG_64BIT /* * Walk curr->robust_list (very carefully, it's a userspace list!) * and mark any locks found there dead, and notify any waiters. * * We silently return on any sign of list-walking problem. */ -static void exit_robust_list(struct task_struct *curr) +static void exit_robust_list64(struct task_struct *curr) { struct robust_list_head __user *head = curr->robust_list; struct robust_list __user *entry, *next_entry, *pending; @@ -857,8 +858,14 @@ static void exit_robust_list(struct task_struct *curr) curr, pip, HANDLE_DEATH_PENDING); } } +#else +static void exit_robust_list64(struct task_struct *curr) +{ + pr_warn("32bit kernel should not allow ROBUST_LIST_64BIT"); + return; +} +#endif -#ifdef CONFIG_COMPAT static void __user *futex_uaddr(struct robust_list __user *entry, compat_long_t futex_offset) { @@ -872,13 +879,13 @@ static void __user *futex_uaddr(struct robust_list __user *entry, * Fetch a robust-list pointer. Bit 0 signals PI futexes: */ static inline int -compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry, - compat_uptr_t __user *head, unsigned int *pi) +fetch_robust_entry32(u32 *uentry, struct robust_list __user **entry, + u32 __user *head, unsigned int *pi) { if (get_user(*uentry, head)) return -EFAULT; - *entry = compat_ptr((*uentry) & ~1); + *entry = (void __user *)(unsigned long)((*uentry) & ~1); *pi = (unsigned int)(*uentry) & 1; return 0; @@ -890,21 +897,21 @@ compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **ent * * We silently return on any sign of list-walking problem. */ -static void compat_exit_robust_list(struct task_struct *curr) +static void exit_robust_list32(struct task_struct *curr) { - struct compat_robust_list_head __user *head = curr->compat_robust_list; + struct robust_list_head32 __user *head = curr->compat_robust_list; struct robust_list __user *entry, *next_entry, *pending; unsigned int limit = ROBUST_LIST_LIMIT, pi, pip; unsigned int next_pi; - compat_uptr_t uentry, next_uentry, upending; - compat_long_t futex_offset; + u32 uentry, next_uentry, upending; + s32 futex_offset; int rc; /* * Fetch the list head (which was registered earlier, via * sys_set_robust_list()): */ - if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi)) + if (fetch_robust_entry32((u32 *)&uentry, &entry, (u32 *)&head->list.next, &pi)) return; /* * Fetch the relative futex offset: @@ -915,7 +922,7 @@ static void compat_exit_robust_list(struct task_struct *curr) * Fetch any possibly pending lock-add first, and handle it * if it exists: */ - if (compat_fetch_robust_entry(&upending, &pending, + if (fetch_robust_entry32(&upending, &pending, &head->list_op_pending, &pip)) return; @@ -925,8 +932,8 @@ static void compat_exit_robust_list(struct task_struct *curr) * Fetch the next entry in the list before calling * handle_futex_death: */ - rc = compat_fetch_robust_entry(&next_uentry, &next_entry, - (compat_uptr_t __user *)&entry->next, &next_pi); + rc = fetch_robust_entry32(&next_uentry, &next_entry, + (u32 __user *)&entry->next, &next_pi); /* * A pending lock might already be on the list, so * dont process it twice: @@ -957,7 +964,6 @@ static void compat_exit_robust_list(struct task_struct *curr) handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING); } } -#endif #ifdef CONFIG_FUTEX_PI @@ -1040,14 +1046,21 @@ static inline void exit_pi_state_list(struct task_struct *curr) { } static void futex_cleanup(struct task_struct *tsk) { +#ifdef CONFIG_64BIT if (unlikely(tsk->robust_list)) { - exit_robust_list(tsk); + exit_robust_list64(tsk); tsk->robust_list = NULL; } +#else + if (unlikely(tsk->robust_list)) { + exit_robust_list32(tsk); + tsk->robust_list = NULL; + } +#endif #ifdef CONFIG_COMPAT if (unlikely(tsk->compat_robust_list)) { - compat_exit_robust_list(tsk); + exit_robust_list32(tsk); tsk->compat_robust_list = NULL; } #endif diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c index 4b6da9116aa6..dba193dfd216 100644 --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -440,7 +440,7 @@ SYSCALL_DEFINE4(futex_requeue, #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(set_robust_list, - struct compat_robust_list_head __user *, head, + struct robust_list_head32 __user *, head, compat_size_t, len) { if (unlikely(len != sizeof(*head))) @@ -455,7 +455,7 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid, compat_uptr_t __user *, head_ptr, compat_size_t __user *, len_ptr) { - struct compat_robust_list_head __user *head; + struct robust_list_head32 __user *head; unsigned long ret; struct task_struct *p; -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list 2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida @ 2024-11-02 5:44 ` kernel test robot 2024-11-02 14:57 ` kernel test robot 1 sibling, 0 replies; 18+ messages in thread From: kernel test robot @ 2024-11-02 5:44 UTC (permalink / raw) To: André Almeida, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: oe-kbuild-all, linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Hi André, kernel test robot noticed the following build warnings: [auto build test WARNING on tip/locking/core] [also build test WARNING on tip/sched/core linus/master tip/x86/asm v6.12-rc5 next-20241101] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andr-Almeida/futex-Use-explicit-sizes-for-compat_exit_robust_list/20241102-002419 base: tip/locking/core patch link: https://lore.kernel.org/r/20241101162147.284993-2-andrealmeid%40igalia.com patch subject: [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list config: alpha-allnoconfig (https://download.01.org/0day-ci/archive/20241102/202411021349.tqg42lGq-lkp@intel.com/config) compiler: alpha-linux-gcc (GCC) 13.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241102/202411021349.tqg42lGq-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411021349.tqg42lGq-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from include/net/compat.h:8, from include/net/scm.h:13, from include/linux/netlink.h:9, from include/uapi/linux/neighbour.h:6, from include/linux/netdevice.h:44, from include/net/sock.h:46, from include/linux/tcp.h:19, from include/linux/ipv6.h:102, from include/net/ipv6.h:12, from include/linux/sunrpc/clnt.h:29, from include/linux/nfs_fs.h:32, from init/do_mounts.c:23: >> include/linux/compat.h:665:35: warning: 'struct robust_list_head32' declared inside parameter list will not be visible outside of this definition or declaration 665 | compat_sys_set_robust_list(struct robust_list_head32 __user *head, | ^~~~~~~~~~~~~~~~~~ -- In file included from include/net/compat.h:8, from include/net/scm.h:13, from include/linux/netlink.h:9, from include/uapi/linux/neighbour.h:6, from include/linux/netdevice.h:44, from include/net/sock.h:46, from include/linux/tcp.h:19, from include/linux/ipv6.h:102, from include/net/addrconf.h:65, from lib/vsprintf.c:41: >> include/linux/compat.h:665:35: warning: 'struct robust_list_head32' declared inside parameter list will not be visible outside of this definition or declaration 665 | compat_sys_set_robust_list(struct robust_list_head32 __user *head, | ^~~~~~~~~~~~~~~~~~ lib/vsprintf.c: In function 'va_format': lib/vsprintf.c:1683:9: warning: function 'va_format' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] 1683 | buf += vsnprintf(buf, end > buf ? end - buf : 0, va_fmt->fmt, va); | ^~~ -- In file included from kernel/time/hrtimer.c:44: >> include/linux/compat.h:665:35: warning: 'struct robust_list_head32' declared inside parameter list will not be visible outside of this definition or declaration 665 | compat_sys_set_robust_list(struct robust_list_head32 __user *head, | ^~~~~~~~~~~~~~~~~~ kernel/time/hrtimer.c:121:35: warning: initialized field overwritten [-Woverride-init] 121 | [CLOCK_REALTIME] = HRTIMER_BASE_REALTIME, | ^~~~~~~~~~~~~~~~~~~~~ kernel/time/hrtimer.c:121:35: note: (near initialization for 'hrtimer_clock_to_base_table[0]') kernel/time/hrtimer.c:122:35: warning: initialized field overwritten [-Woverride-init] 122 | [CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC, | ^~~~~~~~~~~~~~~~~~~~~~ kernel/time/hrtimer.c:122:35: note: (near initialization for 'hrtimer_clock_to_base_table[1]') kernel/time/hrtimer.c:123:35: warning: initialized field overwritten [-Woverride-init] 123 | [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME, | ^~~~~~~~~~~~~~~~~~~~~ kernel/time/hrtimer.c:123:35: note: (near initialization for 'hrtimer_clock_to_base_table[7]') kernel/time/hrtimer.c:124:35: warning: initialized field overwritten [-Woverride-init] 124 | [CLOCK_TAI] = HRTIMER_BASE_TAI, | ^~~~~~~~~~~~~~~~ kernel/time/hrtimer.c:124:35: note: (near initialization for 'hrtimer_clock_to_base_table[11]') -- In file included from kernel/futex/core.c:34: >> include/linux/compat.h:665:35: warning: 'struct robust_list_head32' declared inside parameter list will not be visible outside of this definition or declaration 665 | compat_sys_set_robust_list(struct robust_list_head32 __user *head, | ^~~~~~~~~~~~~~~~~~ kernel/futex/core.c: In function 'exit_robust_list32': kernel/futex/core.c:902:54: error: 'struct task_struct' has no member named 'compat_robust_list' 902 | struct robust_list_head32 __user *head = curr->compat_robust_list; | ^~ kernel/futex/core.c: At top level: kernel/futex/core.c:900:13: warning: 'exit_robust_list32' defined but not used [-Wunused-function] 900 | static void exit_robust_list32(struct task_struct *curr) | ^~~~~~~~~~~~~~~~~~ vim +665 include/linux/compat.h 621 622 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64 623 asmlinkage long compat_sys_pwritev64(unsigned long fd, 624 const struct iovec __user *vec, 625 unsigned long vlen, loff_t pos); 626 #endif 627 asmlinkage long compat_sys_sendfile(int out_fd, int in_fd, 628 compat_off_t __user *offset, compat_size_t count); 629 asmlinkage long compat_sys_sendfile64(int out_fd, int in_fd, 630 compat_loff_t __user *offset, compat_size_t count); 631 asmlinkage long compat_sys_pselect6_time32(int n, compat_ulong_t __user *inp, 632 compat_ulong_t __user *outp, 633 compat_ulong_t __user *exp, 634 struct old_timespec32 __user *tsp, 635 void __user *sig); 636 asmlinkage long compat_sys_pselect6_time64(int n, compat_ulong_t __user *inp, 637 compat_ulong_t __user *outp, 638 compat_ulong_t __user *exp, 639 struct __kernel_timespec __user *tsp, 640 void __user *sig); 641 asmlinkage long compat_sys_ppoll_time32(struct pollfd __user *ufds, 642 unsigned int nfds, 643 struct old_timespec32 __user *tsp, 644 const compat_sigset_t __user *sigmask, 645 compat_size_t sigsetsize); 646 asmlinkage long compat_sys_ppoll_time64(struct pollfd __user *ufds, 647 unsigned int nfds, 648 struct __kernel_timespec __user *tsp, 649 const compat_sigset_t __user *sigmask, 650 compat_size_t sigsetsize); 651 asmlinkage long compat_sys_signalfd4(int ufd, 652 const compat_sigset_t __user *sigmask, 653 compat_size_t sigsetsize, int flags); 654 asmlinkage long compat_sys_newfstatat(unsigned int dfd, 655 const char __user *filename, 656 struct compat_stat __user *statbuf, 657 int flag); 658 asmlinkage long compat_sys_newfstat(unsigned int fd, 659 struct compat_stat __user *statbuf); 660 /* No generic prototype for sync_file_range and sync_file_range2 */ 661 asmlinkage long compat_sys_waitid(int, compat_pid_t, 662 struct compat_siginfo __user *, int, 663 struct compat_rusage __user *); 664 asmlinkage long > 665 compat_sys_set_robust_list(struct robust_list_head32 __user *head, 666 compat_size_t len); 667 asmlinkage long 668 compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr, 669 compat_size_t __user *len_ptr); 670 asmlinkage long compat_sys_getitimer(int which, 671 struct old_itimerval32 __user *it); 672 asmlinkage long compat_sys_setitimer(int which, 673 struct old_itimerval32 __user *in, 674 struct old_itimerval32 __user *out); 675 asmlinkage long compat_sys_kexec_load(compat_ulong_t entry, 676 compat_ulong_t nr_segments, 677 struct compat_kexec_segment __user *, 678 compat_ulong_t flags); 679 asmlinkage long compat_sys_timer_create(clockid_t which_clock, 680 struct compat_sigevent __user *timer_event_spec, 681 timer_t __user *created_timer_id); 682 asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid, 683 compat_long_t addr, compat_long_t data); 684 asmlinkage long compat_sys_sched_setaffinity(compat_pid_t pid, 685 unsigned int len, 686 compat_ulong_t __user *user_mask_ptr); 687 asmlinkage long compat_sys_sched_getaffinity(compat_pid_t pid, 688 unsigned int len, 689 compat_ulong_t __user *user_mask_ptr); 690 asmlinkage long compat_sys_sigaltstack(const compat_stack_t __user *uss_ptr, 691 compat_stack_t __user *uoss_ptr); 692 asmlinkage long compat_sys_rt_sigsuspend(compat_sigset_t __user *unewset, 693 compat_size_t sigsetsize); 694 #ifndef CONFIG_ODD_RT_SIGACTION 695 asmlinkage long compat_sys_rt_sigaction(int, 696 const struct compat_sigaction __user *, 697 struct compat_sigaction __user *, 698 compat_size_t); 699 #endif 700 asmlinkage long compat_sys_rt_sigprocmask(int how, compat_sigset_t __user *set, 701 compat_sigset_t __user *oset, 702 compat_size_t sigsetsize); 703 asmlinkage long compat_sys_rt_sigpending(compat_sigset_t __user *uset, 704 compat_size_t sigsetsize); 705 asmlinkage long compat_sys_rt_sigtimedwait_time32(compat_sigset_t __user *uthese, 706 struct compat_siginfo __user *uinfo, 707 struct old_timespec32 __user *uts, compat_size_t sigsetsize); 708 asmlinkage long compat_sys_rt_sigtimedwait_time64(compat_sigset_t __user *uthese, 709 struct compat_siginfo __user *uinfo, 710 struct __kernel_timespec __user *uts, compat_size_t sigsetsize); 711 asmlinkage long compat_sys_rt_sigqueueinfo(compat_pid_t pid, int sig, 712 struct compat_siginfo __user *uinfo); 713 /* No generic prototype for rt_sigreturn */ 714 asmlinkage long compat_sys_times(struct compat_tms __user *tbuf); 715 asmlinkage long compat_sys_getrlimit(unsigned int resource, 716 struct compat_rlimit __user *rlim); 717 asmlinkage long compat_sys_setrlimit(unsigned int resource, 718 struct compat_rlimit __user *rlim); 719 asmlinkage long compat_sys_getrusage(int who, struct compat_rusage __user *ru); 720 asmlinkage long compat_sys_gettimeofday(struct old_timeval32 __user *tv, 721 struct timezone __user *tz); 722 asmlinkage long compat_sys_settimeofday(struct old_timeval32 __user *tv, 723 struct timezone __user *tz); 724 asmlinkage long compat_sys_sysinfo(struct compat_sysinfo __user *info); 725 asmlinkage long compat_sys_mq_open(const char __user *u_name, 726 int oflag, compat_mode_t mode, 727 struct compat_mq_attr __user *u_attr); 728 asmlinkage long compat_sys_mq_notify(mqd_t mqdes, 729 const struct compat_sigevent __user *u_notification); 730 asmlinkage long compat_sys_mq_getsetattr(mqd_t mqdes, 731 const struct compat_mq_attr __user *u_mqstat, 732 struct compat_mq_attr __user *u_omqstat); 733 asmlinkage long compat_sys_msgctl(int first, int second, void __user *uptr); 734 asmlinkage long compat_sys_msgrcv(int msqid, compat_uptr_t msgp, 735 compat_ssize_t msgsz, compat_long_t msgtyp, int msgflg); 736 asmlinkage long compat_sys_msgsnd(int msqid, compat_uptr_t msgp, 737 compat_ssize_t msgsz, int msgflg); 738 asmlinkage long compat_sys_semctl(int semid, int semnum, int cmd, int arg); 739 asmlinkage long compat_sys_shmctl(int first, int second, void __user *uptr); 740 asmlinkage long compat_sys_shmat(int shmid, compat_uptr_t shmaddr, int shmflg); 741 asmlinkage long compat_sys_recvfrom(int fd, void __user *buf, compat_size_t len, 742 unsigned flags, struct sockaddr __user *addr, 743 int __user *addrlen); 744 asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user *msg, 745 unsigned flags); 746 asmlinkage long compat_sys_recvmsg(int fd, struct compat_msghdr __user *msg, 747 unsigned int flags); 748 /* No generic prototype for readahead */ 749 asmlinkage long compat_sys_keyctl(u32 option, 750 u32 arg2, u32 arg3, u32 arg4, u32 arg5); 751 asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv, 752 const compat_uptr_t __user *envp); 753 /* No generic prototype for fadvise64_64 */ 754 /* CONFIG_MMU only */ 755 asmlinkage long compat_sys_rt_tgsigqueueinfo(compat_pid_t tgid, 756 compat_pid_t pid, int sig, 757 struct compat_siginfo __user *uinfo); 758 asmlinkage long compat_sys_recvmmsg_time64(int fd, struct compat_mmsghdr __user *mmsg, 759 unsigned vlen, unsigned int flags, 760 struct __kernel_timespec __user *timeout); 761 asmlinkage long compat_sys_recvmmsg_time32(int fd, struct compat_mmsghdr __user *mmsg, 762 unsigned vlen, unsigned int flags, 763 struct old_timespec32 __user *timeout); 764 asmlinkage long compat_sys_wait4(compat_pid_t pid, 765 compat_uint_t __user *stat_addr, int options, 766 struct compat_rusage __user *ru); 767 asmlinkage long compat_sys_fanotify_mark(int, unsigned int, __u32, __u32, 768 int, const char __user *); 769 asmlinkage long compat_sys_open_by_handle_at(int mountdirfd, 770 struct file_handle __user *handle, 771 int flags); 772 asmlinkage long compat_sys_sendmmsg(int fd, struct compat_mmsghdr __user *mmsg, 773 unsigned vlen, unsigned int flags); 774 asmlinkage long compat_sys_execveat(int dfd, const char __user *filename, 775 const compat_uptr_t __user *argv, 776 const compat_uptr_t __user *envp, int flags); 777 asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd, 778 const struct iovec __user *vec, 779 compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags); 780 asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd, 781 const struct iovec __user *vec, 782 compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags); 783 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2 784 asmlinkage long compat_sys_preadv64v2(unsigned long fd, 785 const struct iovec __user *vec, 786 unsigned long vlen, loff_t pos, rwf_t flags); 787 #endif 788 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list 2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida 2024-11-02 5:44 ` kernel test robot @ 2024-11-02 14:57 ` kernel test robot 1 sibling, 0 replies; 18+ messages in thread From: kernel test robot @ 2024-11-02 14:57 UTC (permalink / raw) To: André Almeida, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: oe-kbuild-all, linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Hi André, kernel test robot noticed the following build warnings: [auto build test WARNING on tip/locking/core] [also build test WARNING on tip/sched/core linus/master tip/x86/asm v6.12-rc5 next-20241101] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andr-Almeida/futex-Use-explicit-sizes-for-compat_exit_robust_list/20241102-002419 base: tip/locking/core patch link: https://lore.kernel.org/r/20241101162147.284993-2-andrealmeid%40igalia.com patch subject: [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list config: x86_64-randconfig-123-20241102 (https://download.01.org/0day-ci/archive/20241102/202411022242.XCJECOCz-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241102/202411022242.XCJECOCz-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411022242.XCJECOCz-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> kernel/futex/core.c:914:59: sparse: sparse: cast removes address space '__user' of expression >> kernel/futex/core.c:914:59: sparse: sparse: incorrect type in argument 3 (different address spaces) @@ expected unsigned int [noderef] [usertype] __user *head @@ got unsigned int [usertype] * @@ kernel/futex/core.c:914:59: sparse: expected unsigned int [noderef] [usertype] __user *head kernel/futex/core.c:914:59: sparse: got unsigned int [usertype] * vim +/__user +914 kernel/futex/core.c 893 894 /* 895 * Walk curr->robust_list (very carefully, it's a userspace list!) 896 * and mark any locks found there dead, and notify any waiters. 897 * 898 * We silently return on any sign of list-walking problem. 899 */ 900 static void exit_robust_list32(struct task_struct *curr) 901 { 902 struct robust_list_head32 __user *head = curr->compat_robust_list; 903 struct robust_list __user *entry, *next_entry, *pending; 904 unsigned int limit = ROBUST_LIST_LIMIT, pi, pip; 905 unsigned int next_pi; 906 u32 uentry, next_uentry, upending; 907 s32 futex_offset; 908 int rc; 909 910 /* 911 * Fetch the list head (which was registered earlier, via 912 * sys_set_robust_list()): 913 */ > 914 if (fetch_robust_entry32((u32 *)&uentry, &entry, (u32 *)&head->list.next, &pi)) 915 return; 916 /* 917 * Fetch the relative futex offset: 918 */ 919 if (get_user(futex_offset, &head->futex_offset)) 920 return; 921 /* 922 * Fetch any possibly pending lock-add first, and handle it 923 * if it exists: 924 */ 925 if (fetch_robust_entry32(&upending, &pending, 926 &head->list_op_pending, &pip)) 927 return; 928 929 next_entry = NULL; /* avoid warning with gcc */ 930 while (entry != (struct robust_list __user *) &head->list) { 931 /* 932 * Fetch the next entry in the list before calling 933 * handle_futex_death: 934 */ 935 rc = fetch_robust_entry32(&next_uentry, &next_entry, 936 (u32 __user *)&entry->next, &next_pi); 937 /* 938 * A pending lock might already be on the list, so 939 * dont process it twice: 940 */ 941 if (entry != pending) { 942 void __user *uaddr = futex_uaddr(entry, futex_offset); 943 944 if (handle_futex_death(uaddr, curr, pi, 945 HANDLE_DEATH_LIST)) 946 return; 947 } 948 if (rc) 949 return; 950 uentry = next_uentry; 951 entry = next_entry; 952 pi = next_pi; 953 /* 954 * Avoid excessively long or circular lists: 955 */ 956 if (!--limit) 957 break; 958 959 cond_resched(); 960 } 961 if (pending) { 962 void __user *uaddr = futex_uaddr(pending, futex_offset); 963 964 handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING); 965 } 966 } 967 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/3] futex: Create set_robust_list2 2024-11-01 16:21 [PATCH v2 0/3] futex: Create set_robust_list2 André Almeida 2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida @ 2024-11-01 16:21 ` André Almeida 2024-11-02 16:40 ` kernel test robot 2024-11-04 11:22 ` Peter Zijlstra 2024-11-01 16:21 ` [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall André Almeida 2024-11-02 21:58 ` [PATCH v2 0/3] futex: Create set_robust_list2 Florian Weimer 3 siblings, 2 replies; 18+ messages in thread From: André Almeida @ 2024-11-01 16:21 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Create a new robust_list() syscall. The current syscall can't be expanded to cover the following use case, so a new one is needed. This new syscall allows users to set multiple robust lists per process and to have either 32bit or 64bit pointers in the list. * Interface This is the proposed interface: long set_robust_list2(void *head, int index, unsigned int flags) `head` is the head of the userspace struct robust_list_head, just as old set_robust_list(). It needs to be a void pointer since it can point to a normal robust_list_head or a compat_robust_list_head. `flags` can be used for defining the list type: enum robust_list_type { ROBUST_LIST_32BIT, ROBUST_LIST_64BIT, }; `index` is the index in the internal robust_list's linked list (the naming starts to get confusing, I reckon). If `index == -1`, that means that user wants to set a new robust_list, and the kernel will append it in the end of the list, assign a new index and return this index to the user. If `index >= 0`, that means that user wants to re-set `*head` of an already existing list (similarly to what happens when you call set_robust_list() twice with different `*head`). If `index` is out of range, or it points to a non-existing robust_list, or if the internal list is full, an error is returned. User cannot remove lists. Signed-off-by: André Almeida <andrealmeid@igalia.com> --- include/linux/futex.h | 1 + include/linux/sched.h | 1 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/futex.h | 24 +++++++++ init/init_task.c | 3 ++ kernel/futex/core.c | 85 ++++++++++++++++++++++++++++--- kernel/futex/futex.h | 3 ++ kernel/futex/syscalls.c | 36 +++++++++++++ 8 files changed, 149 insertions(+), 9 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index 8217b5ebdd9c..997fe0013bc0 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -76,6 +76,7 @@ static inline void futex_init_task(struct task_struct *tsk) #ifdef CONFIG_COMPAT tsk->compat_robust_list = NULL; #endif + INIT_LIST_HEAD(&tsk->robust_list2); INIT_LIST_HEAD(&tsk->pi_state_list); tsk->pi_state_cache = NULL; tsk->futex_state = FUTEX_STATE_OK; diff --git a/include/linux/sched.h b/include/linux/sched.h index 8f20b703557d..4a2455f1b07c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1284,6 +1284,7 @@ struct task_struct { #ifdef CONFIG_COMPAT struct robust_list_head32 __user *compat_robust_list; #endif + struct list_head robust_list2; struct list_head pi_state_list; struct futex_pi_state *pi_state_cache; struct mutex futex_exit_mutex; diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 5bf6148cac2b..c1f5c9635c07 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -841,8 +841,11 @@ __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules) #define __NR_mseal 462 __SYSCALL(__NR_mseal, sys_mseal) +#define __NR_set_robust_list2 463 +__SYSCALL(__NR_set_robust_list2, sys_set_robust_list2) + #undef __NR_syscalls -#define __NR_syscalls 463 +#define __NR_syscalls 464 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h index d2ee625ea189..13903a278b71 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -146,6 +146,30 @@ struct robust_list_head { struct robust_list __user *list_op_pending; }; +#define ROBUST_LISTS_PER_TASK 10 + +enum robust_list2_type { + ROBUST_LIST_32BIT, + ROBUST_LIST_64BIT, +}; + +#define ROBUST_LIST_TYPE_MASK (ROBUST_LIST_32BIT | ROBUST_LIST_64BIT) + +/* + * This is an entry of a linked list of robust lists. + * + * @head: can point to a 64bit list or a 32bit list + * @list_type: determine the size of the futex pointers in the list + * @index: the index of this entry in the list + * @list: linked list element + */ +struct robust_list2_entry { + void __user *head; + enum robust_list2_type list_type; + unsigned int index; + struct list_head list; +}; + /* * Are there any waiters for this robust futex: */ diff --git a/init/init_task.c b/init/init_task.c index 136a8231355a..1b08e745c47d 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -219,6 +219,9 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = { #ifdef CONFIG_SECCOMP_FILTER .seccomp = { .filter_count = ATOMIC_INIT(0) }, #endif +#ifdef CONFIG_FUTEX + .robust_list2 = LIST_HEAD_INIT(init_task.robust_list2), +#endif }; EXPORT_SYMBOL(init_task); diff --git a/kernel/futex/core.c b/kernel/futex/core.c index bcd0e2a7ba65..f74476d0bcc1 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -797,9 +797,9 @@ static inline int fetch_robust_entry(struct robust_list __user **entry, * * We silently return on any sign of list-walking problem. */ -static void exit_robust_list64(struct task_struct *curr) +static void exit_robust_list64(struct task_struct *curr, + struct robust_list_head __user *head) { - struct robust_list_head __user *head = curr->robust_list; struct robust_list __user *entry, *next_entry, *pending; unsigned int limit = ROBUST_LIST_LIMIT, pi, pip; unsigned int next_pi; @@ -859,7 +859,8 @@ static void exit_robust_list64(struct task_struct *curr) } } #else -static void exit_robust_list64(struct task_struct *curr) +static void exit_robust_list64(struct task_struct *curr, + struct robust_list_head __user *head) { pr_warn("32bit kernel should not allow ROBUST_LIST_64BIT"); return; @@ -897,9 +898,9 @@ fetch_robust_entry32(u32 *uentry, struct robust_list __user **entry, * * We silently return on any sign of list-walking problem. */ -static void exit_robust_list32(struct task_struct *curr) +static void exit_robust_list32(struct task_struct *curr, + struct robust_list_head32 __user *head) { - struct robust_list_head32 __user *head = curr->compat_robust_list; struct robust_list __user *entry, *next_entry, *pending; unsigned int limit = ROBUST_LIST_LIMIT, pi, pip; unsigned int next_pi; @@ -965,6 +966,54 @@ static void exit_robust_list32(struct task_struct *curr) } } +long do_set_robust_list2(struct robust_list_head __user *head, + int index, unsigned int type) +{ + struct list_head *list2 = ¤t->robust_list2; + struct robust_list2_entry *prev, *new = NULL; + + if (index == -1) { + if (list_empty(list2)) { + index = 0; + } else { + prev = list_last_entry(list2, struct robust_list2_entry, list); + index = prev->index + 1; + } + + if (index >= ROBUST_LISTS_PER_TASK) + return -EINVAL; + + new = kmalloc(sizeof(struct robust_list2_entry), GFP_KERNEL); + if (!new) + return -ENOMEM; + + list_add_tail(&new->list, list2); + new->index = index; + + } else if (index >= 0) { + struct robust_list2_entry *curr; + + if (list_empty(list2)) + return -ENOENT; + + list_for_each_entry(curr, list2, list) { + if (index == curr->index) { + new = curr; + break; + } + } + + if (!new) + return -ENOENT; + } + + BUG_ON(!new); + new->head = head; + new->list_type = type; + + return index; +} + #ifdef CONFIG_FUTEX_PI /* @@ -1046,24 +1095,44 @@ static inline void exit_pi_state_list(struct task_struct *curr) { } static void futex_cleanup(struct task_struct *tsk) { + struct robust_list2_entry *curr, *n; + struct list_head *list2 = &tsk->robust_list2; + #ifdef CONFIG_64BIT if (unlikely(tsk->robust_list)) { - exit_robust_list64(tsk); + exit_robust_list64(tsk, tsk->robust_list); tsk->robust_list = NULL; } #else if (unlikely(tsk->robust_list)) { - exit_robust_list32(tsk); + exit_robust_list32(tsk, (struct robust_list_head32 *) tsk->robust_list); tsk->robust_list = NULL; } #endif #ifdef CONFIG_COMPAT if (unlikely(tsk->compat_robust_list)) { - exit_robust_list32(tsk); + exit_robust_list32(tsk, tsk->compat_robust_list); tsk->compat_robust_list = NULL; } #endif + /* + * Walk through the linked list, parsing robust lists and freeing the + * allocated lists + */ + if (unlikely(!list_empty(list2))) { + list_for_each_entry_safe(curr, n, list2, list) { + if (curr->head != NULL) { + if (curr->list_type == ROBUST_LIST_64BIT) + exit_robust_list64(tsk, curr->head); + else if (curr->list_type == ROBUST_LIST_32BIT) + exit_robust_list32(tsk, curr->head); + curr->head = NULL; + } + list_del_init(&curr->list); + kfree(curr); + } + } if (unlikely(!list_empty(&tsk->pi_state_list))) exit_pi_state_list(tsk); diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 8b195d06f4e8..7247d5c583d5 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -349,6 +349,9 @@ extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset); +extern long do_set_robust_list2(struct robust_list_head __user *head, + int index, unsigned int type); + /** * struct futex_vector - Auxiliary struct for futex_waitv() * @w: Userspace provided data diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c index dba193dfd216..ff61570bb9c8 100644 --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -39,6 +39,42 @@ SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head, return 0; } +#define ROBUST_LIST_FLAGS ROBUST_LIST_TYPE_MASK + +/* + * sys_set_robust_list2() + * + * When index == -1, create a new list for user. When index >= 0, try to find + * the corresponding list and re-set the head there. + * + * Return values: + * >= 0: success, index of the robust list + * -EINVAL: invalid flags, invalid index + * -ENOENT: requested index no where to be found + * -ENOMEM: error allocating new list + * -ESRCH: too many allocated lists + */ +SYSCALL_DEFINE3(set_robust_list2, struct robust_list_head __user *, head, + int, index, unsigned int, flags) +{ + unsigned int type; + + type = flags & ROBUST_LIST_TYPE_MASK; + + if (index < -1 || index >= ROBUST_LISTS_PER_TASK) + return -EINVAL; + + if ((flags & ~ROBUST_LIST_FLAGS) != 0) + return -EINVAL; + +#ifndef CONFIG_64BIT + if (type == ROBUST_LIST_64BIT) + return -EINVAL; +#endif + + return do_set_robust_list2(head, index, type); +} + /** * sys_get_robust_list() - Get the robust-futex list head of a task * @pid: pid of the process [zero for current task] -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] futex: Create set_robust_list2 2024-11-01 16:21 ` [PATCH v2 2/3] futex: Create set_robust_list2 André Almeida @ 2024-11-02 16:40 ` kernel test robot 2024-11-04 11:22 ` Peter Zijlstra 1 sibling, 0 replies; 18+ messages in thread From: kernel test robot @ 2024-11-02 16:40 UTC (permalink / raw) To: André Almeida, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: oe-kbuild-all, linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Hi André, kernel test robot noticed the following build warnings: [auto build test WARNING on tip/locking/core] [also build test WARNING on tip/sched/core linus/master v6.12-rc5 next-20241101] [cannot apply to tip/x86/asm] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andr-Almeida/futex-Use-explicit-sizes-for-compat_exit_robust_list/20241102-002419 base: tip/locking/core patch link: https://lore.kernel.org/r/20241101162147.284993-3-andrealmeid%40igalia.com patch subject: [PATCH v2 2/3] futex: Create set_robust_list2 config: i386-randconfig-062-20241102 (https://download.01.org/0day-ci/archive/20241103/202411030038.W5DgvCYP-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241103/202411030038.W5DgvCYP-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411030038.W5DgvCYP-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) kernel/futex/core.c:915:59: sparse: sparse: cast removes address space '__user' of expression kernel/futex/core.c:915:59: sparse: sparse: incorrect type in argument 3 (different address spaces) @@ expected unsigned int [noderef] [usertype] __user *head @@ got unsigned int [usertype] * @@ kernel/futex/core.c:915:59: sparse: expected unsigned int [noderef] [usertype] __user *head kernel/futex/core.c:915:59: sparse: got unsigned int [usertype] * kernel/futex/core.c:1108:42: sparse: sparse: cast removes address space '__user' of expression >> kernel/futex/core.c:1108:42: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected struct robust_list_head32 [noderef] __user *head @@ got struct robust_list_head32 * @@ kernel/futex/core.c:1108:42: sparse: expected struct robust_list_head32 [noderef] __user *head kernel/futex/core.c:1108:42: sparse: got struct robust_list_head32 * kernel/futex/core.c: note: in included file (through include/linux/smp.h, include/linux/alloc_tag.h, include/linux/percpu.h, ...): include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true vim +1108 kernel/futex/core.c 1095 1096 static void futex_cleanup(struct task_struct *tsk) 1097 { 1098 struct robust_list2_entry *curr, *n; 1099 struct list_head *list2 = &tsk->robust_list2; 1100 1101 #ifdef CONFIG_64BIT 1102 if (unlikely(tsk->robust_list)) { 1103 exit_robust_list64(tsk, tsk->robust_list); 1104 tsk->robust_list = NULL; 1105 } 1106 #else 1107 if (unlikely(tsk->robust_list)) { > 1108 exit_robust_list32(tsk, (struct robust_list_head32 *) tsk->robust_list); 1109 tsk->robust_list = NULL; 1110 } 1111 #endif 1112 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] futex: Create set_robust_list2 2024-11-01 16:21 ` [PATCH v2 2/3] futex: Create set_robust_list2 André Almeida 2024-11-02 16:40 ` kernel test robot @ 2024-11-04 11:22 ` Peter Zijlstra 2024-11-04 21:55 ` André Almeida 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2024-11-04 11:22 UTC (permalink / raw) To: André Almeida Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor On Fri, Nov 01, 2024 at 01:21:46PM -0300, André Almeida wrote: > @@ -1046,24 +1095,44 @@ static inline void exit_pi_state_list(struct task_struct *curr) { } > > static void futex_cleanup(struct task_struct *tsk) > { > + struct robust_list2_entry *curr, *n; > + struct list_head *list2 = &tsk->robust_list2; > + > #ifdef CONFIG_64BIT > if (unlikely(tsk->robust_list)) { > - exit_robust_list64(tsk); > + exit_robust_list64(tsk, tsk->robust_list); > tsk->robust_list = NULL; > } > #else > if (unlikely(tsk->robust_list)) { > - exit_robust_list32(tsk); > + exit_robust_list32(tsk, (struct robust_list_head32 *) tsk->robust_list); > tsk->robust_list = NULL; > } > #endif > > #ifdef CONFIG_COMPAT > if (unlikely(tsk->compat_robust_list)) { > - exit_robust_list32(tsk); > + exit_robust_list32(tsk, tsk->compat_robust_list); > tsk->compat_robust_list = NULL; > } > #endif > + /* > + * Walk through the linked list, parsing robust lists and freeing the > + * allocated lists > + */ > + if (unlikely(!list_empty(list2))) { > + list_for_each_entry_safe(curr, n, list2, list) { > + if (curr->head != NULL) { > + if (curr->list_type == ROBUST_LIST_64BIT) > + exit_robust_list64(tsk, curr->head); > + else if (curr->list_type == ROBUST_LIST_32BIT) > + exit_robust_list32(tsk, curr->head); > + curr->head = NULL; > + } > + list_del_init(&curr->list); > + kfree(curr); > + } > + } > > if (unlikely(!list_empty(&tsk->pi_state_list))) > exit_pi_state_list(tsk); I'm still digesting this, but the above seems particularly silly. Should not the legacy lists also be on the list of lists? I mean, it makes no sense to have two completely separate means of tracking lists. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] futex: Create set_robust_list2 2024-11-04 11:22 ` Peter Zijlstra @ 2024-11-04 21:55 ` André Almeida 2024-11-05 12:10 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: André Almeida @ 2024-11-04 21:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor Hi Peter, Em 04/11/2024 08:22, Peter Zijlstra escreveu: > On Fri, Nov 01, 2024 at 01:21:46PM -0300, André Almeida wrote: >> @@ -1046,24 +1095,44 @@ static inline void exit_pi_state_list(struct task_struct *curr) { } >> >> static void futex_cleanup(struct task_struct *tsk) >> { >> + struct robust_list2_entry *curr, *n; >> + struct list_head *list2 = &tsk->robust_list2; >> + >> #ifdef CONFIG_64BIT >> if (unlikely(tsk->robust_list)) { >> - exit_robust_list64(tsk); >> + exit_robust_list64(tsk, tsk->robust_list); >> tsk->robust_list = NULL; >> } >> #else >> if (unlikely(tsk->robust_list)) { >> - exit_robust_list32(tsk); >> + exit_robust_list32(tsk, (struct robust_list_head32 *) tsk->robust_list); >> tsk->robust_list = NULL; >> } >> #endif >> >> #ifdef CONFIG_COMPAT >> if (unlikely(tsk->compat_robust_list)) { >> - exit_robust_list32(tsk); >> + exit_robust_list32(tsk, tsk->compat_robust_list); >> tsk->compat_robust_list = NULL; >> } >> #endif >> + /* >> + * Walk through the linked list, parsing robust lists and freeing the >> + * allocated lists >> + */ >> + if (unlikely(!list_empty(list2))) { >> + list_for_each_entry_safe(curr, n, list2, list) { >> + if (curr->head != NULL) { >> + if (curr->list_type == ROBUST_LIST_64BIT) >> + exit_robust_list64(tsk, curr->head); >> + else if (curr->list_type == ROBUST_LIST_32BIT) >> + exit_robust_list32(tsk, curr->head); >> + curr->head = NULL; >> + } >> + list_del_init(&curr->list); >> + kfree(curr); >> + } >> + } >> >> if (unlikely(!list_empty(&tsk->pi_state_list))) >> exit_pi_state_list(tsk); > > I'm still digesting this, but the above seems particularly silly. > > Should not the legacy lists also be on the list of lists? I mean, it > makes no sense to have two completely separate means of tracking lists. > You are asking if, whenever someone calls set_robust_list() or compat_set_robust_list() to be inserted into ¤t->robust_list2 instead of using tsk->robust_list and tsk->compat_robust_list? I was thinking of doing that, but my current implementation has a kmalloc() call for every insertion, and I wasn't sure if I could add this new latency to the old set_robust_list() syscall. Assuming it is usually called just once during the thread initialization perhaps it shouldn't cause much harm I guess. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/3] futex: Create set_robust_list2 2024-11-04 21:55 ` André Almeida @ 2024-11-05 12:10 ` Peter Zijlstra 0 siblings, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2024-11-05 12:10 UTC (permalink / raw) To: André Almeida Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor On Mon, Nov 04, 2024 at 06:55:45PM -0300, André Almeida wrote: > Hi Peter, > > Em 04/11/2024 08:22, Peter Zijlstra escreveu: > > On Fri, Nov 01, 2024 at 01:21:46PM -0300, André Almeida wrote: > > > @@ -1046,24 +1095,44 @@ static inline void exit_pi_state_list(struct task_struct *curr) { } > > > static void futex_cleanup(struct task_struct *tsk) > > > { > > > + struct robust_list2_entry *curr, *n; > > > + struct list_head *list2 = &tsk->robust_list2; > > > + > > > #ifdef CONFIG_64BIT > > > if (unlikely(tsk->robust_list)) { > > > - exit_robust_list64(tsk); > > > + exit_robust_list64(tsk, tsk->robust_list); > > > tsk->robust_list = NULL; > > > } > > > #else > > > if (unlikely(tsk->robust_list)) { > > > - exit_robust_list32(tsk); > > > + exit_robust_list32(tsk, (struct robust_list_head32 *) tsk->robust_list); > > > tsk->robust_list = NULL; > > > } > > > #endif > > > #ifdef CONFIG_COMPAT > > > if (unlikely(tsk->compat_robust_list)) { > > > - exit_robust_list32(tsk); > > > + exit_robust_list32(tsk, tsk->compat_robust_list); > > > tsk->compat_robust_list = NULL; > > > } > > > #endif > > > + /* > > > + * Walk through the linked list, parsing robust lists and freeing the > > > + * allocated lists > > > + */ > > > + if (unlikely(!list_empty(list2))) { > > > + list_for_each_entry_safe(curr, n, list2, list) { > > > + if (curr->head != NULL) { > > > + if (curr->list_type == ROBUST_LIST_64BIT) > > > + exit_robust_list64(tsk, curr->head); > > > + else if (curr->list_type == ROBUST_LIST_32BIT) > > > + exit_robust_list32(tsk, curr->head); > > > + curr->head = NULL; > > > + } > > > + list_del_init(&curr->list); > > > + kfree(curr); > > > + } > > > + } > > > if (unlikely(!list_empty(&tsk->pi_state_list))) > > > exit_pi_state_list(tsk); > > > > I'm still digesting this, but the above seems particularly silly. > > > > Should not the legacy lists also be on the list of lists? I mean, it > > makes no sense to have two completely separate means of tracking lists. > > > > You are asking if, whenever someone calls set_robust_list() or > compat_set_robust_list() to be inserted into ¤t->robust_list2 instead > of using tsk->robust_list and tsk->compat_robust_list? Yes, that. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall 2024-11-01 16:21 [PATCH v2 0/3] futex: Create set_robust_list2 André Almeida 2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida 2024-11-01 16:21 ` [PATCH v2 2/3] futex: Create set_robust_list2 André Almeida @ 2024-11-01 16:21 ` André Almeida 2024-11-02 5:13 ` kernel test robot 2024-11-02 6:05 ` kernel test robot 2024-11-02 21:58 ` [PATCH v2 0/3] futex: Create set_robust_list2 Florian Weimer 3 siblings, 2 replies; 18+ messages in thread From: André Almeida @ 2024-11-01 16:21 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Wire up the new set_robust_list2 syscall in all available architectures. Signed-off-by: André Almeida <andrealmeid@igalia.com> --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + kernel/sys_ni.c | 1 + scripts/syscall.tbl | 1 + 17 files changed, 17 insertions(+) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 74720667fe09..e7f9d9befdd5 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -502,3 +502,4 @@ 570 common lsm_set_self_attr sys_lsm_set_self_attr 571 common lsm_list_modules sys_lsm_list_modules 572 common mseal sys_mseal +573 common set_robust_list2 sys_robust_list2 diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 23c98203c40f..31070d427ea2 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -477,3 +477,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 22a3cbd4c602..9c5d1fa7ca54 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -462,3 +462,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 2b81a6bd78b2..c03933182b4d 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -468,3 +468,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 953f5b7dc723..8f12bcd55d26 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -401,3 +401,4 @@ 460 n32 lsm_set_self_attr sys_lsm_set_self_attr 461 n32 lsm_list_modules sys_lsm_list_modules 462 n32 mseal sys_mseal +463 n32 set_robust_list2 sys_set_robust_list2 diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index 1464c6be6eb3..5e500a32c980 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -377,3 +377,4 @@ 460 n64 lsm_set_self_attr sys_lsm_set_self_attr 461 n64 lsm_list_modules sys_lsm_list_modules 462 n64 mseal sys_mseal +463 n64 set_robust_list2 sys_set_robust_list2 diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 2439a2491cff..ea5be2805b3f 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -450,3 +450,4 @@ 460 o32 lsm_set_self_attr sys_lsm_set_self_attr 461 o32 lsm_list_modules sys_lsm_list_modules 462 o32 mseal sys_mseal +463 o32 set_robust_list2 sys_set_robust_list2 diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 66dc406b12e4..49adcdb392a2 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -461,3 +461,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index ebae8415dfbb..eff6641f35e6 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -553,3 +553,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 01071182763e..a2366aa6791e 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -465,3 +465,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 sys_set_robust_list2 diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index c55fd7696d40..e6d7e565b942 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -466,3 +466,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index cfdfb3707c16..176022f9a236 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -508,3 +508,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 534c74b14fab..8607563a5510 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -468,3 +468,4 @@ 460 i386 lsm_set_self_attr sys_lsm_set_self_attr 461 i386 lsm_list_modules sys_lsm_list_modules 462 i386 mseal sys_mseal +463 i386 set_robust_list2 sys_set_robust_list2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 7093ee21c0d1..fbc0cef1a97c 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -386,6 +386,7 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 67083fc1b2f5..9081b3bf8272 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index c00a86931f8c..71fbac6176c8 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -195,6 +195,7 @@ COND_SYSCALL(move_pages); COND_SYSCALL(set_mempolicy_home_node); COND_SYSCALL(cachestat); COND_SYSCALL(mseal); +COND_SYSCALL(set_robust_list2); COND_SYSCALL(perf_event_open); COND_SYSCALL(accept4); diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl index 845e24eb372e..e174f6e2d521 100644 --- a/scripts/syscall.tbl +++ b/scripts/syscall.tbl @@ -403,3 +403,4 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common set_robust_list2 sys_set_robust_list2 -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall 2024-11-01 16:21 ` [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall André Almeida @ 2024-11-02 5:13 ` kernel test robot 2024-11-02 6:05 ` kernel test robot 1 sibling, 0 replies; 18+ messages in thread From: kernel test robot @ 2024-11-02 5:13 UTC (permalink / raw) To: André Almeida, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: oe-kbuild-all, linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Hi André, kernel test robot noticed the following build errors: [auto build test ERROR on tip/locking/core] [also build test ERROR on tip/sched/core linus/master v6.12-rc5 next-20241101] [cannot apply to tip/x86/asm] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andr-Almeida/futex-Use-explicit-sizes-for-compat_exit_robust_list/20241102-002419 base: tip/locking/core patch link: https://lore.kernel.org/r/20241101162147.284993-4-andrealmeid%40igalia.com patch subject: [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall config: arc-allnoconfig (https://download.01.org/0day-ci/archive/20241102/202411021208.9UQz3ahR-lkp@intel.com/config) compiler: arc-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241102/202411021208.9UQz3ahR-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411021208.9UQz3ahR-lkp@intel.com/ All errors (new ones prefixed by >>): | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:455:1: note: in expansion of macro '__SYSCALL' 455 | __SYSCALL(454, sys_futex_wake) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:456:1: note: in expansion of macro '__SYSCALL' 456 | __SYSCALL(455, sys_futex_wait) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[455]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:456:1: note: in expansion of macro '__SYSCALL' 456 | __SYSCALL(455, sys_futex_wait) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:457:1: note: in expansion of macro '__SYSCALL' 457 | __SYSCALL(456, sys_futex_requeue) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[456]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:457:1: note: in expansion of macro '__SYSCALL' 457 | __SYSCALL(456, sys_futex_requeue) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:458:1: note: in expansion of macro '__SYSCALL' 458 | __SYSCALL(457, sys_statmount) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[457]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:458:1: note: in expansion of macro '__SYSCALL' 458 | __SYSCALL(457, sys_statmount) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:459:1: note: in expansion of macro '__SYSCALL' 459 | __SYSCALL(458, sys_listmount) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[458]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:459:1: note: in expansion of macro '__SYSCALL' 459 | __SYSCALL(458, sys_listmount) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:460:1: note: in expansion of macro '__SYSCALL' 460 | __SYSCALL(459, sys_lsm_get_self_attr) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[459]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:460:1: note: in expansion of macro '__SYSCALL' 460 | __SYSCALL(459, sys_lsm_get_self_attr) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:461:1: note: in expansion of macro '__SYSCALL' 461 | __SYSCALL(460, sys_lsm_set_self_attr) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[460]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:461:1: note: in expansion of macro '__SYSCALL' 461 | __SYSCALL(460, sys_lsm_set_self_attr) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:462:1: note: in expansion of macro '__SYSCALL' 462 | __SYSCALL(461, sys_lsm_list_modules) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[461]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:462:1: note: in expansion of macro '__SYSCALL' 462 | __SYSCALL(461, sys_lsm_list_modules) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: warning: initialized field overwritten [-Woverride-init] 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:463:1: note: in expansion of macro '__SYSCALL' 463 | __SYSCALL(462, sys_mseal) | ^~~~~~~~~ arch/arc/kernel/sys.c:13:36: note: (near initialization for 'sys_call_table[462]') 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^ ./arch/arc/include/generated/asm/syscall_table_32.h:463:1: note: in expansion of macro '__SYSCALL' 463 | __SYSCALL(462, sys_mseal) | ^~~~~~~~~ >> ./arch/arc/include/generated/asm/syscall_table_32.h:464:16: error: 'sys_set_robust_list2' undeclared here (not in a function); did you mean 'sys_set_robust_list'? 464 | __SYSCALL(463, sys_set_robust_list2) | ^~~~~~~~~~~~~~~~~~~~ arch/arc/kernel/sys.c:13:37: note: in definition of macro '__SYSCALL' 13 | #define __SYSCALL(nr, call) [nr] = (call), | ^~~~ -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall 2024-11-01 16:21 ` [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall André Almeida 2024-11-02 5:13 ` kernel test robot @ 2024-11-02 6:05 ` kernel test robot 1 sibling, 0 replies; 18+ messages in thread From: kernel test robot @ 2024-11-02 6:05 UTC (permalink / raw) To: André Almeida, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1 Cc: oe-kbuild-all, linux-kernel, kernel-dev, linux-api, Nathan Chancellor, André Almeida Hi André, kernel test robot noticed the following build errors: [auto build test ERROR on tip/locking/core] [also build test ERROR on tip/sched/core linus/master v6.12-rc5 next-20241101] [cannot apply to tip/x86/asm] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andr-Almeida/futex-Use-explicit-sizes-for-compat_exit_robust_list/20241102-002419 base: tip/locking/core patch link: https://lore.kernel.org/r/20241101162147.284993-4-andrealmeid%40igalia.com patch subject: [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall config: csky-allnoconfig (https://download.01.org/0day-ci/archive/20241102/202411021323.fazJ8GOs-lkp@intel.com/config) compiler: csky-linux-gcc (GCC) 14.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241102/202411021323.fazJ8GOs-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411021323.fazJ8GOs-lkp@intel.com/ All errors (new ones prefixed by >>): | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:455:1: note: in expansion of macro '__SYSCALL' 455 | __SYSCALL(454, sys_futex_wake) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:456:1: note: in expansion of macro '__SYSCALL' 456 | __SYSCALL(455, sys_futex_wait) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[455]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:456:1: note: in expansion of macro '__SYSCALL' 456 | __SYSCALL(455, sys_futex_wait) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:457:1: note: in expansion of macro '__SYSCALL' 457 | __SYSCALL(456, sys_futex_requeue) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[456]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:457:1: note: in expansion of macro '__SYSCALL' 457 | __SYSCALL(456, sys_futex_requeue) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:458:1: note: in expansion of macro '__SYSCALL' 458 | __SYSCALL(457, sys_statmount) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[457]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:458:1: note: in expansion of macro '__SYSCALL' 458 | __SYSCALL(457, sys_statmount) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:459:1: note: in expansion of macro '__SYSCALL' 459 | __SYSCALL(458, sys_listmount) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[458]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:459:1: note: in expansion of macro '__SYSCALL' 459 | __SYSCALL(458, sys_listmount) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:460:1: note: in expansion of macro '__SYSCALL' 460 | __SYSCALL(459, sys_lsm_get_self_attr) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[459]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:460:1: note: in expansion of macro '__SYSCALL' 460 | __SYSCALL(459, sys_lsm_get_self_attr) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:461:1: note: in expansion of macro '__SYSCALL' 461 | __SYSCALL(460, sys_lsm_set_self_attr) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[460]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:461:1: note: in expansion of macro '__SYSCALL' 461 | __SYSCALL(460, sys_lsm_set_self_attr) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:462:1: note: in expansion of macro '__SYSCALL' 462 | __SYSCALL(461, sys_lsm_list_modules) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[461]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:462:1: note: in expansion of macro '__SYSCALL' 462 | __SYSCALL(461, sys_lsm_list_modules) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: warning: initialized field overwritten [-Woverride-init] 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:463:1: note: in expansion of macro '__SYSCALL' 463 | __SYSCALL(462, sys_mseal) | ^~~~~~~~~ arch/csky/kernel/syscall_table.c:8:35: note: (near initialization for 'sys_call_table[462]') 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^ ./arch/csky/include/generated/asm/syscall_table_32.h:463:1: note: in expansion of macro '__SYSCALL' 463 | __SYSCALL(462, sys_mseal) | ^~~~~~~~~ >> ./arch/csky/include/generated/asm/syscall_table_32.h:464:16: error: 'sys_set_robust_list2' undeclared here (not in a function); did you mean 'sys_set_robust_list'? 464 | __SYSCALL(463, sys_set_robust_list2) | ^~~~~~~~~~~~~~~~~~~~ arch/csky/kernel/syscall_table.c:8:36: note: in definition of macro '__SYSCALL' 8 | #define __SYSCALL(nr, call)[nr] = (call), | ^~~~ -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] futex: Create set_robust_list2 2024-11-01 16:21 [PATCH v2 0/3] futex: Create set_robust_list2 André Almeida ` (2 preceding siblings ...) 2024-11-01 16:21 ` [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall André Almeida @ 2024-11-02 21:58 ` Florian Weimer 2024-11-04 11:32 ` Peter Zijlstra 2024-11-04 21:49 ` André Almeida 3 siblings, 2 replies; 18+ messages in thread From: Florian Weimer @ 2024-11-02 21:58 UTC (permalink / raw) To: André Almeida Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor * André Almeida: > 1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel > this is not a problem, because of the compat entry point. But there's > no such compat entry point for AArch64, so the kernel would do the > pointer arithmetic wrongly. Is also unviable to userspace to keep > track every addition/removal to the robust list and keep a 64bit > version of it somewhere else to feed the kernel. Thus, the new > interface has an option of telling the kernel if the list is filled > with 32bit or 64bit pointers. The size is typically different for 32-bit and 64-bit mode (12 vs 24 bytes). Why isn't this enough to disambiguate? > 2) Apps can set just one robust list (in theory, x86-64 can set two if > they also use the compat entry point). That means that when a x86 app > asks FEX-Emu to call set_robust_list(), FEX have two options: to > overwrite their own robust list pointer and make the app robust, or > to ignore the app robust list and keep the emulator robust. The new > interface allows for multiple robust lists per application, solving > this. Can't you avoid mixing emulated and general userspace code on the same thread? On emulator threads, you have full control over the TCB. QEMU hints towards further problems (in linux-user/syscall.c): case TARGET_NR_set_robust_list: case TARGET_NR_get_robust_list: /* The ABI for supporting robust futexes has userspace pass * the kernel a pointer to a linked list which is updated by * userspace after the syscall; the list is walked by the kernel * when the thread exits. Since the linked list in QEMU guest * memory isn't a valid linked list for the host and we have * no way to reliably intercept the thread-death event, we can't * support these. Silently return ENOSYS so that guest userspace * falls back to a non-robust futex implementation (which should * be OK except in the corner case of the guest crashing while * holding a mutex that is shared with another process via * shared memory). */ return -TARGET_ENOSYS; The glibc implementation is not really prepared for this (__ASSUME_SET_ROBUST_LIST is defined for must architectures). But a couple of years ago, we had a bunch of kernels that regressed robust list support on POWER, and I think we found out only when we tested an unrelated glibc update and saw unexpected glibc test suite failures … Thanks, Florian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] futex: Create set_robust_list2 2024-11-02 21:58 ` [PATCH v2 0/3] futex: Create set_robust_list2 Florian Weimer @ 2024-11-04 11:32 ` Peter Zijlstra 2024-11-04 11:56 ` Peter Zijlstra 2024-11-04 12:36 ` Florian Weimer 2024-11-04 21:49 ` André Almeida 1 sibling, 2 replies; 18+ messages in thread From: Peter Zijlstra @ 2024-11-04 11:32 UTC (permalink / raw) To: Florian Weimer Cc: André Almeida, Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor On Sat, Nov 02, 2024 at 10:58:42PM +0100, Florian Weimer wrote: > QEMU hints towards further problems (in linux-user/syscall.c): > > case TARGET_NR_set_robust_list: > case TARGET_NR_get_robust_list: > /* The ABI for supporting robust futexes has userspace pass > * the kernel a pointer to a linked list which is updated by > * userspace after the syscall; the list is walked by the kernel > * when the thread exits. Since the linked list in QEMU guest > * memory isn't a valid linked list for the host and we have > * no way to reliably intercept the thread-death event, we can't > * support these. Silently return ENOSYS so that guest userspace > * falls back to a non-robust futex implementation (which should > * be OK except in the corner case of the guest crashing while > * holding a mutex that is shared with another process via > * shared memory). > */ > return -TARGET_ENOSYS; I don't think we can sanely fix that. Can't QEMU track the robust thing itself and use waitpid() to discover the thread is gone and fudge things from there? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] futex: Create set_robust_list2 2024-11-04 11:32 ` Peter Zijlstra @ 2024-11-04 11:56 ` Peter Zijlstra 2024-11-04 12:36 ` Florian Weimer 1 sibling, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2024-11-04 11:56 UTC (permalink / raw) To: Florian Weimer Cc: André Almeida, Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor On Mon, Nov 04, 2024 at 12:32:40PM +0100, Peter Zijlstra wrote: > On Sat, Nov 02, 2024 at 10:58:42PM +0100, Florian Weimer wrote: > > > QEMU hints towards further problems (in linux-user/syscall.c): > > > > case TARGET_NR_set_robust_list: > > case TARGET_NR_get_robust_list: > > /* The ABI for supporting robust futexes has userspace pass > > * the kernel a pointer to a linked list which is updated by > > * userspace after the syscall; the list is walked by the kernel > > * when the thread exits. Since the linked list in QEMU guest > > * memory isn't a valid linked list for the host and we have > > * no way to reliably intercept the thread-death event, we can't > > * support these. Silently return ENOSYS so that guest userspace > > * falls back to a non-robust futex implementation (which should > > * be OK except in the corner case of the guest crashing while > > * holding a mutex that is shared with another process via > > * shared memory). > > */ > > return -TARGET_ENOSYS; > > I don't think we can sanely fix that. Can't QEMU track the robust thing > itself and use waitpid() to discover the thread is gone and fudge things > from there? Hmm, what about we mandate 'natural' alignement of the structure such that it is always inside a single page, then QEMU can do the translation here and hand the kernel the 'real' address. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] futex: Create set_robust_list2 2024-11-04 11:32 ` Peter Zijlstra 2024-11-04 11:56 ` Peter Zijlstra @ 2024-11-04 12:36 ` Florian Weimer 2024-11-05 12:18 ` Peter Zijlstra 1 sibling, 1 reply; 18+ messages in thread From: Florian Weimer @ 2024-11-04 12:36 UTC (permalink / raw) To: Peter Zijlstra Cc: André Almeida, Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor * Peter Zijlstra: > On Sat, Nov 02, 2024 at 10:58:42PM +0100, Florian Weimer wrote: > >> QEMU hints towards further problems (in linux-user/syscall.c): >> >> case TARGET_NR_set_robust_list: >> case TARGET_NR_get_robust_list: >> /* The ABI for supporting robust futexes has userspace pass >> * the kernel a pointer to a linked list which is updated by >> * userspace after the syscall; the list is walked by the kernel >> * when the thread exits. Since the linked list in QEMU guest >> * memory isn't a valid linked list for the host and we have >> * no way to reliably intercept the thread-death event, we can't >> * support these. Silently return ENOSYS so that guest userspace >> * falls back to a non-robust futex implementation (which should >> * be OK except in the corner case of the guest crashing while >> * holding a mutex that is shared with another process via >> * shared memory). >> */ >> return -TARGET_ENOSYS; > > I don't think we can sanely fix that. Can't QEMU track the robust thing > itself and use waitpid() to discover the thread is gone and fudge things > from there? There are race conditions with munmap, I think, and they probably get a lot of worse if QEMU does that. See Rich Felker's bug report: | The corruption is performed by the kernel when it walks the robust | list. The basic situation is the same as in PR #13690, except that | here there's actually a potential write to the memory rather than just | a read. | | The sequence of events leading to corruption goes like this: | | 1. Thread A unlocks the process-shared, robust mutex and is preempted | after the mutex is removed from the robust list and atomically | unlocked, but before it's removed from the list_op_pending field of | the robust list header. | | 2. Thread B locks the mutex, and, knowing by program logic that it's | the last user of the mutex, unlocks and unmaps it, allocates/maps | something else that gets assigned the same address as the shared mutex | mapping, and then exits. | | 3. The kernel destroys the process, which involves walking each | thread's robust list and processing each thread's list_op_pending | field of the robust list header. Since thread A has a list_op_pending | pointing at the address previously occupied by the mutex, the kernel | obliviously "unlocks the mutex" by writing a 0 to the address and | futex-waking it. However, the kernel has instead overwritten part of | whatever mapping thread A created. If this is private memory it | (probably) doesn't matter since the process is ending anyway (but are | there race conditions where this can be seen?). If this is shared | memory or a shared file mapping, however, the kernel corrupts it. | | I suspect the race is difficult to hit since thread A has to get | preempted at exactly the wrong time AND thread B has to do a fair | amount of work without thread A getting scheduled again. So I'm not | sure how much luck we'd have getting a test case. <https://sourceware.org/bugzilla/show_bug.cgi?id=14485#c3> We also have a silent unlocking failure because userspace does not know about ROBUST_LIST_LIMIT: Bug 19089 - Robust mutexes do not take ROBUST_LIST_LIMIT into account <https://sourceware.org/bugzilla/show_bug.cgi?id=19089> (I think we may have discussed this one before, and you may have suggested to just hard-code 2048 in userspace because the constant is not expected to change.) So the in-mutex linked list has quite a few problems even outside of emulation. 8-( Thanks, Florian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] futex: Create set_robust_list2 2024-11-04 12:36 ` Florian Weimer @ 2024-11-05 12:18 ` Peter Zijlstra 0 siblings, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2024-11-05 12:18 UTC (permalink / raw) To: Florian Weimer Cc: André Almeida, Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor, Mathieu Desnoyers On Mon, Nov 04, 2024 at 01:36:43PM +0100, Florian Weimer wrote: > * Peter Zijlstra: > > > On Sat, Nov 02, 2024 at 10:58:42PM +0100, Florian Weimer wrote: > > > >> QEMU hints towards further problems (in linux-user/syscall.c): > >> > >> case TARGET_NR_set_robust_list: > >> case TARGET_NR_get_robust_list: > >> /* The ABI for supporting robust futexes has userspace pass > >> * the kernel a pointer to a linked list which is updated by > >> * userspace after the syscall; the list is walked by the kernel > >> * when the thread exits. Since the linked list in QEMU guest > >> * memory isn't a valid linked list for the host and we have > >> * no way to reliably intercept the thread-death event, we can't > >> * support these. Silently return ENOSYS so that guest userspace > >> * falls back to a non-robust futex implementation (which should > >> * be OK except in the corner case of the guest crashing while > >> * holding a mutex that is shared with another process via > >> * shared memory). > >> */ > >> return -TARGET_ENOSYS; > > > > I don't think we can sanely fix that. Can't QEMU track the robust thing > > itself and use waitpid() to discover the thread is gone and fudge things > > from there? > > There are race conditions with munmap, I think, and they probably get a > lot of worse if QEMU does that. > > See Rich Felker's bug report: > > | The corruption is performed by the kernel when it walks the robust > | list. The basic situation is the same as in PR #13690, except that > | here there's actually a potential write to the memory rather than just > | a read. > | > | The sequence of events leading to corruption goes like this: > | > | 1. Thread A unlocks the process-shared, robust mutex and is preempted > | after the mutex is removed from the robust list and atomically > | unlocked, but before it's removed from the list_op_pending field of > | the robust list header. > | > | 2. Thread B locks the mutex, and, knowing by program logic that it's > | the last user of the mutex, unlocks and unmaps it, allocates/maps > | something else that gets assigned the same address as the shared mutex > | mapping, and then exits. > | > | 3. The kernel destroys the process, which involves walking each > | thread's robust list and processing each thread's list_op_pending > | field of the robust list header. Since thread A has a list_op_pending > | pointing at the address previously occupied by the mutex, the kernel > | obliviously "unlocks the mutex" by writing a 0 to the address and > | futex-waking it. However, the kernel has instead overwritten part of > | whatever mapping thread A created. If this is private memory it > | (probably) doesn't matter since the process is ending anyway (but are > | there race conditions where this can be seen?). If this is shared > | memory or a shared file mapping, however, the kernel corrupts it. > | > | I suspect the race is difficult to hit since thread A has to get > | preempted at exactly the wrong time AND thread B has to do a fair > | amount of work without thread A getting scheduled again. So I'm not > | sure how much luck we'd have getting a test case. > > > <https://sourceware.org/bugzilla/show_bug.cgi?id=14485#c3> So I've only managed to conjure up two horrible solutions for this: - put the robust futex operations under user-space RCU, and mandate a matching synchronize_rcu() before any munmap() calls. - add a robust-barrier syscall that waits until all list_op_pending are either NULL or changed since invocation. And mandate this call before munmap(). Neither are particularly pretty I admit, but at least they should work. But doing this and mandating the alignment thing should at least make this qemu thing workable, no? > We also have a silent unlocking failure because userspace does not know > about ROBUST_LIST_LIMIT: > > Bug 19089 - Robust mutexes do not take ROBUST_LIST_LIMIT into account > <https://sourceware.org/bugzilla/show_bug.cgi?id=19089> > > (I think we may have discussed this one before, and you may have > suggested to just hard-code 2048 in userspace because the constant is > not expected to change.) > > So the in-mutex linked list has quite a few problems even outside of > emulation. 8-( It's futex, ofcourse its a pain in the arse :-) And yeah, no better ideas on that limit for now... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] futex: Create set_robust_list2 2024-11-02 21:58 ` [PATCH v2 0/3] futex: Create set_robust_list2 Florian Weimer 2024-11-04 11:32 ` Peter Zijlstra @ 2024-11-04 21:49 ` André Almeida 1 sibling, 0 replies; 18+ messages in thread From: André Almeida @ 2024-11-04 21:49 UTC (permalink / raw) To: Florian Weimer Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart, Davidlohr Bueso, Arnd Bergmann, sonicadvance1, linux-kernel, kernel-dev, linux-api, Nathan Chancellor Hi Florian, Em 02/11/2024 18:58, Florian Weimer escreveu: > * André Almeida: > >> 1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel >> this is not a problem, because of the compat entry point. But there's >> no such compat entry point for AArch64, so the kernel would do the >> pointer arithmetic wrongly. Is also unviable to userspace to keep >> track every addition/removal to the robust list and keep a 64bit >> version of it somewhere else to feed the kernel. Thus, the new >> interface has an option of telling the kernel if the list is filled >> with 32bit or 64bit pointers. > > The size is typically different for 32-bit and 64-bit mode (12 vs 24 > bytes). Why isn't this enough to disambiguate? > Right, so the idea would be to use `size_t len` from the syscall arguments for that? >> 2) Apps can set just one robust list (in theory, x86-64 can set two if >> they also use the compat entry point). That means that when a x86 app >> asks FEX-Emu to call set_robust_list(), FEX have two options: to >> overwrite their own robust list pointer and make the app robust, or >> to ignore the app robust list and keep the emulator robust. The new >> interface allows for multiple robust lists per application, solving >> this. > > Can't you avoid mixing emulated and general userspace code on the same > thread? On emulator threads, you have full control over the TCB. > FEX can't avoid that because it doesn't do a full system emulation, it just does instructions translation. FEX doesn't have full control over the TCB, that's still all glibc, or whatever other dynamic linker is used. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-11-05 12:18 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-01 16:21 [PATCH v2 0/3] futex: Create set_robust_list2 André Almeida 2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida 2024-11-02 5:44 ` kernel test robot 2024-11-02 14:57 ` kernel test robot 2024-11-01 16:21 ` [PATCH v2 2/3] futex: Create set_robust_list2 André Almeida 2024-11-02 16:40 ` kernel test robot 2024-11-04 11:22 ` Peter Zijlstra 2024-11-04 21:55 ` André Almeida 2024-11-05 12:10 ` Peter Zijlstra 2024-11-01 16:21 ` [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall André Almeida 2024-11-02 5:13 ` kernel test robot 2024-11-02 6:05 ` kernel test robot 2024-11-02 21:58 ` [PATCH v2 0/3] futex: Create set_robust_list2 Florian Weimer 2024-11-04 11:32 ` Peter Zijlstra 2024-11-04 11:56 ` Peter Zijlstra 2024-11-04 12:36 ` Florian Weimer 2024-11-05 12:18 ` Peter Zijlstra 2024-11-04 21:49 ` André Almeida
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).