From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAF61330B32 for ; Tue, 17 Mar 2026 22:32:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773786743; cv=none; b=tN0XUB0iWc1ai7SDHzK3GUiKUtqWzU9ZSiTVOH/+kl9d+RPWoHTPID9dgjxNh0cTKV+iFsMXpwfuEmp7Yvg49Y7bT6iqno5UQSURPcPPp5WIlrCFEajK5PrUqfVamf+5Su2TXizuj/ZtLKoIoVg3pUAPVja6ri1X9GTBSFC3IS4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773786743; c=relaxed/simple; bh=9WtAHt4Dkb3tv1WhzeQ/wXETuSIWN+zRYHjYjD2tq6I=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=o5NYqPYCYTmUmGufQAoFwOZk1pEazfCd87gJ0ieJ1LcHEUXLPHI3tI1PZkrw7XiwQ6e15qsAkzajjpdgTIsHhhUoF8LCsr8Dal8HDL+pBpe2q8jsmf3G9hVtW3A6AauAUz7ntCzw/X9RczfiZrq2SadjhL4BTnIhmYU8j9vCI88= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oHBvsGms; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oHBvsGms" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EF9A6C4CEF7; Tue, 17 Mar 2026 22:32:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773786742; bh=9WtAHt4Dkb3tv1WhzeQ/wXETuSIWN+zRYHjYjD2tq6I=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=oHBvsGmsMrf4ehR62JpMKEdTgy8QAgPGE8ozbl16Qx5BU+0Avqju8ZdI9QpY9nuZ0 BVpTkLeJtmKi0ws5ewEaLLKmNImrrX0mNy6hPWFzajySrwVJRRml5zTo6X0dpZtxSp QWKtkzFCnRVMqkaAlW3R1E6/S1sRy19Hd6JTkaCAW0wtqi2e39ovwO2igVqH6ud1ri xps+h9IeCD+dny6ws9cuLRk7sxgm08ufbaYf38cwqdcU1Wpp+Sv9iLHwaOBd2GN4+z kDLOHDw/a3CrnbFEM/89la4qCVzR8WoENR2zoZW48wlqwMTTolXpaBMVYpPg95vbae grSInQRSfkaCw== From: Thomas Gleixner To: Florian Weimer Cc: LKML , Mathieu Desnoyers , =?utf-8?Q?Andr=C3=A9?= Almeida , Sebastian Andrzej Siewior , Carlos O'Donell , Peter Zijlstra , Rich Felker , Torvald Riegel , Darren Hart , Ingo Molnar , Davidlohr Bueso , Arnd Bergmann , "Liam R . Howlett" Subject: Re: [patch 8/8] x86/vdso: Implement __vdso_futex_robust_try_unlock() In-Reply-To: References: <20260316162316.356674433@kernel.org> <20260316164951.484640267@kernel.org> <87ecliokzz.ffs@tglx> Date: Tue, 17 Mar 2026 23:32:18 +0100 Message-ID: <87zf46m6j1.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Tue, Mar 17 2026 at 11:37, Florian Weimer wrote: > * Thomas Gleixner: >> Right, I know that no libc implementation supports such an insanity, but >> the kernel unfortunately allows to do so and it's used in the wild :( >> >> So we have to deal with it somehow and the size modifier was the most >> straight forward solution I could come up with. I'm all ears if someone >> has a better idea. > > Maybe a separate futex op? And the vDSO would have the futex call, > mangle uaddr2 as required for the shared code section that handles both > ops? > > As far as I can tell at this point, the current proposal should work. > We'd probably start with using the syscall-based unlock. Something like the below compiled but untested delta diff which includes also the other unrelated feedback fixups? Thanks, tglx --- diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c index 19d8ef130b63..491ed141622d 100644 --- a/arch/x86/entry/vdso/common/vfutex.c +++ b/arch/x86/entry/vdso/common/vfutex.c @@ -1,72 +1,218 @@ // SPDX-License-Identifier: GPL-2.0-only #include +/* + * Assembly template for the try unlock functions. The basic functionality for + * 64-bit is: + * + * At the call site: + * mov &lock, %rdi Store the lock pointer in RDI + * mov &pop, %rdx Store the pending op pointer in RDX + * mov TID, %esi Store the thread's TID in ESI + * + * 64-bit unlock function: + * mov esi, %eax Move the TID into EAX + * xor %ecx, %ecx Clear ECX + * lock_cmpxchgl %ecx, (%rdi) Attempt the TID -> 0 transition + * .Lcs_start: Start of the critical section + * jnz .Lcs_end If cmpxchl failed jump to the end + * .Lcs_success: Start of the success section + * movq $0, (%rdx) Set the pending op pointer to 0 + * .Lcs_end: End of the critical section + * + * For COMPAT enabled 64-bit kernels this is a bit more complex because the size + * of the @pop pointer has to be determined in the success section: + * + * At the 64-bit call site: + * mov &lock, %rdi Store the lock pointer in RDI + * mov &pop, %rdx Store the pending op pointer in RDX + * mov TID, %esi Store the thread's TID in ESI + * + * At the 32-bit call site: + * mov &lock, %edi Store the lock pointer in EDI + * mov &pop, %edx Store the pending op pointer in EDX + * mov TID, %esi Store the thread's TID in ESI + * + * The 32-bit entry point: + * or $0x1, %edx Mark the op pointer 32-bit + * + * Common unlock function: + * mov esi, %eax Move the TID into EAX + * xor %ecx, %ecx Clear ECX + * mov %rdx, %rsi Store the op pointer in RSI + * and ~0x1, %rsi Clear the size bit in RSI + * lock_cmpxchgl %ecx, (%rdi) Attempt the TID -> 0 transition + * .Lcs_start: Start of the critical section + * jnz .Lcs_end If cmpxchl failed jump to the end + * .Lcs_success: Start of the success section + * test $0x1, %rdx Test the 32-bit size bit in the original pointer + * jz .Lop64 If not set, clear 64-bit + * movl $0, (%rsi) Set the 32-bit pending op pointer to 0 + * jmp .Lcs_end Leave the critical section + * .Lop64: movq $0, (%rsi) Set the 64-bit pending op pointer to 0 + * .Lcs_end: End of the critical section + * + * The 32-bit VDSO needs to set the 32-bit size bit as well to keep the code + * compatible for the kernel side fixup function, but it does not require the + * size evaluation in the success path. + * + * At the 32-bit call site: + * mov &lock, %edi Store the lock pointer in EDI + * mov &pop, %edx Store the pending op pointer in EDX + * mov TID, %esi Store the thread's TID in ESI + * + * The 32-bit entry point does: + * or $0x1, %edx Mark the op pointer 32-bit + * + * 32-bit unlock function: + * mov esi, %eax Move the TID into EAX + * xor %ecx, %ecx Clear ECX + * mov %edx, %esi Store the op pointer in ESI + * and ~0x1, %esi Clear the size bit in ESI + * lock_cmpxchgl %ecx, (%edi) Attempt the TID -> 0 transition + * .Lcs_start: Start of the critical section + * jnz .Lcs_end If cmpxchl failed jump to the end + * .Lcs_success: Start of the success section + * movl $0, (%esi) Set the 32-bit pending op pointer to 0 + * .Lcs_end: End of the critical section + * + * The pointer modification makes sure that the unlock function can determine + * the pending op pointer size correctly and clear either 32 or 64 bit. + * + * The intermediate storage of the unmangled pointer (bit 0 cleared) in [ER]SI + * makes sure that the store hits the right address. + * + * The mangled pointer (bit 0 set for 32-bit) stays in [ER]DX so that the kernel + * side fixup function can determine the storage size correctly and always + * retrieve regs->rdx without any extra knowledge of the actual code path taken + * or checking the compat mode of the task. + * + * The .Lcs_success label is technically not required for a pure 64-bit and the + * 32-bit VDSO but is kept there for simplicity. In those cases the ZF flag in + * regs->eflags is authoritative for the whole critical section and no further + * evaluation is required. + * + * In the 64-bit compat case the .Lcs_success label is required because the + * pointer size check modifies the ZF flag, which means it is only valid for the + * case where .Lcs_start <= regs->ip < L.cs_success, which is obviously the + * same as l.cs_start == regs->ip for x86. + * + * That's still a valuable distinction for clarity to keep the ASM template the + * same for all case. This is also a template for other architectures which + * might have different requirements even for the non COMPAT case. + * + * That means in the 64-bit compat case the decision to do the fixup is: + * + * if (regs->ip >= .Lcs_start && regs->ip < L.cs_success) + * return (regs->eflags & ZF); + * return regs->ip < .Lcs_end; + * + * As the initial critical section check in the return to user space code + * already established that: + * + * .Lcs_start <= regs->ip < L.cs_end + * + * that decision can be simplified to: + * + * return regs->ip >= L.cs_success || regs->eflags & ZF; + * + */ +#define robust_try_unlock_asm(__tid, __lock, __pop) \ + asm volatile ( \ + ".global __kernel_futex_robust_try_unlock_cs_start \n" \ + ".global __kernel_futex_robust_try_unlock_cs_success \n" \ + ".global __kernel_futex_robust_try_unlock_cs_end \n" \ + " \n" \ + " lock cmpxchgl %[val], (%[ptr]) \n" \ + " \n" \ + "__kernel_futex_robust_try_unlock_cs_start: \n" \ + " \n" \ + " jnz __kernel_futex_robust_try_unlock_cs_end \n" \ + " \n" \ + "__kernel_futex_robust_try_unlock_cs_success: \n" \ + " \n" \ + ASM_CLEAR_PTR \ + " \n" \ + "__kernel_futex_robust_try_unlock_cs_end: \n" \ + : [tid] "+a" (__tid) \ + : [ptr] "D" (__lock), \ + [pop] "d" (__pop), \ + [val] "r" (0) \ + ASM_PAD_CONSTRAINT(__pop) \ + : "memory" \ + ) + /* * Compat enabled kernels have to take the size bit into account to support the * mixed size use case of gaming emulators. Contrary to the kernel robust unlock * mechanism all of this does not test for the 32-bit modifier in 32-bit VDSOs * and in compat disabled kernels. User space can keep the pieces. */ -#if defined(CONFIG_X86_64) && !defined(BUILD_VDSO32_64) - +#ifdef __x86_64__ #ifdef CONFIG_COMPAT # define ASM_CLEAR_PTR \ " testl $1, (%[pop]) \n" \ " jz .Lop64 \n" \ " movl $0, (%[pad]) \n" \ - " jmp __vdso_futex_robust_try_unlock_cs_end \n" \ + " jmp __kernel_futex_robust_try_unlock_cs_end \n" \ ".Lop64: \n" \ " movq $0, (%[pad]) \n" -# define ASM_PAD_CONSTRAINT ,[pad] "S" (((unsigned long)pop) & ~0x1UL) +# define ASM_PAD_CONSTRAINT(__pop) ,[pad] "S" (((unsigned long)__pop) & ~0x1UL) + +__u32 noinline __vdso_futex_robust_try_unlock_64(__u32 *lock, __u32 tid, __u64 *pop) +{ + robust_try_unlock_asm(lock, tid, pop); + return tid; +} + +__u32 noinline __vdso_futex_robust_try_unlock_32(__u32 *lock, __u32 tid, __u32 *pop) +{ + __u64 pop_addr = ((u64) pop) | FUTEX_ROBUST_UNLOCK_MOD_32BIT; + + return __vdso_futex_robust_try_unlock_64(lock, tid, (__u64 *)pop_addr); +} + +__u32 futex_robust_try_unlock_64(__u32 *, __u32, __u64 *) + __attribute__((weak, alias("__vdso_futex_robust_try_unlock_64"))); + +__u32 futex_robust_try_unlock_32(__u32 *, __u32, __u32 *) + __attribute__((weak, alias("__vdso_futex_robust_try_unlock_32"))); #else /* CONFIG_COMPAT */ # define ASM_CLEAR_PTR \ " movq $0, (%[pop]) \n" -# define ASM_PAD_CONSTRAINT +# define ASM_PAD_CONSTRAINT(__pop) + +__u32 noinline __vdso_futex_robust_try_unlock_64(__u32 *lock, __u32 tid, __u64 *pop) +{ + robust_try_unlock_asm(lock, tid, pop); + return tid; +} + +__u32 futex_robust_try_unlock_64(__u32 *, __u32, __u64 *) + __attribute__((weak, alias("__vdso_futex_robust_try_unlock_64"))); #endif /* !CONFIG_COMPAT */ -#else /* CONFIG_X86_64 && !BUILD_VDSO32_64 */ +#else /* __x86_64__ */ # define ASM_CLEAR_PTR \ " movl $0, (%[pad]) \n" -# define ASM_PAD_CONSTRAINT ,[pad] "S" (((unsigned long)pop) & ~0x1UL) - -#endif /* !CONFIG_X86_64 || BUILD_VDSO32_64 */ +# define ASM_PAD_CONSTRAINT(__pop) ,[pad] "S" (((unsigned long)__pop) & ~0x1UL) -uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *pop) +__u32 noinline __vdso_futex_robust_try_unlock_32(__u32 *lock, __u32 tid, __u32 *pop) { - asm volatile ( - ".global __vdso_futex_robust_try_unlock_cs_start \n" - ".global __vdso_futex_robust_try_unlock_cs_success \n" - ".global __vdso_futex_robust_try_unlock_cs_end \n" - " \n" - " lock cmpxchgl %[val], (%[ptr]) \n" - " \n" - "__vdso_futex_robust_try_unlock_cs_start: \n" - " \n" - " jnz __vdso_futex_robust_try_unlock_cs_end \n" - " \n" - "__vdso_futex_robust_try_unlock_cs_success: \n" - " \n" - ASM_CLEAR_PTR - " \n" - "__vdso_futex_robust_try_unlock_cs_end: \n" - : [tid] "+a" (tid) - : [ptr] "D" (lock), - [pop] "d" (pop), - [val] "r" (0) - ASM_PAD_CONSTRAINT - : "memory" - ); + __u32 pop_addr = ((u32) pop) | FUTEX_ROBUST_UNLOCK_MOD_32BIT; + robust_try_unlock_asm(lock, tid, (__u32 *)pop_addr); return tid; } -uint32_t futex_robust_try_unlock(uint32_t *, uint32_t, void **) - __attribute__((weak, alias("__vdso_futex_robust_try_unlock"))); +__u32 futex_robust_try_unlock_32(__u32 *, __u32, __u32 *) + __attribute__((weak, alias("__vdso_futex_robust_try_unlock_32"))); +#endif /* !__x86_64__ */ diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S index b027d2f98bd0..cb7b8de8009c 100644 --- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S +++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S @@ -31,10 +31,10 @@ VERSION __vdso_clock_getres_time64; __vdso_getcpu; #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - __vdso_futex_robust_try_unlock; - __vdso_futex_robust_try_unlock_cs_start; - __vdso_futex_robust_try_unlock_cs_success; - __vdso_futex_robust_try_unlock_cs_end; + __vdso_futex_robust_try_unlock_32; + __kernel_futex_robust_try_unlock_cs_start; + __kernel_futex_robust_try_unlock_cs_success; + __kernel_futex_robust_try_unlock_cs_end; #endif }; diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/vdso64/vdso64.lds.S index e5c0ca9664e1..6dd36ae2ab79 100644 --- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S +++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S @@ -33,10 +33,11 @@ VERSION { getrandom; __vdso_getrandom; #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - __vdso_futex_robust_try_unlock; - __vdso_futex_robust_try_unlock_cs_start; - __vdso_futex_robust_try_unlock_cs_success; - __vdso_futex_robust_try_unlock_cs_end; + __vdso_futex_robust_try_unlock_64; + __vdso_futex_robust_try_unlock_32; + __kernel_futex_robust_try_unlock_cs_start; + __kernel_futex_robust_try_unlock_cs_success; + __kernel_futex_robust_try_unlock_cs_end; #endif local: *; }; diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S index 4409d97e7ef6..a456f184c937 100644 --- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S +++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S @@ -23,7 +23,8 @@ VERSION { __vdso_time; __vdso_clock_getres; #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - __vdso_futex_robust_try_unlock; + __vdso_futex_robust_try_unlock_64; + __vdso_futex_robust_try_unlock_32; __vdso_futex_robust_try_unlock_cs_start; __vdso_futex_robust_try_unlock_cs_success; __vdso_futex_robust_try_unlock_cs_end; diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h index 223f469789c5..a96293050bf4 100644 --- a/include/linux/futex_types.h +++ b/include/linux/futex_types.h @@ -11,13 +11,15 @@ struct futex_pi_state; struct robust_list_head; /** - * struct futex_ctrl - Futex related per task data + * struct futex_sched_data - Futex related per task data * @robust_list: User space registered robust list pointer * @compat_robust_list: User space registered robust list pointer for compat tasks + * @pi_state_list: List head for Priority Inheritance (PI) state management + * @pi_state_cache: Pointer to cache one PI state object per task * @exit_mutex: Mutex for serializing exit * @state: Futex handling state to handle exit races correctly */ -struct futex_ctrl { +struct futex_sched_data { struct robust_list_head __user *robust_list; #ifdef CONFIG_COMPAT struct compat_robust_list_head __user *compat_robust_list; @@ -27,9 +29,6 @@ struct futex_ctrl { struct mutex exit_mutex; unsigned int state; }; -#else -struct futex_ctrl { }; -#endif /* !CONFIG_FUTEX */ /** * struct futex_mm_data - Futex related per MM data @@ -71,4 +70,9 @@ struct futex_mm_data { #endif }; +#else +struct futex_sched_data { }; +struct futex_mm_data { }; +#endif /* !CONFIG_FUTEX */ + #endif /* _LINUX_FUTEX_TYPES_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 266d4859e322..a5d5c0ec3c64 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1329,7 +1329,7 @@ struct task_struct { u32 rmid; #endif - struct futex_ctrl futex; + struct futex_sched_data futex; #ifdef CONFIG_PERF_EVENTS u8 perf_recursion[PERF_NR_CONTEXTS]; diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h index ab9d89748595..e447eaea63f4 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -26,6 +26,7 @@ #define FUTEX_PRIVATE_FLAG 128 #define FUTEX_CLOCK_REALTIME 256 #define FUTEX_UNLOCK_ROBUST 512 +#define FUTEX_ROBUST_LIST32 1024 #define FUTEX_CMD_MASK ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | FUTEX_UNLOCK_ROBUST) #define FUTEX_WAIT_PRIVATE (FUTEX_WAIT | FUTEX_PRIVATE_FLAG) @@ -182,23 +183,6 @@ struct robust_list_head { #define FUTEX_ROBUST_MOD_PI (0x1UL) #define FUTEX_ROBUST_MOD_MASK (FUTEX_ROBUST_MOD_PI) -/* - * Modifier for FUTEX_ROBUST_UNLOCK uaddr2. Required to distinguish the storage - * size for the robust_list_head::list_pending_op. This solves two problems: - * - * 1) COMPAT tasks - * - * 2) The mixed mode magic gaming use case which has both 32-bit and 64-bit - * robust lists. Oh well.... - * - * Long story short: 32-bit userspace must set this bit unconditionally to - * ensure that it can run on a 64-bit kernel in compat mode. If user space - * screws that up a 64-bit kernel will happily clear the full 64-bits. 32-bit - * kernels return an error code if the bit is not set. - */ -#define FUTEX_ROBUST_UNLOCK_MOD_32BIT (0x1UL) -#define FUTEX_ROBUST_UNLOCK_MOD_MASK (FUTEX_ROBUST_UNLOCK_MOD_32BIT) - /* * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a * match of any bit. diff --git a/include/vdso/futex.h b/include/vdso/futex.h index 8061bfcb6b92..a768c00b0ada 100644 --- a/include/vdso/futex.h +++ b/include/vdso/futex.h @@ -2,12 +2,11 @@ #ifndef _VDSO_FUTEX_H #define _VDSO_FUTEX_H -#include - -struct robust_list; +#include /** - * __vdso_futex_robust_try_unlock - Try to unlock an uncontended robust futex + * __vdso_futex_robust_try_unlock_64 - Try to unlock an uncontended robust futex + * with a 64-bit op pointer * @lock: Pointer to the futex lock object * @tid: The TID of the calling task * @op: Pointer to the task's robust_list_head::list_pending_op @@ -39,6 +38,23 @@ struct robust_list; * @uaddr2 argument for sys_futex(FUTEX_ROBUST_UNLOCK) operations. See the * modifier and the related documentation in include/uapi/linux/futex.h */ -uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *op); +__u32 __vdso_futex_robust_try_unlock_64(__u32 *lock, __u32 tid, __u64 *op); + +/** + * __vdso_futex_robust_try_unlock_32 - Try to unlock an uncontended robust futex + * with a 32-bit op pointer + * @lock: Pointer to the futex lock object + * @tid: The TID of the calling task + * @op: Pointer to the task's robust_list_head::list_pending_op + * + * Return: The content of *@lock. On success this is the same as @tid. + * + * Same as __vdso_futex_robust_try_unlock_64() just with a 32-bit @op pointer. + */ +__u32 __vdso_futex_robust_try_unlock_32(__u32 *lock, __u32 tid, __u32 *op); + +/* Modifier to convey the size of the op pointer */ +#define FUTEX_ROBUST_UNLOCK_MOD_32BIT (0x1UL) +#define FUTEX_ROBUST_UNLOCK_MOD_MASK (FUTEX_ROBUST_UNLOCK_MOD_32BIT) #endif diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 7957edd46b89..39041cf94522 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -46,6 +46,8 @@ #include #include +#include + #include "futex.h" #include "../locking/rtmutex_common.h" @@ -1434,17 +1436,9 @@ static void exit_pi_state_list(struct task_struct *curr) static inline void exit_pi_state_list(struct task_struct *curr) { } #endif -static inline bool mask_pop_addr(void __user **pop) -{ - unsigned long addr = (unsigned long)*pop; - - *pop = (void __user *) (addr & ~FUTEX_ROBUST_UNLOCK_MOD_MASK); - return !!(addr & FUTEX_ROBUST_UNLOCK_MOD_32BIT); -} - -bool futex_robust_list_clear_pending(void __user *pop) +bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags) { - bool size32bit = mask_pop_addr(&pop); + bool size32bit = !!(flags & FLAGS_ROBUST_LIST32); if (!IS_ENABLED(CONFIG_64BIT) && !size32bit) return false; @@ -1456,15 +1450,28 @@ bool futex_robust_list_clear_pending(void __user *pop) } #ifdef CONFIG_FUTEX_ROBUST_UNLOCK +static inline bool mask_pop_addr(void __user **pop) +{ + unsigned long addr = (unsigned long)*pop; + + *pop = (void __user *) (addr & ~FUTEX_ROBUST_UNLOCK_MOD_MASK); + return !!(addr & FUTEX_ROBUST_UNLOCK_MOD_32BIT); +} + void __futex_fixup_robust_unlock(struct pt_regs *regs) { + unsigned int flags = 0; void __user *pop; if (!arch_futex_needs_robust_unlock_fixup(regs)) return; pop = arch_futex_robust_unlock_get_pop(regs); - futex_robust_list_clear_pending(pop); + + if (mask_pop_addr(&pop)) + flags = FUTEX_ROBUST_UNLOCK_MOD_32BIT; + + futex_robust_list_clear_pending(pop, flags); } #endif /* CONFIG_FUTEX_ROBUST_UNLOCK */ diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index b1aaa90f1779..31a5bae8b470 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -41,6 +41,7 @@ #define FLAGS_STRICT 0x0100 #define FLAGS_MPOL 0x0200 #define FLAGS_UNLOCK_ROBUST 0x0400 +#define FLAGS_ROBUST_LIST32 0x0800 /* FUTEX_ to FLAGS_ */ static inline unsigned int futex_to_flags(unsigned int op) @@ -56,6 +57,9 @@ static inline unsigned int futex_to_flags(unsigned int op) if (op & FUTEX_UNLOCK_ROBUST) flags |= FLAGS_UNLOCK_ROBUST; + if (op & FUTEX_ROBUST_LIST32) + flags |= FLAGS_ROBUST_LIST32; + return flags; } @@ -452,6 +456,6 @@ extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *p extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock); -bool futex_robust_list_clear_pending(void __user *pop); +bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags); #endif /* _FUTEX_H */ diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index b8c76b6242e4..05ca360a7a30 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1298,7 +1298,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop) if (ret || !(flags & FLAGS_UNLOCK_ROBUST)) return ret; - if (!futex_robust_list_clear_pending(pop)) + if (!futex_robust_list_clear_pending(pop, flags)) return -EFAULT; return 0; diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 45effcf42961..233f38b1f52e 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -166,7 +166,7 @@ static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __us * deeper trouble as the robust list head is usually part of TLS. The * chance of survival is close to zero. */ - return futex_robust_list_clear_pending(pop); + return futex_robust_list_clear_pending(pop, flags); } /*