From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19AB3402B8B for ; Thu, 19 Mar 2026 23:24:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773962648; cv=none; b=RIKri8BpZs9S+LRFUfHMYWE309aTevbWY2l0KEmb8EceNjWcG+L1dL10WchLrmuKJ27IyAJnS+eb/ZlJ9M8rGDt5kNXaTJfm02tOD01aT5EsNrstEGIIw4UcHaICOiMrJgl+C04ZnvqN1fn7kpjseT5yLIbeBBqWk4dFl7xSTMQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773962648; c=relaxed/simple; bh=OdJLrP51kCdC+qPUnZbj+uOUIp+oCAV10XAWRxLup+g=; h=Date:Message-ID:From:To:Cc:Subject; b=rTJJCcFNGEVTeuMm6H2ixI9M9IyPsXIc0CK2eoGD1oEcgzHM/Ivt5c0ev12oA2uAB0VDUvDgJMGPs+MdjfY1D9x3k6wWJQP6Yynt/fGy/bl0aI95PV132Kjt+AMlNz0XA0vbZxZ+oKusQgKW0mrwWkQV4G6MDowHm8RmQZ9gUxU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=j/k8aRE7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="j/k8aRE7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B7C9BC2BC9E; Thu, 19 Mar 2026 23:24:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773962647; bh=OdJLrP51kCdC+qPUnZbj+uOUIp+oCAV10XAWRxLup+g=; h=Date:From:To:Cc:Subject:From; b=j/k8aRE7R/wJKOJNHn/mn8rIQxrTG++kz7Nnb+3JnvHU30Jp7/uit8RLsQDIhlBhu LSslt6roIiq4HhpssewvVtlL05h1jiAu4GjXlVQwdYdOInH1F+NiuYy7gAHXUHq7Km jwm4xsIGl/7jgRjJ1PyZ30EODJm+N2L0Ev5F2x1NiHfG8QIpTrTvYqX+5o3ocVVS2f 8wHsgs3aaGBhn8Jx/8WFJxsEe5pTZTNAvRjNZfvarfqe8DhTKXplfqy22i7FtWLf2a dso5DDjs0xWPUOcKlFK2NYRrJkX4sZfg2F+YyNysZGbjQvTSu38hz0V9SL/eUnZ2Gv Tk4F6CGfi/53Q== Date: Fri, 20 Mar 2026 00:24:03 +0100 Message-ID: <20260319225224.853416463@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Mathieu Desnoyers , =?UTF-8?q?Andr=C3=A9=20Almeida?= , Sebastian Andrzej Siewior , Carlos O'Donell , Peter Zijlstra , Florian Weimer , Rich Felker , Torvald Riegel , Darren Hart , Ingo Molnar , Davidlohr Bueso , Arnd Bergmann , "Liam R . Howlett" , Uros Bizjak , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= Subject: [patch v2 00/11] futex: Address the robust futex unlock race for real Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This is a follow up to v1 which can be found here: https://lore.kernel.org/20260316162316.356674433@kernel.org The v1 cover letter contains a detailed analysis or the underlying problem. TLDR: The robust futex unlock mechanism is racy in respect to the clearing of the robust_list_head::list_op_pending pointer because unlock and clearing the pointer are not atomic. The race window is between the unlock and clearing the pending op pointer. If the task is forced to exit in this window, exit will access a potentially invalid pending op pointer when cleaning up the robust list. That happens if another task manages to unmap the object containing the lock before the cleanup, which results in an UAF. In the worst case this UAF can lead to memory corruption when unrelated content has been mapped to the same address by the time the access happens. User space can't solve this problem without help from the kernel. This series provides the kernel side infrastructure to help it along: 1) Combined unlock, pointer clearing, wake-up for the contended case 2) VDSO based unlock and pointer clearing helpers with a fix-up function in the kernel when user space was interrupted within the critical section. Both ensure that the pointer clearing happens _before_ a task exits and the kernel cleans up the robust list during the exit procedure. Changes since v1: - Use a dedicated modifier flag FUTEX_ROBUST_LIST32 for the futex syscall ops to indicate the size of the pending op pointer. This replaces the V1 pointer mangling where bit0 indicated 32-bit size. The flag is mandatory for 32-bit applications. - Florian - Add a new unsafe_atomic_store_release_user() helper and use it for the unlock. If defaults to 'smp_mb(); unsafe_put_user()'. Architectures where store implies release can remove the smp_mb() by selecting a config option (done for x86). The helper can be overridden by architecture implementations. - André, Peter - Remove the global exposure of the critical section labels. They are now only available through vdsoXX.so.dbg, which is sufficient for vdso2c to find them - Thomas, Mathieu - Replace the combined mixed size VDSO unlock helper with dedicated functions for 64-bit and 32-bit pointers. This requires to evaluate up to two ranges in the critical section IP check, but that removes the complexity both in the unlock helpers and the architecture specific decision and pointer retrieval functions. - Use the proper ASM modifiers and reuse the zeroed register, which is used for the cmpxchg() for zeroing the pending op pointer instead of using a $0 immediate. - Uros - Address coding style, documentation and naming feedback - André Thanks to everyone for feedback and discussion! The modified test case and the delta patch against the previous version is below. The series applies on v7.0-rc3 and is also available via git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git locking-futex-v2 According to my basic testing this still works correctly with both sizes, but that needs obviously more scrutiny especially from the libc people and André. If the functionality itself is agreed on we only need to agree on the names and signatures of the functions exposed through the VDSO before we set them in stone. That will hopefully not take another 15 years :) That said, after 20+ years working on futexes I'm still amazed how much code is required to interact with one or two memory locations in user space. I long ago stated that the futex code consists of 5% functionality and 95% corner case handling. After this work episode I'm convinced that futexes are nothing else than a infinite Rube Goldberg machine in disguise. Thanks, tglx --- #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include typedef uint32_t (*frtu64_t)(uint32_t *, uint32_t, uint64_t *); typedef uint32_t (*frtu32_t)(uint32_t *, uint32_t, uint32_t *); static frtu64_t frtu64; static frtu32_t frtu32; static void get_vdso(void) { void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); if (!vdso) vdso = dlopen("linux-gate.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); if (!vdso) vdso = dlopen("linux-vdso32.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); if (!vdso) vdso = dlopen("linux-vdso64.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); if (!vdso) { printf("Failed to find vDSO\n"); exit(1); } frtu64 = (frtu64_t)dlsym(vdso, "__vdso_futex_robust_list64_try_unlock"); frtu32 = (frtu32_t)dlsym(vdso, "__vdso_futex_robust_list32_try_unlock"); } #define FUTEX_ROBUST_UNLOCK 512 #define FUTEX_ROBUST_LIST32 1024 #define UNLOCK64 (FUTEX_WAKE | FUTEX_ROBUST_UNLOCK) #define UNLOCK64_PI (FUTEX_UNLOCK_PI | FUTEX_ROBUST_UNLOCK) #define UNLOCK32 (UNLOCK64 | FUTEX_ROBUST_LIST32) #define UNLOCK32_PI (UNLOCK64 | FUTEX_ROBUST_LIST32) static void set_pop(struct robust_list_head *rhead, pthread_mutex_t *mutex) { rhead->list_op_pending = (struct robust_list *)&mutex->__data.__list.__next; } static void *pop_exp(bool sz32, pthread_mutex_t *mutex) { uint64_t exp = (uint64_t)(unsigned long)&mutex->__data.__list.__next; if (!sz32) return NULL; exp &= ~0xFFFFFFFFULL; return (void *)(unsigned long)exp; } static void unlock_uncontended(bool sz32, struct robust_list_head *rhead, pthread_mutex_t *mutex, pid_t tid) { uint32_t *lock = (uint32_t *)&mutex->__data.__lock; void *exp = pop_exp(sz32, mutex); pid_t lock_tid; set_pop(rhead, mutex); *lock = tid; if (sz32) lock_tid = frtu32(lock, tid, (uint32_t *)&rhead->list_op_pending); else lock_tid = frtu64(lock, tid, (uint64_t *)&rhead->list_op_pending); if (lock_tid != tid) printf("Non contended unlock failed. Return: %08x\n", lock_tid); else if (rhead->list_op_pending != exp) printf("List op not cleared: %16lx\n", (unsigned long) rhead->list_op_pending); else if (*lock) printf("Non contended unlock failed: LOCK %08x\n", *lock); else printf("Non contended unlock succeeded\n"); } static void unlock_syscall(bool sz32, bool pi, struct robust_list_head *rhead, pthread_mutex_t *mutex, pid_t tid) { uint32_t *lock = (uint32_t *)&mutex->__data.__lock; void *exp = pop_exp(sz32, mutex); unsigned int op; int ret; if (sz32) op = pi ? UNLOCK32_PI : UNLOCK32; else op = pi ? UNLOCK64_PI : UNLOCK64; ret = syscall(SYS_futex, lock, op, !!pi, NULL, (uint32_t *)&rhead->list_op_pending, 0, 0); if (ret < 0) printf("syscall unlock failed %d\n", errno); else if (rhead->list_op_pending != exp) printf("List op not cleared in syscall: %16lx\n", (unsigned long) rhead->list_op_pending); else if (*lock) printf("Contended syscall unlock failed: LOCK %08x\n", *lock); else printf("Contended unlock syscall succeeded\n"); } static void unlock_contended(bool sz32, bool pi, struct robust_list_head *rhead, pthread_mutex_t *mutex, pid_t tid) { uint32_t *lock = (uint32_t *)&mutex->__data.__lock; pid_t lock_tid; set_pop(rhead, mutex); *lock = tid | FUTEX_WAITERS; if (sz32) lock_tid = frtu32(lock, tid, (uint32_t *)&rhead->list_op_pending); else lock_tid = frtu64(lock, tid, (uint64_t *)&rhead->list_op_pending); if (lock_tid == tid) printf("Contended unlock succeeded %08x\n", lock_tid); else if (rhead->list_op_pending != (struct robust_list *)&mutex->__data.__list.__next) printf("List op cleared: %16lx\n", (unsigned long) rhead->list_op_pending); else if (!*lock) printf("Contended unlock cleared LOCK %08x\n", *lock); else unlock_syscall(sz32, pi, rhead, mutex, tid); } static void test(bool sz32) { pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; struct robust_list_head *rhead; pid_t tid = gettid(); size_t sz; if (sz32 && !frtu32) return; if (!sz32 && !frtu64) return; syscall(SYS_get_robust_list, 0, &rhead, &sz); printf("Testing non contended unlock %s\n", sz32 ? "POP32" : "POP64"); unlock_uncontended(sz32, rhead, &mutex, tid); printf("Testing contended FUTEX_WAKE unlock %s\n", sz32 ? "POP32" : "POP64"); unlock_contended(sz32, false, rhead, &mutex, tid); printf("Testing contended FUTEX_UNLOCK_PI %s\n", sz32 ? "POP32" : "POP64"); unlock_contended(sz32, true, rhead, &mutex, tid); } int main(int argc, char * const argv[]) { get_vdso(); test(false); test(true); return 0; } --- Delta diff: --- diff --git a/arch/Kconfig b/arch/Kconfig index 102ddbd4298e..0c1e6cc101ff 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -403,6 +403,10 @@ config ARCH_32BIT_OFF_T config ARCH_32BIT_USTAT_F_TINODE bool +# Selected by architectures when plain stores have release semantics +config ARCH_STORE_IMPLIES_RELEASE + bool + config HAVE_ASM_MODVERSIONS bool help diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4be953f0516b..e9437efae787 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -119,6 +119,7 @@ config X86 select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select ARCH_STACKWALK + select ARCH_STORE_IMPLIES_RELEASE select ARCH_SUPPORTS_ACPI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_DEBUG_PAGEALLOC diff --git a/arch/x86/entry/vdso/common/vfutex.c b/arch/x86/entry/vdso/common/vfutex.c index 19d8ef130b63..8df8fd6c759d 100644 --- a/arch/x86/entry/vdso/common/vfutex.c +++ b/arch/x86/entry/vdso/common/vfutex.c @@ -2,71 +2,75 @@ #include /* - * Compat enabled kernels have to take the size bit into account to support the - * mixed size use case of gaming emulators. Contrary to the kernel robust unlock - * mechanism all of this does not test for the 32-bit modifier in 32-bit VDSOs - * and in compat disabled kernels. User space can keep the pieces. + * Assembly template for the try unlock functions. The basic functionality is: + * + * mov esi, %eax Move the TID into EAX + * xor %ecx, %ecx Clear ECX + * lock_cmpxchgl %ecx, (%rdi) Attempt the TID -> 0 transition + * .Lcs_start: Start of the critical section + * jnz .Lcs_end If cmpxchl failed jump to the end + * .Lcs_success: Start of the success section + * movq %rcx, (%rdx) Set the pending op pointer to 0 + * .Lcs_end: End of the critical section + * + * .Lcs_start and .Lcs_end establish the critical section range. .Lcs_success is + * technically not required, but there for illustration, debugging and testing. + * + * When CONFIG_COMPAT is enabled then the 64-bit VDSO provides two functions. + * One for the regular 64-bit sized pending operation pointer and one for a + * 32-bit sized pointer to support gaming emulators. + * + * The 32-bit VDSO provides only the one for 32-bit sized pointers. */ -#if defined(CONFIG_X86_64) && !defined(BUILD_VDSO32_64) +#define __stringify_1(x...) #x +#define __stringify(x...) __stringify_1(x) -#ifdef CONFIG_COMPAT - -# define ASM_CLEAR_PTR \ - " testl $1, (%[pop]) \n" \ - " jz .Lop64 \n" \ - " movl $0, (%[pad]) \n" \ - " jmp __vdso_futex_robust_try_unlock_cs_end \n" \ - ".Lop64: \n" \ - " movq $0, (%[pad]) \n" - -# define ASM_PAD_CONSTRAINT ,[pad] "S" (((unsigned long)pop) & ~0x1UL) - -#else /* CONFIG_COMPAT */ - -# define ASM_CLEAR_PTR \ - " movq $0, (%[pop]) \n" - -# define ASM_PAD_CONSTRAINT - -#endif /* !CONFIG_COMPAT */ +#define LABEL(name, which) __stringify(name##_futex_try_unlock_cs_##which:) -#else /* CONFIG_X86_64 && !BUILD_VDSO32_64 */ +#define JNZ_END(name) "jnz " __stringify(name) "_futex_try_unlock_cs_end\n" -# define ASM_CLEAR_PTR \ - " movl $0, (%[pad]) \n" +#define CLEAR_POPQ "movq %[zero], %a[pop]\n" +#define CLEAR_POPL "movl %k[zero], %a[pop]\n" -# define ASM_PAD_CONSTRAINT ,[pad] "S" (((unsigned long)pop) & ~0x1UL) +#define futex_robust_try_unlock(name, clear_pop, __lock, __tid, __pop) \ +({ \ + asm volatile ( \ + " \n" \ + " lock cmpxchgl %k[zero], %a[lock] \n" \ + " \n" \ + LABEL(name, start) \ + " \n" \ + JNZ_END(name) \ + " \n" \ + LABEL(name, success) \ + " \n" \ + clear_pop \ + " \n" \ + LABEL(name, end) \ + : [tid] "+&a" (__tid) \ + : [lock] "D" (__lock), \ + [pop] "d" (__pop), \ + [zero] "S" (0UL) \ + : "memory" \ + ); \ + __tid; \ +}) -#endif /* !CONFIG_X86_64 || BUILD_VDSO32_64 */ - -uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *pop) +#ifdef __x86_64__ +__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop) { - asm volatile ( - ".global __vdso_futex_robust_try_unlock_cs_start \n" - ".global __vdso_futex_robust_try_unlock_cs_success \n" - ".global __vdso_futex_robust_try_unlock_cs_end \n" - " \n" - " lock cmpxchgl %[val], (%[ptr]) \n" - " \n" - "__vdso_futex_robust_try_unlock_cs_start: \n" - " \n" - " jnz __vdso_futex_robust_try_unlock_cs_end \n" - " \n" - "__vdso_futex_robust_try_unlock_cs_success: \n" - " \n" - ASM_CLEAR_PTR - " \n" - "__vdso_futex_robust_try_unlock_cs_end: \n" - : [tid] "+a" (tid) - : [ptr] "D" (lock), - [pop] "d" (pop), - [val] "r" (0) - ASM_PAD_CONSTRAINT - : "memory" - ); - - return tid; + return futex_robust_try_unlock(x86_64, CLEAR_POPQ, lock, tid, pop); } -uint32_t futex_robust_try_unlock(uint32_t *, uint32_t, void **) - __attribute__((weak, alias("__vdso_futex_robust_try_unlock"))); +#ifdef CONFIG_COMPAT +__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop) +{ + return futex_robust_try_unlock(x86_64_compat, CLEAR_POPL, lock, tid, pop); +} +#endif /* CONFIG_COMPAT */ +#else /* __x86_64__ */ +__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop) +{ + return futex_robust_try_unlock(x86_32, CLEAR_POPL, lock, tid, pop); +} +#endif /* !__x86_64__ */ diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S index b027d2f98bd0..cee8f7f9fe80 100644 --- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S +++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S @@ -31,10 +31,7 @@ VERSION __vdso_clock_getres_time64; __vdso_getcpu; #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - __vdso_futex_robust_try_unlock; - __vdso_futex_robust_try_unlock_cs_start; - __vdso_futex_robust_try_unlock_cs_success; - __vdso_futex_robust_try_unlock_cs_end; + __vdso_futex_robust_list32_try_unlock; #endif }; diff --git a/arch/x86/entry/vdso/vdso64/vdso64.lds.S b/arch/x86/entry/vdso/vdso64/vdso64.lds.S index e5c0ca9664e1..11dae35358a2 100644 --- a/arch/x86/entry/vdso/vdso64/vdso64.lds.S +++ b/arch/x86/entry/vdso/vdso64/vdso64.lds.S @@ -33,10 +33,8 @@ VERSION { getrandom; __vdso_getrandom; #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - __vdso_futex_robust_try_unlock; - __vdso_futex_robust_try_unlock_cs_start; - __vdso_futex_robust_try_unlock_cs_success; - __vdso_futex_robust_try_unlock_cs_end; + __vdso_futex_robust_list64_try_unlock; + __vdso_futex_robust_list32_try_unlock; #endif local: *; }; diff --git a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S index 4409d97e7ef6..0e844af63304 100644 --- a/arch/x86/entry/vdso/vdso64/vdsox32.lds.S +++ b/arch/x86/entry/vdso/vdso64/vdsox32.lds.S @@ -23,10 +23,8 @@ VERSION { __vdso_time; __vdso_clock_getres; #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - __vdso_futex_robust_try_unlock; - __vdso_futex_robust_try_unlock_cs_start; - __vdso_futex_robust_try_unlock_cs_success; - __vdso_futex_robust_try_unlock_cs_end; + __vdso_futex_robust_list64_try_unlock; + __vdso_futex_robust_list32_try_unlock; #endif local: *; }; diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 5ccb45840f79..ad87818d42a0 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -78,13 +78,28 @@ static void vdso_futex_robust_unlock_update_ips(void) { const struct vdso_image *image = current->mm->context.vdso_image; unsigned long vdso = (unsigned long) current->mm->context.vdso; + struct futex_mm_data *fd = ¤t->mm->futex; + struct futex_unlock_cs_range *csr = fd->unlock_cs_ranges; - current->mm->futex.unlock_cs_start_ip = - vdso + image->sym___vdso_futex_robust_try_unlock_cs_start; - current->mm->futex.unlock_cs_success_ip = - vdso + image->sym___vdso_futex_robust_try_unlock_cs_success; - current->mm->futex.unlock_cs_end_ip = - vdso + image->sym___vdso_futex_robust_try_unlock_cs_end; + fd->unlock_cs_num_ranges = 0; +#ifdef CONFIG_X86_64 + if (image->sym_x86_64_futex_try_unlock_cs_start) { + csr->start_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_start; + csr->end_ip = vdso + image->sym_x86_64_futex_try_unlock_cs_end; + csr->pop_size32 = 0; + csr++; + fd->unlock_cs_num_ranges++; + } +#endif /* CONFIG_X86_64 */ + +#if defined(CONFIG_X86_32) || defined(CONFIG_COMPAT) + if (image->sym_x86_32_futex_try_unlock_cs_start) { + csr->start_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_start; + csr->end_ip = vdso + image->sym_x86_32_futex_try_unlock_cs_end; + csr->pop_size32 = 1; + fd->unlock_cs_num_ranges++; + } +#endif /* CONFIG_X86_32 || CONFIG_COMPAT */ } #else static inline void vdso_futex_robust_unlock_update_ips(void) { } diff --git a/arch/x86/include/asm/futex_robust.h b/arch/x86/include/asm/futex_robust.h index bcabb289d18e..e87954703ae2 100644 --- a/arch/x86/include/asm/futex_robust.h +++ b/arch/x86/include/asm/futex_robust.h @@ -4,41 +4,16 @@ #include -static __always_inline bool x86_futex_needs_robust_unlock_fixup(struct pt_regs *regs) +static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs) { /* - * This is tricky in the compat case as it has to take the size check - * into account. See the ASM magic in the VDSO vfutex code. If compat is - * disabled or this is a 32-bit kernel then ZF is authoritive no matter - * what. - */ - if (!IS_ENABLED(CONFIG_X86_64) || !IS_ENABLED(CONFIG_IA32_EMULATION)) - return !!(regs->flags & X86_EFLAGS_ZF); - - /* - * For the compat case, the core code already established that regs->ip - * is >= cs_start and < cs_end. Now check whether it is at the - * conditional jump which checks the cmpxchg() or if it succeeded and - * does the size check, which obviously modifies ZF too. - */ - if (regs->ip >= current->mm->futex.unlock_cs_success_ip) - return true; - /* - * It's at the jnz right after the cmpxchg(). ZF tells whether this - * succeeded or not. + * If ZF is set then the cmpxchg succeeded and the pending op pointer + * needs to be cleared. */ - return !!(regs->flags & X86_EFLAGS_ZF); -} - -#define arch_futex_needs_robust_unlock_fixup(regs) \ - x86_futex_needs_robust_unlock_fixup(regs) - -static __always_inline void __user *x86_futex_robust_unlock_get_pop(struct pt_regs *regs) -{ - return (void __user *)regs->dx; + return regs->flags & X86_EFLAGS_ZF ? (void __user *)regs->dx : NULL; } -#define arch_futex_robust_unlock_get_pop(regs) \ +#define arch_futex_robust_unlock_get_pop(regs) \ x86_futex_robust_unlock_get_pop(regs) #endif /* _ASM_X86_FUTEX_ROBUST_H */ diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index 1ed7a2ae600d..b96a6f04d677 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -25,9 +25,12 @@ struct vdso_image { long sym_int80_landing_pad; long sym_vdso32_sigreturn_landing_pad; long sym_vdso32_rt_sigreturn_landing_pad; - long sym___vdso_futex_robust_try_unlock_cs_start; - long sym___vdso_futex_robust_try_unlock_cs_success; - long sym___vdso_futex_robust_try_unlock_cs_end; + long sym_x86_64_futex_try_unlock_cs_start; + long sym_x86_64_futex_try_unlock_cs_end; + long sym_x86_64_compat_futex_try_unlock_cs_start; + long sym_x86_64_compat_futex_try_unlock_cs_end; + long sym_x86_32_futex_try_unlock_cs_start; + long sym_x86_32_futex_try_unlock_cs_end; }; extern const struct vdso_image vdso64_image; diff --git a/arch/x86/tools/vdso2c.c b/arch/x86/tools/vdso2c.c index 47012474ccc4..2d01e511ca8a 100644 --- a/arch/x86/tools/vdso2c.c +++ b/arch/x86/tools/vdso2c.c @@ -82,9 +82,12 @@ struct vdso_sym required_syms[] = { {"int80_landing_pad", true}, {"vdso32_rt_sigreturn_landing_pad", true}, {"vdso32_sigreturn_landing_pad", true}, - {"__vdso_futex_robust_try_unlock_cs_start", true}, - {"__vdso_futex_robust_try_unlock_cs_success", true}, - {"__vdso_futex_robust_try_unlock_cs_end", true}, + {"x86_64_futex_try_unlock_cs_start", true}, + {"x86_64_futex_try_unlock_cs_end", true}, + {"x86_64_compat_futex_try_unlock_cs_start", true}, + {"x86_64_compat_futex_try_unlock_cs_end", true}, + {"x86_32_futex_try_unlock_cs_start", true}, + {"x86_32_futex_try_unlock_cs_end", true}, }; __attribute__((format(printf, 1, 2))) __attribute__((noreturn)) diff --git a/include/linux/futex.h b/include/linux/futex.h index cd347a4b3fac..8e3d46737b03 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -115,18 +115,20 @@ static inline int futex_mm_init(struct mm_struct *mm) { return 0; } #ifdef CONFIG_FUTEX_ROBUST_UNLOCK #include -void __futex_fixup_robust_unlock(struct pt_regs *regs); +void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr); -static inline bool futex_within_robust_unlock(struct pt_regs *regs) +static inline bool futex_within_robust_unlock(struct pt_regs *regs, + struct futex_unlock_cs_range *csr) { unsigned long ip = instruction_pointer(regs); - return ip >= current->mm->futex.unlock_cs_start_ip && - ip < current->mm->futex.unlock_cs_end_ip; + return ip >= csr->start_ip && ip < csr->end_ip; } static inline void futex_fixup_robust_unlock(struct pt_regs *regs) { + struct futex_unlock_cs_range *csr; + /* * Avoid dereferencing current->mm if not returning from interrupt. * current->rseq.event is going to be used anyway in the exit to user @@ -135,8 +137,17 @@ static inline void futex_fixup_robust_unlock(struct pt_regs *regs) if (!current->rseq.event.user_irq) return; - if (unlikely(futex_within_robust_unlock(regs))) - __futex_fixup_robust_unlock(regs); + csr = current->mm->futex.unlock_cs_ranges; + if (unlikely(futex_within_robust_unlock(regs, csr))) { + __futex_fixup_robust_unlock(regs, csr); + return; + } + + /* Multi sized robust lists are only supported with CONFIG_COMPAT */ + if (IS_ENABLED(CONFIG_COMPAT) && current->mm->futex.unlock_cs_num_ranges == 2) { + if (unlikely(futex_within_robust_unlock(regs, ++csr))) + __futex_fixup_robust_unlock(regs, csr); + } } #else /* CONFIG_FUTEX_ROBUST_UNLOCK */ static inline void futex_fixup_robust_unlock(struct pt_regs *regs) {} diff --git a/include/linux/futex_types.h b/include/linux/futex_types.h index 223f469789c5..90e24a10ed08 100644 --- a/include/linux/futex_types.h +++ b/include/linux/futex_types.h @@ -11,13 +11,15 @@ struct futex_pi_state; struct robust_list_head; /** - * struct futex_ctrl - Futex related per task data + * struct futex_sched_data - Futex related per task data * @robust_list: User space registered robust list pointer * @compat_robust_list: User space registered robust list pointer for compat tasks + * @pi_state_list: List head for Priority Inheritance (PI) state management + * @pi_state_cache: Pointer to cache one PI state object per task * @exit_mutex: Mutex for serializing exit * @state: Futex handling state to handle exit races correctly */ -struct futex_ctrl { +struct futex_sched_data { struct robust_list_head __user *robust_list; #ifdef CONFIG_COMPAT struct compat_robust_list_head __user *compat_robust_list; @@ -27,9 +29,20 @@ struct futex_ctrl { struct mutex exit_mutex; unsigned int state; }; -#else -struct futex_ctrl { }; -#endif /* !CONFIG_FUTEX */ + +/** + * struct futex_unlock_cs_range - Range for the VDSO unlock critical section + * @start_ip: The start IP of the robust futex unlock critical section (inclusive) + * @end_ip: The end IP of the robust futex unlock critical section (exclusive) + * @pop_size32: Pending OP pointer size indicator. 0 == 64-bit, 1 == 32-bit + */ +struct futex_unlock_cs_range { + unsigned long start_ip; + unsigned long end_ip; + unsigned int pop_size32; +}; + +#define FUTEX_ROBUST_MAX_CS_RANGES 2 /** * struct futex_mm_data - Futex related per MM data @@ -41,18 +54,9 @@ struct futex_ctrl { }; * @phash_atomic: Aggregate value for @phash_ref * @phash_ref: Per CPU reference counter for a private hash * - * @unlock_cs_start_ip: The start IP of the robust futex unlock critical section - * - * @unlock_cs_success_ip: The IP of the robust futex unlock critical section which - * indicates that the unlock (cmpxchg) was successful - * Required to handle the compat size insanity for mixed mode - * game emulators. - * - * Not evaluated by the core code as that only - * evaluates the start/end range. Can therefore be 0 if the - * architecture does not care. - * - * @unlock_cs_end_ip: The end IP of the robust futex unlock critical section + * @unlock_cs_num_ranges: The number of critical section ranges for VDSO assisted unlock + * of robust futexes. + * @unlock_cs_ranges: The critical section ranges for VDSO assisted unlock */ struct futex_mm_data { #ifdef CONFIG_FUTEX_PRIVATE_HASH @@ -65,10 +69,14 @@ struct futex_mm_data { unsigned int __percpu *phash_ref; #endif #ifdef CONFIG_FUTEX_ROBUST_UNLOCK - unsigned long unlock_cs_start_ip; - unsigned long unlock_cs_success_ip; - unsigned long unlock_cs_end_ip; + unsigned int unlock_cs_num_ranges; + struct futex_unlock_cs_range unlock_cs_ranges[FUTEX_ROBUST_MAX_CS_RANGES]; #endif }; +#else +struct futex_sched_data { }; +struct futex_mm_data { }; +#endif /* !CONFIG_FUTEX */ + #endif /* _LINUX_FUTEX_TYPES_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 266d4859e322..a5d5c0ec3c64 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1329,7 +1329,7 @@ struct task_struct { u32 rmid; #endif - struct futex_ctrl futex; + struct futex_sched_data futex; #ifdef CONFIG_PERF_EVENTS u8 perf_recursion[PERF_NR_CONTEXTS]; diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 809e4f7dfdbd..bc41d619f9a3 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -644,6 +644,15 @@ static inline void user_access_restore(unsigned long flags) { } #define user_read_access_end user_access_end #endif +#ifndef unsafe_atomic_store_release_user +# define unsafe_atomic_store_release_user(val, uptr, elbl) \ + do { \ + if (!IS_ENABLED(CONFIG_ARCH_STORE_IMPLIES_RELEASE)) \ + smp_mb(); \ + unsafe_put_user(val, uptr, elbl); \ + } while (0) +#endif + /* Define RW variant so the below _mode macro expansion works */ #define masked_user_rw_access_begin(u) masked_user_access_begin(u) #define user_rw_access_begin(u, s) user_access_begin(u, s) diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h index ab9d89748595..9a0f564f1737 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -26,23 +26,48 @@ #define FUTEX_PRIVATE_FLAG 128 #define FUTEX_CLOCK_REALTIME 256 #define FUTEX_UNLOCK_ROBUST 512 -#define FUTEX_CMD_MASK ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | FUTEX_UNLOCK_ROBUST) - -#define FUTEX_WAIT_PRIVATE (FUTEX_WAIT | FUTEX_PRIVATE_FLAG) -#define FUTEX_WAKE_PRIVATE (FUTEX_WAKE | FUTEX_PRIVATE_FLAG) -#define FUTEX_REQUEUE_PRIVATE (FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG) -#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG) -#define FUTEX_WAKE_OP_PRIVATE (FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG) -#define FUTEX_LOCK_PI_PRIVATE (FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG) -#define FUTEX_LOCK_PI2_PRIVATE (FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG) -#define FUTEX_UNLOCK_PI_PRIVATE (FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG) -#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG) +#define FUTEX_ROBUST_LIST32 1024 + +#define FUTEX_CMD_MASK ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME | \ + FUTEX_UNLOCK_ROBUST | FUTEX_ROBUST_LIST32) + +#define FUTEX_WAIT_PRIVATE (FUTEX_WAIT | FUTEX_PRIVATE_FLAG) +#define FUTEX_WAKE_PRIVATE (FUTEX_WAKE | FUTEX_PRIVATE_FLAG) +#define FUTEX_REQUEUE_PRIVATE (FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG) +#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG) +#define FUTEX_WAKE_OP_PRIVATE (FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG) +#define FUTEX_LOCK_PI_PRIVATE (FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG) +#define FUTEX_LOCK_PI2_PRIVATE (FUTEX_LOCK_PI2 | FUTEX_PRIVATE_FLAG) +#define FUTEX_UNLOCK_PI_PRIVATE (FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG) +#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG) #define FUTEX_WAIT_BITSET_PRIVATE (FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG) #define FUTEX_WAKE_BITSET_PRIVATE (FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG) -#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \ - FUTEX_PRIVATE_FLAG) -#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \ - FUTEX_PRIVATE_FLAG) +#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | FUTEX_PRIVATE_FLAG) +#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | FUTEX_PRIVATE_FLAG) + +/* + * Operations to unlock a futex, clear the robust list pending op pointer and + * wake waiters. + */ +#define FUTEX_UNLOCK_PI_LIST64 (FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST) +#define FUTEX_UNLOCK_PI_LIST64_PRIVATE (FUTEX_UNLOCK_PI_LIST64 | FUTEX_PRIVATE_FLAG) +#define FUTEX_UNLOCK_PI_LIST32 (FUTEX_UNLOCK_PI | FUTEX_UNLOCK_ROBUST | \ + FUTEX_ROBUST_LIST32) +#define FUTEX_UNLOCK_PI_LIST32_PRIVATE (FUTEX_UNLOCK_PI_LIST32 | FUTEX_PRIVATE_FLAG) + +#define FUTEX_UNLOCK_WAKE_LIST64 (FUTEX_WAKE | FUTEX_UNLOCK_ROBUST) +#define FUTEX_UNLOCK_WAKE_LIST64_PRIVATE (FUTEX_UNLOCK_LIST64 | FUTEX_PRIVATE_FLAG) + +#define FUTEX_UNLOCK_WAKE_LIST32 (FUTEX_WAKE | FUTEX_UNLOCK_ROBUST | \ + FUTEX_ROBUST_LIST32) +#define FUTEX_UNLOCK_WAKE_LIST32_PRIVATE (FUTEX_UNLOCK_LIST32 | FUTEX_PRIVATE_FLAG) + +#define FUTEX_UNLOCK_BITSET_LIST64 (FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST) +#define FUTEX_UNLOCK_BITSET_LIST64_PRIVATE (FUTEX_UNLOCK_BITSET_LIST64 | FUTEX_PRIVATE_FLAG) + +#define FUTEX_UNLOCK_BITSET_LIST32 (FUTEX_WAKE_BITSET | FUTEX_UNLOCK_ROBUST | \ + FUTEX_ROBUST_LIST32) +#define FUTEX_UNLOCK_BITSET_LIST32_PRIVATE (FUTEX_UNLOCK_BITSET_LIST32 | FUTEX_PRIVATE_FLAG) /* * Flags for futex2 syscalls. @@ -182,23 +207,6 @@ struct robust_list_head { #define FUTEX_ROBUST_MOD_PI (0x1UL) #define FUTEX_ROBUST_MOD_MASK (FUTEX_ROBUST_MOD_PI) -/* - * Modifier for FUTEX_ROBUST_UNLOCK uaddr2. Required to distinguish the storage - * size for the robust_list_head::list_pending_op. This solves two problems: - * - * 1) COMPAT tasks - * - * 2) The mixed mode magic gaming use case which has both 32-bit and 64-bit - * robust lists. Oh well.... - * - * Long story short: 32-bit userspace must set this bit unconditionally to - * ensure that it can run on a 64-bit kernel in compat mode. If user space - * screws that up a 64-bit kernel will happily clear the full 64-bits. 32-bit - * kernels return an error code if the bit is not set. - */ -#define FUTEX_ROBUST_UNLOCK_MOD_32BIT (0x1UL) -#define FUTEX_ROBUST_UNLOCK_MOD_MASK (FUTEX_ROBUST_UNLOCK_MOD_32BIT) - /* * bitset with all bits set for the FUTEX_xxx_BITSET OPs to request a * match of any bit. diff --git a/include/vdso/futex.h b/include/vdso/futex.h index 8061bfcb6b92..3cd175eefe64 100644 --- a/include/vdso/futex.h +++ b/include/vdso/futex.h @@ -2,15 +2,14 @@ #ifndef _VDSO_FUTEX_H #define _VDSO_FUTEX_H -#include - -struct robust_list; +#include /** - * __vdso_futex_robust_try_unlock - Try to unlock an uncontended robust futex + * __vdso_futex_robust_list64_try_unlock - Try to unlock an uncontended robust futex + * with a 64-bit pending op pointer * @lock: Pointer to the futex lock object * @tid: The TID of the calling task - * @op: Pointer to the task's robust_list_head::list_pending_op + * @pop: Pointer to the task's robust_list_head::list_pending_op * * Return: The content of *@lock. On success this is the same as @tid. * @@ -28,17 +27,26 @@ struct robust_list; * * User space uses it in the following way: * - * if (__vdso_futex_robust_try_unlock(lock, tid, &pending_op) != tid) + * if (__vdso_futex_robust_list64_try_unlock(lock, tid, &pending_op) != tid) * err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....); * * If the unlock attempt fails due to the FUTEX_WAITERS bit set in the lock, * then the syscall does the unlock, clears the pending op pointer and wakes the * requested number of waiters. + */ +__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop); + +/** + * __vdso_futex_robust_list32_try_unlock - Try to unlock an uncontended robust futex + * with a 32-bit pending op pointer + * @lock: Pointer to the futex lock object + * @tid: The TID of the calling task + * @pop: Pointer to the task's robust_list_head::list_pending_op + * + * Return: The content of *@lock. On success this is the same as @tid. * - * The @op pointer is intentionally void. It has the same requirements as the - * @uaddr2 argument for sys_futex(FUTEX_ROBUST_UNLOCK) operations. See the - * modifier and the related documentation in include/uapi/linux/futex.h + * Same as __vdso_futex_robust_list64_try_unlock() just with a 32-bit @pop pointer. */ -uint32_t __vdso_futex_robust_try_unlock(uint32_t *lock, uint32_t tid, void *op); +__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop); #endif diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 7957edd46b89..6a9c04471c44 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -46,6 +46,8 @@ #include #include +#include + #include "futex.h" #include "../locking/rtmutex_common.h" @@ -1434,17 +1436,9 @@ static void exit_pi_state_list(struct task_struct *curr) static inline void exit_pi_state_list(struct task_struct *curr) { } #endif -static inline bool mask_pop_addr(void __user **pop) -{ - unsigned long addr = (unsigned long)*pop; - - *pop = (void __user *) (addr & ~FUTEX_ROBUST_UNLOCK_MOD_MASK); - return !!(addr & FUTEX_ROBUST_UNLOCK_MOD_32BIT); -} - -bool futex_robust_list_clear_pending(void __user *pop) +bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags) { - bool size32bit = mask_pop_addr(&pop); + bool size32bit = !!(flags & FLAGS_ROBUST_LIST32); if (!IS_ENABLED(CONFIG_64BIT) && !size32bit) return false; @@ -1456,15 +1450,14 @@ bool futex_robust_list_clear_pending(void __user *pop) } #ifdef CONFIG_FUTEX_ROBUST_UNLOCK -void __futex_fixup_robust_unlock(struct pt_regs *regs) +void __futex_fixup_robust_unlock(struct pt_regs *regs, struct futex_unlock_cs_range *csr) { - void __user *pop; + void __user *pop = arch_futex_robust_unlock_get_pop(regs); - if (!arch_futex_needs_robust_unlock_fixup(regs)) + if (!pop) return; - pop = arch_futex_robust_unlock_get_pop(regs); - futex_robust_list_clear_pending(pop); + futex_robust_list_clear_pending(pop, csr->cs_pop_size32 ? FLAGS_ROBUST_LIST32 : 0); } #endif /* CONFIG_FUTEX_ROBUST_UNLOCK */ diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index b1aaa90f1779..31a5bae8b470 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -41,6 +41,7 @@ #define FLAGS_STRICT 0x0100 #define FLAGS_MPOL 0x0200 #define FLAGS_UNLOCK_ROBUST 0x0400 +#define FLAGS_ROBUST_LIST32 0x0800 /* FUTEX_ to FLAGS_ */ static inline unsigned int futex_to_flags(unsigned int op) @@ -56,6 +57,9 @@ static inline unsigned int futex_to_flags(unsigned int op) if (op & FUTEX_UNLOCK_ROBUST) flags |= FLAGS_UNLOCK_ROBUST; + if (op & FUTEX_ROBUST_LIST32) + flags |= FLAGS_ROBUST_LIST32; + return flags; } @@ -452,6 +456,6 @@ extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *p extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock); -bool futex_robust_list_clear_pending(void __user *pop); +bool futex_robust_list_clear_pending(void __user *pop, unsigned int flags); #endif /* _FUTEX_H */ diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index b8c76b6242e4..05ca360a7a30 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1298,7 +1298,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags, void __user *pop) if (ret || !(flags & FLAGS_UNLOCK_ROBUST)) return ret; - if (!futex_robust_list_clear_pending(pop)) + if (!futex_robust_list_clear_pending(pop, flags)) return -EFAULT; return 0; diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 45effcf42961..c9723d3d17f1 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -157,16 +157,19 @@ static bool futex_robust_unlock(u32 __user *uaddr, unsigned int flags, void __us if (!(flags & FLAGS_UNLOCK_ROBUST)) return true; - /* First unlock the futex. */ - if (put_user(0U, uaddr)) - return false; + /* First unlock the futex, which requires release semantics. */ + scoped_user_write_access(uaddr, efault) + unsafe_atomic_store_release_user(0, uaddr, efault); /* * Clear the pending list op now. If that fails, then the task is in - * deeper trouble as the robust list head is usually part of TLS. The - * chance of survival is close to zero. + * deeper trouble as the robust list head is usually part of the TLS. + * The chance of survival is close to zero. */ - return futex_robust_list_clear_pending(pop); + return futex_robust_list_clear_pending(pop, flags); + +efault: + return false; } /*