From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Hutchings Subject: [PATCH 3.16 56/76] x86/syscall: Sanitize syscall table de-references under speculation Date: Mon, 12 Mar 2018 03:06:12 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Return-path: Content-Disposition: inline In-Reply-To: Sender: stable-owner@vger.kernel.org To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: akpm@linux-foundation.org, Jinpu Wang , kernel-hardening@lists.openwall.com, Jiri Slaby , Andy Lutomirski , Linus Torvalds , Jan Beulich , alan@linux.intel.com, Thomas Gleixner , Dan Williams , gregkh@linuxfoundation.org, linux-arch@vger.kernel.org List-Id: linux-arch.vger.kernel.org 3.16.56-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Ben Hutchings commit 2fbd7af5af8665d18bcefae3e9700be07e22b681 upstream. The upstream version of this, touching C code, was written by Dan Williams, with the following description: > The syscall table base is a user controlled function pointer in kernel > space. Use array_index_nospec() to prevent any out of bounds speculation. > > While retpoline prevents speculating into a userspace directed target it > does not stop the pointer de-reference, the concern is leaking memory > relative to the syscall table base, by observing instruction cache > behavior. The x86_64 assembly version for 4.4 was written by Jiri Slaby, with the following description: > In 4.4.118, we have commit c8961332d6da (x86/syscall: Sanitize syscall > table de-references under speculation), which is a backport of upstream > commit 2fbd7af5af86. But it fixed only the C part of the upstream patch > -- the IA32 sysentry. So it ommitted completely the assembly part -- the > 64bit sysentry. > > Fix that in this patch by explicit array_index_mask_nospec written in > assembly. The same was used in lib/getuser.S. > > However, to have "sbb" working properly, we have to switch from "cmp" > against (NR_syscalls-1) to (NR_syscalls), otherwise the last syscall > number would be "and"ed by 0. It is because the original "ja" relies on > "CF" or "ZF", but we rely only on "CF" in "sbb". That means: switch to > "jae" conditional jump too. > > Final note: use rcx for mask as this is exactly what is overwritten by > the 4th syscall argument (r10) right after. In 3.16 the x86_32 syscall table lookup is also written in assembly. So I've taken Jiri's version and added similar masking in entry_32.S, using edx as the temporary. edx is clobbered by SAVE_REGS and seems to be free at this point. Cc: Dan Williams Cc: Jiri Slaby Cc: Jan Beulich Cc: Linus Torvalds Cc: Thomas Gleixner Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski Cc: alan@linux.intel.com Cc: Jinpu Wang Signed-off-by: Ben Hutchings --- --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -426,6 +426,8 @@ sysenter_past_esp: sysenter_do_call: cmpl $(NR_syscalls), %eax jae sysenter_badsys + sbb %edx, %edx /* array_index_mask_nospec() */ + and %edx, %eax call *sys_call_table(,%eax,4) sysenter_after_call: movl %eax,PT_EAX(%esp) @@ -503,6 +505,8 @@ ENTRY(system_call) cmpl $(NR_syscalls), %eax jae syscall_badsys syscall_call: + sbb %edx, %edx /* array_index_mask_nospec() */ + and %edx, %eax call *sys_call_table(,%eax,4) syscall_after_call: movl %eax,PT_EAX(%esp) # store the return value --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -445,12 +445,14 @@ GLOBAL(system_call_after_swapgs) jnz tracesys system_call_fastpath: #if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max,%rax + cmpq $NR_syscalls, %rax #else andl $__SYSCALL_MASK,%eax - cmpl $__NR_syscall_max,%eax + cmpl $NR_syscalls, %eax #endif - ja badsys + jae badsys + sbb %rcx, %rcx /* array_index_mask_nospec() */ + and %rcx, %rax movq %r10,%rcx #ifdef CONFIG_RETPOLINE movq sys_call_table(, %rax, 8), %rax @@ -577,12 +579,14 @@ tracesys: LOAD_ARGS ARGOFFSET, 1 RESTORE_REST #if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max,%rax + cmpq $NR_syscalls, %rax #else andl $__SYSCALL_MASK,%eax - cmpl $__NR_syscall_max,%eax + cmpl $NR_syscalls, %eax #endif - ja int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */ + jae int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */ + sbb %rcx, %rcx /* array_index_mask_nospec() */ + and %rcx, %rax movq %r10,%rcx /* fixup for C */ #ifdef CONFIG_RETPOLINE movq sys_call_table(, %rax, 8), %rax From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:42770 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932733AbeCLD2X (ORCPT ); Sun, 11 Mar 2018 23:28:23 -0400 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings Date: Mon, 12 Mar 2018 03:06:12 +0000 Message-ID: Subject: [PATCH 3.16 56/76] x86/syscall: Sanitize syscall table de-references under speculation In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: akpm@linux-foundation.org, Jinpu Wang , kernel-hardening@lists.openwall.com, Jiri Slaby , Andy Lutomirski , Linus Torvalds , Jan Beulich , alan@linux.intel.com, Thomas Gleixner , Dan Williams , gregkh@linuxfoundation.org, linux-arch@vger.kernel.org Message-ID: <20180312030612.uwF6YbAZWI-M1jFdh6Hf1BnQAINRUjIYtnC-aqal-I0@z> 3.16.56-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Ben Hutchings commit 2fbd7af5af8665d18bcefae3e9700be07e22b681 upstream. The upstream version of this, touching C code, was written by Dan Williams, with the following description: > The syscall table base is a user controlled function pointer in kernel > space. Use array_index_nospec() to prevent any out of bounds speculation. > > While retpoline prevents speculating into a userspace directed target it > does not stop the pointer de-reference, the concern is leaking memory > relative to the syscall table base, by observing instruction cache > behavior. The x86_64 assembly version for 4.4 was written by Jiri Slaby, with the following description: > In 4.4.118, we have commit c8961332d6da (x86/syscall: Sanitize syscall > table de-references under speculation), which is a backport of upstream > commit 2fbd7af5af86. But it fixed only the C part of the upstream patch > -- the IA32 sysentry. So it ommitted completely the assembly part -- the > 64bit sysentry. > > Fix that in this patch by explicit array_index_mask_nospec written in > assembly. The same was used in lib/getuser.S. > > However, to have "sbb" working properly, we have to switch from "cmp" > against (NR_syscalls-1) to (NR_syscalls), otherwise the last syscall > number would be "and"ed by 0. It is because the original "ja" relies on > "CF" or "ZF", but we rely only on "CF" in "sbb". That means: switch to > "jae" conditional jump too. > > Final note: use rcx for mask as this is exactly what is overwritten by > the 4th syscall argument (r10) right after. In 3.16 the x86_32 syscall table lookup is also written in assembly. So I've taken Jiri's version and added similar masking in entry_32.S, using edx as the temporary. edx is clobbered by SAVE_REGS and seems to be free at this point. Cc: Dan Williams Cc: Jiri Slaby Cc: Jan Beulich Cc: Linus Torvalds Cc: Thomas Gleixner Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski Cc: alan@linux.intel.com Cc: Jinpu Wang Signed-off-by: Ben Hutchings --- --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -426,6 +426,8 @@ sysenter_past_esp: sysenter_do_call: cmpl $(NR_syscalls), %eax jae sysenter_badsys + sbb %edx, %edx /* array_index_mask_nospec() */ + and %edx, %eax call *sys_call_table(,%eax,4) sysenter_after_call: movl %eax,PT_EAX(%esp) @@ -503,6 +505,8 @@ ENTRY(system_call) cmpl $(NR_syscalls), %eax jae syscall_badsys syscall_call: + sbb %edx, %edx /* array_index_mask_nospec() */ + and %edx, %eax call *sys_call_table(,%eax,4) syscall_after_call: movl %eax,PT_EAX(%esp) # store the return value --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -445,12 +445,14 @@ GLOBAL(system_call_after_swapgs) jnz tracesys system_call_fastpath: #if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max,%rax + cmpq $NR_syscalls, %rax #else andl $__SYSCALL_MASK,%eax - cmpl $__NR_syscall_max,%eax + cmpl $NR_syscalls, %eax #endif - ja badsys + jae badsys + sbb %rcx, %rcx /* array_index_mask_nospec() */ + and %rcx, %rax movq %r10,%rcx #ifdef CONFIG_RETPOLINE movq sys_call_table(, %rax, 8), %rax @@ -577,12 +579,14 @@ tracesys: LOAD_ARGS ARGOFFSET, 1 RESTORE_REST #if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max,%rax + cmpq $NR_syscalls, %rax #else andl $__SYSCALL_MASK,%eax - cmpl $__NR_syscall_max,%eax + cmpl $NR_syscalls, %eax #endif - ja int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */ + jae int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */ + sbb %rcx, %rcx /* array_index_mask_nospec() */ + and %rcx, %rax movq %r10,%rcx /* fixup for C */ #ifdef CONFIG_RETPOLINE movq sys_call_table(, %rax, 8), %rax