From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752397AbbCWH4r (ORCPT ); Mon, 23 Mar 2015 03:56:47 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:37628 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752369AbbCWH4m (ORCPT ); Mon, 23 Mar 2015 03:56:42 -0400 Date: Mon, 23 Mar 2015 08:56:37 +0100 From: Ingo Molnar To: Brian Gerst Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Denys Vlasenko , Andy Lutomirski , Borislav Petkov , "H. Peter Anvin" , Linus Torvalds Subject: Re: [PATCH] x86: execve and sigreturn syscalls must return via iret Message-ID: <20150323075637.GA25620@gmail.com> References: <1426978461-32089-1-git-send-email-brgerst@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1426978461-32089-1-git-send-email-brgerst@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Brian Gerst wrote: > Both the execve and sigreturn family of syscalls have the ability to change > registers in ways that may not be compatabile with the syscall path they > were called from. In particular, sysret and sysexit can't handle non-default > %cs and %ss, and some bits in eflags. These syscalls have stubs that are > hardcoded to jump to the iret path, and not return to the original syscall > path. Commit 76f5df43cab5e765c0bd42289103e8f625813ae1 (Always allocate a > complete "struct pt_regs" on the kernel stack) recently changed this for > some 32-bit compat syscalls, but introduced a bug where execve from a 32-bit > program to a 64-bit program would fail because it still returned via sysretl. > This caused Wine to fail when built for both 32-bit and 64-bit. > > This patch sets TIF_NOTIFY_RESUME for execve and sigreturn so that the iret > path is always taken on exit to userspace. > > Signed-off-by: Brian Gerst > Cc: Ingo Molnar > Cc: Denys Vlasenko > Cc: Andy Lutomirski > Cc: Borislav Petkov > Cc: H. Peter Anvin > Cc: Linus Torvalds > --- > arch/x86/ia32/ia32_signal.c | 2 ++ > arch/x86/include/asm/ptrace.h | 2 +- > arch/x86/include/asm/thread_info.h | 7 +++++++ > arch/x86/kernel/process_32.c | 6 +----- > arch/x86/kernel/process_64.c | 1 + > arch/x86/kernel/signal.c | 2 ++ > 6 files changed, 14 insertions(+), 6 deletions(-) Applied the fix to tip:x86/asm, thanks Brian! > + > +/* > + * force syscall return via iret by making it look as if there was > + * some work pending. > +*/ > +#define force_iret() set_thread_flag(TIF_NOTIFY_RESUME) I extended this comment to: /* * Force syscall return via IRET by making it look as if there was * some work pending. IRET is our most capable (but slowest) syscall * return path, which is able to restore modified SS, CS and certain * EFLAGS values that other (fast) syscall return instructions * are not able to restore properly. */ #define force_iret() set_thread_flag(TIF_NOTIFY_RESUME) Just to preserve the underlying reason for force_iret() for the future and such. Btw., it might be a worthwile optimization to detect non-standard SS, CS and EFLAGS values and only force_iret() in that case, that will speed up 99.9999% of execve() and sigreturn() syscalls and only force the 'weird' process startup modes into the slow return path. >>From an access security POV it should be a relatively safe optimization: if we get it wrong then we don't allow certain ABI angles, but we won't make the kernel unsafe AFAICS. Thanks, Ingo