From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756690AbcBIMGY (ORCPT ); Tue, 9 Feb 2016 07:06:24 -0500 Received: from mail.skyhub.de ([78.46.96.112]:39442 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753609AbcBIMGV (ORCPT ); Tue, 9 Feb 2016 07:06:21 -0500 Date: Tue, 9 Feb 2016 13:06:17 +0100 From: Borislav Petkov To: Andy Lutomirski Cc: X86 ML , "linux-kernel@vger.kernel.org" , Brian Gerst , Denys Vlasenko , Stas Sergeev , Cyrill Gorcunov , Pavel Emelyanov Subject: Re: [PATCH v3 2/4] x86/signal/64: Fix SS if needed when delivering a 64-bit signal Message-ID: <20160209120617.GC4119@pd.tnic> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 25, 2016 at 01:34:13PM -0800, Andy Lutomirski wrote: > Signals are always delivered to 64-bit tasks with CS set to a long > mode segment. In long mode, SS doesn't matter as long as it's a > present writable segment. > > If SS starts out invalid (this can happen if the signal was caused > by an IRET fault or was delivered on the way out of set_thread_area > or modify_ldt), then IRET to the signal handler can fail, eventually > killing the task. > > The straightforward fix would be to simply reset SS when delivering > a signal. That breaks DOSEMU, though: 64-bit builds of DOSEMU rely > on SS being set to the faulting SS when signals are delivered. > > As a compromise, this patch leaves SS alone so long as it's valid. > > The net effect should be that the behavior of successfully delivered > signals is unchanged. Some signals that would previously have > failed to be delivered will now be delivered successfully. > > This has no effect for x32 or 32-bit tasks: their signal handlers > were already called with SS == __USER_DS. > > (On Xen, there's a slight hole: if a task sets SS to a writable > *kernel* data segment, then we will fail to identify it as invalid > and we'll still kill the task. If anyone cares, this could be fixed > with a new paravirt hook.) > > Signed-off-by: Andy Lutomirski > --- > arch/x86/include/asm/desc_defs.h | 23 ++++++++++++++++++ > arch/x86/kernel/signal.c | 51 ++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 72 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h > index 278441f39856..00971705a16d 100644 > --- a/arch/x86/include/asm/desc_defs.h > +++ b/arch/x86/include/asm/desc_defs.h > @@ -98,4 +98,27 @@ struct desc_ptr { > > #endif /* !__ASSEMBLY__ */ > > +/* Access rights as returned by LAR */ > +#define AR_TYPE_RODATA (0 * (1 << 9)) > +#define AR_TYPE_RWDATA (1 * (1 << 9)) > +#define AR_TYPE_RODATA_EXPDOWN (2 * (1 << 9)) > +#define AR_TYPE_RWDATA_EXPDOWN (3 * (1 << 9)) > +#define AR_TYPE_XOCODE (4 * (1 << 9)) > +#define AR_TYPE_XRCODE (5 * (1 << 9)) > +#define AR_TYPE_XOCODE_CONF (6 * (1 << 9)) > +#define AR_TYPE_XRCODE_CONF (7 * (1 << 9)) > +#define AR_TYPE_MASK (7 * (1 << 9)) > + > +#define AR_DPL0 (0 * (1 << 13)) > +#define AR_DPL3 (3 * (1 << 13)) > +#define AR_DPL_MASK (3 * (1 << 13)) > + > +#define AR_A (1 << 8) /* A means "accessed" */ > +#define AR_S (1 << 12) /* S means "not system" */ Ah, with "not system" you want to say that S=0b makes it a system descriptor and S=1b a user. I think the SDM calls it more descriptively the "S (descriptor type) flag" while the APM calls it simply the S-field or S-bit. I like "S (descriptor type) flag" more than "not system". :) > +#define AR_P (1 << 15) /* P means "present" */ > +#define AR_AVL (1 << 20) /* AVL does nothing */ AVL = AVaiLable to software > +#define AR_L (1 << 21) /* L means "long mode" */ > +#define AR_DB (1 << 22) /* D or B, depending on type */ > +#define AR_G (1 << 23) /* G means "limit in pages" */ Please use the names from the processor manuals. G is the Granularity bit. "limit in pages" is only clear to the people who have already read the Granularity bit description. :-) > #endif /* _ASM_X86_DESC_DEFS_H */ > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c > index cb6282c3638f..bb3e4208d90d 100644 > --- a/arch/x86/kernel/signal.c > +++ b/arch/x86/kernel/signal.c > @@ -61,6 +61,35 @@ > regs->seg = GET_SEG(seg) | 3; \ > } while (0) > > +#ifdef CONFIG_X86_64 You already have an #else /* !CONFIG_X86_32 */ block above the 64-bit version of __setup_rt_frame(). Just put force_valid_ss() there without that additional ifdef. That file's ifdeffery is beyond any readability anyway. > +/* > + * If regs->ss will cause an IRET fault, change it. Otherwise leave it > + * alone. Using this generally makes no sense unless > + * user_64bit_mode(regs) would return true. > + */ > +static void force_valid_ss(struct pt_regs *regs) > +{ > + u32 ar; > + asm volatile ("lar %[old_ss], %[ar]\n\t" > + "jz 1f\n\t" /* If invalid: */ > + "xorl %[ar], %[ar]\n\t" /* set ar = 0 */ > + "1:" > + : [ar] "=r" (ar) > + : [old_ss] "rm" ((u16)regs->ss)); > + > + /* > + * For a valid 64-bit user context, we need DPL 3, type > + * read-write data or read-write exp-down data, and S and P > + * set. We can't use VERW because VERW doesn't check the > + * P bit. > + */ > + ar &= AR_DPL_MASK | AR_S | AR_P | AR_TYPE_MASK; > + if (ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA) && > + ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN)) > + regs->ss = __USER_DS; > +} > +#endif > + > int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc) > { > unsigned long buf_val; > @@ -459,10 +488,28 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig, > > regs->sp = (unsigned long)frame; > > - /* Set up the CS register to run signal handlers in 64-bit mode, > - even if the handler happens to be interrupting 32-bit code. */ > + /* > + * Set up the CS and SS registers to run signal handlers in > + * 64-bit mode, even if the handler happens to be interrupting > + * 32-bit or 16-bit code. > + * > + * SS is subtle. In 64-bit mode, we don't need any particular > + * SS descriptor, but we do need SS to be valid. It's possible > + * that the old SS is entirely bogus -- this can happen if the > + * signal we're trying to deliver is #GP or #SS caused by a bad > + * SS value. We also have a compatbility issue here: DOSEMU > + * relies on the contents of the SS register indicating the > + * SS value at the time of the signal, even though that code in > + * DOSEMU predates sigreturn's ability to restore SS. (DOSEMU > + * avoids relying on sigreturn to restore SS; instead it uses > + * a trampoline.) So we do our best: if the old SS was valid, > + * we keep it. Otherwise we replace it. > + */ > regs->cs = __USER_CS; > > + if (unlikely(regs->ss != __USER_DS)) So this is fast path AFAICT and from adding a gdb breakpoint here. I guess we can't do the opt-in behavior and patch it out when users don't want to run dosemu. Or maybe we could add a CONFIG_CHECK_OLD_SS which is default y and people can disable it... so an opt-out behavior :) Hmmm... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.