From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 2ECD56B0069 for ; Tue, 16 Jan 2018 11:52:54 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id d17so15177409ioc.23 for ; Tue, 16 Jan 2018 08:52:54 -0800 (PST) Received: from merlin.infradead.org (merlin.infradead.org. [2001:8b0:10b:1231::1]) by mx.google.com with ESMTPS id d190si2123909iog.214.2018.01.16.08.52.49 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 16 Jan 2018 08:52:49 -0800 (PST) Date: Tue, 16 Jan 2018 17:52:13 +0100 From: Peter Zijlstra Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180116165213.GF2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516120619-1159-7-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > From: Joerg Roedel > > Reserve 2MB/4MB of address space for mapping the LDT to > user-space. LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives 64K*2*64=8M > 2M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 005EA6B025F for ; Tue, 16 Jan 2018 12:13:45 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id d63so2483771wma.4 for ; Tue, 16 Jan 2018 09:13:45 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id 93si1621453edk.340.2018.01.16.09.13.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 09:13:44 -0800 (PST) Date: Tue, 16 Jan 2018 18:13:43 +0100 From: Joerg Roedel Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180116171343.GB28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180116165213.GF2228@hirez.programming.kicks-ass.net> Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Hi Peter, On Tue, Jan 16, 2018 at 05:52:13PM +0100, Peter Zijlstra wrote: > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > > From: Joerg Roedel > > > > Reserve 2MB/4MB of address space for mapping the LDT to > > user-space. > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > 64K*2*64=8M > 2M. Thanks, I'll fix that in the next version. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 610B26B0069 for ; Tue, 16 Jan 2018 12:31:45 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id m4so15405003iob.16 for ; Tue, 16 Jan 2018 09:31:45 -0800 (PST) Received: from merlin.infradead.org (merlin.infradead.org. [2001:8b0:10b:1231::1]) by mx.google.com with ESMTPS id z5si2467901itd.105.2018.01.16.09.31.44 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 16 Jan 2018 09:31:44 -0800 (PST) Date: Tue, 16 Jan 2018 18:31:15 +0100 From: Peter Zijlstra Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180116173115.GG2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> <20180116171343.GB28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180116171343.GB28161@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, Jan 16, 2018 at 06:13:43PM +0100, Joerg Roedel wrote: > Hi Peter, > > On Tue, Jan 16, 2018 at 05:52:13PM +0100, Peter Zijlstra wrote: > > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > > > From: Joerg Roedel > > > > > > Reserve 2MB/4MB of address space for mapping the LDT to > > > user-space. > > > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > > 64K*2*64=8M > 2M. > > Thanks, I'll fix that in the next version. Just lower the max SMP setting until it fits or something. 32bit is too address space starved for lots of CPU in any case, 64 CPUs on 32bit is absolutely insane. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 7953E6B026B for ; Tue, 16 Jan 2018 12:35:01 -0500 (EST) Received: by mail-oi0-f72.google.com with SMTP id f71so6721998oib.6 for ; Tue, 16 Jan 2018 09:35:01 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id d71si1098453oic.259.2018.01.16.09.35.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 09:35:00 -0800 (PST) Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> <20180116171343.GB28161@8bytes.org> <20180116173115.GG2228@hirez.programming.kicks-ass.net> From: Waiman Long Message-ID: <13a45e59-5969-2fdb-25cd-adcd5298784b@redhat.com> Date: Tue, 16 Jan 2018 12:34:36 -0500 MIME-Version: 1.0 In-Reply-To: <20180116173115.GG2228@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/16/2018 12:31 PM, Peter Zijlstra wrote: > On Tue, Jan 16, 2018 at 06:13:43PM +0100, Joerg Roedel wrote: >> Hi Peter, >> >> On Tue, Jan 16, 2018 at 05:52:13PM +0100, Peter Zijlstra wrote: >>> On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: >>>> From: Joerg Roedel >>>> >>>> Reserve 2MB/4MB of address space for mapping the LDT to >>>> user-space. >>> LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives >>> 64K*2*64=8M > 2M. >> Thanks, I'll fix that in the next version. > Just lower the max SMP setting until it fits or something. 32bit is too > address space starved for lots of CPU in any case, 64 CPUs on 32bit is > absolutely insane. Maybe we can just scale the amount of reserved space according to the current NR_CPUS setting. In this way, we won't waste more memory than is necessary. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id C60E16B026F for ; Tue, 16 Jan 2018 13:03:11 -0500 (EST) Received: by mail-pl0-f69.google.com with SMTP id f4so3593627plr.14 for ; Tue, 16 Jan 2018 10:03:11 -0800 (PST) Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v127si452568pgv.669.2018.01.16.10.03.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 10:03:10 -0800 (PST) Subject: Re: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-8-git-send-email-joro@8bytes.org> From: Dave Hansen Message-ID: <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> Date: Tue, 16 Jan 2018 10:03:09 -0800 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-8-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/16/2018 08:36 AM, Joerg Roedel wrote: > +/* > + * Page table pages are page-aligned. The lower half of the top > + * level is used for userspace and the top half for the kernel. > + * > + * Returns true for parts of the PGD that map userspace and > + * false for the parts that map the kernel. > + */ > +static inline bool pgdp_maps_userspace(void *__ptr) > +{ > + unsigned long ptr = (unsigned long)__ptr; > + > + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); > +} One of the reasons to implement it the other way: - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); is that the compiler can do this all quickly. KERNEL_PGD_BOUNDARY depends on PAGE_OFFSET which depends on a variable. IOW, the compiler can't do it. How much worse is the code that this generates? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id CCC836B0271 for ; Tue, 16 Jan 2018 13:06:55 -0500 (EST) Received: by mail-pf0-f197.google.com with SMTP id j26so12282082pff.8 for ; Tue, 16 Jan 2018 10:06:55 -0800 (PST) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id w23si2283138pfk.337.2018.01.16.10.06.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 10:06:54 -0800 (PST) Subject: Re: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-11-git-send-email-joro@8bytes.org> From: Dave Hansen Message-ID: Date: Tue, 16 Jan 2018 10:06:48 -0800 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-11-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > In PAE page-tables at the top-level most bits we usually set > with _KERNPG_TABLE are reserved, resulting in a #GP when > they are loaded by the processor. Can you save me the trip to the SDM and remind me which bits actually cause trouble here? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 5765A6B0275 for ; Tue, 16 Jan 2018 13:11:16 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id k4so9780015pgq.15 for ; Tue, 16 Jan 2018 10:11:16 -0800 (PST) Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id r3si2363775plo.432.2018.01.16.10.11.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 10:11:15 -0800 (PST) Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> From: Dave Hansen Message-ID: Date: Tue, 16 Jan 2018 10:11:14 -0800 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-13-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/16/2018 08:36 AM, Joerg Roedel wrote: > +#ifdef CONFIG_X86_64 > /* > * If this is normal user memory, make it NX in the kernel > * pagetables so that, if we somehow screw up and return to > @@ -134,10 +135,16 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) > * may execute from it > * - we don't have NX support > * - we're clearing the PGD (i.e. the new pgd is not present). > + * - We run on a 32 bit kernel. 2-level paging doesn't support NX at > + * all and PAE paging does not support it on the PGD level. We can > + * set it in the PMD level there in the future, but that means we > + * need to unshare the PMDs between the kernel and the user > + * page-tables. > */ > if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && > (__supported_pte_mask & _PAGE_NX)) > pgd.pgd |= _PAGE_NX; > +#endif Ugh. The ghosts of PAE have come back to haunt us. Could we do: static inline bool pgd_supports_nx(unsigned long) { #ifdef CONFIG_X86_64 return (__supported_pte_mask & _PAGE_NX); #else /* No 32-bit page tables support NX at PGD level */ return 0; #endif } Nobody will ever spot the #ifdef the way you laid it out. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id BC38F6B0277 for ; Tue, 16 Jan 2018 13:14:21 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id f5so9701282pgp.18 for ; Tue, 16 Jan 2018 10:14:21 -0800 (PST) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id o1si2420326pld.310.2018.01.16.10.14.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 10:14:20 -0800 (PST) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Dave Hansen Message-ID: <1c7da3dc-279a-fa07-247b-7596cf758a55@intel.com> Date: Tue, 16 Jan 2018 10:14:19 -0800 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Joerg, Very cool!. I really appreciate you putting this together. I don't see any real showstoppers or things that I think will *break* 64-bit. I just hope that we can merge this _slowly_ in case it breaks 64-bit along the way. I didn't look at the assembly in too much detail. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 9963B6B027C for ; Tue, 16 Jan 2018 13:36:04 -0500 (EST) Received: by mail-wr0-f200.google.com with SMTP id y18so7887484wrh.12 for ; Tue, 16 Jan 2018 10:36:04 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id r185si2220415wma.190.2018.01.16.10.36.02 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 10:36:02 -0800 (PST) Date: Tue, 16 Jan 2018 19:35:51 +0100 (CET) From: Thomas Gleixner Subject: Re: [PATCH 01/16] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack In-Reply-To: <1516120619-1159-2-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-2-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > From: Joerg Roedel > > The stack addresss doesn't need to be stored in tss.sp0 if > we switch manually like on sysenter. Rename the offset so > that it still makes sense when we its location. -ENOSENTENCE Other than that. Makes sense. > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 2 +- > arch/x86/kernel/asm-offsets_32.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index a1f28a54f23a..eb8c5615777b 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -401,7 +401,7 @@ ENTRY(xen_sysenter_target) > * 0(%ebp) arg6 > */ > ENTRY(entry_SYSENTER_32) > - movl TSS_sysenter_sp0(%esp), %esp > + movl TSS_sysenter_stack(%esp), %esp > .Lsysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp /* pt_regs->sp (stashed in bp) */ > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c > index fa1261eefa16..654229bac2fc 100644 > --- a/arch/x86/kernel/asm-offsets_32.c > +++ b/arch/x86/kernel/asm-offsets_32.c > @@ -47,7 +47,7 @@ void foo(void) > BLANK(); > > /* Offset from the sysenter stack to tss.sp0 */ > - DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > offsetofend(struct cpu_entry_area, entry_stack_page.stack)); > > #ifdef CONFIG_CC_STACKPROTECTOR > -- > 2.13.6 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f70.google.com (mail-it0-f70.google.com [209.85.214.70]) by kanga.kvack.org (Postfix) with ESMTP id A87486B027D for ; Tue, 16 Jan 2018 13:59:03 -0500 (EST) Received: by mail-it0-f70.google.com with SMTP id u4so4503306iti.2 for ; Tue, 16 Jan 2018 10:59:03 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 103sor1395796iok.306.2018.01.16.10.59.02 for (Google Transport Security); Tue, 16 Jan 2018 10:59:02 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Linus Torvalds Date: Tue, 16 Jan 2018 10:59:01 -0800 Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) Yes, I'm very happy to see that this is actually not nearly as bad as I feared it might be, Some of those #ifdef's in the PTI code you added might want more commentary about what the exact differences are. And maybe they could be done more cleanly with some abstraction. But nothing looked _horrible_. > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. .. and please run all the segment and syscall selfchecks that Andy has written. But yes, checking bare metal, and checking the "odd" applications like Wine and dosemu (and kvm etc) within the PTI kernel is certainly a good idea. > One of the things that are surely broken is XEN_PV support. > I'd appreciate any help with testing and bugfixing on that > front. Xen PV and PTI don't work together even on x86-64 afaik, the Xen people apparently felt it wasn't worth it. See the if (hypervisor_is_type(X86_HYPER_XEN_PV)) { pti_print_if_insecure("disabled on XEN PV."); return; } in pti_check_boottime_disable(). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id B2F356B0280 for ; Tue, 16 Jan 2018 14:02:30 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id x24so9839849pge.13 for ; Tue, 16 Jan 2018 11:02:30 -0800 (PST) Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id p13si2325603plo.628.2018.01.16.11.02.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:02:29 -0800 (PST) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Dave Hansen Message-ID: <90748aea-6fc0-48a5-d154-c98465fea42c@intel.com> Date: Tue, 16 Jan 2018 11:02:28 -0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On 01/16/2018 10:59 AM, Linus Torvalds wrote: >> The code has not run on bare-metal yet, I'll test that in >> the next days once I setup a 32 bit box again. I also havn't >> tested Wine and DosEMU yet, so this might also be broken. > .. and please run all the segment and syscall selfchecks that Andy has written. > > But yes, checking bare metal, and checking the "odd" applications like > Wine and dosemu (and kvm etc) within the PTI kernel is certainly a > good idea. I tried to document a list of the "gotchas" that tripped us up during the 64-bit effort under "Testing": > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/pti&id=01c9b17bf673b05bb401b76ec763e9730ccf1376 NMIs were a biggie too. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 4E7F26B0287 for ; Tue, 16 Jan 2018 14:31:37 -0500 (EST) Received: by mail-wr0-f199.google.com with SMTP id b111so4364565wrd.16 for ; Tue, 16 Jan 2018 11:31:37 -0800 (PST) Received: from SMTP.EU.CITRIX.COM (smtp.ctxuk.citrix.com. [185.25.65.24]) by mx.google.com with ESMTPS id p6si264118edh.215.2018.01.16.11.31.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:31:36 -0800 (PST) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Andrew Cooper Message-ID: Date: Tue, 16 Jan 2018 19:21:00 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Language: en-GB Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel , Juergen Gross , Jan Beulich On 16/01/18 18:59, Linus Torvalds wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >> One of the things that are surely broken is XEN_PV support. >> I'd appreciate any help with testing and bugfixing on that >> front. > Xen PV and PTI don't work together even on x86-64 afaik, the Xen > people apparently felt it wasn't worth it. See the > > if (hypervisor_is_type(X86_HYPER_XEN_PV)) { > pti_print_if_insecure("disabled on XEN PV."); > return; > } 64bit PV guests under Xen already have split pagetables.A It is a base and necessary part of the ABI, because segment limits stopped working in 64bit. 32bit PV guests aren't split, but by far the most efficient way of doing this is to introduce a new enlightenment and have Xen switch all this stuff (and IBRS, for that matter) on behalf of the guest kernel on context switch. ~Andrew -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 1C6906B028B for ; Tue, 16 Jan 2018 14:35:12 -0500 (EST) Received: by mail-wr0-f200.google.com with SMTP id 31so6543844wru.0 for ; Tue, 16 Jan 2018 11:35:12 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id c21si19974wrc.92.2018.01.16.11.35.10 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 11:35:10 -0800 (PST) Date: Tue, 16 Jan 2018 20:34:59 +0100 (CET) From: Thomas Gleixner Subject: Re: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h In-Reply-To: <20180116191105.GC28161@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-8-git-send-email-joro@8bytes.org> <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> <20180116191105.GC28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Dave Hansen , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > On Tue, Jan 16, 2018 at 10:03:09AM -0800, Dave Hansen wrote: > > On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > > + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); > > > +} > > > > One of the reasons to implement it the other way: > > > > - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); > > > > is that the compiler can do this all quickly. KERNEL_PGD_BOUNDARY > > depends on PAGE_OFFSET which depends on a variable. IOW, the compiler > > can't do it. > > > > How much worse is the code that this generates? > > I havn't looked at the actual code this generates, but the > (PAGE_SIZE / 2) comparison doesn't work on 32 bit where the address > space is not always evenly split. I'll look into a better way to check > this. It should be trivial enough to do return (ptr & ~PAGE_MASK) < PGD_SPLIT_SIZE); and define it PAGE_SIZE/2 for 64bit and for PAE make it depend on the configured address space split. Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 1C1156B0290 for ; Tue, 16 Jan 2018 14:44:26 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id a80so1052317wme.2 for ; Tue, 16 Jan 2018 11:44:26 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id d8si2458730edn.329.2018.01.16.11.44.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:44:25 -0800 (PST) Date: Tue, 16 Jan 2018 20:44:24 +0100 From: Joerg Roedel Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's Message-ID: <20180116194424.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, Jan 16, 2018 at 10:11:14AM -0800, Dave Hansen wrote: > > Ugh. The ghosts of PAE have come back to haunt us. :-) Yeah, PAE caused the most trouble for me while getting this running. > > Could we do: > > static inline bool pgd_supports_nx(unsigned long) > { > #ifdef CONFIG_X86_64 > return (__supported_pte_mask & _PAGE_NX); > #else > /* No 32-bit page tables support NX at PGD level */ > return 0; > #endif > } > > Nobody will ever spot the #ifdef the way you laid it out. Right, thats a better way to do it. I'll change it in the next version. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 650C16B0293 for ; Tue, 16 Jan 2018 14:46:16 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id 31so6561885wru.0 for ; Tue, 16 Jan 2018 11:46:16 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id f24si2785593edc.398.2018.01.16.11.46.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:46:15 -0800 (PST) Date: Tue, 16 Jan 2018 20:46:14 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180116194614.GF28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1c7da3dc-279a-fa07-247b-7596cf758a55@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1c7da3dc-279a-fa07-247b-7596cf758a55@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, Jan 16, 2018 at 10:14:19AM -0800, Dave Hansen wrote: > Joerg, > > Very cool!. Thanks :) > I really appreciate you putting this together. I don't see any real > showstoppers or things that I think will *break* 64-bit. I just hope > that we can merge this _slowly_ in case it breaks 64-bit along the way. Sure, it needs a lot more testing and most likely fixing anyway. So there is still some way to go before this is ready for merging. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 7220B6B0294 for ; Tue, 16 Jan 2018 14:55:45 -0500 (EST) Received: by mail-wr0-f200.google.com with SMTP id h1so11417754wre.20 for ; Tue, 16 Jan 2018 11:55:45 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id h2si2689332edf.540.2018.01.16.11.55.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:55:44 -0800 (PST) Date: Tue, 16 Jan 2018 20:55:43 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180116195543.GG28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Hi Linus, On Tue, Jan 16, 2018 at 10:59:01AM -0800, Linus Torvalds wrote: > Yes, I'm very happy to see that this is actually not nearly as bad as > I feared it might be, Yeah, I was looking at the original PTI patches and my impression was that a lot of the complicated stuff (like setting up the cpu_entry_area) was already in there for 32 bit too. So it was mostly about the entry code and some changes to the 32bit page-table code. > Some of those #ifdef's in the PTI code you added might want more > commentary about what the exact differences are. And maybe they could > be done more cleanly with some abstraction. But nothing looked > _horrible_. I'll add more comments and better abstraction, Dave has already suggested some improvements here. Reading some of my comments again, they need a rework anyway. > .. and please run all the segment and syscall selfchecks that Andy has written. Didn't know about them yet, thanks. I will run them too in my testing > Xen PV and PTI don't work together even on x86-64 afaik, the Xen > people apparently felt it wasn't worth it. See the > > if (hypervisor_is_type(X86_HYPER_XEN_PV)) { > pti_print_if_insecure("disabled on XEN PV."); > return; > } > > in pti_check_boottime_disable(). But I might have broken something for them anyway, honestly I didn't pay much attention to the XEN_PV case as I was trying to get it running here. My hope is that someone who knows Xen better than I do will help out :) Regards, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id D18826B0296 for ; Tue, 16 Jan 2018 15:30:27 -0500 (EST) Received: by mail-wr0-f200.google.com with SMTP id 31so6626372wru.0 for ; Tue, 16 Jan 2018 12:30:27 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id z66si2372723wmh.76.2018.01.16.12.30.26 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 12:30:26 -0800 (PST) Date: Tue, 16 Jan 2018 21:30:14 +0100 (CET) From: Thomas Gleixner Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack In-Reply-To: <1516120619-1159-3-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) > /* This is used when switching tasks or entering/exiting vm86 mode. */ > static inline void update_sp0(struct task_struct *task) > { > - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ > -#ifdef CONFIG_X86_32 > - load_sp0(task->thread.sp0); > -#else > + /* sp0 always points to the entry trampoline stack, which is constant: */ > if (static_cpu_has(X86_FEATURE_XENPV)) > load_sp0(task_top_of_stack(task)); > -#endif > } > > #endif /* _ASM_X86_SWITCH_TO_H */ > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c > index 654229bac2fc..7270dd834f4b 100644 > --- a/arch/x86/kernel/asm-offsets_32.c > +++ b/arch/x86/kernel/asm-offsets_32.c > @@ -47,9 +47,11 @@ void foo(void) > BLANK(); > > /* Offset from the sysenter stack to tss.sp0 */ > - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - > offsetofend(struct cpu_entry_area, entry_stack_page.stack)); > > + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); Can you please split out the change of TSS_sysenter_stack into a separate patch? Other than that, this looks good. Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 978236B028A for ; Tue, 16 Jan 2018 16:03:30 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id r63so2759789wmb.9 for ; Tue, 16 Jan 2018 13:03:30 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id r10si2538459wrr.500.2018.01.16.13.03.29 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 13:03:29 -0800 (PST) Date: Tue, 16 Jan 2018 22:03:19 +0100 (CET) From: Thomas Gleixner Subject: Re: [PATCH 09/16] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 In-Reply-To: <1516120619-1159-10-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-10-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > +#ifdef CONFIG_X86_64 > /* > * Clone a single p4d (i.e. a top-level entry on 4-level systems and a > * next-level entry on 5-level systems. > @@ -322,13 +323,29 @@ static void __init pti_clone_p4d(unsigned long addr) > kernel_p4d = p4d_offset(kernel_pgd, addr); > *user_p4d = *kernel_p4d; > } > +#endif > > /* > * Clone the CPU_ENTRY_AREA into the user space visible page table. > */ > static void __init pti_clone_user_shared(void) > { > +#ifdef CONFIG_X86_32 > + /* > + * On 32 bit PAE systems with 1GB of Kernel address space there is only > + * one pgd/p4d for the whole kernel. Cloning that would map the whole > + * address space into the user page-tables, making PTI useless. So clone > + * the page-table on the PMD level to prevent that. > + */ > + unsigned long start, end; > + > + start = CPU_ENTRY_AREA_BASE; > + end = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES); > + > + pti_clone_pmds(start, end, _PAGE_GLOBAL); > +#else > pti_clone_p4d(CPU_ENTRY_AREA_BASE); > +#endif > } Just a minor nit. You already wrap pti_clone_p4d() into X86_64. So it would be cleaner to do: kernel_p4d = p4d_offset(kernel_pgd, addr); *user_p4d = *kernel_p4d; } static void __init pti_clone_user_shared(void) { pti_clone_p4d(CPU_ENTRY_AREA_BASE); } #else /* CONFIG_X86_64 */ /* * Big fat comment. */ static void __init pti_clone_user_shared(void) { .... } #endif /* !CONFIG_X86_64 */ Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 4C90D6B029D for ; Tue, 16 Jan 2018 16:06:53 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id 31so1771935wri.9 for ; Tue, 16 Jan 2018 13:06:53 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id o204si2402985wma.183.2018.01.16.13.06.52 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 13:06:52 -0800 (PST) Date: Tue, 16 Jan 2018 22:06:48 +0100 (CET) From: Thomas Gleixner Subject: Re: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries In-Reply-To: <1516120619-1159-11-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-11-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > From: Joerg Roedel > > With PAE paging we don't have PGD and P4D levels in the > page-table, instead the PUD level is the highest one. > > In PAE page-tables at the top-level most bits we usually set > with _KERNPG_TABLE are reserved, resulting in a #GP when > they are loaded by the processor. > > Work around this by populating PUD entries in the user > page-table only with _PAGE_PRESENT set. > > I am pretty sure there is a cleaner way to do this, but > until I find it use this #ifdef solution. Stick somehting like #define _KERNELPG_TABLE_PUD_ENTRY into the 32 and 64 bit variants of some relevant header file Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id C213E280247 for ; Tue, 16 Jan 2018 16:10:59 -0500 (EST) Received: by mail-wr0-f197.google.com with SMTP id 33so8747643wrs.3 for ; Tue, 16 Jan 2018 13:10:59 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id z74si2349778wmc.120.2018.01.16.13.10.58 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 13:10:58 -0800 (PST) Date: Tue, 16 Jan 2018 22:10:52 +0100 (CET) From: Thomas Gleixner Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's In-Reply-To: <1516120619-1159-13-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > > +#ifdef CONFIG_X86_64 > /* > * If this is normal user memory, make it NX in the kernel > * pagetables so that, if we somehow screw up and return to > @@ -134,10 +135,16 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) > * may execute from it > * - we don't have NX support > * - we're clearing the PGD (i.e. the new pgd is not present). > + * - We run on a 32 bit kernel. 2-level paging doesn't support NX at > + * all and PAE paging does not support it on the PGD level. We can > + * set it in the PMD level there in the future, but that means we > + * need to unshare the PMDs between the kernel and the user > + * page-tables. > */ > if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && > (__supported_pte_mask & _PAGE_NX)) > pgd.pgd |= _PAGE_NX; I'd suggest to have: static inline pteval_t supported_pgd_mask(void) { if (IS_ENABLED(CONFIG_X86_64)) return __supported_pte_mask; return __supported_pte_mask & ~_PAGE_NX); } and get rid of the ifdeffery completely. Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id CBE81280247 for ; Tue, 16 Jan 2018 16:15:22 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id s22so1013633pfh.21 for ; Tue, 16 Jan 2018 13:15:22 -0800 (PST) Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id y20si2666775pfj.54.2018.01.16.13.15.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 13:15:21 -0800 (PST) Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> From: Dave Hansen Message-ID: Date: Tue, 16 Jan 2018 13:15:21 -0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/16/2018 01:10 PM, Thomas Gleixner wrote: > > static inline pteval_t supported_pgd_mask(void) > { > if (IS_ENABLED(CONFIG_X86_64)) > return __supported_pte_mask; > return __supported_pte_mask & ~_PAGE_NX); > } > > and get rid of the ifdeffery completely. Heh, that's an entertaining way to do it. Joerg, if you go do it this way, it would be nice to add all the other gunk that we don't allow to be set in the PAE pgd. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id C9D1A28024A for ; Tue, 16 Jan 2018 16:20:47 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id r82so2801914wme.0 for ; Tue, 16 Jan 2018 13:20:47 -0800 (PST) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id b62si2455832wma.55.2018.01.16.13.20.46 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 16 Jan 2018 13:20:46 -0800 (PST) Date: Tue, 16 Jan 2018 22:20:40 +0100 (CET) From: Thomas Gleixner Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, 16 Jan 2018, Joerg Roedel wrote: > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) > 16 files changed, 333 insertions(+), 123 deletions(-) Impressively small and well done ! Can you please make that patch set against git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-pti-for-linus so we immediately have it backportable for 4.14 stable? It's only a trivial conflict in pgtable.h, but we'd like to make the life of stable as simple as possible. They have enough headache with the pre 4.14 trees. We can pick some of the simple patches which make defines and inlines available out of the pile right away and apply them to x86/pti to shrink the amount of stuff you have to worry about. Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 731D528024A for ; Tue, 16 Jan 2018 17:26:45 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id g2so1399623pfh.9 for ; Tue, 16 Jan 2018 14:26:45 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id u14si2403223pgo.179.2018.01.16.14.26.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 14:26:44 -0800 (PST) Received: from mail-it0-f41.google.com (mail-it0-f41.google.com [209.85.214.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9181721799 for ; Tue, 16 Jan 2018 22:26:43 +0000 (UTC) Received: by mail-it0-f41.google.com with SMTP id x42so6783121ita.4 for ; Tue, 16 Jan 2018 14:26:43 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:26:22 -0800 Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Hi, > > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) > > The patches are split as follows: > > - 1-3 contain the entry-code changes to enter and > exit the kernel via the sysenter trampoline stack. > > - 4-7 are fixes to get the code compile on 32 bit > with CONFIG_PAGE_TABLE_ISOLATION=y. > > - 8-14 adapt the existing PTI code to work properly > on 32 bit and add the needed parts to 32 bit > page-table code. > > - 15 switches PTI on by adding the CR3 switches to > kernel entry/exit. > > - 16 enables the Kconfig for all of X86 > > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. > If you pass all the x86 selftests, then Wine and DOSEMU are pretty likely to work :) --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 3E58428024A for ; Tue, 16 Jan 2018 17:37:31 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id z12so10114883pgv.6 for ; Tue, 16 Jan 2018 14:37:31 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id v12si2443345pgo.67.2018.01.16.14.37.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 14:37:30 -0800 (PST) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 94A7F217A3 for ; Tue, 16 Jan 2018 22:37:29 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id b198so15926858iof.6 for ; Tue, 16 Jan 2018 14:37:29 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:37:08 -0800 Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner Cc: Joerg Roedel , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 12:30 PM, Thomas Gleixner wrote: > On Tue, 16 Jan 2018, Joerg Roedel wrote: >> @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) >> /* This is used when switching tasks or entering/exiting vm86 mode. */ >> static inline void update_sp0(struct task_struct *task) >> { >> - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ >> -#ifdef CONFIG_X86_32 >> - load_sp0(task->thread.sp0); >> -#else >> + /* sp0 always points to the entry trampoline stack, which is constant: */ >> if (static_cpu_has(X86_FEATURE_XENPV)) >> load_sp0(task_top_of_stack(task)); >> -#endif >> } >> >> #endif /* _ASM_X86_SWITCH_TO_H */ >> diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c >> index 654229bac2fc..7270dd834f4b 100644 >> --- a/arch/x86/kernel/asm-offsets_32.c >> +++ b/arch/x86/kernel/asm-offsets_32.c >> @@ -47,9 +47,11 @@ void foo(void) >> BLANK(); >> >> /* Offset from the sysenter stack to tss.sp0 */ >> - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - >> + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - >> offsetofend(struct cpu_entry_area, entry_stack_page.stack)); I was going to say that this is just too magical. The convention is that STRUCT_member refers to "member" of "STRUCT". Here you're encoding a more complicated calculation. How about putting just the needed offsets in asm_offsets and putting the actual calculation in the asm code or a header. >> >> + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); This belongs in asm_offsets.c. Just move the asm_offsets_64.c version there and call it a day. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id EA8E728024A for ; Tue, 16 Jan 2018 17:45:50 -0500 (EST) Received: by mail-pf0-f200.google.com with SMTP id y13so5839843pfl.16 for ; Tue, 16 Jan 2018 14:45:50 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id g21si2679389plo.236.2018.01.16.14.45.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 14:45:49 -0800 (PST) Received: from mail-it0-f47.google.com (mail-it0-f47.google.com [209.85.214.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CF06E21783 for ; Tue, 16 Jan 2018 22:45:48 +0000 (UTC) Received: by mail-it0-f47.google.com with SMTP id w14so6159456itc.3 for ; Tue, 16 Jan 2018 14:45:48 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-3-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:45:27 -0800 Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Use the sysenter stack as a trampoline stack to enter the > kernel. The sysenter stack is already in the cpu_entry_area > and will be mapped to userspace when PTI is enabled. > > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 89 +++++++++++++++++++++++++++++++++++----- > arch/x86/include/asm/switch_to.h | 6 +-- > arch/x86/kernel/asm-offsets_32.c | 4 +- > arch/x86/kernel/cpu/common.c | 5 ++- > arch/x86/kernel/process.c | 2 - > arch/x86/kernel/process_32.c | 6 +++ > 6 files changed, 91 insertions(+), 21 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index eb8c5615777b..5a7bdb73be9f 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -222,6 +222,47 @@ > .endm > > /* > + * Switch from the entry-trampline stack to the kernel stack of the > + * running task. > + * > + * nr_regs is the number of dwords to push from the entry stack to the > + * task stack. If it is > 0 it expects an irq frame at the bottom of the > + * stack. > + * > + * check_user != 0 it will add a check to only switch stacks if the > + * kernel entry was from user-space. > + */ > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 How about marking nr_regs with :req to force everyone to be explicit? > + > + .if \check_user > 0 && \nr_regs > 0 > + testb $3, (\nr_regs - 4)*4(%esp) /* CS */ > + jz .Lend_\@ > + .endif > + > + pushl %edi > + movl %esp, %edi > + > + /* > + * TSS_sysenter_stack is the offset from the bottom of the > + * entry-stack > + */ > + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp This is incomprehensible. You're adding what appears to be the offset of sysenter_stack within the TSS to something based on esp and dereferencing that to get the new esp. That't not actually what you're doing, but please change asm_offsets.c (as in my previous email) to avoid putting serious arithmetic in it and then do the arithmetic right here so that it's possible to follow what's going on. > + > + /* Copy the registers over */ > + .if \nr_regs > 0 > + i = 0 > + .rept \nr_regs > + pushl (\nr_regs - i) * 4(%edi) > + i = i + 1 > + .endr > + .endif > + > + mov (%edi), %edi > + > +.Lend_\@: > +.endm > + > +/* > * %eax: prev task > * %edx: next task > */ > @@ -401,7 +442,9 @@ ENTRY(xen_sysenter_target) > * 0(%ebp) arg6 > */ > ENTRY(entry_SYSENTER_32) > - movl TSS_sysenter_stack(%esp), %esp > + /* Kernel stack is empty */ > + SWITCH_TO_KERNEL_STACK This would be more readable if you put nr_regs in here. > + > .Lsysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp /* pt_regs->sp (stashed in bp) */ > @@ -521,6 +564,10 @@ ENDPROC(entry_SYSENTER_32) > ENTRY(entry_INT80_32) > ASM_CLAC > pushl %eax /* pt_regs->orig_ax */ > + > + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 > + Why check_user? > @@ -655,6 +702,10 @@ END(irq_entries_start) > common_interrupt: > ASM_CLAC > addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */ > + > + /* Stack layout: ss, esp, eflags, cs, eip, vector */ > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 LGTM. > ENTRY(nmi) > ASM_CLAC > + > + /* Stack layout: ss, esp, eflags, cs, eip */ > + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 This is wrong, I think. If you get an nmi in kernel mode but while still on the sysenter stack, you blow up. IIRC we have some crazy code already to handle this (for nmi and #DB), and maybe that's already adequate or can be made adequate, but at the very least this needs a big comment explaining why it's okay. > diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h > index eb5f7999a893..20e5f7ab8260 100644 > --- a/arch/x86/include/asm/switch_to.h > +++ b/arch/x86/include/asm/switch_to.h > @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) > /* This is used when switching tasks or entering/exiting vm86 mode. */ > static inline void update_sp0(struct task_struct *task) > { > - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ > -#ifdef CONFIG_X86_32 > - load_sp0(task->thread.sp0); > -#else > + /* sp0 always points to the entry trampoline stack, which is constant: */ > if (static_cpu_has(X86_FEATURE_XENPV)) > load_sp0(task_top_of_stack(task)); > -#endif > } > > #endif /* _ASM_X86_SWITCH_TO_H */ > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c > index 654229bac2fc..7270dd834f4b 100644 > --- a/arch/x86/kernel/asm-offsets_32.c > +++ b/arch/x86/kernel/asm-offsets_32.c > @@ -47,9 +47,11 @@ void foo(void) > BLANK(); > > /* Offset from the sysenter stack to tss.sp0 */ > - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - > offsetofend(struct cpu_entry_area, entry_stack_page.stack)); > > + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); > + > #ifdef CONFIG_CC_STACKPROTECTOR > BLANK(); > OFFSET(stack_canary_offset, stack_canary, canary); > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index ef29ad001991..20a71c914e59 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1649,11 +1649,12 @@ void cpu_init(void) > enter_lazy_tlb(&init_mm, curr); > > /* > - * Initialize the TSS. Don't bother initializing sp0, as the initial > - * task never enters user mode. > + * Initialize the TSS. sp0 points to the entry trampoline stack > + * regardless of what task is running. > */ > set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); > load_TR_desc(); > + load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1)); It's high time we unified the 32-bit and 64-bit versions of the code. This isn't necessarily needed for your series, though. > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > index 5224c6099184..452eeac00b80 100644 > --- a/arch/x86/kernel/process_32.c > +++ b/arch/x86/kernel/process_32.c > @@ -292,6 +292,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) > this_cpu_write(cpu_current_top_of_stack, > (unsigned long)task_stack_page(next_p) + > THREAD_SIZE); > + /* > + * TODO: Find a way to let cpu_current_top_of_stack point to > + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with > + * iret exceptions. > + */ > + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); Do you know what the issue is? As a general comment, the interaction between this patch and vm86 is a bit scary. In vm86 mode, the kernel gets entered with extra stuff on the stack, which may screw up all your offsets. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id AC0CE28024A for ; Tue, 16 Jan 2018 17:46:38 -0500 (EST) Received: by mail-pg0-f69.google.com with SMTP id i2so10058920pgq.8 for ; Tue, 16 Jan 2018 14:46:38 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id z5si2892042plo.122.2018.01.16.14.46.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 14:46:37 -0800 (PST) Received: from mail-it0-f44.google.com (mail-it0-f44.google.com [209.85.214.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6A708217A1 for ; Tue, 16 Jan 2018 22:46:37 +0000 (UTC) Received: by mail-it0-f44.google.com with SMTP id x42so6836327ita.4 for ; Tue, 16 Jan 2018 14:46:37 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-5-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-5-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:46:16 -0800 Message-ID: Subject: Re: [PATCH 04/16] x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Move it out of the X86_64 specific processor defines so > that its visible for 32bit too. Hmm. This is okay, I guess, but any code that actually uses this definition is inherently wrong, since 32-bit implies !PCID. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id EB5E928024A for ; Tue, 16 Jan 2018 17:49:05 -0500 (EST) Received: by mail-pl0-f71.google.com with SMTP id t2so6909341plm.7 for ; Tue, 16 Jan 2018 14:49:05 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id m15si3037361pln.714.2018.01.16.14.49.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 14:49:04 -0800 (PST) Received: from mail-io0-f172.google.com (mail-io0-f172.google.com [209.85.223.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 76A5E21783 for ; Tue, 16 Jan 2018 22:49:04 +0000 (UTC) Received: by mail-io0-f172.google.com with SMTP id l17so8741154ioc.3 for ; Tue, 16 Jan 2018 14:49:04 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-4-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:48:43 -0800 Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Switch back to the trampoline stack before returning to > userspace. > > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 58 ++++++++++++++++++++++++++++++++++++++++ > arch/x86/kernel/asm-offsets_32.c | 1 + > 2 files changed, 59 insertions(+) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 5a7bdb73be9f..14018eeb11c3 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -263,6 +263,61 @@ > .endm > > /* > + * Switch back from the kernel stack to the entry stack. > + * > + * iret_frame > 0 adds code to copie over an iret frame from the old to > + * the new stack. It also adds a check which bails out if > + * we are not returning to user-space. > + * > + * This macro is allowed not modify eflags when iret_frame == 0. > + */ > +.macro SWITCH_TO_ENTRY_STACK iret_frame=0 > + .if \iret_frame > 0 > + /* Are we returning to userspace? */ > + testb $3, 4(%esp) /* return CS */ > + jz .Lend_\@ > + .endif > + > + /* > + * We run with user-%fs already loaded from pt_regs, so we don't > + * have access to per_cpu data anymore, and there is no swapgs > + * equivalent on x86_32. > + * We work around this by loading the kernel-%fs again and > + * reading the entry stack address from there. Then we restore > + * the user-%fs and return. > + */ > + pushl %fs > + pushl %edi > + > + /* Re-load kernel-%fs, after that we can use PER_CPU_VAR */ > + movl $(__KERNEL_PERCPU), %edi > + movl %edi, %fs > + > + /* Save old stack pointer to copy the return frame over if needed */ > + movl %esp, %edi > + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %esp > + > + /* Now we are on the entry stack */ > + > + .if \iret_frame > 0 > + /* Stack frame: ss, esp, eflags, cs, eip, fs, edi */ > + pushl 6*4(%edi) /* ss */ > + pushl 5*4(%edi) /* esp */ > + pushl 4*4(%edi) /* eflags */ > + pushl 3*4(%edi) /* cs */ > + pushl 2*4(%edi) /* eip */ > + .endif > + > + pushl 4(%edi) /* fs */ > + > + /* Restore user %edi and user %fs */ > + movl (%edi), %edi > + popl %fs Yikes! We're not *supposed* to be able to observe an asynchronous descriptor table change, but if the LDT changes out from under you, this is going to blow up badly. It would be really nice if you could pull this off without percpu access or without needing to do this dance where you load user FS, then kernel FS, then user FS. If that's not doable, then you should at least add exception handling -- look at the other 'pop %fs' instructions in entry_32.S. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id 375E528024A for ; Tue, 16 Jan 2018 17:52:08 -0500 (EST) Received: by mail-pl0-f69.google.com with SMTP id q1so6936247plr.15 for ; Tue, 16 Jan 2018 14:52:08 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id y12si2768612pff.4.2018.01.16.14.52.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 14:52:07 -0800 (PST) Received: from mail-io0-f169.google.com (mail-io0-f169.google.com [209.85.223.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DF7F821781 for ; Tue, 16 Jan 2018 22:52:06 +0000 (UTC) Received: by mail-io0-f169.google.com with SMTP id w188so18619307iod.10 for ; Tue, 16 Jan 2018 14:52:06 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180116165213.GF2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:51:45 -0800 Message-ID: Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:52 AM, Peter Zijlstra wrote: > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: >> From: Joerg Roedel >> >> Reserve 2MB/4MB of address space for mapping the LDT to >> user-space. > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > 64K*2*64=8M > 2M. If this works like it does on 64-bit, it only needs 128k regardless of the number of CPUs. The LDT mapping is specific to the mm. How are you dealing with PAE here? That is, what's your pagetable layout? What parts of the address space are owned by what code? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id 0E6E3280263 for ; Tue, 16 Jan 2018 21:48:09 -0500 (EST) Received: by mail-io0-f199.google.com with SMTP id e69so10820108iod.17 for ; Tue, 16 Jan 2018 18:48:09 -0800 (PST) Received: from aserp2120.oracle.com (aserp2120.oracle.com. [141.146.126.78]) by mx.google.com with ESMTPS id m139si3524176itb.88.2018.01.16.18.48.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 18:48:08 -0800 (PST) Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> From: Boris Ostrovsky Message-ID: <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> Date: Tue, 16 Jan 2018 21:47:06 -0500 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-3-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/16/2018 11:36 AM, Joerg Roedel wrote: > > /* > + * Switch from the entry-trampline stack to the kernel stack of the > + * running task. > + * > + * nr_regs is the number of dwords to push from the entry stack to the > + * task stack. If it is > 0 it expects an irq frame at the bottom of the > + * stack. > + * > + * check_user != 0 it will add a check to only switch stacks if the > + * kernel entry was from user-space. > + */ > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. With those macros fixed I was able to boot 32-bit Xen PV guest. -boris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id C20E3280281 for ; Wed, 17 Jan 2018 02:59:47 -0500 (EST) Received: by mail-pf0-f200.google.com with SMTP id q8so8036147pfh.12 for ; Tue, 16 Jan 2018 23:59:47 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id q4si3256359pgn.232.2018.01.16.23.59.46 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 16 Jan 2018 23:59:46 -0800 (PST) Date: Wed, 17 Jan 2018 08:59:24 +0100 From: Peter Zijlstra Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180117075924.GI2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 02:51:45PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:52 AM, Peter Zijlstra wrote: > > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > >> From: Joerg Roedel > >> > >> Reserve 2MB/4MB of address space for mapping the LDT to > >> user-space. > > > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > > 64K*2*64=8M > 2M. > > If this works like it does on 64-bit, it only needs 128k regardless of > the number of CPUs. The LDT mapping is specific to the mm. Ah, then I got my LDT things confused again... which is certainly possible, we had a few too many variants back then. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 5BFDD280281 for ; Wed, 17 Jan 2018 04:18:55 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id f3so3622943wmc.8 for ; Wed, 17 Jan 2018 01:18:55 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id 34si4217284edp.249.2018.01.17.01.18.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 01:18:54 -0800 (PST) Date: Wed, 17 Jan 2018 10:18:53 +0100 From: Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180117091853.GI28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 02:45:27PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 > > How about marking nr_regs with :req to force everyone to be explicit? Yeah, that's more readable, I'll change it. > > + /* > > + * TSS_sysenter_stack is the offset from the bottom of the > > + * entry-stack > > + */ > > + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp > > This is incomprehensible. You're adding what appears to be the offset > of sysenter_stack within the TSS to something based on esp and > dereferencing that to get the new esp. That't not actually what > you're doing, but please change asm_offsets.c (as in my previous > email) to avoid putting serious arithmetic in it and then do the > arithmetic right here so that it's possible to follow what's going on. Probably this needs better comments. So TSS_sysenter_stack is the offset from to tss.sp0 (tss.sp1 later) from the _bottom_ of the stack. But in this macro the stack might not be empty, it has a configurable (by \nr_regs) number of dwords on it. Before this instruction we also do a push %edi, so we need (\nr_regs + 1). This can't be put into asm_offset.c, as the actual offset depends on how much is on the stack. > > ENTRY(entry_INT80_32) > > ASM_CLAC > > pushl %eax /* pt_regs->orig_ax */ > > + > > + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ > > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 > > + > > Why check_user? You are right, check_user shouldn't ne needed as INT80 is never called from kernel mode. > > ENTRY(nmi) > > ASM_CLAC > > + > > + /* Stack layout: ss, esp, eflags, cs, eip */ > > + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 > > This is wrong, I think. If you get an nmi in kernel mode but while > still on the sysenter stack, you blow up. IIRC we have some crazy > code already to handle this (for nmi and #DB), and maybe that's > already adequate or can be made adequate, but at the very least this > needs a big comment explaining why it's okay. If we get an nmi while still on the sysenter stack, then we are not entering the handler from user-space and the above code will do nothing and behave as before. But you are right, it might blow up. There is a problem with the cr3 switch, because the nmi can happen in kernel mode before the cr3 is switched, then this handler will not do the cr3 switch itself and crash the kernel. But the stack switching should be fine, I think. > > + /* > > + * TODO: Find a way to let cpu_current_top_of_stack point to > > + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with > > + * iret exceptions. > > + */ > > + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); > > Do you know what the issue is? No, not yet, I will look into that again. But first I want to get this series stable enough as it is. > As a general comment, the interaction between this patch and vm86 is a > bit scary. In vm86 mode, the kernel gets entered with extra stuff on > the stack, which may screw up all your offsets. Just read up on vm86 mode control transfers and the stack layout then. Looks like I need to check for eflags.vm=1 and copy four more registers from/to the entry stack. Thanks for pointing that out. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 7FC10280281 for ; Wed, 17 Jan 2018 04:24:44 -0500 (EST) Received: by mail-wr0-f199.google.com with SMTP id c11so7071627wrb.23 for ; Wed, 17 Jan 2018 01:24:44 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id e6si4926705edk.214.2018.01.17.01.24.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 01:24:43 -0800 (PST) Date: Wed, 17 Jan 2018 10:24:42 +0100 From: Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117092442.GJ28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > + /* Restore user %edi and user %fs */ > > + movl (%edi), %edi > > + popl %fs > > Yikes! We're not *supposed* to be able to observe an asynchronous > descriptor table change, but if the LDT changes out from under you, > this is going to blow up badly. It would be really nice if you could > pull this off without percpu access or without needing to do this > dance where you load user FS, then kernel FS, then user FS. If that's > not doable, then you should at least add exception handling -- look at > the other 'pop %fs' instructions in entry_32.S. You are right! This also means I need to do the 'popl %fs' before the cr3-switch. I'll fix it in the next version. I have no real idea on how to switch back to the entry stack without access to per_cpu variables. I also can't access the cpu_entry_area for the cpu yet, because for that we need to be on the entry stack already. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 7439D280281 for ; Wed, 17 Jan 2018 04:26:59 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id d63so3634808wma.4 for ; Wed, 17 Jan 2018 01:26:59 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id y5si4347351edj.28.2018.01.17.01.26.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 01:26:58 -0800 (PST) Date: Wed, 17 Jan 2018 10:26:57 +0100 From: Joerg Roedel Subject: Re: [PATCH 04/16] x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Message-ID: <20180117092657.GK28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-5-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 02:46:16PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > From: Joerg Roedel > > > > Move it out of the X86_64 specific processor defines so > > that its visible for 32bit too. > > Hmm. This is okay, I guess, but any code that actually uses this > definition is inherently wrong, since 32-bit implies !PCID. Yes, I tried another approach first which just #ifdef'ed out the relevant parts in tlbflush.h which use this bit. But that seemed to be the wrong path, as there is more PCID code that is compiled in for 32 bit. So defining the bit for 32 bit seemed to be the cleaner solution for now. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 4F299280298 for ; Wed, 17 Jan 2018 04:55:09 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id v14so3850716wmd.3 for ; Wed, 17 Jan 2018 01:55:09 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id g7si513034edj.376.2018.01.17.01.55.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 01:55:08 -0800 (PST) Date: Wed, 17 Jan 2018 10:55:07 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180117095507.GM28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Hi Thomas, thanks for your review, I'll work in your suggestions for the next post. On Tue, Jan 16, 2018 at 10:20:40PM +0100, Thomas Gleixner wrote: > On Tue, 16 Jan 2018, Joerg Roedel wrote: > > 16 files changed, 333 insertions(+), 123 deletions(-) > > Impressively small and well done ! Thanks :) > Can you please make that patch set against > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-pti-for-linus > > so we immediately have it backportable for 4.14 stable? It's only a trivial > conflict in pgtable.h, but we'd like to make the life of stable as simple > as possible. They have enough headache with the pre 4.14 trees. Sure, will do. > We can pick some of the simple patches which make defines and inlines > available out of the pile right away and apply them to x86/pti to shrink > the amount of stuff you have to worry about. This should be patches 4, 5, 7, 11, and I think 13 is also simple enough. Feel free to take them, but I can also carry them forward if needed. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id E3B0C28029C for ; Wed, 17 Jan 2018 08:57:55 -0500 (EST) Received: by mail-io0-f197.google.com with SMTP id p202so9999024iod.18 for ; Wed, 17 Jan 2018 05:57:55 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id p198sor2425906ioe.240.2018.01.17.05.57.54 for (Google Transport Security); Wed, 17 Jan 2018 05:57:54 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180117092442.GJ28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> From: Brian Gerst Date: Wed, 17 Jan 2018 05:57:53 -0800 Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: > On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote: >> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >> > + /* Restore user %edi and user %fs */ >> > + movl (%edi), %edi >> > + popl %fs >> >> Yikes! We're not *supposed* to be able to observe an asynchronous >> descriptor table change, but if the LDT changes out from under you, >> this is going to blow up badly. It would be really nice if you could >> pull this off without percpu access or without needing to do this >> dance where you load user FS, then kernel FS, then user FS. If that's >> not doable, then you should at least add exception handling -- look at >> the other 'pop %fs' instructions in entry_32.S. > > You are right! This also means I need to do the 'popl %fs' before the > cr3-switch. I'll fix it in the next version. > > I have no real idea on how to switch back to the entry stack without > access to per_cpu variables. I also can't access the cpu_entry_area for > the cpu yet, because for that we need to be on the entry stack already. Switch to the trampoline stack before loading user segments. -- Brian Gerst -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 56DA228029C for ; Wed, 17 Jan 2018 09:00:09 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id t134so18087248iof.6 for ; Wed, 17 Jan 2018 06:00:09 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id u62sor796712ioe.322.2018.01.17.06.00.08 for (Google Transport Security); Wed, 17 Jan 2018 06:00:08 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> From: Brian Gerst Date: Wed, 17 Jan 2018 06:00:07 -0800 Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst wrote: > On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: >> On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote: >>> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >>> > + /* Restore user %edi and user %fs */ >>> > + movl (%edi), %edi >>> > + popl %fs >>> >>> Yikes! We're not *supposed* to be able to observe an asynchronous >>> descriptor table change, but if the LDT changes out from under you, >>> this is going to blow up badly. It would be really nice if you could >>> pull this off without percpu access or without needing to do this >>> dance where you load user FS, then kernel FS, then user FS. If that's >>> not doable, then you should at least add exception handling -- look at >>> the other 'pop %fs' instructions in entry_32.S. >> >> You are right! This also means I need to do the 'popl %fs' before the >> cr3-switch. I'll fix it in the next version. >> >> I have no real idea on how to switch back to the entry stack without >> access to per_cpu variables. I also can't access the cpu_entry_area for >> the cpu yet, because for that we need to be on the entry stack already. > > Switch to the trampoline stack before loading user segments. But then again, you could take a fault on the trampoline stack if you get a bad segment. Perhaps just pushing the new stack pointer onto the process stack before user segment loads will be the right move. -- Brian Gerst -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id BC57228029C for ; Wed, 17 Jan 2018 09:08:12 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id k126so4227797wmd.5 for ; Wed, 17 Jan 2018 06:08:12 -0800 (PST) Received: from SMTP.EU.CITRIX.COM (smtp.eu.citrix.com. [185.25.65.24]) by mx.google.com with ESMTPS id j50si593599ede.121.2018.01.17.06.08.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 06:08:11 -0800 (PST) Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> <20180117090238.GH28161@8bytes.org> From: Andrew Cooper Message-ID: <97298add-9484-7d83-50a3-1c668ce3107d@citrix.com> Date: Wed, 17 Jan 2018 14:04:22 +0000 MIME-Version: 1.0 In-Reply-To: <20180117090238.GH28161@8bytes.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Language: en-GB Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel , Boris Ostrovsky Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 17/01/18 09:02, Joerg Roedel wrote: > Hi Boris, > > thanks for testing this :) > > On Tue, Jan 16, 2018 at 09:47:06PM -0500, Boris Ostrovsky wrote: >> On 01/16/2018 11:36 AM, Joerg Roedel wrote: >>> +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 >> >> This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. >> >> With those macros fixed I was able to boot 32-bit Xen PV guest. > Hmm, on bare metal the stack switch happens regardless of the > X86_FEATURE_PTI feature being set, because we always program tss.sp0 > with the systenter stack. How is the kernel entry stack setup on xen-pv? > I think something is missing there instead. There is one single stack registered with Xen, on which you get a normal exception frame in all cases, even via the registered (virtual) syscall/sysenter/failsafe handlers. ~Andrew -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id E62CA28029C for ; Wed, 17 Jan 2018 09:10:07 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id 17so8804592wma.1 for ; Wed, 17 Jan 2018 06:10:07 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id s14si1111318eds.524.2018.01.17.06.10.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 06:10:06 -0800 (PST) Date: Wed, 17 Jan 2018 15:10:06 +0100 From: Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117141006.GR28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Brian Gerst Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote: > On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: > > I have no real idea on how to switch back to the entry stack without > > access to per_cpu variables. I also can't access the cpu_entry_area for > > the cpu yet, because for that we need to be on the entry stack already. > > Switch to the trampoline stack before loading user segments. That requires to copy most of pt_regs from task- to trampoline-stack, not sure if that is faster than temporily restoring kernel %fs. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 8C0E228029C for ; Wed, 17 Jan 2018 09:14:20 -0500 (EST) Received: by mail-wr0-f199.google.com with SMTP id t21so4888235wrb.14 for ; Wed, 17 Jan 2018 06:14:20 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id b16si2876537ede.175.2018.01.17.06.14.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 06:14:19 -0800 (PST) Date: Wed, 17 Jan 2018 15:14:18 +0100 From: Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117141418.GS28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Brian Gerst Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote: > On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst wrote: > But then again, you could take a fault on the trampoline stack if you > get a bad segment. Perhaps just pushing the new stack pointer onto > the process stack before user segment loads will be the right move. User segment loads pop from the stack, so having anything on-top also doesn't work. Maybe I can leave some space at the bottom of the task-stack at entry time and store the pointer there on exit, if that doesn't confuse the stack unwinder too much. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f198.google.com (mail-ot0-f198.google.com [74.125.82.198]) by kanga.kvack.org (Postfix) with ESMTP id D74EA6B0033 for ; Wed, 17 Jan 2018 09:45:20 -0500 (EST) Received: by mail-ot0-f198.google.com with SMTP id 60so12498939otc.8 for ; Wed, 17 Jan 2018 06:45:20 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id r10si351404oib.100.2018.01.17.06.45.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 06:45:19 -0800 (PST) Date: Wed, 17 Jan 2018 08:45:03 -0600 From: Josh Poimboeuf Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117144503.62e47m6e5yyyze3d@treble> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> <20180117141418.GS28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180117141418.GS28161@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Brian Gerst , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 03:14:18PM +0100, Joerg Roedel wrote: > On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote: > > On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst wrote: > > But then again, you could take a fault on the trampoline stack if you > > get a bad segment. Perhaps just pushing the new stack pointer onto > > the process stack before user segment loads will be the right move. > > User segment loads pop from the stack, so having anything on-top also > doesn't work. > > Maybe I can leave some space at the bottom of the task-stack at entry > time and store the pointer there on exit, if that doesn't confuse the > stack unwinder too much. If you put it at the end of the stack page, I _think_ all you'd have to do is just adjust TOP_OF_KERNEL_STACK_PADDING. -- Josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 6B0186B0038 for ; Wed, 17 Jan 2018 10:23:29 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id e186so5219494iof.9 for ; Wed, 17 Jan 2018 07:23:29 -0800 (PST) Received: from aserp2120.oracle.com (aserp2120.oracle.com. [141.146.126.78]) by mx.google.com with ESMTPS id k66si5364021itd.82.2018.01.17.07.23.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 07:23:28 -0800 (PST) Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> <20180117090238.GH28161@8bytes.org> <97298add-9484-7d83-50a3-1c668ce3107d@citrix.com> From: Boris Ostrovsky Message-ID: Date: Wed, 17 Jan 2018 10:22:24 -0500 MIME-Version: 1.0 In-Reply-To: <97298add-9484-7d83-50a3-1c668ce3107d@citrix.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Cooper , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On 01/17/2018 09:04 AM, Andrew Cooper wrote: > On 17/01/18 09:02, Joerg Roedel wrote: >> Hi Boris, >> >> thanks for testing this :) >> >> On Tue, Jan 16, 2018 at 09:47:06PM -0500, Boris Ostrovsky wrote: >>> On 01/16/2018 11:36 AM, Joerg Roedel wrote: >>>> +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 >>> This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. >>> >>> With those macros fixed I was able to boot 32-bit Xen PV guest. >> Hmm, on bare metal the stack switch happens regardless of the >> X86_FEATURE_PTI feature being set, because we always program tss.sp0 >> with the systenter stack. How is the kernel entry stack setup on xen-pv? >> I think something is missing there instead. > There is one single stack registered with Xen, on which you get a normal > exception frame in all cases, even via the registered (virtual) > syscall/sysenter/failsafe handlers. And so the check should be at least against X86_FEATURE_XENPV, not necessarily X86_FEATURE_PTI. But I guess you can still check against X86_FEATURE_PTI since without it there is not much reason to switch stacks? -boris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id E64D56B0033 for ; Wed, 17 Jan 2018 13:10:46 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id e28so11813511pgn.23 for ; Wed, 17 Jan 2018 10:10:46 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id h6si4926278pln.585.2018.01.17.10.10.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 10:10:45 -0800 (PST) Received: from mail-it0-f48.google.com (mail-it0-f48.google.com [209.85.214.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DC91C21797 for ; Wed, 17 Jan 2018 18:10:44 +0000 (UTC) Received: by mail-it0-f48.google.com with SMTP id p124so10321075ite.1 for ; Wed, 17 Jan 2018 10:10:44 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180117091853.GI28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 10:10:23 -0800 Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > On Tue, Jan 16, 2018 at 02:45:27PM -0800, Andy Lutomirski wrote: >> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >> > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 >> >> How about marking nr_regs with :req to force everyone to be explicit? > > Yeah, that's more readable, I'll change it. > >> > + /* >> > + * TSS_sysenter_stack is the offset from the bottom of the >> > + * entry-stack >> > + */ >> > + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp >> >> This is incomprehensible. You're adding what appears to be the offset >> of sysenter_stack within the TSS to something based on esp and >> dereferencing that to get the new esp. That't not actually what >> you're doing, but please change asm_offsets.c (as in my previous >> email) to avoid putting serious arithmetic in it and then do the >> arithmetic right here so that it's possible to follow what's going on. > > Probably this needs better comments. So TSS_sysenter_stack is the offset > from to tss.sp0 (tss.sp1 later) from the _bottom_ of the stack. But in > this macro the stack might not be empty, it has a configurable (by > \nr_regs) number of dwords on it. Before this instruction we also do a > push %edi, so we need (\nr_regs + 1). > > This can't be put into asm_offset.c, as the actual offset depends on how > much is on the stack. > >> > ENTRY(entry_INT80_32) >> > ASM_CLAC >> > pushl %eax /* pt_regs->orig_ax */ >> > + >> > + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ >> > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 >> > + >> >> Why check_user? > > You are right, check_user shouldn't ne needed as INT80 is never called > from kernel mode. > >> > ENTRY(nmi) >> > ASM_CLAC >> > + >> > + /* Stack layout: ss, esp, eflags, cs, eip */ >> > + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 >> >> This is wrong, I think. If you get an nmi in kernel mode but while >> still on the sysenter stack, you blow up. IIRC we have some crazy >> code already to handle this (for nmi and #DB), and maybe that's >> already adequate or can be made adequate, but at the very least this >> needs a big comment explaining why it's okay. > > If we get an nmi while still on the sysenter stack, then we are not > entering the handler from user-space and the above code will do > nothing and behave as before. > > But you are right, it might blow up. There is a problem with the cr3 > switch, because the nmi can happen in kernel mode before the cr3 is > switched, then this handler will not do the cr3 switch itself and crash > the kernel. But the stack switching should be fine, I think. > >> > + /* >> > + * TODO: Find a way to let cpu_current_top_of_stack point to >> > + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with >> > + * iret exceptions. >> > + */ >> > + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); >> >> Do you know what the issue is? > > No, not yet, I will look into that again. But first I want to get > this series stable enough as it is. > >> As a general comment, the interaction between this patch and vm86 is a >> bit scary. In vm86 mode, the kernel gets entered with extra stuff on >> the stack, which may screw up all your offsets. > > Just read up on vm86 mode control transfers and the stack layout then. > Looks like I need to check for eflags.vm=1 and copy four more registers > from/to the entry stack. Thanks for pointing that out. You could just copy those slots unconditionally. After all, you're slowing down entries by an epic amount due to writing CR3 on with PCID off, so four words copied should be entirely lost in the noise. OTOH, checking for VM86 mode is just a single bt against EFLAGS. With the modern (rewritten a year or two ago by Brian Gerst) vm86 code, all the slots (those actually in pt_regs) are in the same location regardless of whether we're in VM86 mode or not, but we're still fiddling with the bottom of the stack. Since you're controlling the switch to the kernel thread stack, you can easily just write the frame to the correct location, so you should not need to context switch sp1 -- you can do it sanely and leave sp1 as the actual bottom of the kernel stack no matter what. In fact, you could probably avoid context switching sp0, either, which would be a nice cleanup. So I recommend the following. Keep sp0 as the bottom of the sysenter stack no matter what. Then do: bt $X86_EFLAGS_VM_BIT jc .Lfrom_vm_\@ push 5 regs to real stack, starting at four-word offset (so they're in the right place) update %esp ... .Lupdate_esp_\@ .Lfrom_vm_\@: push 9 regs to real stack, starting at the bottom jmp .Lupdate_esp_\@ Does that seem reasonable? It's arguably much nicer than what we have now. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id EF6226B0033 for ; Wed, 17 Jan 2018 13:12:54 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id s22so3158417pfh.21 for ; Wed, 17 Jan 2018 10:12:54 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id p2si4830537plo.798.2018.01.17.10.12.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 10:12:53 -0800 (PST) Received: from mail-io0-f176.google.com (mail-io0-f176.google.com [209.85.223.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7376D21742 for ; Wed, 17 Jan 2018 18:12:53 +0000 (UTC) Received: by mail-io0-f176.google.com with SMTP id f34so16592883ioi.13 for ; Wed, 17 Jan 2018 10:12:53 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180117141006.GR28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> <20180117141006.GR28161@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 10:12:32 -0800 Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Brian Gerst , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 6:10 AM, Joerg Roedel wrote: > On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote: >> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: > >> > I have no real idea on how to switch back to the entry stack without >> > access to per_cpu variables. I also can't access the cpu_entry_area for >> > the cpu yet, because for that we need to be on the entry stack already. >> >> Switch to the trampoline stack before loading user segments. > > That requires to copy most of pt_regs from task- to trampoline-stack, > not sure if that is faster than temporily restoring kernel %fs. > I would optimize for simplicity, not speed. You're already planning to write to CR3, which is serializing, blows away the TLB, *and* takes the absurdly large amount of time that the microcode needs to blow away the TLB. (For whatever reason, Intel doesn't seem to have hardware that can quickly wipe the TLB. I suspect that the actual implementation does it in a loop and wipes little pieces at a time. Whatever it actually does, the CR3 write itself is very slow.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id C40186B0261 for ; Wed, 17 Jan 2018 18:41:30 -0500 (EST) Received: by mail-pg0-f69.google.com with SMTP id q1so12618305pgv.4 for ; Wed, 17 Jan 2018 15:41:30 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id 31si5328794plj.417.2018.01.17.15.41.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 15:41:29 -0800 (PST) Received: from mail-it0-f54.google.com (mail-it0-f54.google.com [209.85.214.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EAB7F2177D for ; Wed, 17 Jan 2018 23:41:28 +0000 (UTC) Received: by mail-it0-f54.google.com with SMTP id b5so11323678itc.3 for ; Wed, 17 Jan 2018 15:41:28 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-15-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-15-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 15:41:07 -0800 Message-ID: Subject: Re: [PATCH 14/16] x86/mm/legacy: Populate the user page-table with user pgd's Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Also populate the user-spage pgd's in the user page-table. > > Signed-off-by: Joerg Roedel > --- > arch/x86/include/asm/pgtable-2level.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h > index 685ffe8a0eaf..d96486d23c58 100644 > --- a/arch/x86/include/asm/pgtable-2level.h > +++ b/arch/x86/include/asm/pgtable-2level.h > @@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte) > > static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) > { > +#ifdef CONFIG_PAGE_TABLE_ISOLATION > + pmd.pud.p4d.pgd = pti_set_user_pgd(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd); > +#endif > *pmdp = pmd; > } > Nothing against your patch, but this seems like a perfectly fine place to rant: I *hate* the way we deal with page table folding. Grr. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 9B4AA6B0268 for ; Wed, 17 Jan 2018 18:43:36 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id e26so15523482pfi.15 for ; Wed, 17 Jan 2018 15:43:36 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id d9si5381198plj.186.2018.01.17.15.43.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 15:43:35 -0800 (PST) Received: from mail-it0-f43.google.com (mail-it0-f43.google.com [209.85.214.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2B1222179F for ; Wed, 17 Jan 2018 23:43:35 +0000 (UTC) Received: by mail-it0-f43.google.com with SMTP id c16so11317940itc.5 for ; Wed, 17 Jan 2018 15:43:35 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1516120619-1159-9-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-9-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 15:43:14 -0800 Message-ID: Subject: Re: [PATCH 08/16] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Allocate a kernel and a user page-table root when PTI is > enabled. Also allocate a full page per root for PAEm because > otherwise the bit to flip in cr3 to switch between them > would be non-constant, which creates a lot of hassle. > Keep that for a later optimization. > > Signed-off-by: Joerg Roedel > --- > arch/x86/kernel/head_32.S | 23 ++++++++++++++++++----- > arch/x86/mm/pgtable.c | 11 ++++++----- > 2 files changed, 24 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S > index c29020907886..fc550559bf58 100644 > --- a/arch/x86/kernel/head_32.S > +++ b/arch/x86/kernel/head_32.S > @@ -512,28 +512,41 @@ ENTRY(initial_code) > ENTRY(setup_once_ref) > .long setup_once > > +#ifdef CONFIG_PAGE_TABLE_ISOLATION > +#define PGD_ALIGN (2 * PAGE_SIZE) > +#define PTI_USER_PGD_FILL 1024 > +#else > +#define PGD_ALIGN (PAGE_SIZE) > +#define PTI_USER_PGD_FILL 0 > +#endif > /* > * BSS section > */ > __PAGE_ALIGNED_BSS > - .align PAGE_SIZE > + .align PGD_ALIGN > #ifdef CONFIG_X86_PAE > .globl initial_pg_pmd > initial_pg_pmd: > .fill 1024*KPMDS,4,0 > + .fill PTI_USER_PGD_FILL,4,0 Couldn't this be simplified to just .align PGD_ALIGN, 0 without the .fill? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id CECB56B025E for ; Fri, 19 Jan 2018 04:55:29 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id 194so804918wmv.9 for ; Fri, 19 Jan 2018 01:55:29 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id p4si1782281edm.328.2018.01.19.01.55.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 01:55:25 -0800 (PST) Date: Fri, 19 Jan 2018 10:55:23 +0100 From: Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180119095523.GY28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Hey Andy, On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote: > On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > > Just read up on vm86 mode control transfers and the stack layout then. > > Looks like I need to check for eflags.vm=1 and copy four more registers > > from/to the entry stack. Thanks for pointing that out. > > You could just copy those slots unconditionally. After all, you're > slowing down entries by an epic amount due to writing CR3 on with PCID > off, so four words copied should be entirely lost in the noise. OTOH, > checking for VM86 mode is just a single bt against EFLAGS. > > With the modern (rewritten a year or two ago by Brian Gerst) vm86 > code, all the slots (those actually in pt_regs) are in the same > location regardless of whether we're in VM86 mode or not, but we're > still fiddling with the bottom of the stack. Since you're controlling > the switch to the kernel thread stack, you can easily just write the > frame to the correct location, so you should not need to context > switch sp1 -- you can do it sanely and leave sp1 as the actual bottom > of the kernel stack no matter what. In fact, you could probably avoid > context switching sp0, either, which would be a nice cleanup. I am not sure what you mean by "not context switching sp0/sp1" ... > So I recommend the following. Keep sp0 as the bottom of the sysenter > stack no matter what. Then do: > > bt $X86_EFLAGS_VM_BIT > jc .Lfrom_vm_\@ > > push 5 regs to real stack, starting at four-word offset (so they're in > the right place) > update %esp > ... > .Lupdate_esp_\@ > > .Lfrom_vm_\@: > push 9 regs to real stack, starting at the bottom > jmp .Lupdate_esp_\@ > > Does that seem reasonable? It's arguably much nicer than what we have > now. But that looks like a good idea. Having a consistent stack with and without vm86 is certainly a nice cleanup. Regards, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 7AE886B0268 for ; Fri, 19 Jan 2018 04:57:07 -0500 (EST) Received: by mail-wm0-f69.google.com with SMTP id p190so828581wmd.0 for ; Fri, 19 Jan 2018 01:57:07 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id 92si1329026edn.468.2018.01.19.01.57.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 01:57:06 -0800 (PST) Date: Fri, 19 Jan 2018 10:57:05 +0100 From: Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180119095705.GZ28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> <20180117141006.GR28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Brian Gerst , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 10:12:32AM -0800, Andy Lutomirski wrote: > I would optimize for simplicity, not speed. You're already planning > to write to CR3, which is serializing, blows away the TLB, *and* takes > the absurdly large amount of time that the microcode needs to blow > away the TLB. Okay, so I am going to do the stack-switch before pt_regs is restored. This is at least better than playing games with hiding the entry/exit %esp somewhere in stack-memory. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 274A66B026B for ; Fri, 19 Jan 2018 04:57:53 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id 33so926637wrs.3 for ; Fri, 19 Jan 2018 01:57:53 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id o91si538498eda.277.2018.01.19.01.57.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 01:57:52 -0800 (PST) Date: Fri, 19 Jan 2018 10:57:51 +0100 From: Joerg Roedel Subject: Re: [PATCH 08/16] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled Message-ID: <20180119095751.GA28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-9-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Wed, Jan 17, 2018 at 03:43:14PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > #ifdef CONFIG_X86_PAE > > .globl initial_pg_pmd > > initial_pg_pmd: > > .fill 1024*KPMDS,4,0 > > + .fill PTI_USER_PGD_FILL,4,0 > > Couldn't this be simplified to just .align PGD_ALIGN, 0 without the .fill? You are right, will change that. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 67A696B0038 for ; Fri, 19 Jan 2018 05:55:31 -0500 (EST) Received: by mail-wr0-f197.google.com with SMTP id c11so967067wrb.23 for ; Fri, 19 Jan 2018 02:55:31 -0800 (PST) Received: from atrey.karlin.mff.cuni.cz (atrey.karlin.mff.cuni.cz. [195.113.26.193]) by mx.google.com with ESMTPS id m25si8128656wrb.162.2018.01.19.02.55.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 02:55:30 -0800 (PST) Date: Fri, 19 Jan 2018 11:55:28 +0100 From: Pavel Machek Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180119105527.GB29725@amd> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZfOjI3PrQbgiZnxM" Content-Disposition: inline In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de --ZfOjI3PrQbgiZnxM Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi! > From: Joerg Roedel >=20 > Hi, >=20 > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) Thanks for doing the work. I tried applying it on top of -next, and that did not succeed. Let me try Linus tree... > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. Um. Ok, testing is something I can do. At least I have excuse to power on T40p. Ok... Testing is something I can do... If I can get it to compile. CC arch/x86/mm/dump_pagetables.o arch/x86/mm/dump_pagetables.c: In function =E2=80=98ptdump_walk_user_pgd_level_checkwx=E2=80=99: arch/x86/mm/dump_pagetables.c:546:26: error: =E2=80=98init_top_pgt=E2=80= =99 undeclared (first use in this function) pgd_t *pgd =3D (pgd_t *) &init_top_pgt; ^ arch/x86/mm/dump_pagetables.c:546:26: note: each undeclared identifier is reported only once for each function it appears in scripts/Makefile.build:316: recipe for target 'arch/x86/mm/dump_pagetables.o' failed make[2]: *** [arch/x86/mm/dump_pagetables.o] Error 1 scripts/Makefile.build:575: recipe for target 'arch/x86/mm' failed make[1]: *** [arch/x86/mm] Error 2 make[1]: *** Waiting for unfinished jobs.... CC arch/x86/platform/intel/iosf_mbi.o =20 Ok, I guess I can disable some config option... Pavel --=20 (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blo= g.html --ZfOjI3PrQbgiZnxM Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlphzp8ACgkQMOfwapXb+vJJTwCgqKLRKD1mKRaeVYX66fFsYamu 7yIAoI0EoZckBNrg01y4Ogj10vnf+FdS =vixT -----END PGP SIGNATURE----- --ZfOjI3PrQbgiZnxM-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id B9EA76B0038 for ; Fri, 19 Jan 2018 06:07:38 -0500 (EST) Received: by mail-pg0-f69.google.com with SMTP id v17so221738pgb.18 for ; Fri, 19 Jan 2018 03:07:38 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id g66si8158639pgc.264.2018.01.19.03.07.37 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 19 Jan 2018 03:07:37 -0800 (PST) Date: Fri, 19 Jan 2018 12:07:26 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180119110726.odea3h3smcjyicnk@suse.de> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180119105527.GB29725@amd> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180119105527.GB29725@amd> Sender: owner-linux-mm@kvack.org List-ID: To: Pavel Machek Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long Hey Pavel, On Fri, Jan 19, 2018 at 11:55:28AM +0100, Pavel Machek wrote: > Thanks for doing the work. > > I tried applying it on top of -next, and that did not succeed. Let me > try Linus tree... Thanks for your help with testing this patch-set, but I recommend to wait for the next version, as review already found a couple of bugs that might crash your system. For example there are NMI cases that might crash your machine because the NMI happens in kernel mode before the cr3 switch. VM86 mode is also definitly broken. I am about to fix that and will send a new version, if all goes well, at some point next week. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 60FDC6B0038 for ; Fri, 19 Jan 2018 07:58:26 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id b186so3268322wmf.0 for ; Fri, 19 Jan 2018 04:58:26 -0800 (PST) Received: from atrey.karlin.mff.cuni.cz (atrey.karlin.mff.cuni.cz. [195.113.26.193]) by mx.google.com with ESMTPS id p11si7333804wre.553.2018.01.19.04.58.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 04:58:25 -0800 (PST) Date: Fri, 19 Jan 2018 13:58:20 +0100 From: Pavel Machek Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180119125819.GA17936@amd> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180119105527.GB29725@amd> <20180119110726.odea3h3smcjyicnk@suse.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NzB8fVQJ5HfG6fxh" Content-Disposition: inline In-Reply-To: <20180119110726.odea3h3smcjyicnk@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri 2018-01-19 12:07:26, Joerg Roedel wrote: > Hey Pavel, >=20 > On Fri, Jan 19, 2018 at 11:55:28AM +0100, Pavel Machek wrote: > > Thanks for doing the work. > >=20 > > I tried applying it on top of -next, and that did not succeed. Let me > > try Linus tree... >=20 > Thanks for your help with testing this patch-set, but I recommend to > wait for the next version, as review already found a couple of bugs that > might crash your system. For example there are NMI cases that might > crash your machine because the NMI happens in kernel mode before the cr3 > switch. VM86 mode is also definitly broken. Thanks for heads-up. I guess I can disable NMI avoid VM86. CONFIG_X86_PTDUMP_CORE should be responsible for boot fail. Disabling it is not at all easy, as CONFIG_EMBEDDED selects CONFIG_EXPERTS selects CONFIG_DEBUG_KERNEL selects CONFIG_X86_PTDUMP_CORE. (Crazy, if you ask me). You may want to test with that enabled. Patch below might fix it. (Signed-off-by: me). Tests so far: kernel boots in qemu. Whole system boots on thinkpad T40p, vulnerabities/meltdown says mitigation: PTI.. so I guess it works. Tested-by: me. :-) Best regards, Pavel diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 2a4849e..896b53b 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -543,7 +543,11 @@ EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs); static void ptdump_walk_user_pgd_level_checkwx(void) { #ifdef CONFIG_PAGE_TABLE_ISOLATION +#ifdef CONFIG_X86_64 pgd_t *pgd =3D (pgd_t *) &init_top_pgt; +#else + pgd_t *pgd =3D swapper_pg_dir; +#endif =20 if (!static_cpu_has(X86_FEATURE_PTI)) return; --=20 (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blo= g.html --NzB8fVQJ5HfG6fxh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlph62sACgkQMOfwapXb+vJiIQCgqBDHc+te64tub1fd2ysUnYzO zUIAn0KcVe+znFkXmNnlqNlZM3gHxU1P =TNq4 -----END PGP SIGNATURE----- --NzB8fVQJ5HfG6fxh-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f69.google.com (mail-vk0-f69.google.com [209.85.213.69]) by kanga.kvack.org (Postfix) with ESMTP id 36EEC6B0069 for ; Fri, 19 Jan 2018 10:27:10 -0500 (EST) Received: by mail-vk0-f69.google.com with SMTP id y127so1050462vkg.17 for ; Fri, 19 Jan 2018 07:27:10 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id w51si2906558edb.141.2018.01.16.11.41.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:41:13 -0800 (PST) Date: Tue, 16 Jan 2018 20:41:12 +0100 From: Joerg Roedel Subject: Re: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries Message-ID: <20180116194112.GD28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-11-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, Jan 16, 2018 at 10:06:48AM -0800, Dave Hansen wrote: > On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > > > In PAE page-tables at the top-level most bits we usually set > > with _KERNPG_TABLE are reserved, resulting in a #GP when > > they are loaded by the processor. > > Can you save me the trip to the SDM and remind me which bits actually > cause trouble here? Everything besides PRESENT, PCD, PWT and the actual physical address, so RW, and NX for example cause a #GP. Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f198.google.com (mail-ua0-f198.google.com [209.85.217.198]) by kanga.kvack.org (Postfix) with ESMTP id 8AE6C6B025E for ; Fri, 19 Jan 2018 10:27:12 -0500 (EST) Received: by mail-ua0-f198.google.com with SMTP id l14so1227087uaa.17 for ; Fri, 19 Jan 2018 07:27:12 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id w1si3020044edk.223.2018.01.16.11.11.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 11:11:06 -0800 (PST) Date: Tue, 16 Jan 2018 20:11:05 +0100 From: Joerg Roedel Subject: Re: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h Message-ID: <20180116191105.GC28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-8-git-send-email-joro@8bytes.org> <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Tue, Jan 16, 2018 at 10:03:09AM -0800, Dave Hansen wrote: > On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); > > +} > > One of the reasons to implement it the other way: > > - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); > > is that the compiler can do this all quickly. KERNEL_PGD_BOUNDARY > depends on PAGE_OFFSET which depends on a variable. IOW, the compiler > can't do it. > > How much worse is the code that this generates? I havn't looked at the actual code this generates, but the (PAGE_SIZE / 2) comparison doesn't work on 32 bit where the address space is not always evenly split. I'll look into a better way to check this. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id D7F626B025F for ; Fri, 19 Jan 2018 10:27:53 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id b111so1429481wrd.16 for ; Fri, 19 Jan 2018 07:27:53 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id t7si1183246edc.248.2018.01.16.08.39.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:26 -0800 (PST) From: Joerg Roedel Subject: [PATCH 13/16] x86/mm/pti: Add an overflow check to pti_clone_pmds() Date: Tue, 16 Jan 2018 17:36:56 +0100 Message-Id: <1516120619-1159-14-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel The addr counter will overflow if we clone the last PMD of the address space, resulting in an endless loop. Check for that and bail out of the loop when it happens. Signed-off-by: Joerg Roedel --- arch/x86/mm/pti.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index a561b5625d6c..faea5faeddc5 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -293,6 +293,10 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) p4d_t *p4d; pud_t *pud; + /* Overflow check */ + if (addr < start) + break; + pgd = pgd_offset_k(addr); if (WARN_ON(pgd_none(*pgd))) return; -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f199.google.com (mail-ua0-f199.google.com [209.85.217.199]) by kanga.kvack.org (Postfix) with ESMTP id D80F46B0069 for ; Fri, 19 Jan 2018 10:28:27 -0500 (EST) Received: by mail-ua0-f199.google.com with SMTP id v26so1254195uaj.19 for ; Fri, 19 Jan 2018 07:28:27 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id i5si1312103edc.211.2018.01.16.08.39.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:23 -0800 (PST) From: Joerg Roedel Subject: [PATCH 15/16] x86/entry/32: Switch between kernel and user cr3 on entry/exit Date: Tue, 16 Jan 2018 17:36:58 +0100 Message-Id: <1516120619-1159-16-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Add the cr3 switches between the kernel and the user page-table when PTI is enabled. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 14018eeb11c3..6a1d9f1e1f89 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -221,6 +221,25 @@ POP_GS_EX .endm +#define PTI_SWITCH_MASK (1 << PAGE_SHIFT) + +.macro SWITCH_TO_KERNEL_CR3 + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + movl %cr3, %edi + andl $(~PTI_SWITCH_MASK), %edi + movl %edi, %cr3 +.Lend_\@: +.endm + +.macro SWITCH_TO_USER_CR3 + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + mov %cr3, %edi + /* Flip the PGD to the user version */ + orl $(PTI_SWITCH_MASK), %edi + mov %edi, %cr3 +.Lend_\@: +.endm + /* * Switch from the entry-trampline stack to the kernel stack of the * running task. @@ -240,6 +259,7 @@ .endif pushl %edi + SWITCH_TO_KERNEL_CR3 movl %esp, %edi /* @@ -309,9 +329,12 @@ .endif pushl 4(%edi) /* fs */ + pushl (%edi) /* edi */ + + SWITCH_TO_USER_CR3 /* Restore user %edi and user %fs */ - movl (%edi), %edi + popl %edi popl %fs .Lend_\@: -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f198.google.com (mail-ua0-f198.google.com [209.85.217.198]) by kanga.kvack.org (Postfix) with ESMTP id 109ED6B025E for ; Fri, 19 Jan 2018 10:28:30 -0500 (EST) Received: by mail-ua0-f198.google.com with SMTP id l14so1229248uaa.17 for ; Fri, 19 Jan 2018 07:28:30 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id d5si2448027edj.327.2018.01.16.08.39.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:23 -0800 (PST) From: Joerg Roedel Subject: [PATCH 16/16] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Date: Tue, 16 Jan 2018 17:36:59 +0100 Message-Id: <1516120619-1159-17-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Allow PTI to be compiled on x86_32. Signed-off-by: Joerg Roedel --- security/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/Kconfig b/security/Kconfig index b0cb9a5f9448..93d85fda0f54 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -57,7 +57,7 @@ config SECURITY_NETWORK config PAGE_TABLE_ISOLATION bool "Remove the kernel mapping in user mode" default y - depends on X86_64 && !UML + depends on X86 && !UML help This feature reduces the number of hardware side channels by ensuring that the majority of kernel addresses are not mapped -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f200.google.com (mail-ua0-f200.google.com [209.85.217.200]) by kanga.kvack.org (Postfix) with ESMTP id 956476B0260 for ; Fri, 19 Jan 2018 10:28:32 -0500 (EST) Received: by mail-ua0-f200.google.com with SMTP id t9so1257778uac.20 for ; Fri, 19 Jan 2018 07:28:32 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id 6si2518739edi.36.2018.01.16.08.39.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:23 -0800 (PST) From: Joerg Roedel Subject: [PATCH 11/16] x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h Date: Tue, 16 Jan 2018 17:36:54 +0100 Message-Id: <1516120619-1159-12-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel There it is also usable from 32 bit code. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable.h | 23 +++++++++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 21 --------------------- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index abafe4d7fd3e..248721971532 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -618,8 +618,31 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size, pmd_t *populate_extra_pmd(unsigned long vaddr); pte_t *populate_extra_pte(unsigned long vaddr); + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); + +/* + * Take a PGD location (pgdp) and a pgd value that needs to be set there. + * Populates the user and returns the resulting PGD that must be set in + * the kernel copy of the page tables. + */ +static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + if (!static_cpu_has(X86_FEATURE_PTI)) + return pgd; + return __pti_set_user_pgd(pgdp, pgd); +} +#else /* CONFIG_PAGE_TABLE_ISOLATION */ +static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + return pgd; +} +#endif /* CONFIG_PAGE_TABLE_ISOLATION */ + #endif /* __ASSEMBLY__ */ + #ifdef CONFIG_X86_32 # include #else diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 3c5a73c8bb50..50a02a32a0b3 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,27 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } -#ifdef CONFIG_PAGE_TABLE_ISOLATION -pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); - -/* - * Take a PGD location (pgdp) and a pgd value that needs to be set there. - * Populates the user and returns the resulting PGD that must be set in - * the kernel copy of the page tables. - */ -static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) -{ - if (!static_cpu_has(X86_FEATURE_PTI)) - return pgd; - return __pti_set_user_pgd(pgdp, pgd); -} -#else -static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) -{ - return pgd; -} -#endif - static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d) { #if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL) -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 169F26B0261 for ; Fri, 19 Jan 2018 10:28:35 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id g13so1446473wrh.19 for ; Fri, 19 Jan 2018 07:28:35 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id x4si1377870edc.501.2018.01.16.08.39.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:22 -0800 (PST) From: Joerg Roedel Subject: [PATCH 14/16] x86/mm/legacy: Populate the user page-table with user pgd's Date: Tue, 16 Jan 2018 17:36:57 +0100 Message-Id: <1516120619-1159-15-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Also populate the user-spage pgd's in the user page-table. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable-2level.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h index 685ffe8a0eaf..d96486d23c58 100644 --- a/arch/x86/include/asm/pgtable-2level.h +++ b/arch/x86/include/asm/pgtable-2level.h @@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte) static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) { +#ifdef CONFIG_PAGE_TABLE_ISOLATION + pmd.pud.p4d.pgd = pti_set_user_pgd(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd); +#endif *pmdp = pmd; } -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id BE09A6B0266 for ; Fri, 19 Jan 2018 10:28:37 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id 17so3482182wma.1 for ; Fri, 19 Jan 2018 07:28:37 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id 94si1666835edn.200.2018.01.16.08.39.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:21 -0800 (PST) From: Joerg Roedel Subject: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's Date: Tue, 16 Jan 2018 17:36:55 +0100 Message-Id: <1516120619-1159-13-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel This is the last part of the PAE page-table setup for PAE before we can add the CR3 switch to the entry code. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable-3level.h | 3 +++ arch/x86/mm/pti.c | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index bc4af5453802..910f0b35370e 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) static inline void native_set_pud(pud_t *pudp, pud_t pud) { +#ifdef CONFIG_PAGE_TABLE_ISOLATION + pud.p4d.pgd = pti_set_user_pgd(&pudp->p4d.pgd, pud.p4d.pgd); +#endif set_64bit((unsigned long long *)(pudp), native_pud_val(pud)); } diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 6b6bfd13350e..a561b5625d6c 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -122,6 +122,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) */ kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd; +#ifdef CONFIG_X86_64 /* * If this is normal user memory, make it NX in the kernel * pagetables so that, if we somehow screw up and return to @@ -134,10 +135,16 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) * may execute from it * - we don't have NX support * - we're clearing the PGD (i.e. the new pgd is not present). + * - We run on a 32 bit kernel. 2-level paging doesn't support NX at + * all and PAE paging does not support it on the PGD level. We can + * set it in the PMD level there in the future, but that means we + * need to unshare the PMDs between the kernel and the user + * page-tables. */ if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && (__supported_pte_mask & _PAGE_NX)) pgd.pgd |= _PAGE_NX; +#endif /* return the copy of the PGD we want the kernel to use: */ return pgd; -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f197.google.com (mail-ua0-f197.google.com [209.85.217.197]) by kanga.kvack.org (Postfix) with ESMTP id 09564280244 for ; Fri, 19 Jan 2018 10:28:40 -0500 (EST) Received: by mail-ua0-f197.google.com with SMTP id c10so1222718uae.23 for ; Fri, 19 Jan 2018 07:28:40 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id l89si2576173ede.122.2018.01.16.08.39.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:21 -0800 (PST) From: Joerg Roedel Subject: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries Date: Tue, 16 Jan 2018 17:36:53 +0100 Message-Id: <1516120619-1159-11-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel With PAE paging we don't have PGD and P4D levels in the page-table, instead the PUD level is the highest one. In PAE page-tables at the top-level most bits we usually set with _KERNPG_TABLE are reserved, resulting in a #GP when they are loaded by the processor. Work around this by populating PUD entries in the user page-table only with _PAGE_PRESENT set. I am pretty sure there is a cleaner way to do this, but until I find it use this #ifdef solution. Signed-off-by: Joerg Roedel --- arch/x86/mm/pti.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 20be21301a59..6b6bfd13350e 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -202,8 +202,12 @@ static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address) unsigned long new_pmd_page = __get_free_page(gfp); if (!new_pmd_page) return NULL; - +#ifdef CONFIG_X86_PAE + /* TODO: There must be a cleaner way to do this */ + set_pud(pud, __pud(_PAGE_PRESENT | __pa(new_pmd_page))); +#else set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page))); +#endif } return pmd_offset(pud, address); -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f198.google.com (mail-ua0-f198.google.com [209.85.217.198]) by kanga.kvack.org (Postfix) with ESMTP id 7EAED280244 for ; Fri, 19 Jan 2018 10:28:45 -0500 (EST) Received: by mail-ua0-f198.google.com with SMTP id j18so1261572uag.4 for ; Fri, 19 Jan 2018 07:28:45 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id e45si2650818eda.4.2018.01.16.08.39.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:20 -0800 (PST) From: Joerg Roedel Subject: [PATCH 05/16] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Date: Tue, 16 Jan 2018 17:36:48 +0100 Message-Id: <1516120619-1159-6-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Make them available on 32 bit and clone_pgd_range() happy. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable.h | 49 +++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 49 --------------------------------------- 2 files changed, 49 insertions(+), 49 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e42b8943cb1a..0a9f746cbdc1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1109,6 +1109,55 @@ static inline int pud_write(pud_t pud) return pud_flags(pud) & _PAGE_RW; } +#ifdef CONFIG_PAGE_TABLE_ISOLATION +/* + * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages + * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and + * the user one is in the last 4k. To switch between them, you + * just need to flip the 12th bit in their addresses. + */ +#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT + +/* + * This generates better code than the inline assembly in + * __set_bit(). + */ +static inline void *ptr_set_bit(void *ptr, int bit) +{ + unsigned long __ptr = (unsigned long)ptr; + + __ptr |= BIT(bit); + return (void *)__ptr; +} +static inline void *ptr_clear_bit(void *ptr, int bit) +{ + unsigned long __ptr = (unsigned long)ptr; + + __ptr &= ~BIT(bit); + return (void *)__ptr; +} + +static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp) +{ + return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp) +{ + return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp) +{ + return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) +{ + return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); +} +#endif /* CONFIG_PAGE_TABLE_ISOLATION */ + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 81462e9a34f6..58d7f10e937d 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,55 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } -#ifdef CONFIG_PAGE_TABLE_ISOLATION -/* - * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages - * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and - * the user one is in the last 4k. To switch between them, you - * just need to flip the 12th bit in their addresses. - */ -#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT - -/* - * This generates better code than the inline assembly in - * __set_bit(). - */ -static inline void *ptr_set_bit(void *ptr, int bit) -{ - unsigned long __ptr = (unsigned long)ptr; - - __ptr |= BIT(bit); - return (void *)__ptr; -} -static inline void *ptr_clear_bit(void *ptr, int bit) -{ - unsigned long __ptr = (unsigned long)ptr; - - __ptr &= ~BIT(bit); - return (void *)__ptr; -} - -static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp) -{ - return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); -} - -static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp) -{ - return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); -} - -static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp) -{ - return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); -} - -static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) -{ - return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); -} -#endif /* CONFIG_PAGE_TABLE_ISOLATION */ - /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 64002280244 for ; Fri, 19 Jan 2018 10:28:48 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id z83so1225076wmc.5 for ; Fri, 19 Jan 2018 07:28:48 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id p90si2509759edp.379.2018.01.16.08.39.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:21 -0800 (PST) From: Joerg Roedel Subject: [PATCH 09/16] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 Date: Tue, 16 Jan 2018 17:36:52 +0100 Message-Id: <1516120619-1159-10-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Cloning on the P4D level would clone the complete kernel address space into the user-space page-tables for PAE kernels. Cloning on PMD level is fine for PAE and legacy paging. Signed-off-by: Joerg Roedel --- arch/x86/mm/pti.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index ce38f165489b..20be21301a59 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -308,6 +308,7 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) } } +#ifdef CONFIG_X86_64 /* * Clone a single p4d (i.e. a top-level entry on 4-level systems and a * next-level entry on 5-level systems. @@ -322,13 +323,29 @@ static void __init pti_clone_p4d(unsigned long addr) kernel_p4d = p4d_offset(kernel_pgd, addr); *user_p4d = *kernel_p4d; } +#endif /* * Clone the CPU_ENTRY_AREA into the user space visible page table. */ static void __init pti_clone_user_shared(void) { +#ifdef CONFIG_X86_32 + /* + * On 32 bit PAE systems with 1GB of Kernel address space there is only + * one pgd/p4d for the whole kernel. Cloning that would map the whole + * address space into the user page-tables, making PTI useless. So clone + * the page-table on the PMD level to prevent that. + */ + unsigned long start, end; + + start = CPU_ENTRY_AREA_BASE; + end = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES); + + pti_clone_pmds(start, end, _PAGE_GLOBAL); +#else pti_clone_p4d(CPU_ENTRY_AREA_BASE); +#endif } /* -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id B76A2280244 for ; Fri, 19 Jan 2018 10:28:51 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id b195so1235563wmb.1 for ; Fri, 19 Jan 2018 07:28:51 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id o5si2436321eda.525.2018.01.16.08.39.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:20 -0800 (PST) From: Joerg Roedel Subject: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h Date: Tue, 16 Jan 2018 17:36:50 +0100 Message-Id: <1516120619-1159-8-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel These two functions are required for PTI on 32 bit: * pgdp_maps_userspace() * pgd_large() Also re-implement pgdp_maps_userspace() so that it will work on 64 and 32 bit kernels. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable.h | 16 ++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 15 --------------- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 0a9f746cbdc1..abafe4d7fd3e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1109,6 +1109,22 @@ static inline int pud_write(pud_t pud) return pud_flags(pud) & _PAGE_RW; } +/* + * Page table pages are page-aligned. The lower half of the top + * level is used for userspace and the top half for the kernel. + * + * Returns true for parts of the PGD that map userspace and + * false for the parts that map the kernel. + */ +static inline bool pgdp_maps_userspace(void *__ptr) +{ + unsigned long ptr = (unsigned long)__ptr; + + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); +} + +static inline int pgd_large(pgd_t pgd) { return 0; } + #ifdef CONFIG_PAGE_TABLE_ISOLATION /* * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 58d7f10e937d..3c5a73c8bb50 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,20 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } -/* - * Page table pages are page-aligned. The lower half of the top - * level is used for userspace and the top half for the kernel. - * - * Returns true for parts of the PGD that map userspace and - * false for the parts that map the kernel. - */ -static inline bool pgdp_maps_userspace(void *__ptr) -{ - unsigned long ptr = (unsigned long)__ptr; - - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); -} - #ifdef CONFIG_PAGE_TABLE_ISOLATION pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); @@ -208,7 +194,6 @@ extern void sync_global_pgds(unsigned long start, unsigned long end); /* * Level 4 access. */ -static inline int pgd_large(pgd_t pgd) { return 0; } #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE) /* PUD - Level3 access */ -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f199.google.com (mail-ua0-f199.google.com [209.85.217.199]) by kanga.kvack.org (Postfix) with ESMTP id 49305280244 for ; Fri, 19 Jan 2018 10:28:54 -0500 (EST) Received: by mail-ua0-f199.google.com with SMTP id 19so1265922uae.15 for ; Fri, 19 Jan 2018 07:28:54 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id 5si2484322edb.158.2018.01.16.08.39.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:20 -0800 (PST) From: Joerg Roedel Subject: [PATCH 08/16] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled Date: Tue, 16 Jan 2018 17:36:51 +0100 Message-Id: <1516120619-1159-9-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Allocate a kernel and a user page-table root when PTI is enabled. Also allocate a full page per root for PAEm because otherwise the bit to flip in cr3 to switch between them would be non-constant, which creates a lot of hassle. Keep that for a later optimization. Signed-off-by: Joerg Roedel --- arch/x86/kernel/head_32.S | 23 ++++++++++++++++++----- arch/x86/mm/pgtable.c | 11 ++++++----- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index c29020907886..fc550559bf58 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -512,28 +512,41 @@ ENTRY(initial_code) ENTRY(setup_once_ref) .long setup_once +#ifdef CONFIG_PAGE_TABLE_ISOLATION +#define PGD_ALIGN (2 * PAGE_SIZE) +#define PTI_USER_PGD_FILL 1024 +#else +#define PGD_ALIGN (PAGE_SIZE) +#define PTI_USER_PGD_FILL 0 +#endif /* * BSS section */ __PAGE_ALIGNED_BSS - .align PAGE_SIZE + .align PGD_ALIGN #ifdef CONFIG_X86_PAE .globl initial_pg_pmd initial_pg_pmd: .fill 1024*KPMDS,4,0 + .fill PTI_USER_PGD_FILL,4,0 #else .globl initial_page_table initial_page_table: .fill 1024,4,0 + .fill PTI_USER_PGD_FILL,4,0 #endif + .align PGD_ALIGN initial_pg_fixmap: .fill 1024,4,0 -.globl empty_zero_page -empty_zero_page: - .fill 4096,1,0 + .fill PTI_USER_PGD_FILL,4,0 .globl swapper_pg_dir + .align PGD_ALIGN swapper_pg_dir: .fill 1024,4,0 + .fill PTI_USER_PGD_FILL,4,0 +.globl empty_zero_page +empty_zero_page: + .fill 4096,1,0 EXPORT_SYMBOL(empty_zero_page) /* @@ -542,7 +555,7 @@ EXPORT_SYMBOL(empty_zero_page) #ifdef CONFIG_X86_PAE __PAGE_ALIGNED_DATA /* Page-aligned for the benefit of paravirt? */ - .align PAGE_SIZE + .align PGD_ALIGN ENTRY(initial_page_table) .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 004abf9ebf12..48abefd95924 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -313,7 +313,7 @@ static int __init pgd_cache_init(void) * When PAE kernel is running as a Xen domain, it does not use * shared kernel pmd. And this requires a whole page for pgd. */ - if (!SHARED_KERNEL_PMD) + if (static_cpu_has(X86_FEATURE_PTI) || !SHARED_KERNEL_PMD) return 0; /* @@ -337,8 +337,9 @@ static inline pgd_t *_pgd_alloc(void) * If no SHARED_KERNEL_PMD, PAE kernel is running as a Xen domain. * We allocate one page for pgd. */ - if (!SHARED_KERNEL_PMD) - return (pgd_t *)__get_free_page(PGALLOC_GFP); + if (static_cpu_has(X86_FEATURE_PTI) || !SHARED_KERNEL_PMD) + return (pgd_t *)__get_free_pages(PGALLOC_GFP, + PGD_ALLOCATION_ORDER); /* * Now PAE kernel is not running as a Xen domain. We can allocate @@ -349,8 +350,8 @@ static inline pgd_t *_pgd_alloc(void) static inline void _pgd_free(pgd_t *pgd) { - if (!SHARED_KERNEL_PMD) - free_page((unsigned long)pgd); + if (static_cpu_has(X86_FEATURE_PTI) || !SHARED_KERNEL_PMD) + free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER); else kmem_cache_free(pgd_cache, pgd); } -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f69.google.com (mail-vk0-f69.google.com [209.85.213.69]) by kanga.kvack.org (Postfix) with ESMTP id 01316280244 for ; Fri, 19 Jan 2018 10:28:57 -0500 (EST) Received: by mail-vk0-f69.google.com with SMTP id d130so1055511vkf.6 for ; Fri, 19 Jan 2018 07:28:56 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id i2si2316257edc.272.2018.01.16.08.39.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:20 -0800 (PST) From: Joerg Roedel Subject: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Date: Tue, 16 Jan 2018 17:36:49 +0100 Message-Id: <1516120619-1159-7-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Reserve 2MB/4MB of address space for mapping the LDT to user-space. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable_32_types.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index ce245b0cdfca..3c30a7fcae68 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -47,9 +47,12 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */ #define CPU_ENTRY_AREA_BASE \ ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK) -#define PKMAP_BASE \ +#define LDT_BASE_ADDR \ ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK) +#define PKMAP_BASE \ + ((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK) + #ifdef CONFIG_HIGHMEM # define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE) #else -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 227C96B025F for ; Fri, 19 Jan 2018 10:29:00 -0500 (EST) Received: by mail-wr0-f199.google.com with SMTP id y111so1468248wrc.2 for ; Fri, 19 Jan 2018 07:29:00 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id r44si611841edd.42.2018.01.16.08.39.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:20 -0800 (PST) From: Joerg Roedel Subject: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Date: Tue, 16 Jan 2018 17:36:46 +0100 Message-Id: <1516120619-1159-4-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Switch back to the trampoline stack before returning to userspace. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 58 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/asm-offsets_32.c | 1 + 2 files changed, 59 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 5a7bdb73be9f..14018eeb11c3 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -263,6 +263,61 @@ .endm /* + * Switch back from the kernel stack to the entry stack. + * + * iret_frame > 0 adds code to copie over an iret frame from the old to + * the new stack. It also adds a check which bails out if + * we are not returning to user-space. + * + * This macro is allowed not modify eflags when iret_frame == 0. + */ +.macro SWITCH_TO_ENTRY_STACK iret_frame=0 + .if \iret_frame > 0 + /* Are we returning to userspace? */ + testb $3, 4(%esp) /* return CS */ + jz .Lend_\@ + .endif + + /* + * We run with user-%fs already loaded from pt_regs, so we don't + * have access to per_cpu data anymore, and there is no swapgs + * equivalent on x86_32. + * We work around this by loading the kernel-%fs again and + * reading the entry stack address from there. Then we restore + * the user-%fs and return. + */ + pushl %fs + pushl %edi + + /* Re-load kernel-%fs, after that we can use PER_CPU_VAR */ + movl $(__KERNEL_PERCPU), %edi + movl %edi, %fs + + /* Save old stack pointer to copy the return frame over if needed */ + movl %esp, %edi + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %esp + + /* Now we are on the entry stack */ + + .if \iret_frame > 0 + /* Stack frame: ss, esp, eflags, cs, eip, fs, edi */ + pushl 6*4(%edi) /* ss */ + pushl 5*4(%edi) /* esp */ + pushl 4*4(%edi) /* eflags */ + pushl 3*4(%edi) /* cs */ + pushl 2*4(%edi) /* eip */ + .endif + + pushl 4(%edi) /* fs */ + + /* Restore user %edi and user %fs */ + movl (%edi), %edi + popl %fs + +.Lend_\@: +.endm + +/* * %eax: prev task * %edx: next task */ @@ -512,6 +567,8 @@ ENTRY(entry_SYSENTER_32) btr $X86_EFLAGS_IF_BIT, (%esp) popfl + SWITCH_TO_ENTRY_STACK + /* * Return back to the vDSO, which will pop ecx and edx. * Don't bother with DS and ES (they already contain __USER_DS). @@ -601,6 +658,7 @@ restore_all: .Lrestore_nocheck: RESTORE_REGS 4 # skip orig_eax/error_code .Lirq_return: + SWITCH_TO_ENTRY_STACK iret_frame=1 INTERRUPT_RETURN .section .fixup, "ax" diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index 7270dd834f4b..b628f898edd2 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -50,6 +50,7 @@ void foo(void) DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - offsetofend(struct cpu_entry_area, entry_stack_page.stack)); + OFFSET(TSS_sp0, tss_struct, x86_tss.sp0); OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); #ifdef CONFIG_CC_STACKPROTECTOR -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 1A1766B026A for ; Fri, 19 Jan 2018 10:29:03 -0500 (EST) Received: by mail-wm0-f69.google.com with SMTP id v14so1266695wmd.3 for ; Fri, 19 Jan 2018 07:29:03 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id l25si2317683edf.456.2018.01.16.08.39.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:18 -0800 (PST) From: Joerg Roedel Subject: [RFC PATCH 00/16] PTI support for x86-32 Date: Tue, 16 Jan 2018 17:36:43 +0100 Message-Id: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Hi, here is my current WIP code to enable PTI on x86-32. It is still in a pretty early state, but it successfully boots my KVM guest with PAE and with legacy paging. The existing PTI code for x86-64 already prepares a lot of the stuff needed for 32 bit too, thanks for that to all the people involved in its development :) The patches are split as follows: - 1-3 contain the entry-code changes to enter and exit the kernel via the sysenter trampoline stack. - 4-7 are fixes to get the code compile on 32 bit with CONFIG_PAGE_TABLE_ISOLATION=y. - 8-14 adapt the existing PTI code to work properly on 32 bit and add the needed parts to 32 bit page-table code. - 15 switches PTI on by adding the CR3 switches to kernel entry/exit. - 16 enables the Kconfig for all of X86 The code has not run on bare-metal yet, I'll test that in the next days once I setup a 32 bit box again. I also havn't tested Wine and DosEMU yet, so this might also be broken. With that post I'd like to ask for all kinds of constructive feedback on the approaches I have taken and of course the many things I broke with it :) One of the things that are surely broken is XEN_PV support. I'd appreciate any help with testing and bugfixing on that front. So please review and let me know your thoughts. Thanks, Joerg Joerg Roedel (16): x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack x86/entry/32: Enter the kernel via trampoline stack x86/entry/32: Leave the kernel via the trampoline stack x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h x86/mm/ldt: Reserve high address-space range for the LDT x86/mm: Move two more functions from pgtable_64.h to pgtable.h x86/pgtable/32: Allocate 8k page-tables when PTI is enabled x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 x86/mm/pti: Populate valid user pud entries x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h x86/mm/pae: Populate the user page-table with user pgd's x86/mm/pti: Add an overflow check to pti_clone_pmds() x86/mm/legacy: Populate the user page-table with user pgd's x86/entry/32: Switch between kernel and user cr3 on entry/exit x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 arch/x86/entry/entry_32.S | 170 +++++++++++++++++++++++++++++--- arch/x86/include/asm/pgtable-2level.h | 3 + arch/x86/include/asm/pgtable-3level.h | 3 + arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ arch/x86/include/asm/pgtable_32_types.h | 5 +- arch/x86/include/asm/pgtable_64.h | 85 ---------------- arch/x86/include/asm/processor-flags.h | 8 +- arch/x86/include/asm/switch_to.h | 6 +- arch/x86/kernel/asm-offsets_32.c | 5 +- arch/x86/kernel/cpu/common.c | 5 +- arch/x86/kernel/head_32.S | 23 ++++- arch/x86/kernel/process.c | 2 - arch/x86/kernel/process_32.c | 6 ++ arch/x86/mm/pgtable.c | 11 ++- arch/x86/mm/pti.c | 34 ++++++- security/Kconfig | 2 +- 16 files changed, 333 insertions(+), 123 deletions(-) -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f69.google.com (mail-vk0-f69.google.com [209.85.213.69]) by kanga.kvack.org (Postfix) with ESMTP id AF46A6B026A for ; Fri, 19 Jan 2018 10:29:05 -0500 (EST) Received: by mail-vk0-f69.google.com with SMTP id s75so1050062vke.23 for ; Fri, 19 Jan 2018 07:29:05 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id c10si715702edf.457.2018.01.16.08.39.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:19 -0800 (PST) From: Joerg Roedel Subject: [PATCH 04/16] x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Date: Tue, 16 Jan 2018 17:36:47 +0100 Message-Id: <1516120619-1159-5-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Move it out of the X86_64 specific processor defines so that its visible for 32bit too. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/processor-flags.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index 625a52a5594f..02c2cbda4a74 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -39,10 +39,6 @@ #define CR3_PCID_MASK 0xFFFull #define CR3_NOFLUSH BIT_ULL(63) -#ifdef CONFIG_PAGE_TABLE_ISOLATION -# define X86_CR3_PTI_PCID_USER_BIT 11 -#endif - #else /* * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save @@ -53,4 +49,8 @@ #define CR3_NOFLUSH 0 #endif +#ifdef CONFIG_PAGE_TABLE_ISOLATION +# define X86_CR3_PTI_PCID_USER_BIT 11 +#endif + #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f200.google.com (mail-ua0-f200.google.com [209.85.217.200]) by kanga.kvack.org (Postfix) with ESMTP id AD90B6B026C for ; Fri, 19 Jan 2018 10:29:08 -0500 (EST) Received: by mail-ua0-f200.google.com with SMTP id e8so1222911uam.22 for ; Fri, 19 Jan 2018 07:29:08 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id h10si2489819eda.203.2018.01.16.08.39.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:18 -0800 (PST) From: Joerg Roedel Subject: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Date: Tue, 16 Jan 2018 17:36:45 +0100 Message-Id: <1516120619-1159-3-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel Use the sysenter stack as a trampoline stack to enter the kernel. The sysenter stack is already in the cpu_entry_area and will be mapped to userspace when PTI is enabled. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 89 +++++++++++++++++++++++++++++++++++----- arch/x86/include/asm/switch_to.h | 6 +-- arch/x86/kernel/asm-offsets_32.c | 4 +- arch/x86/kernel/cpu/common.c | 5 ++- arch/x86/kernel/process.c | 2 - arch/x86/kernel/process_32.c | 6 +++ 6 files changed, 91 insertions(+), 21 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index eb8c5615777b..5a7bdb73be9f 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -222,6 +222,47 @@ .endm /* + * Switch from the entry-trampline stack to the kernel stack of the + * running task. + * + * nr_regs is the number of dwords to push from the entry stack to the + * task stack. If it is > 0 it expects an irq frame at the bottom of the + * stack. + * + * check_user != 0 it will add a check to only switch stacks if the + * kernel entry was from user-space. + */ +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 + + .if \check_user > 0 && \nr_regs > 0 + testb $3, (\nr_regs - 4)*4(%esp) /* CS */ + jz .Lend_\@ + .endif + + pushl %edi + movl %esp, %edi + + /* + * TSS_sysenter_stack is the offset from the bottom of the + * entry-stack + */ + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp + + /* Copy the registers over */ + .if \nr_regs > 0 + i = 0 + .rept \nr_regs + pushl (\nr_regs - i) * 4(%edi) + i = i + 1 + .endr + .endif + + mov (%edi), %edi + +.Lend_\@: +.endm + +/* * %eax: prev task * %edx: next task */ @@ -401,7 +442,9 @@ ENTRY(xen_sysenter_target) * 0(%ebp) arg6 */ ENTRY(entry_SYSENTER_32) - movl TSS_sysenter_stack(%esp), %esp + /* Kernel stack is empty */ + SWITCH_TO_KERNEL_STACK + .Lsysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp /* pt_regs->sp (stashed in bp) */ @@ -521,6 +564,10 @@ ENDPROC(entry_SYSENTER_32) ENTRY(entry_INT80_32) ASM_CLAC pushl %eax /* pt_regs->orig_ax */ + + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL pt_regs_ax=$-ENOSYS /* save rest */ /* @@ -655,6 +702,10 @@ END(irq_entries_start) common_interrupt: ASM_CLAC addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */ + + /* Stack layout: ss, esp, eflags, cs, eip, vector */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL ENCODE_FRAME_POINTER TRACE_IRQS_OFF @@ -663,16 +714,17 @@ common_interrupt: jmp ret_from_intr ENDPROC(common_interrupt) -#define BUILD_INTERRUPT3(name, nr, fn) \ -ENTRY(name) \ - ASM_CLAC; \ - pushl $~(nr); \ - SAVE_ALL; \ - ENCODE_FRAME_POINTER; \ - TRACE_IRQS_OFF \ - movl %esp, %eax; \ - call fn; \ - jmp ret_from_intr; \ +#define BUILD_INTERRUPT3(name, nr, fn) \ +ENTRY(name) \ + ASM_CLAC; \ + pushl $~(nr); \ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1; \ + SAVE_ALL; \ + ENCODE_FRAME_POINTER; \ + TRACE_IRQS_OFF \ + movl %esp, %eax; \ + call fn; \ + jmp ret_from_intr; \ ENDPROC(name) #define BUILD_INTERRUPT(name, nr) \ @@ -893,6 +945,9 @@ ENTRY(page_fault) END(page_fault) common_exception: + /* Stack layout: ss, esp, eflags, cs, eip, error_code, handler */ + SWITCH_TO_KERNEL_STACK nr_regs=7 check_user=1 + /* the function address is in %gs's slot on the stack */ pushl %fs pushl %es @@ -936,6 +991,10 @@ ENTRY(debug) */ ASM_CLAC pushl $-1 # mark this as an int + + /* Stack layout: ss, esp, eflags, cs, eip, $-1 */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL ENCODE_FRAME_POINTER xorl %edx, %edx # error code 0 @@ -971,6 +1030,10 @@ END(debug) */ ENTRY(nmi) ASM_CLAC + + /* Stack layout: ss, esp, eflags, cs, eip */ + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 + #ifdef CONFIG_X86_ESPFIX32 pushl %eax movl %ss, %eax @@ -1034,6 +1097,10 @@ END(nmi) ENTRY(int3) ASM_CLAC pushl $-1 # mark this as an int + + /* Stack layout: ss, esp, eflags, cs, eip, vector */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL ENCODE_FRAME_POINTER TRACE_IRQS_OFF diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index eb5f7999a893..20e5f7ab8260 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) /* This is used when switching tasks or entering/exiting vm86 mode. */ static inline void update_sp0(struct task_struct *task) { - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ -#ifdef CONFIG_X86_32 - load_sp0(task->thread.sp0); -#else + /* sp0 always points to the entry trampoline stack, which is constant: */ if (static_cpu_has(X86_FEATURE_XENPV)) load_sp0(task_top_of_stack(task)); -#endif } #endif /* _ASM_X86_SWITCH_TO_H */ diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index 654229bac2fc..7270dd834f4b 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -47,9 +47,11 @@ void foo(void) BLANK(); /* Offset from the sysenter stack to tss.sp0 */ - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - offsetofend(struct cpu_entry_area, entry_stack_page.stack)); + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); + #ifdef CONFIG_CC_STACKPROTECTOR BLANK(); OFFSET(stack_canary_offset, stack_canary, canary); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ef29ad001991..20a71c914e59 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1649,11 +1649,12 @@ void cpu_init(void) enter_lazy_tlb(&init_mm, curr); /* - * Initialize the TSS. Don't bother initializing sp0, as the initial - * task never enters user mode. + * Initialize the TSS. sp0 points to the entry trampoline stack + * regardless of what task is running. */ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); + load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1)); load_mm_ldt(&init_mm); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 832a6acd730f..a9950946b263 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -57,14 +57,12 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = { */ .sp0 = (1UL << (BITS_PER_LONG-1)) + 1, -#ifdef CONFIG_X86_64 /* * .sp1 is cpu_current_top_of_stack. The init task never * runs user code, but cpu_current_top_of_stack should still * be well defined before the first context switch. */ .sp1 = TOP_OF_INIT_STACK, -#endif #ifdef CONFIG_X86_32 .ss0 = __KERNEL_DS, diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 5224c6099184..452eeac00b80 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -292,6 +292,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + THREAD_SIZE); + /* + * TODO: Find a way to let cpu_current_top_of_stack point to + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with + * iret exceptions. + */ + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); /* * Restore %gs if needed (which is common) -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f197.google.com (mail-ua0-f197.google.com [209.85.217.197]) by kanga.kvack.org (Postfix) with ESMTP id C450C6B026E for ; Fri, 19 Jan 2018 10:29:11 -0500 (EST) Received: by mail-ua0-f197.google.com with SMTP id 19so1266371uae.15 for ; Fri, 19 Jan 2018 07:29:11 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id g7si2653098edj.376.2018.01.16.08.39.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 08:39:18 -0800 (PST) From: Joerg Roedel Subject: [PATCH 01/16] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack Date: Tue, 16 Jan 2018 17:36:44 +0100 Message-Id: <1516120619-1159-2-git-send-email-joro@8bytes.org> In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org From: Joerg Roedel The stack addresss doesn't need to be stored in tss.sp0 if we switch manually like on sysenter. Rename the offset so that it still makes sense when we its location. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 2 +- arch/x86/kernel/asm-offsets_32.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index a1f28a54f23a..eb8c5615777b 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -401,7 +401,7 @@ ENTRY(xen_sysenter_target) * 0(%ebp) arg6 */ ENTRY(entry_SYSENTER_32) - movl TSS_sysenter_sp0(%esp), %esp + movl TSS_sysenter_stack(%esp), %esp .Lsysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp /* pt_regs->sp (stashed in bp) */ diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index fa1261eefa16..654229bac2fc 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -47,7 +47,7 @@ void foo(void) BLANK(); /* Offset from the sysenter stack to tss.sp0 */ - DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - offsetofend(struct cpu_entry_area, entry_stack_page.stack)); #ifdef CONFIG_CC_STACKPROTECTOR -- 2.13.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f71.google.com (mail-vk0-f71.google.com [209.85.213.71]) by kanga.kvack.org (Postfix) with ESMTP id 6A7D46B0276 for ; Fri, 19 Jan 2018 10:30:04 -0500 (EST) Received: by mail-vk0-f71.google.com with SMTP id h185so86541vkg.20 for ; Fri, 19 Jan 2018 07:30:04 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id 35si2808805edk.490.2018.01.17.01.33.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 01:33:32 -0800 (PST) Date: Wed, 17 Jan 2018 10:33:31 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180117093331.GL28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Hi Andy, thanks a lot for your review and input, especially on the entry-code changes! On Tue, Jan 16, 2018 at 02:26:22PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > The code has not run on bare-metal yet, I'll test that in > > the next days once I setup a 32 bit box again. I also havn't > > tested Wine and DosEMU yet, so this might also be broken. > > > > If you pass all the x86 selftests, then Wine and DOSEMU are pretty > likely to work :) Okay, good to know. I will definitily run them and make them pass :) Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f70.google.com (mail-vk0-f70.google.com [209.85.213.70]) by kanga.kvack.org (Postfix) with ESMTP id 641796B0277 for ; Fri, 19 Jan 2018 10:30:07 -0500 (EST) Received: by mail-vk0-f70.google.com with SMTP id k20so1055655vki.11 for ; Fri, 19 Jan 2018 07:30:07 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id z32si2814228edc.87.2018.01.17.01.02.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 01:02:39 -0800 (PST) Date: Wed, 17 Jan 2018 10:02:38 +0100 From: Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180117090238.GH28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> Sender: owner-linux-mm@kvack.org List-ID: To: Boris Ostrovsky Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Hi Boris, thanks for testing this :) On Tue, Jan 16, 2018 at 09:47:06PM -0500, Boris Ostrovsky wrote: > On 01/16/2018 11:36 AM, Joerg Roedel wrote: > >+.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 > > > This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. > > With those macros fixed I was able to boot 32-bit Xen PV guest. Hmm, on bare metal the stack switch happens regardless of the X86_FEATURE_PTI feature being set, because we always program tss.sp0 with the systenter stack. How is the kernel entry stack setup on xen-pv? I think something is missing there instead. Regards, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 8DF706B026B for ; Fri, 19 Jan 2018 11:30:56 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id a9so2197315pff.0 for ; Fri, 19 Jan 2018 08:30:56 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id u69si8448230pgb.10.2018.01.19.08.30.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 08:30:55 -0800 (PST) Received: from mail-io0-f173.google.com (mail-io0-f173.google.com [209.85.223.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B6E2B21759 for ; Fri, 19 Jan 2018 16:30:54 +0000 (UTC) Received: by mail-io0-f173.google.com with SMTP id f4so964489ioh.8 for ; Fri, 19 Jan 2018 08:30:54 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180119095523.GY28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> From: Andy Lutomirski Date: Fri, 19 Jan 2018 08:30:33 -0800 Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Fri, Jan 19, 2018 at 1:55 AM, Joerg Roedel wrote: > Hey Andy, > > On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote: >> On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > >> > Just read up on vm86 mode control transfers and the stack layout then. >> > Looks like I need to check for eflags.vm=1 and copy four more registers >> > from/to the entry stack. Thanks for pointing that out. >> >> You could just copy those slots unconditionally. After all, you're >> slowing down entries by an epic amount due to writing CR3 on with PCID >> off, so four words copied should be entirely lost in the noise. OTOH, >> checking for VM86 mode is just a single bt against EFLAGS. >> >> With the modern (rewritten a year or two ago by Brian Gerst) vm86 >> code, all the slots (those actually in pt_regs) are in the same >> location regardless of whether we're in VM86 mode or not, but we're >> still fiddling with the bottom of the stack. Since you're controlling >> the switch to the kernel thread stack, you can easily just write the >> frame to the correct location, so you should not need to context >> switch sp1 -- you can do it sanely and leave sp1 as the actual bottom >> of the kernel stack no matter what. In fact, you could probably avoid >> context switching sp0, either, which would be a nice cleanup. > > I am not sure what you mean by "not context switching sp0/sp1" ... You're supposed to read what I meant, not what I said... I meant that we could have sp0 have a genuinely constant value per cpu. That means that the entry trampoline ends up with RIP, etc in a different place depending on whether VM was in use, but the entry trampoline code should be able to handle that. sp1 would have a value that varies by task, but it could just point to the top of the stack instead of being changed depending on whether VM is in use. Instead, the entry trampoline would offset the registers as needed to keep pt_regs in the right place. I think you already figured all of that out, though :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id 593BA6B0033 for ; Sun, 21 Jan 2018 15:13:18 -0500 (EST) Received: by mail-pg0-f72.google.com with SMTP id e28so6582334pgn.23 for ; Sun, 21 Jan 2018 12:13:18 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id k1sor400910pgk.394.2018.01.21.12.13.16 for (Google Transport Security); Sun, 21 Jan 2018 12:13:16 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Date: Sun, 21 Jan 2018 12:13:13 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de I am looking on PTI on x86-32, but I did not mange to get the PoC to = work on this setup (kaslr disabled, similar setup works on 64-bit). Did you use any PoC to =E2=80=9Ctest=E2=80=9D the protection? Thanks, Nadav Joerg Roedel wrote: > From: Joerg Roedel >=20 > Hi, >=20 > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) >=20 > The patches are split as follows: >=20 > - 1-3 contain the entry-code changes to enter and > exit the kernel via the sysenter trampoline stack. >=20 > - 4-7 are fixes to get the code compile on 32 bit > with CONFIG_PAGE_TABLE_ISOLATION=3Dy. >=20 > - 8-14 adapt the existing PTI code to work properly > on 32 bit and add the needed parts to 32 bit > page-table code. >=20 > - 15 switches PTI on by adding the CR3 switches to > kernel entry/exit. >=20 > - 16 enables the Kconfig for all of X86 >=20 > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. >=20 > With that post I'd like to ask for all kinds of constructive > feedback on the approaches I have taken and of course the > many things I broke with it :) >=20 > One of the things that are surely broken is XEN_PV support. > I'd appreciate any help with testing and bugfixing on that > front. >=20 > So please review and let me know your thoughts. >=20 > Thanks, >=20 > Joerg >=20 > Joerg Roedel (16): > x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack > x86/entry/32: Enter the kernel via trampoline stack > x86/entry/32: Leave the kernel via the trampoline stack > x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 > x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h > x86/mm/ldt: Reserve high address-space range for the LDT > x86/mm: Move two more functions from pgtable_64.h to pgtable.h > x86/pgtable/32: Allocate 8k page-tables when PTI is enabled > x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 > x86/mm/pti: Populate valid user pud entries > x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h > x86/mm/pae: Populate the user page-table with user pgd's > x86/mm/pti: Add an overflow check to pti_clone_pmds() > x86/mm/legacy: Populate the user page-table with user pgd's > x86/entry/32: Switch between kernel and user cr3 on entry/exit > x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 >=20 > arch/x86/entry/entry_32.S | 170 = +++++++++++++++++++++++++++++--- > arch/x86/include/asm/pgtable-2level.h | 3 + > arch/x86/include/asm/pgtable-3level.h | 3 + > arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ > arch/x86/include/asm/pgtable_32_types.h | 5 +- > arch/x86/include/asm/pgtable_64.h | 85 ---------------- > arch/x86/include/asm/processor-flags.h | 8 +- > arch/x86/include/asm/switch_to.h | 6 +- > arch/x86/kernel/asm-offsets_32.c | 5 +- > arch/x86/kernel/cpu/common.c | 5 +- > arch/x86/kernel/head_32.S | 23 ++++- > arch/x86/kernel/process.c | 2 - > arch/x86/kernel/process_32.c | 6 ++ > arch/x86/mm/pgtable.c | 11 ++- > arch/x86/mm/pti.c | 34 ++++++- > security/Kconfig | 2 +- > 16 files changed, 333 insertions(+), 123 deletions(-) >=20 > --=20 > 2.13.6 >=20 > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 8B4E76B0003 for ; Sun, 21 Jan 2018 15:44:58 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id b75so6914969pfk.22 for ; Sun, 21 Jan 2018 12:44:58 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id x124sor2790888pgx.240.2018.01.21.12.44.57 for (Google Transport Security); Sun, 21 Jan 2018 12:44:57 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> Date: Sun, 21 Jan 2018 12:44:53 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Please ignore my previous email. I got it working=E2=80=A6 Sorry for the = spam. Nadav Amit wrote: > I am looking on PTI on x86-32, but I did not mange to get the PoC to = work on > this setup (kaslr disabled, similar setup works on 64-bit). >=20 > Did you use any PoC to =E2=80=9Ctest=E2=80=9D the protection? >=20 > Thanks, > Nadav >=20 >=20 > Joerg Roedel wrote: >=20 >> From: Joerg Roedel >>=20 >> Hi, >>=20 >> here is my current WIP code to enable PTI on x86-32. It is >> still in a pretty early state, but it successfully boots my >> KVM guest with PAE and with legacy paging. The existing PTI >> code for x86-64 already prepares a lot of the stuff needed >> for 32 bit too, thanks for that to all the people involved >> in its development :) >>=20 >> The patches are split as follows: >>=20 >> - 1-3 contain the entry-code changes to enter and >> exit the kernel via the sysenter trampoline stack. >>=20 >> - 4-7 are fixes to get the code compile on 32 bit >> with CONFIG_PAGE_TABLE_ISOLATION=3Dy. >>=20 >> - 8-14 adapt the existing PTI code to work properly >> on 32 bit and add the needed parts to 32 bit >> page-table code. >>=20 >> - 15 switches PTI on by adding the CR3 switches to >> kernel entry/exit. >>=20 >> - 16 enables the Kconfig for all of X86 >>=20 >> The code has not run on bare-metal yet, I'll test that in >> the next days once I setup a 32 bit box again. I also havn't >> tested Wine and DosEMU yet, so this might also be broken. >>=20 >> With that post I'd like to ask for all kinds of constructive >> feedback on the approaches I have taken and of course the >> many things I broke with it :) >>=20 >> One of the things that are surely broken is XEN_PV support. >> I'd appreciate any help with testing and bugfixing on that >> front. >>=20 >> So please review and let me know your thoughts. >>=20 >> Thanks, >>=20 >> Joerg >>=20 >> Joerg Roedel (16): >> x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack >> x86/entry/32: Enter the kernel via trampoline stack >> x86/entry/32: Leave the kernel via the trampoline stack >> x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 >> x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h >> x86/mm/ldt: Reserve high address-space range for the LDT >> x86/mm: Move two more functions from pgtable_64.h to pgtable.h >> x86/pgtable/32: Allocate 8k page-tables when PTI is enabled >> x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 >> x86/mm/pti: Populate valid user pud entries >> x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h >> x86/mm/pae: Populate the user page-table with user pgd's >> x86/mm/pti: Add an overflow check to pti_clone_pmds() >> x86/mm/legacy: Populate the user page-table with user pgd's >> x86/entry/32: Switch between kernel and user cr3 on entry/exit >> x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 >>=20 >> arch/x86/entry/entry_32.S | 170 = +++++++++++++++++++++++++++++--- >> arch/x86/include/asm/pgtable-2level.h | 3 + >> arch/x86/include/asm/pgtable-3level.h | 3 + >> arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ >> arch/x86/include/asm/pgtable_32_types.h | 5 +- >> arch/x86/include/asm/pgtable_64.h | 85 ---------------- >> arch/x86/include/asm/processor-flags.h | 8 +- >> arch/x86/include/asm/switch_to.h | 6 +- >> arch/x86/kernel/asm-offsets_32.c | 5 +- >> arch/x86/kernel/cpu/common.c | 5 +- >> arch/x86/kernel/head_32.S | 23 ++++- >> arch/x86/kernel/process.c | 2 - >> arch/x86/kernel/process_32.c | 6 ++ >> arch/x86/mm/pgtable.c | 11 ++- >> arch/x86/mm/pti.c | 34 ++++++- >> security/Kconfig | 2 +- >> 16 files changed, 333 insertions(+), 123 deletions(-) >>=20 >> --=20 >> 2.13.6 >>=20 >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 2377F6B0003 for ; Sun, 21 Jan 2018 18:46:30 -0500 (EST) Received: by mail-pg0-f69.google.com with SMTP id o16so7019505pgv.3 for ; Sun, 21 Jan 2018 15:46:30 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id k6sor3108058pgp.230.2018.01.21.15.46.28 for (Google Transport Security); Sun, 21 Jan 2018 15:46:28 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> Date: Sun, 21 Jan 2018 15:46:24 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de I wanted to see whether segments protection can be a replacement for PTI (yes, excluding SMEP emulation), or whether speculative execution = =E2=80=9Cignores=E2=80=9D limit checks, similarly to the way paging protection is skipped. It does seem that segmentation provides sufficient protection from = Meltdown. The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the segment = limit is set to prevent access to the kernel memory. [ It passes if the limit is not = set, even if the DS is reloaded. ] My test is enclosed below. So my question: wouldn=E2=80=99t it be much more efficient to use = segmentation protection for x86-32, and allow users to choose whether they want = SMEP-like protection if needed (and then enable PTI)? [ There might be some corner cases in which setting a segment limit introduces a problem, for example when modify_ldt() is used to set = invalid limit, but I presume that these are relatively uncommon, can be detected = on runtime, and PTI can then be used as a fallback mechanism. ] Thanks, Nadav -- >8 -- Subject: [PATCH] Test segmentation protection --- libkdump/libkdump.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/libkdump/libkdump.c b/libkdump/libkdump.c index c590391..db5bac3 100644 --- a/libkdump/libkdump.c +++ b/libkdump/libkdump.c @@ -10,6 +10,9 @@ #include #include #include +#include +#include +#include =20 libkdump_config_t libkdump_auto_config =3D {0}; =20 @@ -500,6 +503,31 @@ int __attribute__((optimize("-Os"), noinline)) = libkdump_read_tsx() { return 0; } =20 +extern int modify_ldt(int, void*, unsigned long); + +void change_ds(void) +{ + int r; + struct user_desc desc =3D { + .entry_number =3D 1, + .base_addr =3D 0, +#ifdef NO_SEGMENTS + .limit =3D 0xffffeu, +#else + .limit =3D 0xbffffu, +#endif + .seg_32bit =3D 1, + .contents =3D 0, + .read_exec_only =3D 0, + .limit_in_pages =3D 1, + .seg_not_present =3D 0, + }; + + r =3D modify_ldt(1 /* write */, &desc, sizeof(desc)); + assert(r =3D=3D 0); + asm volatile ("mov %0, %%ds\n\t" : : "r"((1 << 3) | (1 << 2) | = 3)); +} + // = --------------------------------------------------------------------------= - int __attribute__((optimize("-Os"), noinline)) = libkdump_read_signal_handler() { size_t retries =3D config.retries + 1; @@ -507,6 +535,9 @@ int __attribute__((optimize("-Os"), noinline)) = libkdump_read_signal_handler() { =20 while (retries--) { if (!setjmp(buf)) { + /* longjmp reloads the original DS... */ + change_ds(); + MELTDOWN; } Nadav Amit wrote: > Please ignore my previous email. I got it working=E2=80=A6 Sorry for = the spam. >=20 >=20 > Nadav Amit wrote: >=20 >> I am looking on PTI on x86-32, but I did not mange to get the PoC to = work on >> this setup (kaslr disabled, similar setup works on 64-bit). >>=20 >> Did you use any PoC to =E2=80=9Ctest=E2=80=9D the protection? >>=20 >> Thanks, >> Nadav >>=20 >>=20 >> Joerg Roedel wrote: >>=20 >>> From: Joerg Roedel >>>=20 >>> Hi, >>>=20 >>> here is my current WIP code to enable PTI on x86-32. It is >>> still in a pretty early state, but it successfully boots my >>> KVM guest with PAE and with legacy paging. The existing PTI >>> code for x86-64 already prepares a lot of the stuff needed >>> for 32 bit too, thanks for that to all the people involved >>> in its development :) >>>=20 >>> The patches are split as follows: >>>=20 >>> - 1-3 contain the entry-code changes to enter and >>> exit the kernel via the sysenter trampoline stack. >>>=20 >>> - 4-7 are fixes to get the code compile on 32 bit >>> with CONFIG_PAGE_TABLE_ISOLATION=3Dy. >>>=20 >>> - 8-14 adapt the existing PTI code to work properly >>> on 32 bit and add the needed parts to 32 bit >>> page-table code. >>>=20 >>> - 15 switches PTI on by adding the CR3 switches to >>> kernel entry/exit. >>>=20 >>> - 16 enables the Kconfig for all of X86 >>>=20 >>> The code has not run on bare-metal yet, I'll test that in >>> the next days once I setup a 32 bit box again. I also havn't >>> tested Wine and DosEMU yet, so this might also be broken. >>>=20 >>> With that post I'd like to ask for all kinds of constructive >>> feedback on the approaches I have taken and of course the >>> many things I broke with it :) >>>=20 >>> One of the things that are surely broken is XEN_PV support. >>> I'd appreciate any help with testing and bugfixing on that >>> front. >>>=20 >>> So please review and let me know your thoughts. >>>=20 >>> Thanks, >>>=20 >>> Joerg >>>=20 >>> Joerg Roedel (16): >>> x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack >>> x86/entry/32: Enter the kernel via trampoline stack >>> x86/entry/32: Leave the kernel via the trampoline stack >>> x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 >>> x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h >>> x86/mm/ldt: Reserve high address-space range for the LDT >>> x86/mm: Move two more functions from pgtable_64.h to pgtable.h >>> x86/pgtable/32: Allocate 8k page-tables when PTI is enabled >>> x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 >>> x86/mm/pti: Populate valid user pud entries >>> x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h >>> x86/mm/pae: Populate the user page-table with user pgd's >>> x86/mm/pti: Add an overflow check to pti_clone_pmds() >>> x86/mm/legacy: Populate the user page-table with user pgd's >>> x86/entry/32: Switch between kernel and user cr3 on entry/exit >>> x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 >>>=20 >>> arch/x86/entry/entry_32.S | 170 = +++++++++++++++++++++++++++++--- >>> arch/x86/include/asm/pgtable-2level.h | 3 + >>> arch/x86/include/asm/pgtable-3level.h | 3 + >>> arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ >>> arch/x86/include/asm/pgtable_32_types.h | 5 +- >>> arch/x86/include/asm/pgtable_64.h | 85 ---------------- >>> arch/x86/include/asm/processor-flags.h | 8 +- >>> arch/x86/include/asm/switch_to.h | 6 +- >>> arch/x86/kernel/asm-offsets_32.c | 5 +- >>> arch/x86/kernel/cpu/common.c | 5 +- >>> arch/x86/kernel/head_32.S | 23 ++++- >>> arch/x86/kernel/process.c | 2 - >>> arch/x86/kernel/process_32.c | 6 ++ >>> arch/x86/mm/pgtable.c | 11 ++- >>> arch/x86/mm/pti.c | 34 ++++++- >>> security/Kconfig | 2 +- >>> 16 files changed, 333 insertions(+), 123 deletions(-) >>>=20 >>> --=20 >>> 2.13.6 >>>=20 >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f198.google.com (mail-io0-f198.google.com [209.85.223.198]) by kanga.kvack.org (Postfix) with ESMTP id 91D29800D8 for ; Sun, 21 Jan 2018 21:11:09 -0500 (EST) Received: by mail-io0-f198.google.com with SMTP id n19so8441029iob.7 for ; Sun, 21 Jan 2018 18:11:09 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id u185sor3913317itf.144.2018.01.21.18.11.08 for (Google Transport Security); Sun, 21 Jan 2018 18:11:08 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> From: Linus Torvalds Date: Sun, 21 Jan 2018 18:11:07 -0800 Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Nadav Amit Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Sun, Jan 21, 2018 at 3:46 PM, Nadav Amit wrote: > I wanted to see whether segments protection can be a replacement for PTI > (yes, excluding SMEP emulation), or whether speculative execution =E2=80= =9Cignores=E2=80=9D > limit checks, similarly to the way paging protection is skipped. > > It does seem that segmentation provides sufficient protection from Meltdo= wn. > The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the segment = limit is set to > prevent access to the kernel memory. [ It passes if the limit is not set, > even if the DS is reloaded. ] My test is enclosed below. Interesting. It might not be entirely reliable for all microarchitectures, though. > So my question: wouldn=E2=80=99t it be much more efficient to use segment= ation > protection for x86-32, and allow users to choose whether they want SMEP-l= ike > protection if needed (and then enable PTI)? That's what we did long long ago, with user space segments actually using the limit (in fact, if you go back far enough, the kernel even used the base). You'd have to make sure that the LDT loading etc do not allow CPL3 segments with base+limit past TASK_SIZE, so that people can't generate their own. And the TLS segments also need to be limited (and remember, the limit has to be TASK_SIZE-base, not just TASK_SIZE). And we should check with Intel that segment limit checking really is guaranteed to be done before any access. Too bad x86-64 got rid of the segments ;) Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 13D13800D8 for ; Sun, 21 Jan 2018 21:28:09 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id h38so5663882wrh.11 for ; Sun, 21 Jan 2018 18:28:09 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id c2sor9256731edi.46.2018.01.21.18.28.07 for (Google Transport Security); Sun, 21 Jan 2018 18:28:07 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: Date: Sun, 21 Jan 2018 18:27:59 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <8B8147E4-0560-456D-BA23-F0037C80C945@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Linus Torvalds wrote: > On Sun, Jan 21, 2018 at 3:46 PM, Nadav Amit = wrote: >> I wanted to see whether segments protection can be a replacement for = PTI >> (yes, excluding SMEP emulation), or whether speculative execution = =E2=80=9Cignores=E2=80=9D >> limit checks, similarly to the way paging protection is skipped. >>=20 >> It does seem that segmentation provides sufficient protection from = Meltdown. >> The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the = segment limit is set to >> prevent access to the kernel memory. [ It passes if the limit is not = set, >> even if the DS is reloaded. ] My test is enclosed below. >=20 > Interesting. It might not be entirely reliable for all > microarchitectures, though. >=20 >> So my question: wouldn=E2=80=99t it be much more efficient to use = segmentation >> protection for x86-32, and allow users to choose whether they want = SMEP-like >> protection if needed (and then enable PTI)? >=20 > That's what we did long long ago, with user space segments actually > using the limit (in fact, if you go back far enough, the kernel even > used the base). >=20 > You'd have to make sure that the LDT loading etc do not allow CPL3 > segments with base+limit past TASK_SIZE, so that people can't generate > their own. And the TLS segments also need to be limited (and > remember, the limit has to be TASK_SIZE-base, not just TASK_SIZE). >=20 > And we should check with Intel that segment limit checking really is > guaranteed to be done before any access. Thanks. I=E2=80=99ll try to check with Intel liaison people of VMware = (my employer), yet any feedback will be appreciated. > Too bad x86-64 got rid of the segments ;) Actually, as I noted in a different thread, running 32-bit binaries on x86_64 in legacy-mode, without PTI, performs considerably better than = x86_64 binaries with PTI for workloads that are hit the most (e.g., Redis). By dynamically removing the 64-bit user-CS from the GDT, this mode should = be safe, as long as CS load is not done speculatively. Regards, Nadav= -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 68BEC800D8 for ; Sun, 21 Jan 2018 21:35:55 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id v25so7673045pfg.14 for ; Sun, 21 Jan 2018 18:35:55 -0800 (PST) Received: from mail.zytor.com (terminus.zytor.com. [65.50.211.136]) by mx.google.com with ESMTPS id a13si9891292pgt.663.2018.01.21.18.35.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 21 Jan 2018 18:35:54 -0800 (PST) Date: Sun, 21 Jan 2018 18:20:11 -0800 In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: hpa@zytor.com Message-ID: <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds , Nadav Amit Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On January 21, 2018 6:11:07 PM PST, Linus Torvalds wrote: >On Sun, Jan 21, 2018 at 3:46 PM, Nadav Amit >wrote: >> I wanted to see whether segments protection can be a replacement for >PTI >> (yes, excluding SMEP emulation), or whether speculative execution >=E2=80=9Cignores=E2=80=9D >> limit checks, similarly to the way paging protection is skipped=2E >> >> It does seem that segmentation provides sufficient protection from >Meltdown=2E >> The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the segmen= t limit is set >to >> prevent access to the kernel memory=2E [ It passes if the limit is not >set, >> even if the DS is reloaded=2E ] My test is enclosed below=2E > >Interesting=2E It might not be entirely reliable for all >microarchitectures, though=2E > >> So my question: wouldn=E2=80=99t it be much more efficient to use >segmentation >> protection for x86-32, and allow users to choose whether they want >SMEP-like >> protection if needed (and then enable PTI)? > >That's what we did long long ago, with user space segments actually >using the limit (in fact, if you go back far enough, the kernel even >used the base)=2E > >You'd have to make sure that the LDT loading etc do not allow CPL3 >segments with base+limit past TASK_SIZE, so that people can't generate >their own=2E And the TLS segments also need to be limited (and >remember, the limit has to be TASK_SIZE-base, not just TASK_SIZE)=2E > >And we should check with Intel that segment limit checking really is >guaranteed to be done before any access=2E > >Too bad x86-64 got rid of the segments ;) > > Linus No idea about Intel, but at least on Transmeta CPUs the limit check was as= ynchronous with the access=2E --=20 Sent from my Android device with K-9 Mail=2E Please excuse my brevity=2E -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 18B35800D8 for ; Mon, 22 Jan 2018 03:56:32 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id h17so4841937wmc.6 for ; Mon, 22 Jan 2018 00:56:32 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [81.169.241.247]) by mx.google.com with ESMTPS id m2si2714677edf.99.2018.01.22.00.56.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 00:56:27 -0800 (PST) Date: Mon, 22 Jan 2018 09:56:25 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180122085625.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Nadav Amit Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Hey Nadav, On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > It does seem that segmentation provides sufficient protection from Meltdown. Thanks for testing this, if this turns out to be true for all affected uarchs it would be a great and better way of protection than enabling PTI. But I'd like an official statement from Intel on that one, as their recommended fix is still to use PTI. And as you said, if it turns out that this works only on some Intel uarchs, we can also detect it at runtime and then chose the fasted meltdown protection mechanism. Thanks, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id DBD40800D8 for ; Mon, 22 Jan 2018 04:54:54 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id c142so4916409wmh.4 for ; Mon, 22 Jan 2018 01:54:54 -0800 (PST) Received: from smtp-out4.electric.net (smtp-out4.electric.net. [192.162.216.185]) by mx.google.com with ESMTPS id h89si268936edd.471.2018.01.22.01.54.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 01:54:53 -0800 (PST) From: David Laight Subject: RE: [RFC PATCH 00/16] PTI support for x86-32 Date: Mon, 22 Jan 2018 09:55:31 +0000 Message-ID: <7f37ff1c10b04b2386c2044cdc8e38be@AcuMS.aculab.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> In-Reply-To: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> Content-Language: en-US Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: 'Nadav Amit' , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "aliguori@amazon.com" , "daniel.gruss@iaik.tugraz.at" , "hughd@google.com" , "keescook@google.com" , Andrea Arcangeli , Waiman Long , "jroedel@suse.de" RnJvbTogTmFkYXYgQW1pdA0KPiBTZW50OiAyMSBKYW51YXJ5IDIwMTggMjM6NDYNCj4gDQo+IEkg d2FudGVkIHRvIHNlZSB3aGV0aGVyIHNlZ21lbnRzIHByb3RlY3Rpb24gY2FuIGJlIGEgcmVwbGFj ZW1lbnQgZm9yIFBUSQ0KPiAoeWVzLCBleGNsdWRpbmcgU01FUCBlbXVsYXRpb24pLCBvciB3aGV0 aGVyIHNwZWN1bGF0aXZlIGV4ZWN1dGlvbiDigJxpZ25vcmVz4oCdDQo+IGxpbWl0IGNoZWNrcywg c2ltaWxhcmx5IHRvIHRoZSB3YXkgcGFnaW5nIHByb3RlY3Rpb24gaXMgc2tpcHBlZC4NCg0KVGhh dCdzIG1hZGUgbWUgcmVtZW1iZXIgc29tZXRoaW5nIGFib3V0IHNlZ21lbnQgbGltaXRzIGFwcGx5 aW5nIGluIDY0Yml0IG1vZGUuDQpJIHJlYWxseSBjYW4ndCByZW1lbWJlciB0aGUgZGV0YWlscyBh dCBhbGwuDQpJJ20gc3VyZSBpdCBoYWQgc29tZXRoaW5nIHRvIGRvIHdpdGggb25lIG9mIHRoZSBW TSBpbXBsZW1lbnRhdGlvbnMgcmVzdHJpY3RpbmcNCm1lbW9yeSBhY2Nlc3Nlcy4NCg0KCURhdmlk DQoNCg== -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 81626800D8 for ; Mon, 22 Jan 2018 05:04:11 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id r82so4774126wme.0 for ; Mon, 22 Jan 2018 02:04:11 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id x4si3811477edc.501.2018.01.22.02.04.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 02:04:10 -0800 (PST) Date: Mon, 22 Jan 2018 11:04:09 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180122100409.GF28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <7f37ff1c10b04b2386c2044cdc8e38be@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7f37ff1c10b04b2386c2044cdc8e38be@AcuMS.aculab.com> Sender: owner-linux-mm@kvack.org List-ID: To: David Laight Cc: 'Nadav Amit' , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "aliguori@amazon.com" , "daniel.gruss@iaik.tugraz.at" , "hughd@google.com" , "keescook@google.com" , Andrea Arcangeli , Waiman Long , "jroedel@suse.de" On Mon, Jan 22, 2018 at 09:55:31AM +0000, David Laight wrote: > That's made me remember something about segment limits applying in 64bit mode. > I really can't remember the details at all. > I'm sure it had something to do with one of the VM implementations restricting > memory accesses. Some AMD chips have long-mode segment limits, not sure if Intel has them too. But they are useless here because the limit is 32 bit and can only protect the upper 4GB of virtual address space. The limits also don't apply to GS and CS segements. Regards, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 492A4800D8 for ; Mon, 22 Jan 2018 05:11:20 -0500 (EST) Received: by mail-wr0-f200.google.com with SMTP id s9so4738752wra.10 for ; Mon, 22 Jan 2018 02:11:20 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id c10si80807edf.457.2018.01.22.02.11.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 02:11:19 -0800 (PST) Date: Mon, 22 Jan 2018 11:11:18 +0100 From: Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180122101118.GG28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Hey Andy, On Fri, Jan 19, 2018 at 08:30:33AM -0800, Andy Lutomirski wrote: > I meant that we could have sp0 have a genuinely constant value per > cpu. That means that the entry trampoline ends up with RIP, etc in a > different place depending on whether VM was in use, but the entry > trampoline code should be able to handle that. sp1 would have a value > that varies by task, but it could just point to the top of the stack > instead of being changed depending on whether VM is in use. Instead, > the entry trampoline would offset the registers as needed to keep > pt_regs in the right place. > > I think you already figured all of that out, though :) Yes, and after looking a while into it, it would make a nice cleanup for the entry code. On the other side, it would change the layout for the in-kernel 'struct pt_regs', so that the user-visible pt_regs ends up with a different layout than the one we use in the the kernel. This can certainly be all worked out, but it makes this nice entry-code cleanup not so nice and clean anymore. At least the work required to make it work without breaking user-space is not in the scope of this patch-set. Regards, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id AC6E3800D8 for ; Mon, 22 Jan 2018 12:46:47 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id e2so3307313ioa.22 for ; Mon, 22 Jan 2018 09:46:47 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id h71si13132473ioe.267.2018.01.22.09.46.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 09:46:46 -0800 (PST) Received: from mail-io0-f171.google.com (mail-io0-f171.google.com [209.85.223.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2EA972178F for ; Mon, 22 Jan 2018 17:46:45 +0000 (UTC) Received: by mail-io0-f171.google.com with SMTP id f4so8508167ioh.8 for ; Mon, 22 Jan 2018 09:46:45 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180122101118.GG28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> <20180122101118.GG28161@8bytes.org> From: Andy Lutomirski Date: Mon, 22 Jan 2018 09:46:24 -0800 Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Mon, Jan 22, 2018 at 2:11 AM, Joerg Roedel wrote: > Hey Andy, > > On Fri, Jan 19, 2018 at 08:30:33AM -0800, Andy Lutomirski wrote: >> I meant that we could have sp0 have a genuinely constant value per >> cpu. That means that the entry trampoline ends up with RIP, etc in a >> different place depending on whether VM was in use, but the entry >> trampoline code should be able to handle that. sp1 would have a value >> that varies by task, but it could just point to the top of the stack >> instead of being changed depending on whether VM is in use. Instead, >> the entry trampoline would offset the registers as needed to keep >> pt_regs in the right place. >> >> I think you already figured all of that out, though :) > > Yes, and after looking a while into it, it would make a nice cleanup for > the entry code. On the other side, it would change the layout for the > in-kernel 'struct pt_regs', so that the user-visible pt_regs ends up > with a different layout than the one we use in the the kernel. I don't think this is necessarily the case. We end up with four more fields that are logically there at the end of pt_regs (which is already kind-of-sort-of the case), but we don't actually need to put them in struct pt_regs. We just end up with (regs + 1) != "top of task stack", but even that has precedent -- it's already true for tasks in vm86 mode. > > This can certainly be all worked out, but it makes this nice entry-code > cleanup not so nice and clean anymore. At least the work required to > make it work without breaking user-space is not in the scope of this > patch-set. Agreed. This should probably be saved for later. Except that your patch set still needs to come up with some way to function correctly on vm86. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id 3F86F280247 for ; Mon, 22 Jan 2018 15:14:06 -0500 (EST) Received: by mail-io0-f197.google.com with SMTP id q18so7544866ioh.4 for ; Mon, 22 Jan 2018 12:14:06 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id n25sor9047939iob.176.2018.01.22.12.14.04 for (Google Transport Security); Mon, 22 Jan 2018 12:14:05 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> From: Linus Torvalds Date: Mon, 22 Jan 2018 12:14:03 -0800 Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Peter Anvin Cc: Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Sun, Jan 21, 2018 at 6:20 PM, wrote: > > No idea about Intel, but at least on Transmeta CPUs the limit check was asynchronous with the access. Yes, but TMTA had a really odd uarch and didn't check segment limits natively. When you do it in hardware. the limit check is actually fairly natural to do early rather than late (since it acts on the linear address _before_ base add and TLB lookup). So it's not like it can't be done late, but there are reasons why a traditional microarchitecture might always end up doing the limit check early and so segmentation might be a good defense against meltdown on 32-bit Intel. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id E8636800D8 for ; Mon, 22 Jan 2018 16:25:19 -0500 (EST) Received: by mail-it0-f69.google.com with SMTP id z39so12590520ita.1 for ; Mon, 22 Jan 2018 13:25:19 -0800 (PST) Received: from mail.zytor.com (terminus.zytor.com. [65.50.211.136]) by mx.google.com with ESMTPS id q129si13870757ioe.313.2018.01.22.13.25.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 13:25:18 -0800 (PST) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> From: "H. Peter Anvin" Message-ID: Date: Mon, 22 Jan 2018 13:10:19 -0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On 01/22/18 12:14, Linus Torvalds wrote: > On Sun, Jan 21, 2018 at 6:20 PM, wrote: >> >> No idea about Intel, but at least on Transmeta CPUs the limit check was asynchronous with the access. > > Yes, but TMTA had a really odd uarch and didn't check segment limits natively. > Only on TM3000 ("Wilma") and TM5000 ("Fred"), not on TM8000 ("Astro"). Astro might in fact have been more synchronous than most modern machines (see below.) > When you do it in hardware. the limit check is actually fairly natural > to do early rather than late (since it acts on the linear address > _before_ base add and TLB lookup). > > So it's not like it can't be done late, but there are reasons why a > traditional microarchitecture might always end up doing the limit > check early and so segmentation might be a good defense against > meltdown on 32-bit Intel. I will try to investigate, but as you can imagine the amount of bandwidth I might be able to get on this is definitely going to be limited. All of the below is generic discussion that almost certainly can be found in some form in Hennesey & Patterson, and so I don't have to worry about giving away Intel secrets: It isn't really true that it is natural to check this early. One of the most fundamental frequency limiters in a modern CPU architecture (meaning anything from the last 20 years or so) has been the data-dependent AGU-D$-AGU loop. Note that this doesn't even include the TLB: the TLB is looked up in parallel with the D$, and if the result was *either* a cache-TLB mismatch or a TLB miss the result is prevented from committing. In the case of the x86, the AGU receives up to three sources plus the segment base, and if possible given the target process and gates available might be designed to have a unified 4-input adder, with the 3-input case for limit checks being done separately. Misses and even more so exceptions (which are far less frequent than misses) are demoted to a slower where the goal is to prevent commit rather than trying to race to be in the data path. So although it is natural to *issue* the load and the limit check at the same time, the limit check is still going to be deferred. Whether or not it is permitted to be fully asynchronous with the load is probably a tradeoff of timing requirements vs complexity. At least theoretically one could imagine a machine which would take the trap after the speculative machine had already chased the pointer loop several levels down; this would most likely mean separate uops to allow for the existing out-of-order machine to do the bookkeeping. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id DA8CE800D8 for ; Tue, 23 Jan 2018 09:39:23 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id z83so514997wmc.5 for ; Tue, 23 Jan 2018 06:39:23 -0800 (PST) Received: from fuzix.org (www.llwyncelyn.cymru. [82.70.14.225]) by mx.google.com with ESMTPS id u6si374331wrb.183.2018.01.23.06.39.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jan 2018 06:39:22 -0800 (PST) Date: Tue, 23 Jan 2018 14:38:31 +0000 From: Alan Cox Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180123143831.2d769f9d@alans-desktop> In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Linus Torvalds , Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel > of timing requirements vs complexity. At least theoretically one could > imagine a machine which would take the trap after the speculative > machine had already chased the pointer loop several levels down; this > would most likely mean separate uops to allow for the existing > out-of-order machine to do the bookkeeping. It's not quite the same but in the IA-64 case you can write itanium code that does exactly that. The speculation is expressed in software not hardware (because you can trigger a load, then check later if it worked out and respond appripriately). Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id BD9A2800D8 for ; Tue, 23 Jan 2018 09:57:59 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id c142so558296wmh.4 for ; Tue, 23 Jan 2018 06:57:59 -0800 (PST) Received: from fuzix.org (www.llwyncelyn.cymru. [82.70.14.225]) by mx.google.com with ESMTPS id 3si6677053wmc.56.2018.01.23.06.57.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jan 2018 06:57:58 -0800 (PST) Date: Tue, 23 Jan 2018 14:57:17 +0000 From: Alan Cox Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180123145717.75c84e9a@alans-desktop> In-Reply-To: <20180122085625.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <20180122085625.GE28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Nadav Amit , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Mon, 22 Jan 2018 09:56:25 +0100 Joerg Roedel wrote: > Hey Nadav, > > On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > > It does seem that segmentation provides sufficient protection from Meltdown. > > Thanks for testing this, if this turns out to be true for all affected > uarchs it would be a great and better way of protection than enabling > PTI. > > But I'd like an official statement from Intel on that one, as their > recommended fix is still to use PTI. > > And as you said, if it turns out that this works only on some Intel > uarchs, we can also detect it at runtime and then chose the fasted > meltdown protection mechanism. I'll follow this up and get an official statement. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f70.google.com (mail-lf0-f70.google.com [209.85.215.70]) by kanga.kvack.org (Postfix) with ESMTP id CD163800D8 for ; Wed, 24 Jan 2018 13:58:02 -0500 (EST) Received: by mail-lf0-f70.google.com with SMTP id r13so1406258lff.22 for ; Wed, 24 Jan 2018 10:58:02 -0800 (PST) Received: from shrek.podlesie.net (shrek-3s.podlesie.net. [2a00:13a0:3010::1]) by mx.google.com with ESMTP id h8si1122112lfk.359.2018.01.24.10.58.00 for ; Wed, 24 Jan 2018 10:58:00 -0800 (PST) Date: Wed, 24 Jan 2018 19:58:00 +0100 From: Krzysztof Mazur Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180124185800.GA11515@shrek.podlesie.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli On Tue, Jan 16, 2018 at 05:36:43PM +0100, Joerg Roedel wrote: > From: Joerg Roedel > > Hi, > > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) Hi, I've waited for this patches for a long time, until I've tried to exploit meltdown on some old 32-bit CPUs and failed. Pentium M seems to speculatively execute the second load with eax always equal to 0: movzx (%[addr]), %%eax shl $12, %%eax movzx (%[target], %%eax), %%eax And on Pentium 4-based Xeon the second load seems to be never executed, even without shift (shifts are slow on some or all Pentium 4's). Maybe not all P6 and Netbursts CPUs are affected, but I'm not sure. Maybe the kernel, at least on 32-bit, should try to exploit meltdown to test if the CPU is really affected. The series boots on Pentium M (and crashes when I've used perf, but it is an already known issue). However, I don't like the performance regression with CONFIG_PAGE_TABLE_ISOLATION=n (about 7.2%), trivial "benchmark": --- cut here --- #include #include int main(void) { unsigned long i; int fd; fd = open("/dev/null", O_WRONLY); for (i = 0; i < 10000000; i++) { char x = 0; write(fd, &x, 1); } return 0; } --- cut here --- Time (on Pentium M 1.73 GHz): baseline (4.15.0-rc8-gdda3e152): 2.415 s (+/- 0.64%) patched, without CONFIG_PAGE_TABLE_ISOLATION=n 2.588 s (+/- 0.01%) patched, nopti 2.597 s (+/- 0.31%) patched, pti 18.272 s (some older kernel, pre 4.15) 2.378 s Thanks, Krzysiek -- perf results: baseline: Performance counter stats for './bench' (5 runs): 2401.539139 task-clock:HG # 0.995 CPUs utilized ( +- 0.23% ) 23 context-switches:HG # 0.009 K/sec ( +- 4.02% ) 0 cpu-migrations:HG # 0.000 K/sec 30 page-faults:HG # 0.013 K/sec ( +- 1.24% ) 4142375834 cycles:HG # 1.725 GHz ( +- 0.23% ) [39.99%] 385110908 stalled-cycles-frontend:HG # 9.30% frontend cycles idle ( +- 0.06% ) [40.01%] stalled-cycles-backend:HG 4142489274 instructions:HG # 1.00 insns per cycle # 0.09 stalled cycles per insn ( +- 0.00% ) [40.00%] 802270380 branches:HG # 334.065 M/sec ( +- 0.00% ) [40.00%] 34278 branch-misses:HG # 0.00% of all branches ( +- 1.94% ) [40.00%] 2.414741497 seconds time elapsed ( +- 0.64% ) patched, without CONFIG_PAGE_TABLE_ISOLATION=n Performance counter stats for './bench' (5 runs): 2587.026405 task-clock:HG # 1.000 CPUs utilized ( +- 0.01% ) 28 context-switches:HG # 0.011 K/sec ( +- 5.95% ) 0 cpu-migrations:HG # 0.000 K/sec 31 page-faults:HG # 0.012 K/sec ( +- 1.21% ) 4462401079 cycles:HG # 1.725 GHz ( +- 0.01% ) [39.98%] 388646121 stalled-cycles-frontend:HG # 8.71% frontend cycles idle ( +- 0.05% ) [40.01%] stalled-cycles-backend:HG 4283638646 instructions:HG # 0.96 insns per cycle # 0.09 stalled cycles per insn ( +- 0.00% ) [40.03%] 822484311 branches:HG # 317.927 M/sec ( +- 0.00% ) [40.01%] 39372 branch-misses:HG # 0.00% of all branches ( +- 2.33% ) [39.98%] 2.587818354 seconds time elapsed ( +- 0.01% ) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 7DAF06B0005 for ; Thu, 25 Jan 2018 12:10:55 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id r9so4271096wme.8 for ; Thu, 25 Jan 2018 09:10:55 -0800 (PST) Received: from fuzix.org (www.llwyncelyn.cymru. [82.70.14.225]) by mx.google.com with ESMTPS id p203si1142247wmb.197.2018.01.25.09.10.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jan 2018 09:10:54 -0800 (PST) Date: Thu, 25 Jan 2018 17:09:25 +0000 From: Alan Cox Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180125170925.1d72d587@alans-desktop> In-Reply-To: <20180122085625.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <20180122085625.GE28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Nadav Amit , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de On Mon, 22 Jan 2018 09:56:25 +0100 Joerg Roedel wrote: > Hey Nadav, > > On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > > It does seem that segmentation provides sufficient protection from Meltdown. > > Thanks for testing this, if this turns out to be true for all affected > uarchs it would be a great and better way of protection than enabling > PTI. > > But I'd like an official statement from Intel on that one, as their > recommended fix is still to use PTI. It is: we don't think segmentation works on all processors as a defence. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 06FC96B0005 for ; Thu, 25 Jan 2018 17:09:46 -0500 (EST) Received: by mail-pf0-f197.google.com with SMTP id e26so6931348pfi.15 for ; Thu, 25 Jan 2018 14:09:45 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 1-v6sor1224284plv.17.2018.01.25.14.09.43 for (Google Transport Security); Thu, 25 Jan 2018 14:09:44 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <20180124185800.GA11515@shrek.podlesie.net> Date: Thu, 25 Jan 2018 14:09:40 -0800 Content-Transfer-Encoding: 7bit Message-Id: <67E8EB67-EB60-441E-BDFB-521F3D431400@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180124185800.GA11515@shrek.podlesie.net> Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Mazur Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli Krzysztof Mazur wrote: > On Tue, Jan 16, 2018 at 05:36:43PM +0100, Joerg Roedel wrote: >> From: Joerg Roedel >> >> Hi, >> >> here is my current WIP code to enable PTI on x86-32. It is >> still in a pretty early state, but it successfully boots my >> KVM guest with PAE and with legacy paging. The existing PTI >> code for x86-64 already prepares a lot of the stuff needed >> for 32 bit too, thanks for that to all the people involved >> in its development :) > > Hi, > > I've waited for this patches for a long time, until I've tried to > exploit meltdown on some old 32-bit CPUs and failed. Pentium M > seems to speculatively execute the second load with eax > always equal to 0: > > movzx (%[addr]), %%eax > shl $12, %%eax > movzx (%[target], %%eax), %%eax > > And on Pentium 4-based Xeon the second load seems to be never executed, > even without shift (shifts are slow on some or all Pentium 4's). Maybe > not all P6 and Netbursts CPUs are affected, but I'm not sure. Maybe the > kernel, at least on 32-bit, should try to exploit meltdown to test if > the CPU is really affected. The PoC apparently does not work with 3GB of memory or more on 32-bit. Does you setup has more? Can you try the attack while setting max_addr=1G ? Thanks, Nadav -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f72.google.com (mail-lf0-f72.google.com [209.85.215.72]) by kanga.kvack.org (Postfix) with ESMTP id 7251E6B0009 for ; Fri, 26 Jan 2018 04:28:40 -0500 (EST) Received: by mail-lf0-f72.google.com with SMTP id r20so1479217lfr.4 for ; Fri, 26 Jan 2018 01:28:40 -0800 (PST) Received: from shrek.podlesie.net (shrek-3s.podlesie.net. [2a00:13a0:3010::1]) by mx.google.com with ESMTP id v9si1748525ljb.392.2018.01.26.01.28.37 for ; Fri, 26 Jan 2018 01:28:38 -0800 (PST) Date: Fri, 26 Jan 2018 10:28:36 +0100 From: Krzysztof Mazur Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180126092836.GA11003@shrek.podlesie.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180124185800.GA11515@shrek.podlesie.net> <67E8EB67-EB60-441E-BDFB-521F3D431400@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <67E8EB67-EB60-441E-BDFB-521F3D431400@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Nadav Amit Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli On Thu, Jan 25, 2018 at 02:09:40PM -0800, Nadav Amit wrote: > The PoC apparently does not work with 3GB of memory or more on 32-bit. Does > you setup has more? Can you try the attack while setting max_addr=1G ? No, I tested on: Pentium M (Dothan): 1.5 GB RAM, PAE for NX, 2GB/2GB split CONFIG_NOHIGHMEM=y CONFIG_VMSPLIT_2G=y CONFIG_PAGE_OFFSET=0x80000000 CONFIG_X86_PAE=y and Xeon (Pentium 4): 2 GB RAM, no PAE, 1.75GB/2.25GB split CONFIG_NOHIGHMEM=y CONFIG_VMSPLIT_2G_OPT=y CONFIG_PAGE_OFFSET=0x78000000 Now I'm testing with standard settings on Pentium M: 1.5 GB RAM, no PAE, 3GB/1GB split, ~890 MB RAM available CONFIG_NOHIGHMEM=y CONFIG_PAGE_OFFSET=0xc0000000 CONFIG_X86_PAE=n and it still does not work. reliability from https://github.com/IAIK/meltdown reports 0.38% (1/256 = 0.39%, "true" random), and other libkdump tools does not work. https://github.com/paboldin/meltdown-exploit (on linux_proc_banner symbol) reports: cached = 46, uncached = 515, threshold 153 read c0897020 = ff (score=0/1000) read c0897021 = ff (score=0/1000) read c0897022 = ff (score=0/1000) read c0897023 = ff (score=0/1000) read c0897024 = ff (score=0/1000) NOT VULNERABLE and my exploit with: for (i = 0; i < 256; i++) { unsigned char *px = p + (i << 12); t = rdtsc(); readb(px); t = rdtsc() - t; if (t < 100) printf("%02x %lld\n", i, t); } loop returns only "00 45". When I change the exploit code (now based on paboldin code to be sure) to: movzx (%[addr]), %%eax movl $0xaa, %%eax shl $12, %%eax movzx (%[target], %%eax), %%eax I always get "0xaa 51", so the CPU is speculatively executing the second load with (0xaa << 12) in eax, and without the movl instruction, eax seems to be always 0. I even tried to remove the shift: movzx (%[addr]), %%eax movzx (%[target], %%eax), %%eax and I've been reading known value (from /dev/mem, for instance 0x20), I've modified target array offset, and the CPU is still touching "wrong" cacheline, eax == 0 instead of 0x20. I've also tested movl instead of movzx (with and 0xff). On Core 2 Quad in 64-bit mode everything works as expected, vulnerable to Meltdown (I did not test it in 32-bit mode). I don't have any Core "1" to test. On that Pentium M syscall slowdown caused by PTI is huge, 7.5 times slower (7 times compared to patched kernel with disabled PTI), on Skylake with PCID the same trivial benchmark is "only" 3.5 times slower (and 5.2 times slower without PCID). Krzysiek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id C284F6B0029 for ; Fri, 26 Jan 2018 07:36:23 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id j13so250592wmh.3 for ; Fri, 26 Jan 2018 04:36:23 -0800 (PST) Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id 5si1357509edb.158.2018.01.26.04.36.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Jan 2018 04:36:18 -0800 (PST) Date: Fri, 26 Jan 2018 13:36:16 +0100 From: Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180126123616.GK28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <20180122085625.GE28161@8bytes.org> <20180125170925.1d72d587@alans-desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180125170925.1d72d587@alans-desktop> Sender: owner-linux-mm@kvack.org List-ID: To: Alan Cox Cc: Nadav Amit , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Hi Alan, On Thu, Jan 25, 2018 at 05:09:25PM +0000, Alan Cox wrote: > On Mon, 22 Jan 2018 09:56:25 +0100 > Joerg Roedel wrote: > > > Hey Nadav, > > > > On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > > > It does seem that segmentation provides sufficient protection from Meltdown. > > > > Thanks for testing this, if this turns out to be true for all affected > > uarchs it would be a great and better way of protection than enabling > > PTI. > > > > But I'd like an official statement from Intel on that one, as their > > recommended fix is still to use PTI. > > It is: we don't think segmentation works on all processors as a defence. Thanks for checking and the official statement. So the official mitigation recommendation is still to use PTI. Regards, Joerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751491AbeAPQsi (ORCPT + 1 other); Tue, 16 Jan 2018 11:48:38 -0500 Received: from 8bytes.org ([81.169.241.247]:54730 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751295AbeAPQsb (ORCPT ); Tue, 16 Jan 2018 11:48:31 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 08/16] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled Date: Tue, 16 Jan 2018 17:36:51 +0100 Message-Id: <1516120619-1159-9-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Allocate a kernel and a user page-table root when PTI is enabled. Also allocate a full page per root for PAEm because otherwise the bit to flip in cr3 to switch between them would be non-constant, which creates a lot of hassle. Keep that for a later optimization. Signed-off-by: Joerg Roedel --- arch/x86/kernel/head_32.S | 23 ++++++++++++++++++----- arch/x86/mm/pgtable.c | 11 ++++++----- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index c29020907886..fc550559bf58 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -512,28 +512,41 @@ ENTRY(initial_code) ENTRY(setup_once_ref) .long setup_once +#ifdef CONFIG_PAGE_TABLE_ISOLATION +#define PGD_ALIGN (2 * PAGE_SIZE) +#define PTI_USER_PGD_FILL 1024 +#else +#define PGD_ALIGN (PAGE_SIZE) +#define PTI_USER_PGD_FILL 0 +#endif /* * BSS section */ __PAGE_ALIGNED_BSS - .align PAGE_SIZE + .align PGD_ALIGN #ifdef CONFIG_X86_PAE .globl initial_pg_pmd initial_pg_pmd: .fill 1024*KPMDS,4,0 + .fill PTI_USER_PGD_FILL,4,0 #else .globl initial_page_table initial_page_table: .fill 1024,4,0 + .fill PTI_USER_PGD_FILL,4,0 #endif + .align PGD_ALIGN initial_pg_fixmap: .fill 1024,4,0 -.globl empty_zero_page -empty_zero_page: - .fill 4096,1,0 + .fill PTI_USER_PGD_FILL,4,0 .globl swapper_pg_dir + .align PGD_ALIGN swapper_pg_dir: .fill 1024,4,0 + .fill PTI_USER_PGD_FILL,4,0 +.globl empty_zero_page +empty_zero_page: + .fill 4096,1,0 EXPORT_SYMBOL(empty_zero_page) /* @@ -542,7 +555,7 @@ EXPORT_SYMBOL(empty_zero_page) #ifdef CONFIG_X86_PAE __PAGE_ALIGNED_DATA /* Page-aligned for the benefit of paravirt? */ - .align PAGE_SIZE + .align PGD_ALIGN ENTRY(initial_page_table) .long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 004abf9ebf12..48abefd95924 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -313,7 +313,7 @@ static int __init pgd_cache_init(void) * When PAE kernel is running as a Xen domain, it does not use * shared kernel pmd. And this requires a whole page for pgd. */ - if (!SHARED_KERNEL_PMD) + if (static_cpu_has(X86_FEATURE_PTI) || !SHARED_KERNEL_PMD) return 0; /* @@ -337,8 +337,9 @@ static inline pgd_t *_pgd_alloc(void) * If no SHARED_KERNEL_PMD, PAE kernel is running as a Xen domain. * We allocate one page for pgd. */ - if (!SHARED_KERNEL_PMD) - return (pgd_t *)__get_free_page(PGALLOC_GFP); + if (static_cpu_has(X86_FEATURE_PTI) || !SHARED_KERNEL_PMD) + return (pgd_t *)__get_free_pages(PGALLOC_GFP, + PGD_ALLOCATION_ORDER); /* * Now PAE kernel is not running as a Xen domain. We can allocate @@ -349,8 +350,8 @@ static inline pgd_t *_pgd_alloc(void) static inline void _pgd_free(pgd_t *pgd) { - if (!SHARED_KERNEL_PMD) - free_page((unsigned long)pgd); + if (static_cpu_has(X86_FEATURE_PTI) || !SHARED_KERNEL_PMD) + free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER); else kmem_cache_free(pgd_cache, pgd); } -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751451AbeAPQsg (ORCPT + 1 other); Tue, 16 Jan 2018 11:48:36 -0500 Received: from 8bytes.org ([81.169.241.247]:54718 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751031AbeAPQsb (ORCPT ); Tue, 16 Jan 2018 11:48:31 -0500 X-Greylist: delayed 551 seconds by postgrey-1.27 at vger.kernel.org; Tue, 16 Jan 2018 11:48:30 EST From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 14/16] x86/mm/legacy: Populate the user page-table with user pgd's Date: Tue, 16 Jan 2018 17:36:57 +0100 Message-Id: <1516120619-1159-15-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Also populate the user-spage pgd's in the user page-table. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable-2level.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h index 685ffe8a0eaf..d96486d23c58 100644 --- a/arch/x86/include/asm/pgtable-2level.h +++ b/arch/x86/include/asm/pgtable-2level.h @@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte) static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) { +#ifdef CONFIG_PAGE_TABLE_ISOLATION + pmd.pud.p4d.pgd = pti_set_user_pgd(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd); +#endif *pmdp = pmd; } -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751545AbeAPQsx (ORCPT + 1 other); Tue, 16 Jan 2018 11:48:53 -0500 Received: from 8bytes.org ([81.169.241.247]:54752 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751436AbeAPQsg (ORCPT ); Tue, 16 Jan 2018 11:48:36 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 09/16] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 Date: Tue, 16 Jan 2018 17:36:52 +0100 Message-Id: <1516120619-1159-10-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Cloning on the P4D level would clone the complete kernel address space into the user-space page-tables for PAE kernels. Cloning on PMD level is fine for PAE and legacy paging. Signed-off-by: Joerg Roedel --- arch/x86/mm/pti.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index ce38f165489b..20be21301a59 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -308,6 +308,7 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) } } +#ifdef CONFIG_X86_64 /* * Clone a single p4d (i.e. a top-level entry on 4-level systems and a * next-level entry on 5-level systems. @@ -322,13 +323,29 @@ static void __init pti_clone_p4d(unsigned long addr) kernel_p4d = p4d_offset(kernel_pgd, addr); *user_p4d = *kernel_p4d; } +#endif /* * Clone the CPU_ENTRY_AREA into the user space visible page table. */ static void __init pti_clone_user_shared(void) { +#ifdef CONFIG_X86_32 + /* + * On 32 bit PAE systems with 1GB of Kernel address space there is only + * one pgd/p4d for the whole kernel. Cloning that would map the whole + * address space into the user page-tables, making PTI useless. So clone + * the page-table on the PMD level to prevent that. + */ + unsigned long start, end; + + start = CPU_ENTRY_AREA_BASE; + end = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES); + + pti_clone_pmds(start, end, _PAGE_GLOBAL); +#else pti_clone_p4d(CPU_ENTRY_AREA_BASE); +#endif } /* -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751575AbeAPQsz (ORCPT + 1 other); Tue, 16 Jan 2018 11:48:55 -0500 Received: from 8bytes.org ([81.169.241.247]:54800 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751434AbeAPQsg (ORCPT ); Tue, 16 Jan 2018 11:48:36 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Date: Tue, 16 Jan 2018 17:36:49 +0100 Message-Id: <1516120619-1159-7-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Reserve 2MB/4MB of address space for mapping the LDT to user-space. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable_32_types.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index ce245b0cdfca..3c30a7fcae68 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -47,9 +47,12 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */ #define CPU_ENTRY_AREA_BASE \ ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK) -#define PKMAP_BASE \ +#define LDT_BASE_ADDR \ ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK) +#define PKMAP_BASE \ + ((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK) + #ifdef CONFIG_HIGHMEM # define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE) #else -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751615AbeAPQs4 (ORCPT + 1 other); Tue, 16 Jan 2018 11:48:56 -0500 Received: from 8bytes.org ([81.169.241.247]:54802 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751428AbeAPQsg (ORCPT ); Tue, 16 Jan 2018 11:48:36 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 04/16] x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Date: Tue, 16 Jan 2018 17:36:47 +0100 Message-Id: <1516120619-1159-5-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Move it out of the X86_64 specific processor defines so that its visible for 32bit too. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/processor-flags.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index 625a52a5594f..02c2cbda4a74 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -39,10 +39,6 @@ #define CR3_PCID_MASK 0xFFFull #define CR3_NOFLUSH BIT_ULL(63) -#ifdef CONFIG_PAGE_TABLE_ISOLATION -# define X86_CR3_PTI_PCID_USER_BIT 11 -#endif - #else /* * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save @@ -53,4 +49,8 @@ #define CR3_NOFLUSH 0 #endif +#ifdef CONFIG_PAGE_TABLE_ISOLATION +# define X86_CR3_PTI_PCID_USER_BIT 11 +#endif + #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751508AbeAPQsw (ORCPT + 1 other); Tue, 16 Jan 2018 11:48:52 -0500 Received: from 8bytes.org ([81.169.241.247]:54828 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751446AbeAPQsg (ORCPT ); Tue, 16 Jan 2018 11:48:36 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's Date: Tue, 16 Jan 2018 17:36:55 +0100 Message-Id: <1516120619-1159-13-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel This is the last part of the PAE page-table setup for PAE before we can add the CR3 switch to the entry code. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable-3level.h | 3 +++ arch/x86/mm/pti.c | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index bc4af5453802..910f0b35370e 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) static inline void native_set_pud(pud_t *pudp, pud_t pud) { +#ifdef CONFIG_PAGE_TABLE_ISOLATION + pud.p4d.pgd = pti_set_user_pgd(&pudp->p4d.pgd, pud.p4d.pgd); +#endif set_64bit((unsigned long long *)(pudp), native_pud_val(pud)); } diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 6b6bfd13350e..a561b5625d6c 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -122,6 +122,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) */ kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd; +#ifdef CONFIG_X86_64 /* * If this is normal user memory, make it NX in the kernel * pagetables so that, if we somehow screw up and return to @@ -134,10 +135,16 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) * may execute from it * - we don't have NX support * - we're clearing the PGD (i.e. the new pgd is not present). + * - We run on a 32 bit kernel. 2-level paging doesn't support NX at + * all and PAE paging does not support it on the PGD level. We can + * set it in the PMD level there in the future, but that means we + * need to unshare the PMDs between the kernel and the user + * page-tables. */ if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && (__supported_pte_mask & _PAGE_NX)) pgd.pgd |= _PAGE_NX; +#endif /* return the copy of the PGD we want the kernel to use: */ return pgd; -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751824AbeAPQvD (ORCPT + 1 other); Tue, 16 Jan 2018 11:51:03 -0500 Received: from 8bytes.org ([81.169.241.247]:54792 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751390AbeAPQse (ORCPT ); Tue, 16 Jan 2018 11:48:34 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 13/16] x86/mm/pti: Add an overflow check to pti_clone_pmds() Date: Tue, 16 Jan 2018 17:36:56 +0100 Message-Id: <1516120619-1159-14-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel The addr counter will overflow if we clone the last PMD of the address space, resulting in an endless loop. Check for that and bail out of the loop when it happens. Signed-off-by: Joerg Roedel --- arch/x86/mm/pti.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index a561b5625d6c..faea5faeddc5 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -293,6 +293,10 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) p4d_t *p4d; pud_t *pud; + /* Overflow check */ + if (addr < start) + break; + pgd = pgd_offset_k(addr); if (WARN_ON(pgd_none(*pgd))) return; -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751865AbeAPQvI (ORCPT + 1 other); Tue, 16 Jan 2018 11:51:08 -0500 Received: from 8bytes.org ([81.169.241.247]:54790 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751381AbeAPQse (ORCPT ); Tue, 16 Jan 2018 11:48:34 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 01/16] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack Date: Tue, 16 Jan 2018 17:36:44 +0100 Message-Id: <1516120619-1159-2-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel The stack addresss doesn't need to be stored in tss.sp0 if we switch manually like on sysenter. Rename the offset so that it still makes sense when we its location. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 2 +- arch/x86/kernel/asm-offsets_32.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index a1f28a54f23a..eb8c5615777b 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -401,7 +401,7 @@ ENTRY(xen_sysenter_target) * 0(%ebp) arg6 */ ENTRY(entry_SYSENTER_32) - movl TSS_sysenter_sp0(%esp), %esp + movl TSS_sysenter_stack(%esp), %esp .Lsysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp /* pt_regs->sp (stashed in bp) */ diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index fa1261eefa16..654229bac2fc 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -47,7 +47,7 @@ void foo(void) BLANK(); /* Offset from the sysenter stack to tss.sp0 */ - DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - offsetofend(struct cpu_entry_area, entry_stack_page.stack)); #ifdef CONFIG_CC_STACKPROTECTOR -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751843AbeAPQvH (ORCPT + 1 other); Tue, 16 Jan 2018 11:51:07 -0500 Received: from 8bytes.org ([81.169.241.247]:54788 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751385AbeAPQse (ORCPT ); Tue, 16 Jan 2018 11:48:34 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [RFC PATCH 00/16] PTI support for x86-32 Date: Tue, 16 Jan 2018 17:36:43 +0100 Message-Id: <1516120619-1159-1-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Hi, here is my current WIP code to enable PTI on x86-32. It is still in a pretty early state, but it successfully boots my KVM guest with PAE and with legacy paging. The existing PTI code for x86-64 already prepares a lot of the stuff needed for 32 bit too, thanks for that to all the people involved in its development :) The patches are split as follows: - 1-3 contain the entry-code changes to enter and exit the kernel via the sysenter trampoline stack. - 4-7 are fixes to get the code compile on 32 bit with CONFIG_PAGE_TABLE_ISOLATION=y. - 8-14 adapt the existing PTI code to work properly on 32 bit and add the needed parts to 32 bit page-table code. - 15 switches PTI on by adding the CR3 switches to kernel entry/exit. - 16 enables the Kconfig for all of X86 The code has not run on bare-metal yet, I'll test that in the next days once I setup a 32 bit box again. I also havn't tested Wine and DosEMU yet, so this might also be broken. With that post I'd like to ask for all kinds of constructive feedback on the approaches I have taken and of course the many things I broke with it :) One of the things that are surely broken is XEN_PV support. I'd appreciate any help with testing and bugfixing on that front. So please review and let me know your thoughts. Thanks, Joerg Joerg Roedel (16): x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack x86/entry/32: Enter the kernel via trampoline stack x86/entry/32: Leave the kernel via the trampoline stack x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h x86/mm/ldt: Reserve high address-space range for the LDT x86/mm: Move two more functions from pgtable_64.h to pgtable.h x86/pgtable/32: Allocate 8k page-tables when PTI is enabled x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 x86/mm/pti: Populate valid user pud entries x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h x86/mm/pae: Populate the user page-table with user pgd's x86/mm/pti: Add an overflow check to pti_clone_pmds() x86/mm/legacy: Populate the user page-table with user pgd's x86/entry/32: Switch between kernel and user cr3 on entry/exit x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 arch/x86/entry/entry_32.S | 170 +++++++++++++++++++++++++++++--- arch/x86/include/asm/pgtable-2level.h | 3 + arch/x86/include/asm/pgtable-3level.h | 3 + arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ arch/x86/include/asm/pgtable_32_types.h | 5 +- arch/x86/include/asm/pgtable_64.h | 85 ---------------- arch/x86/include/asm/processor-flags.h | 8 +- arch/x86/include/asm/switch_to.h | 6 +- arch/x86/kernel/asm-offsets_32.c | 5 +- arch/x86/kernel/cpu/common.c | 5 +- arch/x86/kernel/head_32.S | 23 ++++- arch/x86/kernel/process.c | 2 - arch/x86/kernel/process_32.c | 6 ++ arch/x86/mm/pgtable.c | 11 ++- arch/x86/mm/pti.c | 34 ++++++- security/Kconfig | 2 +- 16 files changed, 333 insertions(+), 123 deletions(-) -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751685AbeAPQvC (ORCPT + 1 other); Tue, 16 Jan 2018 11:51:02 -0500 Received: from 8bytes.org ([81.169.241.247]:54794 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751393AbeAPQse (ORCPT ); Tue, 16 Jan 2018 11:48:34 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h Date: Tue, 16 Jan 2018 17:36:50 +0100 Message-Id: <1516120619-1159-8-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel These two functions are required for PTI on 32 bit: * pgdp_maps_userspace() * pgd_large() Also re-implement pgdp_maps_userspace() so that it will work on 64 and 32 bit kernels. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable.h | 16 ++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 15 --------------- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 0a9f746cbdc1..abafe4d7fd3e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1109,6 +1109,22 @@ static inline int pud_write(pud_t pud) return pud_flags(pud) & _PAGE_RW; } +/* + * Page table pages are page-aligned. The lower half of the top + * level is used for userspace and the top half for the kernel. + * + * Returns true for parts of the PGD that map userspace and + * false for the parts that map the kernel. + */ +static inline bool pgdp_maps_userspace(void *__ptr) +{ + unsigned long ptr = (unsigned long)__ptr; + + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); +} + +static inline int pgd_large(pgd_t pgd) { return 0; } + #ifdef CONFIG_PAGE_TABLE_ISOLATION /* * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 58d7f10e937d..3c5a73c8bb50 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,20 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } -/* - * Page table pages are page-aligned. The lower half of the top - * level is used for userspace and the top half for the kernel. - * - * Returns true for parts of the PGD that map userspace and - * false for the parts that map the kernel. - */ -static inline bool pgdp_maps_userspace(void *__ptr) -{ - unsigned long ptr = (unsigned long)__ptr; - - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); -} - #ifdef CONFIG_PAGE_TABLE_ISOLATION pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); @@ -208,7 +194,6 @@ extern void sync_global_pgds(unsigned long start, unsigned long end); /* * Level 4 access. */ -static inline int pgd_large(pgd_t pgd) { return 0; } #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE) /* PUD - Level3 access */ -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751909AbeAPQxH (ORCPT + 1 other); Tue, 16 Jan 2018 11:53:07 -0500 Received: from merlin.infradead.org ([205.233.59.134]:53720 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751284AbeAPQxE (ORCPT ); Tue, 16 Jan 2018 11:53:04 -0500 Date: Tue, 16 Jan 2018 17:52:13 +0100 From: Peter Zijlstra To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180116165213.GF2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516120619-1159-7-git-send-email-joro@8bytes.org> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > From: Joerg Roedel > > Reserve 2MB/4MB of address space for mapping the LDT to > user-space. LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives 64K*2*64=8M > 2M. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751968AbeAPQzh (ORCPT + 1 other); Tue, 16 Jan 2018 11:55:37 -0500 Received: from 8bytes.org ([81.169.241.247]:54758 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751366AbeAPQsd (ORCPT ); Tue, 16 Jan 2018 11:48:33 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries Date: Tue, 16 Jan 2018 17:36:53 +0100 Message-Id: <1516120619-1159-11-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel With PAE paging we don't have PGD and P4D levels in the page-table, instead the PUD level is the highest one. In PAE page-tables at the top-level most bits we usually set with _KERNPG_TABLE are reserved, resulting in a #GP when they are loaded by the processor. Work around this by populating PUD entries in the user page-table only with _PAGE_PRESENT set. I am pretty sure there is a cleaner way to do this, but until I find it use this #ifdef solution. Signed-off-by: Joerg Roedel --- arch/x86/mm/pti.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 20be21301a59..6b6bfd13350e 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -202,8 +202,12 @@ static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address) unsigned long new_pmd_page = __get_free_page(gfp); if (!new_pmd_page) return NULL; - +#ifdef CONFIG_X86_PAE + /* TODO: There must be a cleaner way to do this */ + set_pud(pud, __pud(_PAGE_PRESENT | __pa(new_pmd_page))); +#else set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page))); +#endif } return pmd_offset(pud, address); -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751913AbeAPQzf (ORCPT + 1 other); Tue, 16 Jan 2018 11:55:35 -0500 Received: from 8bytes.org ([81.169.241.247]:54760 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751368AbeAPQsd (ORCPT ); Tue, 16 Jan 2018 11:48:33 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Date: Tue, 16 Jan 2018 17:36:46 +0100 Message-Id: <1516120619-1159-4-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Switch back to the trampoline stack before returning to userspace. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 58 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/asm-offsets_32.c | 1 + 2 files changed, 59 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 5a7bdb73be9f..14018eeb11c3 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -263,6 +263,61 @@ .endm /* + * Switch back from the kernel stack to the entry stack. + * + * iret_frame > 0 adds code to copie over an iret frame from the old to + * the new stack. It also adds a check which bails out if + * we are not returning to user-space. + * + * This macro is allowed not modify eflags when iret_frame == 0. + */ +.macro SWITCH_TO_ENTRY_STACK iret_frame=0 + .if \iret_frame > 0 + /* Are we returning to userspace? */ + testb $3, 4(%esp) /* return CS */ + jz .Lend_\@ + .endif + + /* + * We run with user-%fs already loaded from pt_regs, so we don't + * have access to per_cpu data anymore, and there is no swapgs + * equivalent on x86_32. + * We work around this by loading the kernel-%fs again and + * reading the entry stack address from there. Then we restore + * the user-%fs and return. + */ + pushl %fs + pushl %edi + + /* Re-load kernel-%fs, after that we can use PER_CPU_VAR */ + movl $(__KERNEL_PERCPU), %edi + movl %edi, %fs + + /* Save old stack pointer to copy the return frame over if needed */ + movl %esp, %edi + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %esp + + /* Now we are on the entry stack */ + + .if \iret_frame > 0 + /* Stack frame: ss, esp, eflags, cs, eip, fs, edi */ + pushl 6*4(%edi) /* ss */ + pushl 5*4(%edi) /* esp */ + pushl 4*4(%edi) /* eflags */ + pushl 3*4(%edi) /* cs */ + pushl 2*4(%edi) /* eip */ + .endif + + pushl 4(%edi) /* fs */ + + /* Restore user %edi and user %fs */ + movl (%edi), %edi + popl %fs + +.Lend_\@: +.endm + +/* * %eax: prev task * %edx: next task */ @@ -512,6 +567,8 @@ ENTRY(entry_SYSENTER_32) btr $X86_EFLAGS_IF_BIT, (%esp) popfl + SWITCH_TO_ENTRY_STACK + /* * Return back to the vDSO, which will pop ecx and edx. * Don't bother with DS and ES (they already contain __USER_DS). @@ -601,6 +658,7 @@ restore_all: .Lrestore_nocheck: RESTORE_REGS 4 # skip orig_eax/error_code .Lirq_return: + SWITCH_TO_ENTRY_STACK iret_frame=1 INTERRUPT_RETURN .section .fixup, "ax" diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index 7270dd834f4b..b628f898edd2 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -50,6 +50,7 @@ void foo(void) DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - offsetofend(struct cpu_entry_area, entry_stack_page.stack)); + OFFSET(TSS_sp0, tss_struct, x86_tss.sp0); OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); #ifdef CONFIG_CC_STACKPROTECTOR -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751972AbeAPQ4U (ORCPT + 1 other); Tue, 16 Jan 2018 11:56:20 -0500 Received: from 8bytes.org ([81.169.241.247]:54752 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751353AbeAPQsd (ORCPT ); Tue, 16 Jan 2018 11:48:33 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 15/16] x86/entry/32: Switch between kernel and user cr3 on entry/exit Date: Tue, 16 Jan 2018 17:36:58 +0100 Message-Id: <1516120619-1159-16-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Add the cr3 switches between the kernel and the user page-table when PTI is enabled. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 14018eeb11c3..6a1d9f1e1f89 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -221,6 +221,25 @@ POP_GS_EX .endm +#define PTI_SWITCH_MASK (1 << PAGE_SHIFT) + +.macro SWITCH_TO_KERNEL_CR3 + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + movl %cr3, %edi + andl $(~PTI_SWITCH_MASK), %edi + movl %edi, %cr3 +.Lend_\@: +.endm + +.macro SWITCH_TO_USER_CR3 + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + mov %cr3, %edi + /* Flip the PGD to the user version */ + orl $(PTI_SWITCH_MASK), %edi + mov %edi, %cr3 +.Lend_\@: +.endm + /* * Switch from the entry-trampline stack to the kernel stack of the * running task. @@ -240,6 +259,7 @@ .endif pushl %edi + SWITCH_TO_KERNEL_CR3 movl %esp, %edi /* @@ -309,9 +329,12 @@ .endif pushl 4(%edi) /* fs */ + pushl (%edi) /* edi */ + + SWITCH_TO_USER_CR3 /* Restore user %edi and user %fs */ - movl (%edi), %edi + popl %edi popl %fs .Lend_\@: -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751781AbeAPQ4S (ORCPT + 1 other); Tue, 16 Jan 2018 11:56:18 -0500 Received: from 8bytes.org ([81.169.241.247]:54750 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751345AbeAPQsd (ORCPT ); Tue, 16 Jan 2018 11:48:33 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 05/16] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Date: Tue, 16 Jan 2018 17:36:48 +0100 Message-Id: <1516120619-1159-6-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Make them available on 32 bit and clone_pgd_range() happy. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable.h | 49 +++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 49 --------------------------------------- 2 files changed, 49 insertions(+), 49 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e42b8943cb1a..0a9f746cbdc1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1109,6 +1109,55 @@ static inline int pud_write(pud_t pud) return pud_flags(pud) & _PAGE_RW; } +#ifdef CONFIG_PAGE_TABLE_ISOLATION +/* + * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages + * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and + * the user one is in the last 4k. To switch between them, you + * just need to flip the 12th bit in their addresses. + */ +#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT + +/* + * This generates better code than the inline assembly in + * __set_bit(). + */ +static inline void *ptr_set_bit(void *ptr, int bit) +{ + unsigned long __ptr = (unsigned long)ptr; + + __ptr |= BIT(bit); + return (void *)__ptr; +} +static inline void *ptr_clear_bit(void *ptr, int bit) +{ + unsigned long __ptr = (unsigned long)ptr; + + __ptr &= ~BIT(bit); + return (void *)__ptr; +} + +static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp) +{ + return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp) +{ + return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp) +{ + return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) +{ + return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); +} +#endif /* CONFIG_PAGE_TABLE_ISOLATION */ + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 81462e9a34f6..58d7f10e937d 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,55 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } -#ifdef CONFIG_PAGE_TABLE_ISOLATION -/* - * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages - * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and - * the user one is in the last 4k. To switch between them, you - * just need to flip the 12th bit in their addresses. - */ -#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT - -/* - * This generates better code than the inline assembly in - * __set_bit(). - */ -static inline void *ptr_set_bit(void *ptr, int bit) -{ - unsigned long __ptr = (unsigned long)ptr; - - __ptr |= BIT(bit); - return (void *)__ptr; -} -static inline void *ptr_clear_bit(void *ptr, int bit) -{ - unsigned long __ptr = (unsigned long)ptr; - - __ptr &= ~BIT(bit); - return (void *)__ptr; -} - -static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp) -{ - return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); -} - -static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp) -{ - return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); -} - -static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp) -{ - return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); -} - -static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) -{ - return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); -} -#endif /* CONFIG_PAGE_TABLE_ISOLATION */ - /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751547AbeAPQ5j (ORCPT + 1 other); Tue, 16 Jan 2018 11:57:39 -0500 Received: from 8bytes.org ([81.169.241.247]:54722 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751085AbeAPQsb (ORCPT ); Tue, 16 Jan 2018 11:48:31 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Date: Tue, 16 Jan 2018 17:36:45 +0100 Message-Id: <1516120619-1159-3-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Use the sysenter stack as a trampoline stack to enter the kernel. The sysenter stack is already in the cpu_entry_area and will be mapped to userspace when PTI is enabled. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 89 +++++++++++++++++++++++++++++++++++----- arch/x86/include/asm/switch_to.h | 6 +-- arch/x86/kernel/asm-offsets_32.c | 4 +- arch/x86/kernel/cpu/common.c | 5 ++- arch/x86/kernel/process.c | 2 - arch/x86/kernel/process_32.c | 6 +++ 6 files changed, 91 insertions(+), 21 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index eb8c5615777b..5a7bdb73be9f 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -222,6 +222,47 @@ .endm /* + * Switch from the entry-trampline stack to the kernel stack of the + * running task. + * + * nr_regs is the number of dwords to push from the entry stack to the + * task stack. If it is > 0 it expects an irq frame at the bottom of the + * stack. + * + * check_user != 0 it will add a check to only switch stacks if the + * kernel entry was from user-space. + */ +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 + + .if \check_user > 0 && \nr_regs > 0 + testb $3, (\nr_regs - 4)*4(%esp) /* CS */ + jz .Lend_\@ + .endif + + pushl %edi + movl %esp, %edi + + /* + * TSS_sysenter_stack is the offset from the bottom of the + * entry-stack + */ + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp + + /* Copy the registers over */ + .if \nr_regs > 0 + i = 0 + .rept \nr_regs + pushl (\nr_regs - i) * 4(%edi) + i = i + 1 + .endr + .endif + + mov (%edi), %edi + +.Lend_\@: +.endm + +/* * %eax: prev task * %edx: next task */ @@ -401,7 +442,9 @@ ENTRY(xen_sysenter_target) * 0(%ebp) arg6 */ ENTRY(entry_SYSENTER_32) - movl TSS_sysenter_stack(%esp), %esp + /* Kernel stack is empty */ + SWITCH_TO_KERNEL_STACK + .Lsysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp /* pt_regs->sp (stashed in bp) */ @@ -521,6 +564,10 @@ ENDPROC(entry_SYSENTER_32) ENTRY(entry_INT80_32) ASM_CLAC pushl %eax /* pt_regs->orig_ax */ + + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL pt_regs_ax=$-ENOSYS /* save rest */ /* @@ -655,6 +702,10 @@ END(irq_entries_start) common_interrupt: ASM_CLAC addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */ + + /* Stack layout: ss, esp, eflags, cs, eip, vector */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL ENCODE_FRAME_POINTER TRACE_IRQS_OFF @@ -663,16 +714,17 @@ common_interrupt: jmp ret_from_intr ENDPROC(common_interrupt) -#define BUILD_INTERRUPT3(name, nr, fn) \ -ENTRY(name) \ - ASM_CLAC; \ - pushl $~(nr); \ - SAVE_ALL; \ - ENCODE_FRAME_POINTER; \ - TRACE_IRQS_OFF \ - movl %esp, %eax; \ - call fn; \ - jmp ret_from_intr; \ +#define BUILD_INTERRUPT3(name, nr, fn) \ +ENTRY(name) \ + ASM_CLAC; \ + pushl $~(nr); \ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1; \ + SAVE_ALL; \ + ENCODE_FRAME_POINTER; \ + TRACE_IRQS_OFF \ + movl %esp, %eax; \ + call fn; \ + jmp ret_from_intr; \ ENDPROC(name) #define BUILD_INTERRUPT(name, nr) \ @@ -893,6 +945,9 @@ ENTRY(page_fault) END(page_fault) common_exception: + /* Stack layout: ss, esp, eflags, cs, eip, error_code, handler */ + SWITCH_TO_KERNEL_STACK nr_regs=7 check_user=1 + /* the function address is in %gs's slot on the stack */ pushl %fs pushl %es @@ -936,6 +991,10 @@ ENTRY(debug) */ ASM_CLAC pushl $-1 # mark this as an int + + /* Stack layout: ss, esp, eflags, cs, eip, $-1 */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL ENCODE_FRAME_POINTER xorl %edx, %edx # error code 0 @@ -971,6 +1030,10 @@ END(debug) */ ENTRY(nmi) ASM_CLAC + + /* Stack layout: ss, esp, eflags, cs, eip */ + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 + #ifdef CONFIG_X86_ESPFIX32 pushl %eax movl %ss, %eax @@ -1034,6 +1097,10 @@ END(nmi) ENTRY(int3) ASM_CLAC pushl $-1 # mark this as an int + + /* Stack layout: ss, esp, eflags, cs, eip, vector */ + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 + SAVE_ALL ENCODE_FRAME_POINTER TRACE_IRQS_OFF diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index eb5f7999a893..20e5f7ab8260 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) /* This is used when switching tasks or entering/exiting vm86 mode. */ static inline void update_sp0(struct task_struct *task) { - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ -#ifdef CONFIG_X86_32 - load_sp0(task->thread.sp0); -#else + /* sp0 always points to the entry trampoline stack, which is constant: */ if (static_cpu_has(X86_FEATURE_XENPV)) load_sp0(task_top_of_stack(task)); -#endif } #endif /* _ASM_X86_SWITCH_TO_H */ diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index 654229bac2fc..7270dd834f4b 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -47,9 +47,11 @@ void foo(void) BLANK(); /* Offset from the sysenter stack to tss.sp0 */ - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - offsetofend(struct cpu_entry_area, entry_stack_page.stack)); + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); + #ifdef CONFIG_CC_STACKPROTECTOR BLANK(); OFFSET(stack_canary_offset, stack_canary, canary); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ef29ad001991..20a71c914e59 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1649,11 +1649,12 @@ void cpu_init(void) enter_lazy_tlb(&init_mm, curr); /* - * Initialize the TSS. Don't bother initializing sp0, as the initial - * task never enters user mode. + * Initialize the TSS. sp0 points to the entry trampoline stack + * regardless of what task is running. */ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); + load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1)); load_mm_ldt(&init_mm); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 832a6acd730f..a9950946b263 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -57,14 +57,12 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = { */ .sp0 = (1UL << (BITS_PER_LONG-1)) + 1, -#ifdef CONFIG_X86_64 /* * .sp1 is cpu_current_top_of_stack. The init task never * runs user code, but cpu_current_top_of_stack should still * be well defined before the first context switch. */ .sp1 = TOP_OF_INIT_STACK, -#endif #ifdef CONFIG_X86_32 .ss0 = __KERNEL_DS, diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 5224c6099184..452eeac00b80 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -292,6 +292,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + THREAD_SIZE); + /* + * TODO: Find a way to let cpu_current_top_of_stack point to + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with + * iret exceptions. + */ + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); /* * Restore %gs if needed (which is common) -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751606AbeAPQ5k (ORCPT + 1 other); Tue, 16 Jan 2018 11:57:40 -0500 Received: from 8bytes.org ([81.169.241.247]:54746 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751309AbeAPQsb (ORCPT ); Tue, 16 Jan 2018 11:48:31 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 11/16] x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h Date: Tue, 16 Jan 2018 17:36:54 +0100 Message-Id: <1516120619-1159-12-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel There it is also usable from 32 bit code. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/pgtable.h | 23 +++++++++++++++++++++++ arch/x86/include/asm/pgtable_64.h | 21 --------------------- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index abafe4d7fd3e..248721971532 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -618,8 +618,31 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size, pmd_t *populate_extra_pmd(unsigned long vaddr); pte_t *populate_extra_pte(unsigned long vaddr); + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); + +/* + * Take a PGD location (pgdp) and a pgd value that needs to be set there. + * Populates the user and returns the resulting PGD that must be set in + * the kernel copy of the page tables. + */ +static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + if (!static_cpu_has(X86_FEATURE_PTI)) + return pgd; + return __pti_set_user_pgd(pgdp, pgd); +} +#else /* CONFIG_PAGE_TABLE_ISOLATION */ +static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + return pgd; +} +#endif /* CONFIG_PAGE_TABLE_ISOLATION */ + #endif /* __ASSEMBLY__ */ + #ifdef CONFIG_X86_32 # include #else diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 3c5a73c8bb50..50a02a32a0b3 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,27 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } -#ifdef CONFIG_PAGE_TABLE_ISOLATION -pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); - -/* - * Take a PGD location (pgdp) and a pgd value that needs to be set there. - * Populates the user and returns the resulting PGD that must be set in - * the kernel copy of the page tables. - */ -static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) -{ - if (!static_cpu_has(X86_FEATURE_PTI)) - return pgd; - return __pti_set_user_pgd(pgdp, pgd); -} -#else -static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) -{ - return pgd; -} -#endif - static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d) { #if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL) -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751960AbeAPQ5o (ORCPT + 1 other); Tue, 16 Jan 2018 11:57:44 -0500 Received: from 8bytes.org ([81.169.241.247]:54738 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751301AbeAPQsb (ORCPT ); Tue, 16 Jan 2018 11:48:31 -0500 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 16/16] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Date: Tue, 16 Jan 2018 17:36:59 +0100 Message-Id: <1516120619-1159-17-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Joerg Roedel Allow PTI to be compiled on x86_32. Signed-off-by: Joerg Roedel --- security/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/Kconfig b/security/Kconfig index b0cb9a5f9448..93d85fda0f54 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -57,7 +57,7 @@ config SECURITY_NETWORK config PAGE_TABLE_ISOLATION bool "Remove the kernel mapping in user mode" default y - depends on X86_64 && !UML + depends on X86 && !UML help This feature reduces the number of hardware side channels by ensuring that the majority of kernel addresses are not mapped -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751126AbeAPRNq (ORCPT + 1 other); Tue, 16 Jan 2018 12:13:46 -0500 Received: from 8bytes.org ([81.169.241.247]:55356 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750764AbeAPRNp (ORCPT ); Tue, 16 Jan 2018 12:13:45 -0500 Date: Tue, 16 Jan 2018 18:13:43 +0100 From: Joerg Roedel To: Peter Zijlstra Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180116171343.GB28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180116165213.GF2228@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Peter, On Tue, Jan 16, 2018 at 05:52:13PM +0100, Peter Zijlstra wrote: > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > > From: Joerg Roedel > > > > Reserve 2MB/4MB of address space for mapping the LDT to > > user-space. > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > 64K*2*64=8M > 2M. Thanks, I'll fix that in the next version. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751142AbeAPRbx (ORCPT + 1 other); Tue, 16 Jan 2018 12:31:53 -0500 Received: from merlin.infradead.org ([205.233.59.134]:54306 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750796AbeAPRbw (ORCPT ); Tue, 16 Jan 2018 12:31:52 -0500 Date: Tue, 16 Jan 2018 18:31:15 +0100 From: Peter Zijlstra To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180116173115.GG2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> <20180116171343.GB28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180116171343.GB28161@8bytes.org> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 06:13:43PM +0100, Joerg Roedel wrote: > Hi Peter, > > On Tue, Jan 16, 2018 at 05:52:13PM +0100, Peter Zijlstra wrote: > > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > > > From: Joerg Roedel > > > > > > Reserve 2MB/4MB of address space for mapping the LDT to > > > user-space. > > > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > > 64K*2*64=8M > 2M. > > Thanks, I'll fix that in the next version. Just lower the max SMP setting until it fits or something. 32bit is too address space starved for lots of CPU in any case, 64 CPUs on 32bit is absolutely insane. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751349AbeAPRfG (ORCPT + 1 other); Tue, 16 Jan 2018 12:35:06 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46572 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751173AbeAPRfF (ORCPT ); Tue, 16 Jan 2018 12:35:05 -0500 Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT To: Peter Zijlstra , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> <20180116171343.GB28161@8bytes.org> <20180116173115.GG2228@hirez.programming.kicks-ass.net> From: Waiman Long Organization: Red Hat Message-ID: <13a45e59-5969-2fdb-25cd-adcd5298784b@redhat.com> Date: Tue, 16 Jan 2018 12:34:36 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <20180116173115.GG2228@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 16 Jan 2018 17:34:59 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 12:31 PM, Peter Zijlstra wrote: > On Tue, Jan 16, 2018 at 06:13:43PM +0100, Joerg Roedel wrote: >> Hi Peter, >> >> On Tue, Jan 16, 2018 at 05:52:13PM +0100, Peter Zijlstra wrote: >>> On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: >>>> From: Joerg Roedel >>>> >>>> Reserve 2MB/4MB of address space for mapping the LDT to >>>> user-space. >>> LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives >>> 64K*2*64=8M > 2M. >> Thanks, I'll fix that in the next version. > Just lower the max SMP setting until it fits or something. 32bit is too > address space starved for lots of CPU in any case, 64 CPUs on 32bit is > absolutely insane. Maybe we can just scale the amount of reserved space according to the current NR_CPUS setting. In this way, we won't waste more memory than is necessary. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751412AbeAPSDL (ORCPT + 1 other); Tue, 16 Jan 2018 13:03:11 -0500 Received: from mga06.intel.com ([134.134.136.31]:28706 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751060AbeAPSDK (ORCPT ); Tue, 16 Jan 2018 13:03:10 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="193810120" Subject: Re: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-8-git-send-email-joro@8bytes.org> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de From: Dave Hansen Message-ID: <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> Date: Tue, 16 Jan 2018 10:03:09 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-8-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 08:36 AM, Joerg Roedel wrote: > +/* > + * Page table pages are page-aligned. The lower half of the top > + * level is used for userspace and the top half for the kernel. > + * > + * Returns true for parts of the PGD that map userspace and > + * false for the parts that map the kernel. > + */ > +static inline bool pgdp_maps_userspace(void *__ptr) > +{ > + unsigned long ptr = (unsigned long)__ptr; > + > + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); > +} One of the reasons to implement it the other way: - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); is that the compiler can do this all quickly. KERNEL_PGD_BOUNDARY depends on PAGE_OFFSET which depends on a variable. IOW, the compiler can't do it. How much worse is the code that this generates? From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751182AbeAPSGw (ORCPT + 1 other); Tue, 16 Jan 2018 13:06:52 -0500 Received: from mga07.intel.com ([134.134.136.100]:38867 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbeAPSGv (ORCPT ); Tue, 16 Jan 2018 13:06:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="193811019" Subject: Re: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-11-git-send-email-joro@8bytes.org> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de From: Dave Hansen Message-ID: Date: Tue, 16 Jan 2018 10:06:48 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-11-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > In PAE page-tables at the top-level most bits we usually set > with _KERNPG_TABLE are reserved, resulting in a #GP when > they are loaded by the processor. Can you save me the trip to the SDM and remind me which bits actually cause trouble here? From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751697AbeAPSLQ (ORCPT + 1 other); Tue, 16 Jan 2018 13:11:16 -0500 Received: from mga09.intel.com ([134.134.136.24]:49447 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751028AbeAPSLP (ORCPT ); Tue, 16 Jan 2018 13:11:15 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="193811786" Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de From: Dave Hansen Message-ID: Date: Tue, 16 Jan 2018 10:11:14 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-13-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 08:36 AM, Joerg Roedel wrote: > +#ifdef CONFIG_X86_64 > /* > * If this is normal user memory, make it NX in the kernel > * pagetables so that, if we somehow screw up and return to > @@ -134,10 +135,16 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) > * may execute from it > * - we don't have NX support > * - we're clearing the PGD (i.e. the new pgd is not present). > + * - We run on a 32 bit kernel. 2-level paging doesn't support NX at > + * all and PAE paging does not support it on the PGD level. We can > + * set it in the PMD level there in the future, but that means we > + * need to unshare the PMDs between the kernel and the user > + * page-tables. > */ > if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && > (__supported_pte_mask & _PAGE_NX)) > pgd.pgd |= _PAGE_NX; > +#endif Ugh. The ghosts of PAE have come back to haunt us. Could we do: static inline bool pgd_supports_nx(unsigned long) { #ifdef CONFIG_X86_64 return (__supported_pte_mask & _PAGE_NX); #else /* No 32-bit page tables support NX at PGD level */ return 0; #endif } Nobody will ever spot the #ifdef the way you laid it out. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751594AbeAPSOV (ORCPT + 1 other); Tue, 16 Jan 2018 13:14:21 -0500 Received: from mga05.intel.com ([192.55.52.43]:8301 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750890AbeAPSOU (ORCPT ); Tue, 16 Jan 2018 13:14:20 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="193813334" Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de From: Dave Hansen Message-ID: <1c7da3dc-279a-fa07-247b-7596cf758a55@intel.com> Date: Tue, 16 Jan 2018 10:14:19 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Joerg, Very cool!. I really appreciate you putting this together. I don't see any real showstoppers or things that I think will *break* 64-bit. I just hope that we can merge this _slowly_ in case it breaks 64-bit along the way. I didn't look at the assembly in too much detail. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751383AbeAPSgD (ORCPT + 1 other); Tue, 16 Jan 2018 13:36:03 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:43949 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbeAPSgC (ORCPT ); Tue, 16 Jan 2018 13:36:02 -0500 Date: Tue, 16 Jan 2018 19:35:51 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 01/16] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack In-Reply-To: <1516120619-1159-2-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-2-git-send-email-joro@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > From: Joerg Roedel > > The stack addresss doesn't need to be stored in tss.sp0 if > we switch manually like on sysenter. Rename the offset so > that it still makes sense when we its location. -ENOSENTENCE Other than that. Makes sense. > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 2 +- > arch/x86/kernel/asm-offsets_32.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index a1f28a54f23a..eb8c5615777b 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -401,7 +401,7 @@ ENTRY(xen_sysenter_target) > * 0(%ebp) arg6 > */ > ENTRY(entry_SYSENTER_32) > - movl TSS_sysenter_sp0(%esp), %esp > + movl TSS_sysenter_stack(%esp), %esp > .Lsysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp /* pt_regs->sp (stashed in bp) */ > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c > index fa1261eefa16..654229bac2fc 100644 > --- a/arch/x86/kernel/asm-offsets_32.c > +++ b/arch/x86/kernel/asm-offsets_32.c > @@ -47,7 +47,7 @@ void foo(void) > BLANK(); > > /* Offset from the sysenter stack to tss.sp0 */ > - DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > offsetofend(struct cpu_entry_area, entry_stack_page.stack)); > > #ifdef CONFIG_CC_STACKPROTECTOR > -- > 2.13.6 > > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751678AbeAPS7E (ORCPT + 1 other); Tue, 16 Jan 2018 13:59:04 -0500 Received: from mail-io0-f196.google.com ([209.85.223.196]:39662 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751107AbeAPS7C (ORCPT ); Tue, 16 Jan 2018 13:59:02 -0500 X-Google-Smtp-Source: ACJfBouYM8eiRj5Knhu1I8X3WkVnmfsqgJ/gEydjZqTpPWUvutV/+f/9GswWuq0DV6HbON35Yzqyd+XQhF0UBlBlGtY= MIME-Version: 1.0 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Linus Torvalds Date: Tue, 16 Jan 2018 10:59:01 -0800 X-Google-Sender-Auth: T4To39vhDeOLpW8HlVWFfaZMd-E Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , "the arch/x86 maintainers" , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) Yes, I'm very happy to see that this is actually not nearly as bad as I feared it might be, Some of those #ifdef's in the PTI code you added might want more commentary about what the exact differences are. And maybe they could be done more cleanly with some abstraction. But nothing looked _horrible_. > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. .. and please run all the segment and syscall selfchecks that Andy has written. But yes, checking bare metal, and checking the "odd" applications like Wine and dosemu (and kvm etc) within the PTI kernel is certainly a good idea. > One of the things that are surely broken is XEN_PV support. > I'd appreciate any help with testing and bugfixing on that > front. Xen PV and PTI don't work together even on x86-64 afaik, the Xen people apparently felt it wasn't worth it. See the if (hypervisor_is_type(X86_HYPER_XEN_PV)) { pti_print_if_insecure("disabled on XEN PV."); return; } in pti_check_boottime_disable(). Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751796AbeAPTCb (ORCPT + 1 other); Tue, 16 Jan 2018 14:02:31 -0500 Received: from mga14.intel.com ([192.55.52.115]:61510 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751732AbeAPTC3 (ORCPT ); Tue, 16 Jan 2018 14:02:29 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="193827350" Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Linus Torvalds , Joerg Roedel References: <1516120619-1159-1-git-send-email-joro@8bytes.org> Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel From: Dave Hansen Message-ID: <90748aea-6fc0-48a5-d154-c98465fea42c@intel.com> Date: Tue, 16 Jan 2018 11:02:28 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 10:59 AM, Linus Torvalds wrote: >> The code has not run on bare-metal yet, I'll test that in >> the next days once I setup a 32 bit box again. I also havn't >> tested Wine and DosEMU yet, so this might also be broken. > .. and please run all the segment and syscall selfchecks that Andy has written. > > But yes, checking bare metal, and checking the "odd" applications like > Wine and dosemu (and kvm etc) within the PTI kernel is certainly a > good idea. I tried to document a list of the "gotchas" that tripped us up during the 64-bit effort under "Testing": > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/pti&id=01c9b17bf673b05bb401b76ec763e9730ccf1376 NMIs were a biggie too. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751590AbeAPTLJ (ORCPT + 1 other); Tue, 16 Jan 2018 14:11:09 -0500 Received: from 8bytes.org ([81.169.241.247]:59064 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751451AbeAPTLH (ORCPT ); Tue, 16 Jan 2018 14:11:07 -0500 Date: Tue, 16 Jan 2018 20:11:05 +0100 From: Joerg Roedel To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h Message-ID: <20180116191105.GC28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-8-git-send-email-joro@8bytes.org> <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 10:03:09AM -0800, Dave Hansen wrote: > On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); > > +} > > One of the reasons to implement it the other way: > > - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); > > is that the compiler can do this all quickly. KERNEL_PGD_BOUNDARY > depends on PAGE_OFFSET which depends on a variable. IOW, the compiler > can't do it. > > How much worse is the code that this generates? I havn't looked at the actual code this generates, but the (PAGE_SIZE / 2) comparison doesn't work on 32 bit where the address space is not always evenly split. I'll look into a better way to check this. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751763AbeAPTbi (ORCPT + 1 other); Tue, 16 Jan 2018 14:31:38 -0500 Received: from smtp.ctxuk.citrix.com ([185.25.65.24]:27934 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750811AbeAPTbg (ORCPT ); Tue, 16 Jan 2018 14:31:36 -0500 X-IronPort-AV: E=Sophos;i="5.46,369,1511827200"; d="scan'208";a="66151019" Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Linus Torvalds , Joerg Roedel CC: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel , Juergen Gross , Jan Beulich References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Andrew Cooper Message-ID: Date: Tue, 16 Jan 2018 19:21:00 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Language: en-GB X-ClientProxiedBy: AMSPEX02CAS02.citrite.net (10.69.22.113) To AMSPEX02CL01.citrite.net (10.69.22.125) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 16/01/18 18:59, Linus Torvalds wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >> One of the things that are surely broken is XEN_PV support. >> I'd appreciate any help with testing and bugfixing on that >> front. > Xen PV and PTI don't work together even on x86-64 afaik, the Xen > people apparently felt it wasn't worth it. See the > > if (hypervisor_is_type(X86_HYPER_XEN_PV)) { > pti_print_if_insecure("disabled on XEN PV."); > return; > } 64bit PV guests under Xen already have split pagetables.  It is a base and necessary part of the ABI, because segment limits stopped working in 64bit. 32bit PV guests aren't split, but by far the most efficient way of doing this is to introduce a new enlightenment and have Xen switch all this stuff (and IBRS, for that matter) on behalf of the guest kernel on context switch. ~Andrew From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751990AbeAPTfN (ORCPT + 1 other); Tue, 16 Jan 2018 14:35:13 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44083 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751465AbeAPTfK (ORCPT ); Tue, 16 Jan 2018 14:35:10 -0500 Date: Tue, 16 Jan 2018 20:34:59 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Dave Hansen , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 07/16] x86/mm: Move two more functions from pgtable_64.h to pgtable.h In-Reply-To: <20180116191105.GC28161@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-8-git-send-email-joro@8bytes.org> <727a7eba-41a0-d5bb-df54-8e58b33fde76@intel.com> <20180116191105.GC28161@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > On Tue, Jan 16, 2018 at 10:03:09AM -0800, Dave Hansen wrote: > > On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > > + return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < KERNEL_PGD_BOUNDARY); > > > +} > > > > One of the reasons to implement it the other way: > > > > - return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); > > > > is that the compiler can do this all quickly. KERNEL_PGD_BOUNDARY > > depends on PAGE_OFFSET which depends on a variable. IOW, the compiler > > can't do it. > > > > How much worse is the code that this generates? > > I havn't looked at the actual code this generates, but the > (PAGE_SIZE / 2) comparison doesn't work on 32 bit where the address > space is not always evenly split. I'll look into a better way to check > this. It should be trivial enough to do return (ptr & ~PAGE_MASK) < PGD_SPLIT_SIZE); and define it PAGE_SIZE/2 for 64bit and for PAE make it depend on the configured address space split. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751615AbeAPTlP (ORCPT + 1 other); Tue, 16 Jan 2018 14:41:15 -0500 Received: from 8bytes.org ([81.169.241.247]:59362 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750905AbeAPTlO (ORCPT ); Tue, 16 Jan 2018 14:41:14 -0500 Date: Tue, 16 Jan 2018 20:41:12 +0100 From: Joerg Roedel To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries Message-ID: <20180116194112.GD28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-11-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 10:06:48AM -0800, Dave Hansen wrote: > On 01/16/2018 08:36 AM, Joerg Roedel wrote: > > > > In PAE page-tables at the top-level most bits we usually set > > with _KERNPG_TABLE are reserved, resulting in a #GP when > > they are loaded by the processor. > > Can you save me the trip to the SDM and remind me which bits actually > cause trouble here? Everything besides PRESENT, PCD, PWT and the actual physical address, so RW, and NX for example cause a #GP. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751873AbeAPTo2 (ORCPT + 1 other); Tue, 16 Jan 2018 14:44:28 -0500 Received: from 8bytes.org ([81.169.241.247]:59446 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751465AbeAPTo0 (ORCPT ); Tue, 16 Jan 2018 14:44:26 -0500 Date: Tue, 16 Jan 2018 20:44:24 +0100 From: Joerg Roedel To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's Message-ID: <20180116194424.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 10:11:14AM -0800, Dave Hansen wrote: > > Ugh. The ghosts of PAE have come back to haunt us. :-) Yeah, PAE caused the most trouble for me while getting this running. > > Could we do: > > static inline bool pgd_supports_nx(unsigned long) > { > #ifdef CONFIG_X86_64 > return (__supported_pte_mask & _PAGE_NX); > #else > /* No 32-bit page tables support NX at PGD level */ > return 0; > #endif > } > > Nobody will ever spot the #ifdef the way you laid it out. Right, thats a better way to do it. I'll change it in the next version. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751843AbeAPTqR (ORCPT + 1 other); Tue, 16 Jan 2018 14:46:17 -0500 Received: from 8bytes.org ([81.169.241.247]:59518 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751438AbeAPTqQ (ORCPT ); Tue, 16 Jan 2018 14:46:16 -0500 Date: Tue, 16 Jan 2018 20:46:14 +0100 From: Joerg Roedel To: Dave Hansen Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180116194614.GF28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1c7da3dc-279a-fa07-247b-7596cf758a55@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1c7da3dc-279a-fa07-247b-7596cf758a55@intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 10:14:19AM -0800, Dave Hansen wrote: > Joerg, > > Very cool!. Thanks :) > I really appreciate you putting this together. I don't see any real > showstoppers or things that I think will *break* 64-bit. I just hope > that we can merge this _slowly_ in case it breaks 64-bit along the way. Sure, it needs a lot more testing and most likely fixing anyway. So there is still some way to go before this is ready for merging. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752029AbeAPTzq (ORCPT + 1 other); Tue, 16 Jan 2018 14:55:46 -0500 Received: from 8bytes.org ([81.169.241.247]:59640 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752013AbeAPTzp (ORCPT ); Tue, 16 Jan 2018 14:55:45 -0500 Date: Tue, 16 Jan 2018 20:55:43 +0100 From: Joerg Roedel To: Linus Torvalds Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , Linux Kernel Mailing List , linux-mm , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180116195543.GG28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Linus, On Tue, Jan 16, 2018 at 10:59:01AM -0800, Linus Torvalds wrote: > Yes, I'm very happy to see that this is actually not nearly as bad as > I feared it might be, Yeah, I was looking at the original PTI patches and my impression was that a lot of the complicated stuff (like setting up the cpu_entry_area) was already in there for 32 bit too. So it was mostly about the entry code and some changes to the 32bit page-table code. > Some of those #ifdef's in the PTI code you added might want more > commentary about what the exact differences are. And maybe they could > be done more cleanly with some abstraction. But nothing looked > _horrible_. I'll add more comments and better abstraction, Dave has already suggested some improvements here. Reading some of my comments again, they need a rework anyway. > .. and please run all the segment and syscall selfchecks that Andy has written. Didn't know about them yet, thanks. I will run them too in my testing > Xen PV and PTI don't work together even on x86-64 afaik, the Xen > people apparently felt it wasn't worth it. See the > > if (hypervisor_is_type(X86_HYPER_XEN_PV)) { > pti_print_if_insecure("disabled on XEN PV."); > return; > } > > in pti_check_boottime_disable(). But I might have broken something for them anyway, honestly I didn't pay much attention to the XEN_PV case as I was trying to get it running here. My hope is that someone who knows Xen better than I do will help out :) Regards, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751919AbeAPUa0 (ORCPT + 1 other); Tue, 16 Jan 2018 15:30:26 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44202 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751106AbeAPUaZ (ORCPT ); Tue, 16 Jan 2018 15:30:25 -0500 Date: Tue, 16 Jan 2018 21:30:14 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack In-Reply-To: <1516120619-1159-3-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) > /* This is used when switching tasks or entering/exiting vm86 mode. */ > static inline void update_sp0(struct task_struct *task) > { > - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ > -#ifdef CONFIG_X86_32 > - load_sp0(task->thread.sp0); > -#else > + /* sp0 always points to the entry trampoline stack, which is constant: */ > if (static_cpu_has(X86_FEATURE_XENPV)) > load_sp0(task_top_of_stack(task)); > -#endif > } > > #endif /* _ASM_X86_SWITCH_TO_H */ > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c > index 654229bac2fc..7270dd834f4b 100644 > --- a/arch/x86/kernel/asm-offsets_32.c > +++ b/arch/x86/kernel/asm-offsets_32.c > @@ -47,9 +47,11 @@ void foo(void) > BLANK(); > > /* Offset from the sysenter stack to tss.sp0 */ > - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - > offsetofend(struct cpu_entry_area, entry_stack_page.stack)); > > + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); Can you please split out the change of TSS_sysenter_stack into a separate patch? Other than that, this looks good. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751396AbeAPVDb (ORCPT + 1 other); Tue, 16 Jan 2018 16:03:31 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44281 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750796AbeAPVDZ (ORCPT ); Tue, 16 Jan 2018 16:03:25 -0500 Date: Tue, 16 Jan 2018 22:03:19 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 09/16] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 In-Reply-To: <1516120619-1159-10-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-10-git-send-email-joro@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > +#ifdef CONFIG_X86_64 > /* > * Clone a single p4d (i.e. a top-level entry on 4-level systems and a > * next-level entry on 5-level systems. > @@ -322,13 +323,29 @@ static void __init pti_clone_p4d(unsigned long addr) > kernel_p4d = p4d_offset(kernel_pgd, addr); > *user_p4d = *kernel_p4d; > } > +#endif > > /* > * Clone the CPU_ENTRY_AREA into the user space visible page table. > */ > static void __init pti_clone_user_shared(void) > { > +#ifdef CONFIG_X86_32 > + /* > + * On 32 bit PAE systems with 1GB of Kernel address space there is only > + * one pgd/p4d for the whole kernel. Cloning that would map the whole > + * address space into the user page-tables, making PTI useless. So clone > + * the page-table on the PMD level to prevent that. > + */ > + unsigned long start, end; > + > + start = CPU_ENTRY_AREA_BASE; > + end = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES); > + > + pti_clone_pmds(start, end, _PAGE_GLOBAL); > +#else > pti_clone_p4d(CPU_ENTRY_AREA_BASE); > +#endif > } Just a minor nit. You already wrap pti_clone_p4d() into X86_64. So it would be cleaner to do: kernel_p4d = p4d_offset(kernel_pgd, addr); *user_p4d = *kernel_p4d; } static void __init pti_clone_user_shared(void) { pti_clone_p4d(CPU_ENTRY_AREA_BASE); } #else /* CONFIG_X86_64 */ /* * Big fat comment. */ static void __init pti_clone_user_shared(void) { .... } #endif /* !CONFIG_X86_64 */ Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751047AbeAPVGw (ORCPT + 1 other); Tue, 16 Jan 2018 16:06:52 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44307 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750811AbeAPVGv (ORCPT ); Tue, 16 Jan 2018 16:06:51 -0500 Date: Tue, 16 Jan 2018 22:06:48 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 10/16] x86/mm/pti: Populate valid user pud entries In-Reply-To: <1516120619-1159-11-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-11-git-send-email-joro@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > From: Joerg Roedel > > With PAE paging we don't have PGD and P4D levels in the > page-table, instead the PUD level is the highest one. > > In PAE page-tables at the top-level most bits we usually set > with _KERNPG_TABLE are reserved, resulting in a #GP when > they are loaded by the processor. > > Work around this by populating PUD entries in the user > page-table only with _PAGE_PRESENT set. > > I am pretty sure there is a cleaner way to do this, but > until I find it use this #ifdef solution. Stick somehting like #define _KERNELPG_TABLE_PUD_ENTRY into the 32 and 64 bit variants of some relevant header file Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751444AbeAPVK7 (ORCPT + 1 other); Tue, 16 Jan 2018 16:10:59 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44332 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750772AbeAPVK6 (ORCPT ); Tue, 16 Jan 2018 16:10:58 -0500 Date: Tue, 16 Jan 2018 22:10:52 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's In-Reply-To: <1516120619-1159-13-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > > +#ifdef CONFIG_X86_64 > /* > * If this is normal user memory, make it NX in the kernel > * pagetables so that, if we somehow screw up and return to > @@ -134,10 +135,16 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) > * may execute from it > * - we don't have NX support > * - we're clearing the PGD (i.e. the new pgd is not present). > + * - We run on a 32 bit kernel. 2-level paging doesn't support NX at > + * all and PAE paging does not support it on the PGD level. We can > + * set it in the PMD level there in the future, but that means we > + * need to unshare the PMDs between the kernel and the user > + * page-tables. > */ > if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && > (__supported_pte_mask & _PAGE_NX)) > pgd.pgd |= _PAGE_NX; I'd suggest to have: static inline pteval_t supported_pgd_mask(void) { if (IS_ENABLED(CONFIG_X86_64)) return __supported_pte_mask; return __supported_pte_mask & ~_PAGE_NX); } and get rid of the ifdeffery completely. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751117AbeAPVPX (ORCPT + 1 other); Tue, 16 Jan 2018 16:15:23 -0500 Received: from mga09.intel.com ([134.134.136.24]:61883 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750790AbeAPVPV (ORCPT ); Tue, 16 Jan 2018 16:15:21 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,369,1511856000"; d="scan'208";a="193860920" Subject: Re: [PATCH 12/16] x86/mm/pae: Populate the user page-table with user pgd's To: Thomas Gleixner , Joerg Roedel References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-13-git-send-email-joro@8bytes.org> Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de From: Dave Hansen Message-ID: Date: Tue, 16 Jan 2018 13:15:21 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 01:10 PM, Thomas Gleixner wrote: > > static inline pteval_t supported_pgd_mask(void) > { > if (IS_ENABLED(CONFIG_X86_64)) > return __supported_pte_mask; > return __supported_pte_mask & ~_PAGE_NX); > } > > and get rid of the ifdeffery completely. Heh, that's an entertaining way to do it. Joerg, if you go do it this way, it would be nice to add all the other gunk that we don't allow to be set in the PAE pgd. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751277AbeAPVUq (ORCPT + 1 other); Tue, 16 Jan 2018 16:20:46 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44361 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750772AbeAPVUp (ORCPT ); Tue, 16 Jan 2018 16:20:45 -0500 Date: Tue, 16 Jan 2018 22:20:40 +0100 (CET) From: Thomas Gleixner To: Joerg Roedel cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Message-ID: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 16 Jan 2018, Joerg Roedel wrote: > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) > 16 files changed, 333 insertions(+), 123 deletions(-) Impressively small and well done ! Can you please make that patch set against git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-pti-for-linus so we immediately have it backportable for 4.14 stable? It's only a trivial conflict in pgtable.h, but we'd like to make the life of stable as simple as possible. They have enough headache with the pre 4.14 trees. We can pick some of the simple patches which make defines and inlines available out of the pile right away and apply them to x86/pti to shrink the amount of stuff you have to worry about. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751606AbeAPW0p (ORCPT + 1 other); Tue, 16 Jan 2018 17:26:45 -0500 Received: from mail.kernel.org ([198.145.29.99]:56768 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750892AbeAPW0o (ORCPT ); Tue, 16 Jan 2018 17:26:44 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E659217A4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBote3sQ3dCbsKFd2l9MMxSKLOVzNIVTEXcsX8aZdKRHpaDPOU7yxcfpLTr4AEB67gePP5KC0QCj7wuSs1Flu2Bc= MIME-Version: 1.0 In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:26:22 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Hi, > > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) > > The patches are split as follows: > > - 1-3 contain the entry-code changes to enter and > exit the kernel via the sysenter trampoline stack. > > - 4-7 are fixes to get the code compile on 32 bit > with CONFIG_PAGE_TABLE_ISOLATION=y. > > - 8-14 adapt the existing PTI code to work properly > on 32 bit and add the needed parts to 32 bit > page-table code. > > - 15 switches PTI on by adding the CR3 switches to > kernel entry/exit. > > - 16 enables the Kconfig for all of X86 > > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. > If you pass all the x86 selftests, then Wine and DOSEMU are pretty likely to work :) --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751741AbeAPWhb (ORCPT + 1 other); Tue, 16 Jan 2018 17:37:31 -0500 Received: from mail.kernel.org ([198.145.29.99]:58662 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750796AbeAPWha (ORCPT ); Tue, 16 Jan 2018 17:37:30 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8A3D32179F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBov5jiKtE52LuUUoaNyehc0Mp9XLfO/nQ5Fu0j8FK+bnKZA5GJ0Vr2sKKHrcFrq78Rbb1xBpd98qLyGScMxJYfQ= MIME-Version: 1.0 In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:37:08 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Thomas Gleixner Cc: Joerg Roedel , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 12:30 PM, Thomas Gleixner wrote: > On Tue, 16 Jan 2018, Joerg Roedel wrote: >> @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) >> /* This is used when switching tasks or entering/exiting vm86 mode. */ >> static inline void update_sp0(struct task_struct *task) >> { >> - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ >> -#ifdef CONFIG_X86_32 >> - load_sp0(task->thread.sp0); >> -#else >> + /* sp0 always points to the entry trampoline stack, which is constant: */ >> if (static_cpu_has(X86_FEATURE_XENPV)) >> load_sp0(task_top_of_stack(task)); >> -#endif >> } >> >> #endif /* _ASM_X86_SWITCH_TO_H */ >> diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c >> index 654229bac2fc..7270dd834f4b 100644 >> --- a/arch/x86/kernel/asm-offsets_32.c >> +++ b/arch/x86/kernel/asm-offsets_32.c >> @@ -47,9 +47,11 @@ void foo(void) >> BLANK(); >> >> /* Offset from the sysenter stack to tss.sp0 */ >> - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - >> + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - >> offsetofend(struct cpu_entry_area, entry_stack_page.stack)); I was going to say that this is just too magical. The convention is that STRUCT_member refers to "member" of "STRUCT". Here you're encoding a more complicated calculation. How about putting just the needed offsets in asm_offsets and putting the actual calculation in the asm code or a header. >> >> + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); This belongs in asm_offsets.c. Just move the asm_offsets_64.c version there and call it a day. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751465AbeAPWpv (ORCPT + 1 other); Tue, 16 Jan 2018 17:45:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:59596 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbeAPWpt (ORCPT ); Tue, 16 Jan 2018 17:45:49 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8C642178E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBov4GNEapzA/rzfH5lB4FGvu99RRaqVb24n8e+9RPUsMrL6DkmQidkXD+8QmCr5mZ9JfznwO+10V/UsWkGGXciE= MIME-Version: 1.0 In-Reply-To: <1516120619-1159-3-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:45:27 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Use the sysenter stack as a trampoline stack to enter the > kernel. The sysenter stack is already in the cpu_entry_area > and will be mapped to userspace when PTI is enabled. > > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 89 +++++++++++++++++++++++++++++++++++----- > arch/x86/include/asm/switch_to.h | 6 +-- > arch/x86/kernel/asm-offsets_32.c | 4 +- > arch/x86/kernel/cpu/common.c | 5 ++- > arch/x86/kernel/process.c | 2 - > arch/x86/kernel/process_32.c | 6 +++ > 6 files changed, 91 insertions(+), 21 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index eb8c5615777b..5a7bdb73be9f 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -222,6 +222,47 @@ > .endm > > /* > + * Switch from the entry-trampline stack to the kernel stack of the > + * running task. > + * > + * nr_regs is the number of dwords to push from the entry stack to the > + * task stack. If it is > 0 it expects an irq frame at the bottom of the > + * stack. > + * > + * check_user != 0 it will add a check to only switch stacks if the > + * kernel entry was from user-space. > + */ > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 How about marking nr_regs with :req to force everyone to be explicit? > + > + .if \check_user > 0 && \nr_regs > 0 > + testb $3, (\nr_regs - 4)*4(%esp) /* CS */ > + jz .Lend_\@ > + .endif > + > + pushl %edi > + movl %esp, %edi > + > + /* > + * TSS_sysenter_stack is the offset from the bottom of the > + * entry-stack > + */ > + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp This is incomprehensible. You're adding what appears to be the offset of sysenter_stack within the TSS to something based on esp and dereferencing that to get the new esp. That't not actually what you're doing, but please change asm_offsets.c (as in my previous email) to avoid putting serious arithmetic in it and then do the arithmetic right here so that it's possible to follow what's going on. > + > + /* Copy the registers over */ > + .if \nr_regs > 0 > + i = 0 > + .rept \nr_regs > + pushl (\nr_regs - i) * 4(%edi) > + i = i + 1 > + .endr > + .endif > + > + mov (%edi), %edi > + > +.Lend_\@: > +.endm > + > +/* > * %eax: prev task > * %edx: next task > */ > @@ -401,7 +442,9 @@ ENTRY(xen_sysenter_target) > * 0(%ebp) arg6 > */ > ENTRY(entry_SYSENTER_32) > - movl TSS_sysenter_stack(%esp), %esp > + /* Kernel stack is empty */ > + SWITCH_TO_KERNEL_STACK This would be more readable if you put nr_regs in here. > + > .Lsysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp /* pt_regs->sp (stashed in bp) */ > @@ -521,6 +564,10 @@ ENDPROC(entry_SYSENTER_32) > ENTRY(entry_INT80_32) > ASM_CLAC > pushl %eax /* pt_regs->orig_ax */ > + > + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 > + Why check_user? > @@ -655,6 +702,10 @@ END(irq_entries_start) > common_interrupt: > ASM_CLAC > addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */ > + > + /* Stack layout: ss, esp, eflags, cs, eip, vector */ > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 LGTM. > ENTRY(nmi) > ASM_CLAC > + > + /* Stack layout: ss, esp, eflags, cs, eip */ > + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 This is wrong, I think. If you get an nmi in kernel mode but while still on the sysenter stack, you blow up. IIRC we have some crazy code already to handle this (for nmi and #DB), and maybe that's already adequate or can be made adequate, but at the very least this needs a big comment explaining why it's okay. > diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h > index eb5f7999a893..20e5f7ab8260 100644 > --- a/arch/x86/include/asm/switch_to.h > +++ b/arch/x86/include/asm/switch_to.h > @@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) > /* This is used when switching tasks or entering/exiting vm86 mode. */ > static inline void update_sp0(struct task_struct *task) > { > - /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ > -#ifdef CONFIG_X86_32 > - load_sp0(task->thread.sp0); > -#else > + /* sp0 always points to the entry trampoline stack, which is constant: */ > if (static_cpu_has(X86_FEATURE_XENPV)) > load_sp0(task_top_of_stack(task)); > -#endif > } > > #endif /* _ASM_X86_SWITCH_TO_H */ > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c > index 654229bac2fc..7270dd834f4b 100644 > --- a/arch/x86/kernel/asm-offsets_32.c > +++ b/arch/x86/kernel/asm-offsets_32.c > @@ -47,9 +47,11 @@ void foo(void) > BLANK(); > > /* Offset from the sysenter stack to tss.sp0 */ > - DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - > + DEFINE(TSS_sysenter_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) - > offsetofend(struct cpu_entry_area, entry_stack_page.stack)); > > + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); > + > #ifdef CONFIG_CC_STACKPROTECTOR > BLANK(); > OFFSET(stack_canary_offset, stack_canary, canary); > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index ef29ad001991..20a71c914e59 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1649,11 +1649,12 @@ void cpu_init(void) > enter_lazy_tlb(&init_mm, curr); > > /* > - * Initialize the TSS. Don't bother initializing sp0, as the initial > - * task never enters user mode. > + * Initialize the TSS. sp0 points to the entry trampoline stack > + * regardless of what task is running. > */ > set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); > load_TR_desc(); > + load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1)); It's high time we unified the 32-bit and 64-bit versions of the code. This isn't necessarily needed for your series, though. > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > index 5224c6099184..452eeac00b80 100644 > --- a/arch/x86/kernel/process_32.c > +++ b/arch/x86/kernel/process_32.c > @@ -292,6 +292,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) > this_cpu_write(cpu_current_top_of_stack, > (unsigned long)task_stack_page(next_p) + > THREAD_SIZE); > + /* > + * TODO: Find a way to let cpu_current_top_of_stack point to > + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with > + * iret exceptions. > + */ > + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); Do you know what the issue is? As a general comment, the interaction between this patch and vm86 is a bit scary. In vm86 mode, the kernel gets entered with extra stuff on the stack, which may screw up all your offsets. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751950AbeAPWqj (ORCPT + 1 other); Tue, 16 Jan 2018 17:46:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:60108 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751047AbeAPWqh (ORCPT ); Tue, 16 Jan 2018 17:46:37 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68E8D21799 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBotzweFRLL+XcuATo1155/XoRAwhBKM05jwkmqA3SC3Q/5emBu23B4cBSRRjUwPyFT1VvaTmfbaQQEOPeQZbUSc= MIME-Version: 1.0 In-Reply-To: <1516120619-1159-5-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-5-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:46:16 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 04/16] x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Move it out of the X86_64 specific processor defines so > that its visible for 32bit too. Hmm. This is okay, I guess, but any code that actually uses this definition is inherently wrong, since 32-bit implies !PCID. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751342AbeAPWtG (ORCPT + 1 other); Tue, 16 Jan 2018 17:49:06 -0500 Received: from mail.kernel.org ([198.145.29.99]:60822 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbeAPWtF (ORCPT ); Tue, 16 Jan 2018 17:49:05 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 732A121781 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBot0GFP6Ne6mKPrRq6kfKFv5ugCchgUIk3yMtPXeKnMpOPj7fS5e/s1/RSaAjQO3or1ZuH1qPBgYNvuN0AAle6Y= MIME-Version: 1.0 In-Reply-To: <1516120619-1159-4-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:48:43 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Switch back to the trampoline stack before returning to > userspace. > > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 58 ++++++++++++++++++++++++++++++++++++++++ > arch/x86/kernel/asm-offsets_32.c | 1 + > 2 files changed, 59 insertions(+) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 5a7bdb73be9f..14018eeb11c3 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -263,6 +263,61 @@ > .endm > > /* > + * Switch back from the kernel stack to the entry stack. > + * > + * iret_frame > 0 adds code to copie over an iret frame from the old to > + * the new stack. It also adds a check which bails out if > + * we are not returning to user-space. > + * > + * This macro is allowed not modify eflags when iret_frame == 0. > + */ > +.macro SWITCH_TO_ENTRY_STACK iret_frame=0 > + .if \iret_frame > 0 > + /* Are we returning to userspace? */ > + testb $3, 4(%esp) /* return CS */ > + jz .Lend_\@ > + .endif > + > + /* > + * We run with user-%fs already loaded from pt_regs, so we don't > + * have access to per_cpu data anymore, and there is no swapgs > + * equivalent on x86_32. > + * We work around this by loading the kernel-%fs again and > + * reading the entry stack address from there. Then we restore > + * the user-%fs and return. > + */ > + pushl %fs > + pushl %edi > + > + /* Re-load kernel-%fs, after that we can use PER_CPU_VAR */ > + movl $(__KERNEL_PERCPU), %edi > + movl %edi, %fs > + > + /* Save old stack pointer to copy the return frame over if needed */ > + movl %esp, %edi > + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %esp > + > + /* Now we are on the entry stack */ > + > + .if \iret_frame > 0 > + /* Stack frame: ss, esp, eflags, cs, eip, fs, edi */ > + pushl 6*4(%edi) /* ss */ > + pushl 5*4(%edi) /* esp */ > + pushl 4*4(%edi) /* eflags */ > + pushl 3*4(%edi) /* cs */ > + pushl 2*4(%edi) /* eip */ > + .endif > + > + pushl 4(%edi) /* fs */ > + > + /* Restore user %edi and user %fs */ > + movl (%edi), %edi > + popl %fs Yikes! We're not *supposed* to be able to observe an asynchronous descriptor table change, but if the LDT changes out from under you, this is going to blow up badly. It would be really nice if you could pull this off without percpu access or without needing to do this dance where you load user FS, then kernel FS, then user FS. If that's not doable, then you should at least add exception handling -- look at the other 'pop %fs' instructions in entry_32.S. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751932AbeAPWwI (ORCPT + 1 other); Tue, 16 Jan 2018 17:52:08 -0500 Received: from mail.kernel.org ([198.145.29.99]:33408 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751047AbeAPWwH (ORCPT ); Tue, 16 Jan 2018 17:52:07 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA57420C0F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBos6H0TNlZEB4M0+A3EV4if3GUiQIdlFqSqceFG3St2yWVELqMAFXf/RUdOBhmWFtANOh8V4dAeNrzUgK+rGknw= MIME-Version: 1.0 In-Reply-To: <20180116165213.GF2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> From: Andy Lutomirski Date: Tue, 16 Jan 2018 14:51:45 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT To: Peter Zijlstra Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 8:52 AM, Peter Zijlstra wrote: > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: >> From: Joerg Roedel >> >> Reserve 2MB/4MB of address space for mapping the LDT to >> user-space. > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > 64K*2*64=8M > 2M. If this works like it does on 64-bit, it only needs 128k regardless of the number of CPUs. The LDT mapping is specific to the mm. How are you dealing with PAE here? That is, what's your pagetable layout? What parts of the address space are owned by what code? From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751873AbeAQCs1 (ORCPT + 1 other); Tue, 16 Jan 2018 21:48:27 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:53768 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750830AbeAQCs0 (ORCPT ); Tue, 16 Jan 2018 21:48:26 -0500 Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> From: Boris Ostrovsky Message-ID: <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> Date: Tue, 16 Jan 2018 21:47:06 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1516120619-1159-3-git-send-email-joro@8bytes.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=741 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170038 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/16/2018 11:36 AM, Joerg Roedel wrote: > > /* > + * Switch from the entry-trampline stack to the kernel stack of the > + * running task. > + * > + * nr_regs is the number of dwords to push from the entry stack to the > + * task stack. If it is > 0 it expects an irq frame at the bottom of the > + * stack. > + * > + * check_user != 0 it will add a check to only switch stacks if the > + * kernel entry was from user-space. > + */ > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. With those macros fixed I was able to boot 32-bit Xen PV guest. -boris From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752496AbeAQH76 (ORCPT + 1 other); Wed, 17 Jan 2018 02:59:58 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:44741 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752467AbeAQH75 (ORCPT ); Wed, 17 Jan 2018 02:59:57 -0500 Date: Wed, 17 Jan 2018 08:59:24 +0100 From: Peter Zijlstra To: Andy Lutomirski Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 06/16] x86/mm/ldt: Reserve high address-space range for the LDT Message-ID: <20180117075924.GI2228@hirez.programming.kicks-ass.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-7-git-send-email-joro@8bytes.org> <20180116165213.GF2228@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 02:51:45PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:52 AM, Peter Zijlstra wrote: > > On Tue, Jan 16, 2018 at 05:36:49PM +0100, Joerg Roedel wrote: > >> From: Joerg Roedel > >> > >> Reserve 2MB/4MB of address space for mapping the LDT to > >> user-space. > > > > LDT is 64k, we need 2 per CPU, and NR_CPUS <= 64 on 32bit, that gives > > 64K*2*64=8M > 2M. > > If this works like it does on 64-bit, it only needs 128k regardless of > the number of CPUs. The LDT mapping is specific to the mm. Ah, then I got my LDT things confused again... which is certainly possible, we had a few too many variants back then. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752209AbeAQJCm (ORCPT + 1 other); Wed, 17 Jan 2018 04:02:42 -0500 Received: from 8bytes.org ([81.169.241.247]:53818 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750794AbeAQJCj (ORCPT ); Wed, 17 Jan 2018 04:02:39 -0500 Date: Wed, 17 Jan 2018 10:02:38 +0100 From: Joerg Roedel To: Boris Ostrovsky Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180117090238.GH28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Boris, thanks for testing this :) On Tue, Jan 16, 2018 at 09:47:06PM -0500, Boris Ostrovsky wrote: > On 01/16/2018 11:36 AM, Joerg Roedel wrote: > >+.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 > > > This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. > > With those macros fixed I was able to boot 32-bit Xen PV guest. Hmm, on bare metal the stack switch happens regardless of the X86_FEATURE_PTI feature being set, because we always program tss.sp0 with the systenter stack. How is the kernel entry stack setup on xen-pv? I think something is missing there instead. Regards, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752513AbeAQJS6 (ORCPT + 1 other); Wed, 17 Jan 2018 04:18:58 -0500 Received: from 8bytes.org ([81.169.241.247]:54298 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752422AbeAQJSz (ORCPT ); Wed, 17 Jan 2018 04:18:55 -0500 Date: Wed, 17 Jan 2018 10:18:53 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180117091853.GI28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 02:45:27PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 > > How about marking nr_regs with :req to force everyone to be explicit? Yeah, that's more readable, I'll change it. > > + /* > > + * TSS_sysenter_stack is the offset from the bottom of the > > + * entry-stack > > + */ > > + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp > > This is incomprehensible. You're adding what appears to be the offset > of sysenter_stack within the TSS to something based on esp and > dereferencing that to get the new esp. That't not actually what > you're doing, but please change asm_offsets.c (as in my previous > email) to avoid putting serious arithmetic in it and then do the > arithmetic right here so that it's possible to follow what's going on. Probably this needs better comments. So TSS_sysenter_stack is the offset from to tss.sp0 (tss.sp1 later) from the _bottom_ of the stack. But in this macro the stack might not be empty, it has a configurable (by \nr_regs) number of dwords on it. Before this instruction we also do a push %edi, so we need (\nr_regs + 1). This can't be put into asm_offset.c, as the actual offset depends on how much is on the stack. > > ENTRY(entry_INT80_32) > > ASM_CLAC > > pushl %eax /* pt_regs->orig_ax */ > > + > > + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ > > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 > > + > > Why check_user? You are right, check_user shouldn't ne needed as INT80 is never called from kernel mode. > > ENTRY(nmi) > > ASM_CLAC > > + > > + /* Stack layout: ss, esp, eflags, cs, eip */ > > + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 > > This is wrong, I think. If you get an nmi in kernel mode but while > still on the sysenter stack, you blow up. IIRC we have some crazy > code already to handle this (for nmi and #DB), and maybe that's > already adequate or can be made adequate, but at the very least this > needs a big comment explaining why it's okay. If we get an nmi while still on the sysenter stack, then we are not entering the handler from user-space and the above code will do nothing and behave as before. But you are right, it might blow up. There is a problem with the cr3 switch, because the nmi can happen in kernel mode before the cr3 is switched, then this handler will not do the cr3 switch itself and crash the kernel. But the stack switching should be fine, I think. > > + /* > > + * TODO: Find a way to let cpu_current_top_of_stack point to > > + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with > > + * iret exceptions. > > + */ > > + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); > > Do you know what the issue is? No, not yet, I will look into that again. But first I want to get this series stable enough as it is. > As a general comment, the interaction between this patch and vm86 is a > bit scary. In vm86 mode, the kernel gets entered with extra stuff on > the stack, which may screw up all your offsets. Just read up on vm86 mode control transfers and the stack layout then. Looks like I need to check for eflags.vm=1 and copy four more registers from/to the entry stack. Thanks for pointing that out. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752621AbeAQJYq (ORCPT + 1 other); Wed, 17 Jan 2018 04:24:46 -0500 Received: from 8bytes.org ([81.169.241.247]:54538 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750816AbeAQJYo (ORCPT ); Wed, 17 Jan 2018 04:24:44 -0500 Date: Wed, 17 Jan 2018 10:24:42 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117092442.GJ28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > + /* Restore user %edi and user %fs */ > > + movl (%edi), %edi > > + popl %fs > > Yikes! We're not *supposed* to be able to observe an asynchronous > descriptor table change, but if the LDT changes out from under you, > this is going to blow up badly. It would be really nice if you could > pull this off without percpu access or without needing to do this > dance where you load user FS, then kernel FS, then user FS. If that's > not doable, then you should at least add exception handling -- look at > the other 'pop %fs' instructions in entry_32.S. You are right! This also means I need to do the 'popl %fs' before the cr3-switch. I'll fix it in the next version. I have no real idea on how to switch back to the entry stack without access to per_cpu variables. I also can't access the cpu_entry_area for the cpu yet, because for that we need to be on the entry stack already. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752650AbeAQJ1B (ORCPT + 1 other); Wed, 17 Jan 2018 04:27:01 -0500 Received: from 8bytes.org ([81.169.241.247]:54678 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752209AbeAQJ07 (ORCPT ); Wed, 17 Jan 2018 04:26:59 -0500 Date: Wed, 17 Jan 2018 10:26:57 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 04/16] x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Message-ID: <20180117092657.GK28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-5-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 02:46:16PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > From: Joerg Roedel > > > > Move it out of the X86_64 specific processor defines so > > that its visible for 32bit too. > > Hmm. This is okay, I guess, but any code that actually uses this > definition is inherently wrong, since 32-bit implies !PCID. Yes, I tried another approach first which just #ifdef'ed out the relevant parts in tlbflush.h which use this bit. But that seemed to be the wrong path, as there is more PCID code that is compiled in for 32 bit. So defining the bit for 32 bit seemed to be the cleaner solution for now. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752598AbeAQJdi (ORCPT + 1 other); Wed, 17 Jan 2018 04:33:38 -0500 Received: from 8bytes.org ([81.169.241.247]:54972 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752523AbeAQJdd (ORCPT ); Wed, 17 Jan 2018 04:33:33 -0500 Date: Wed, 17 Jan 2018 10:33:31 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180117093331.GL28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Andy, thanks a lot for your review and input, especially on the entry-code changes! On Tue, Jan 16, 2018 at 02:26:22PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > The code has not run on bare-metal yet, I'll test that in > > the next days once I setup a 32 bit box again. I also havn't > > tested Wine and DosEMU yet, so this might also be broken. > > > > If you pass all the x86 selftests, then Wine and DOSEMU are pretty > likely to work :) Okay, good to know. I will definitily run them and make them pass :) Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752276AbeAQJzM (ORCPT + 1 other); Wed, 17 Jan 2018 04:55:12 -0500 Received: from 8bytes.org ([81.169.241.247]:55804 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750846AbeAQJzJ (ORCPT ); Wed, 17 Jan 2018 04:55:09 -0500 Date: Wed, 17 Jan 2018 10:55:07 +0100 From: Joerg Roedel To: Thomas Gleixner Cc: Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180117095507.GM28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Thomas, thanks for your review, I'll work in your suggestions for the next post. On Tue, Jan 16, 2018 at 10:20:40PM +0100, Thomas Gleixner wrote: > On Tue, 16 Jan 2018, Joerg Roedel wrote: > > 16 files changed, 333 insertions(+), 123 deletions(-) > > Impressively small and well done ! Thanks :) > Can you please make that patch set against > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-pti-for-linus > > so we immediately have it backportable for 4.14 stable? It's only a trivial > conflict in pgtable.h, but we'd like to make the life of stable as simple > as possible. They have enough headache with the pre 4.14 trees. Sure, will do. > We can pick some of the simple patches which make defines and inlines > available out of the pile right away and apply them to x86/pti to shrink > the amount of stuff you have to worry about. This should be patches 4, 5, 7, 11, and I think 13 is also simple enough. Feel free to take them, but I can also carry them forward if needed. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753201AbeAQN54 (ORCPT + 1 other); Wed, 17 Jan 2018 08:57:56 -0500 Received: from mail-io0-f171.google.com ([209.85.223.171]:39407 "EHLO mail-io0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753015AbeAQN5z (ORCPT ); Wed, 17 Jan 2018 08:57:55 -0500 X-Google-Smtp-Source: ACJfBos8xkhTVMe0b16ijWiJM/4ibsrGqY4xi7vAop0MreOIrp7+3L5mocsVv/GLdogjoXMnI47aM886tRtSgaoaU3M= MIME-Version: 1.0 In-Reply-To: <20180117092442.GJ28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> From: Brian Gerst Date: Wed, 17 Jan 2018 05:57:53 -0800 Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: > On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote: >> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >> > + /* Restore user %edi and user %fs */ >> > + movl (%edi), %edi >> > + popl %fs >> >> Yikes! We're not *supposed* to be able to observe an asynchronous >> descriptor table change, but if the LDT changes out from under you, >> this is going to blow up badly. It would be really nice if you could >> pull this off without percpu access or without needing to do this >> dance where you load user FS, then kernel FS, then user FS. If that's >> not doable, then you should at least add exception handling -- look at >> the other 'pop %fs' instructions in entry_32.S. > > You are right! This also means I need to do the 'popl %fs' before the > cr3-switch. I'll fix it in the next version. > > I have no real idea on how to switch back to the entry stack without > access to per_cpu variables. I also can't access the cpu_entry_area for > the cpu yet, because for that we need to be on the entry stack already. Switch to the trampoline stack before loading user segments. -- Brian Gerst From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753208AbeAQOAK (ORCPT + 1 other); Wed, 17 Jan 2018 09:00:10 -0500 Received: from mail-io0-f173.google.com ([209.85.223.173]:36645 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752941AbeAQOAI (ORCPT ); Wed, 17 Jan 2018 09:00:08 -0500 X-Google-Smtp-Source: ACJfBovP0JNWTe7twaEZMDprFyn7cs3FfXqoYbh8rdaVLwuxPUBkmqBrE4zkYJ+owPHCa7QgnM11yxtFYE0/OXgRrZc= MIME-Version: 1.0 In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> From: Brian Gerst Date: Wed, 17 Jan 2018 06:00:07 -0800 Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst wrote: > On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: >> On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote: >>> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >>> > + /* Restore user %edi and user %fs */ >>> > + movl (%edi), %edi >>> > + popl %fs >>> >>> Yikes! We're not *supposed* to be able to observe an asynchronous >>> descriptor table change, but if the LDT changes out from under you, >>> this is going to blow up badly. It would be really nice if you could >>> pull this off without percpu access or without needing to do this >>> dance where you load user FS, then kernel FS, then user FS. If that's >>> not doable, then you should at least add exception handling -- look at >>> the other 'pop %fs' instructions in entry_32.S. >> >> You are right! This also means I need to do the 'popl %fs' before the >> cr3-switch. I'll fix it in the next version. >> >> I have no real idea on how to switch back to the entry stack without >> access to per_cpu variables. I also can't access the cpu_entry_area for >> the cpu yet, because for that we need to be on the entry stack already. > > Switch to the trampoline stack before loading user segments. But then again, you could take a fault on the trampoline stack if you get a bad segment. Perhaps just pushing the new stack pointer onto the process stack before user segment loads will be the right move. -- Brian Gerst From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753029AbeAQOIN (ORCPT + 1 other); Wed, 17 Jan 2018 09:08:13 -0500 Received: from smtp.ctxuk.citrix.com ([185.25.65.24]:28904 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752343AbeAQOIM (ORCPT ); Wed, 17 Jan 2018 09:08:12 -0500 X-IronPort-AV: E=Sophos;i="5.46,372,1511827200"; d="scan'208";a="66197472" Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel , Boris Ostrovsky CC: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , , , , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , , , , , Andrea Arcangeli , Waiman Long , References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> <20180117090238.GH28161@8bytes.org> From: Andrew Cooper Message-ID: <97298add-9484-7d83-50a3-1c668ce3107d@citrix.com> Date: Wed, 17 Jan 2018 14:04:22 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180117090238.GH28161@8bytes.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Language: en-GB X-ClientProxiedBy: AMSPEX02CAS02.citrite.net (10.69.22.113) To AMSPEX02CL01.citrite.net (10.69.22.125) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 17/01/18 09:02, Joerg Roedel wrote: > Hi Boris, > > thanks for testing this :) > > On Tue, Jan 16, 2018 at 09:47:06PM -0500, Boris Ostrovsky wrote: >> On 01/16/2018 11:36 AM, Joerg Roedel wrote: >>> +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 >> >> This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. >> >> With those macros fixed I was able to boot 32-bit Xen PV guest. > Hmm, on bare metal the stack switch happens regardless of the > X86_FEATURE_PTI feature being set, because we always program tss.sp0 > with the systenter stack. How is the kernel entry stack setup on xen-pv? > I think something is missing there instead. There is one single stack registered with Xen, on which you get a normal exception frame in all cases, even via the registered (virtual) syscall/sysenter/failsafe handlers. ~Andrew From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753138AbeAQOKJ (ORCPT + 1 other); Wed, 17 Jan 2018 09:10:09 -0500 Received: from 8bytes.org ([81.169.241.247]:40408 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752491AbeAQOKH (ORCPT ); Wed, 17 Jan 2018 09:10:07 -0500 Date: Wed, 17 Jan 2018 15:10:06 +0100 From: Joerg Roedel To: Brian Gerst Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117141006.GR28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote: > On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: > > I have no real idea on how to switch back to the entry stack without > > access to per_cpu variables. I also can't access the cpu_entry_area for > > the cpu yet, because for that we need to be on the entry stack already. > > Switch to the trampoline stack before loading user segments. That requires to copy most of pt_regs from task- to trampoline-stack, not sure if that is faster than temporily restoring kernel %fs. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753406AbeAQOOi (ORCPT + 1 other); Wed, 17 Jan 2018 09:14:38 -0500 Received: from 8bytes.org ([81.169.241.247]:40504 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753208AbeAQOOU (ORCPT ); Wed, 17 Jan 2018 09:14:20 -0500 Date: Wed, 17 Jan 2018 15:14:18 +0100 From: Joerg Roedel To: Brian Gerst Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117141418.GS28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote: > On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst wrote: > But then again, you could take a fault on the trampoline stack if you > get a bad segment. Perhaps just pushing the new stack pointer onto > the process stack before user segment loads will be the right move. User segment loads pop from the stack, so having anything on-top also doesn't work. Maybe I can leave some space at the bottom of the task-stack at entry time and store the pointer there on exit, if that doesn't confuse the stack unwinder too much. Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753400AbeAQOpU (ORCPT + 1 other); Wed, 17 Jan 2018 09:45:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45258 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753051AbeAQOpT (ORCPT ); Wed, 17 Jan 2018 09:45:19 -0500 Date: Wed, 17 Jan 2018 08:45:03 -0600 From: Josh Poimboeuf To: Joerg Roedel Cc: Brian Gerst , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180117144503.62e47m6e5yyyze3d@treble> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> <20180117141418.GS28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180117141418.GS28161@8bytes.org> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 17 Jan 2018 14:45:19 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 17, 2018 at 03:14:18PM +0100, Joerg Roedel wrote: > On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote: > > On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst wrote: > > But then again, you could take a fault on the trampoline stack if you > > get a bad segment. Perhaps just pushing the new stack pointer onto > > the process stack before user segment loads will be the right move. > > User segment loads pop from the stack, so having anything on-top also > doesn't work. > > Maybe I can leave some space at the bottom of the task-stack at entry > time and store the pointer there on exit, if that doesn't confuse the > stack unwinder too much. If you put it at the end of the stack page, I _think_ all you'd have to do is just adjust TOP_OF_KERNEL_STACK_PADDING. -- Josh From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753648AbeAQPZL (ORCPT + 1 other); Wed, 17 Jan 2018 10:25:11 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:42310 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753550AbeAQPZG (ORCPT ); Wed, 17 Jan 2018 10:25:06 -0500 Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Andrew Cooper , Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <476d7100-2414-d09e-abf1-5aa4d369a3b7@oracle.com> <20180117090238.GH28161@8bytes.org> <97298add-9484-7d83-50a3-1c668ce3107d@citrix.com> From: Boris Ostrovsky Message-ID: Date: Wed, 17 Jan 2018 10:22:24 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <97298add-9484-7d83-50a3-1c668ce3107d@citrix.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801170219 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/17/2018 09:04 AM, Andrew Cooper wrote: > On 17/01/18 09:02, Joerg Roedel wrote: >> Hi Boris, >> >> thanks for testing this :) >> >> On Tue, Jan 16, 2018 at 09:47:06PM -0500, Boris Ostrovsky wrote: >>> On 01/16/2018 11:36 AM, Joerg Roedel wrote: >>>> +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 >>> This (and next patch's SWITCH_TO_ENTRY_STACK) need X86_FEATURE_PTI check. >>> >>> With those macros fixed I was able to boot 32-bit Xen PV guest. >> Hmm, on bare metal the stack switch happens regardless of the >> X86_FEATURE_PTI feature being set, because we always program tss.sp0 >> with the systenter stack. How is the kernel entry stack setup on xen-pv? >> I think something is missing there instead. > There is one single stack registered with Xen, on which you get a normal > exception frame in all cases, even via the registered (virtual) > syscall/sysenter/failsafe handlers. And so the check should be at least against X86_FEATURE_XENPV, not necessarily X86_FEATURE_PTI. But I guess you can still check against X86_FEATURE_PTI since without it there is not much reason to switch stacks? -boris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516918183; cv=none; d=google.com; s=arc-20160816; b=k3jrc/OZTvVb0IvZHfxFnudwNRQE6lDLIrHPL8i3kU3W81NEWU/Xmtxyb/G2yG0vkX E6ahez0ocSrcLnwpsw20voCSzxKOBjoLpDfyGg7Ls1WNTuEz+WMoSNw7yDJgpZQP7Fdo dsAOWaYsYqyA4spmtAZU1uYI9ft06gzMDo/BKdrDA7udX4Mtj5xlwJdL5tfPwnBuDksH 81wJVYMiU86BL/qL4gZnNOgVDco63AU4FtzBgLuceQykdB06Er2tqyC5L/yVgbv+eTUx BMc/IWx4X36W6VZJvBV0Ou0rt5ePrHBRTyU1BIyq919nNX7nSNZc99M8vzKxx8IFMODo fc5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:dkim-signature :arc-authentication-results; bh=/i8lsAbNpAWWST7L9oJnqcPNw7uGPAdhZtw1oWpgppo=; b=THqc2SA4n2D4iKeH5s6GOK2+qHr3t/eUivJSt0C/kHAKX2cVx0bPiA/lW2T7MlMo4f HbNVpN/O5hQE7IhKQQR7vUfsuf8KSkQ1+7+83jKkVbYwSaOIbz2kttm0L8JD71ljatGx 9KXObFQuUN+pQLLxTzdeXm3yM6prb620wwVyfNGOtivsjVI3JyobcApu7+vyrBtrNcYs Tm77+DZ2D7WIYEs/9FuCY7t7HZANqCw30Ob83EXZVLEC2vDh/GXJSJXloZXO9G9/ZU5R /MTsBb2/EFfDu9mWU2EAFpmTPjKYWwASLaMWCr02oJamvZ+o02Y0+nmknpmJyYE6gbXF TRjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rLIVz4Xd; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rLIVz4Xd; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com X-Google-Smtp-Source: AH8x225+CiChG5FNLneOFGbfR/scVr6w8pndvcG0bKErByiZm69VstUVDneyiDS4XClAgW29ZXFqDg== Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <20180124185800.GA11515@shrek.podlesie.net> Date: Thu, 25 Jan 2018 14:09:40 -0800 Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli Content-Transfer-Encoding: 7bit Message-Id: <67E8EB67-EB60-441E-BDFB-521F3D431400@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180124185800.GA11515@shrek.podlesie.net> To: Krzysztof Mazur X-Mailer: Apple Mail (2.3273) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590604001663192131?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Krzysztof Mazur wrote: > On Tue, Jan 16, 2018 at 05:36:43PM +0100, Joerg Roedel wrote: >> From: Joerg Roedel >> >> Hi, >> >> here is my current WIP code to enable PTI on x86-32. It is >> still in a pretty early state, but it successfully boots my >> KVM guest with PAE and with legacy paging. The existing PTI >> code for x86-64 already prepares a lot of the stuff needed >> for 32 bit too, thanks for that to all the people involved >> in its development :) > > Hi, > > I've waited for this patches for a long time, until I've tried to > exploit meltdown on some old 32-bit CPUs and failed. Pentium M > seems to speculatively execute the second load with eax > always equal to 0: > > movzx (%[addr]), %%eax > shl $12, %%eax > movzx (%[target], %%eax), %%eax > > And on Pentium 4-based Xeon the second load seems to be never executed, > even without shift (shifts are slow on some or all Pentium 4's). Maybe > not all P6 and Netbursts CPUs are affected, but I'm not sure. Maybe the > kernel, at least on 32-bit, should try to exploit meltdown to test if > the CPU is really affected. The PoC apparently does not work with 3GB of memory or more on 32-bit. Does you setup has more? Can you try the attack while setting max_addr=1G ? Thanks, Nadav From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x227hxUCpxSdUt+UfWQCCrLBXRgfvNGss+LTC6/IdBQaojn7RzooDeZG/SOo+e5jvC/c7wbg4 ARC-Seal: i=1; a=rsa-sha256; t=1516958917; cv=none; d=google.com; s=arc-20160816; b=TAjhur07LoDDJBmELX3EhqGNsXJMfYHmpL1FE3yQfNbNMw5saAnNlucowIR4Oq4y66 DIcLLxiQuRucoSRoEAUXuUrdZJIArc9DphcB1dJh9pX7+QV3QD8xa/U7uXrzwoCRihZq D6/z9FZP787vGt9a+GV+AHbq6ECrgcohX2SaFDdLGHJr+wAOpnPLYkda9230KQ0c3chS aSCCWZwmtZpRhr+7/lZEcQTOdmprTTB4jGYsuwScT0lupjSvYRZ7hu3cQjtbMGMoPxwJ wzhP2sqST/mfhdhwn0END2iIeXcDlmXRLuQ+PB/3DeJ8rGmQYlQmTwBerFi+k/agXVKD Zasg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=PGRe1ItFIPoita3qqhxWK0ULR7f/1b/Nf4ZhQe5yNsM=; b=mUnpe5yFiYYnr4ZKvOXq/cKsU/utBqEe1M0ZjPtkOxGPWDgEG1owyEHCHSTPkwfeYX IJZoumnadnFxsCxVH5FDvk9vHBO4Nbe8mZB4uFoyKR8pWmhLsHJVl1QQnXJjNZL7D0lA mGXmLZDPt1Xng4ANI9ykHgk5nb+4WYSS2NnPE+Cjw4wEXbE0ynqHzPCfalAfCXFIxzZA TAyZieJFQtWo5LrERrIfs7C51dRt7eXWTFy0bl6rgmGDrTJbIumBIij7njRfJDffPHxY bcVKokucVdEMC8yJw0Up4462As1tB9YI1PDpIwLrVHI28H/V6p+pkbtRwc1k/Jt/lpTf Q8jw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of krzysiek@podlesie.net designates 2a00:13a0:3010::1 as permitted sender) smtp.mailfrom=krzysiek@podlesie.net; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=podlesie.net Authentication-Results: mx.google.com; spf=pass (google.com: domain of krzysiek@podlesie.net designates 2a00:13a0:3010::1 as permitted sender) smtp.mailfrom=krzysiek@podlesie.net; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=podlesie.net Date: Fri, 26 Jan 2018 10:28:36 +0100 From: Krzysztof Mazur To: Nadav Amit Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180126092836.GA11003@shrek.podlesie.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180124185800.GA11515@shrek.podlesie.net> <67E8EB67-EB60-441E-BDFB-521F3D431400@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <67E8EB67-EB60-441E-BDFB-521F3D431400@gmail.com> User-Agent: Mutt/1.6.2 (2016-07-01) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590646714464941939?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Thu, Jan 25, 2018 at 02:09:40PM -0800, Nadav Amit wrote: > The PoC apparently does not work with 3GB of memory or more on 32-bit. Does > you setup has more? Can you try the attack while setting max_addr=1G ? No, I tested on: Pentium M (Dothan): 1.5 GB RAM, PAE for NX, 2GB/2GB split CONFIG_NOHIGHMEM=y CONFIG_VMSPLIT_2G=y CONFIG_PAGE_OFFSET=0x80000000 CONFIG_X86_PAE=y and Xeon (Pentium 4): 2 GB RAM, no PAE, 1.75GB/2.25GB split CONFIG_NOHIGHMEM=y CONFIG_VMSPLIT_2G_OPT=y CONFIG_PAGE_OFFSET=0x78000000 Now I'm testing with standard settings on Pentium M: 1.5 GB RAM, no PAE, 3GB/1GB split, ~890 MB RAM available CONFIG_NOHIGHMEM=y CONFIG_PAGE_OFFSET=0xc0000000 CONFIG_X86_PAE=n and it still does not work. reliability from https://github.com/IAIK/meltdown reports 0.38% (1/256 = 0.39%, "true" random), and other libkdump tools does not work. https://github.com/paboldin/meltdown-exploit (on linux_proc_banner symbol) reports: cached = 46, uncached = 515, threshold 153 read c0897020 = ff (score=0/1000) read c0897021 = ff (score=0/1000) read c0897022 = ff (score=0/1000) read c0897023 = ff (score=0/1000) read c0897024 = ff (score=0/1000) NOT VULNERABLE and my exploit with: for (i = 0; i < 256; i++) { unsigned char *px = p + (i << 12); t = rdtsc(); readb(px); t = rdtsc() - t; if (t < 100) printf("%02x %lld\n", i, t); } loop returns only "00 45". When I change the exploit code (now based on paboldin code to be sure) to: movzx (%[addr]), %%eax movl $0xaa, %%eax shl $12, %%eax movzx (%[target], %%eax), %%eax I always get "0xaa 51", so the CPU is speculatively executing the second load with (0xaa << 12) in eax, and without the movl instruction, eax seems to be always 0. I even tried to remove the shift: movzx (%[addr]), %%eax movzx (%[target], %%eax), %%eax and I've been reading known value (from /dev/mem, for instance 0x20), I've modified target array offset, and the CPU is still touching "wrong" cacheline, eax == 0 instead of 0x20. I've also tested movl instead of movzx (with and 0xff). On Core 2 Quad in 64-bit mode everything works as expected, vulnerable to Meltdown (I did not test it in 32-bit mode). I don't have any Core "1" to test. On that Pentium M syscall slowdown caused by PTI is huge, 7.5 times slower (7 times compared to patched kernel with disabled PTI), on Skylake with PCID the same trivial benchmark is "only" 3.5 times slower (and 5.2 times slower without PCID). Krzysiek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516565596; cv=none; d=google.com; s=arc-20160816; b=RCZ7JkulgxOub3wT3RnnnFPgcLl4vzVTT3Hh8jkLg2EIWDPyDoi0hzD3HCoFwc/nxz 6xOJ82ME3XRavDkLgvFU5HxoGYstmsvRE//MdxvHz8D8stJ2ugFqMoNu8FvqNFh4PvRl GhzajxTH8OeKmpIWNIcjvNOwn/IlAhdQKXXUJ7eZhtWZ4p//kupekn7OWxaFQ6z2Je4l QgJOEWKTNeh0eCzMNZjnxCiZjCBKruLQ6wbuNBnlQZGFG9dESJy1llsuGLWhEeo/D6tz MLetmYd3bKGd8GlgtvBMluS3eclcAULh7EUspZjQ1JEvv0aZHr4+BEUUj56aAcxRgRwm INwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:dkim-signature :arc-authentication-results; bh=qmL0ysBZnLwvWAEE68EnDmrJrYR3HNqNE9trALNzntI=; b=JS+IwRHbVmIKNkBNApGq7T6Y0B1BQ2A2cGQmfC1EVzcZY5+EMyRyM0jZCnh+kbPm0w rFpPY4cjbzgH/cQH+0mNeUP4n4t4v0aJ0yvetG+1iPVPllG5v7KzGM/bN2s6CK6kf55q 4AQ6nY0fNyWz7Ia+McJNAC85faSMTAnDEeX7I2VihkM0WTKN9wuL6xHko5lYng7fh/eW rGySMVRHqgVKimtldOr6gkLTAoDf0caUS6RsCfix8K6cOsVfFyk+FjMCNJPVf0tJ7U4o fqXAoFsk/tAU2jomuLr9DVrv76OY4WnxXs0eJPzi3TOvvErQNlJO40n1Y6mAjF0xH+9W 9E0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Zzx+55SS; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Zzx+55SS; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com X-Google-Smtp-Source: AH8x227QTi7vwc86/Ce7yVChqWDoglA05RhHK1sq0WeU5B1mbPkfG17tIcMg46fzVURv/Uwew/tguQ== Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> Date: Sun, 21 Jan 2018 12:13:13 -0800 Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Content-Transfer-Encoding: quoted-printable Message-Id: <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> To: Joerg Roedel X-Mailer: Apple Mail (2.3273) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590234287174470023?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: I am looking on PTI on x86-32, but I did not mange to get the PoC to = work on this setup (kaslr disabled, similar setup works on 64-bit). Did you use any PoC to =E2=80=9Ctest=E2=80=9D the protection? Thanks, Nadav Joerg Roedel wrote: > From: Joerg Roedel >=20 > Hi, >=20 > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) >=20 > The patches are split as follows: >=20 > - 1-3 contain the entry-code changes to enter and > exit the kernel via the sysenter trampoline stack. >=20 > - 4-7 are fixes to get the code compile on 32 bit > with CONFIG_PAGE_TABLE_ISOLATION=3Dy. >=20 > - 8-14 adapt the existing PTI code to work properly > on 32 bit and add the needed parts to 32 bit > page-table code. >=20 > - 15 switches PTI on by adding the CR3 switches to > kernel entry/exit. >=20 > - 16 enables the Kconfig for all of X86 >=20 > The code has not run on bare-metal yet, I'll test that in > the next days once I setup a 32 bit box again. I also havn't > tested Wine and DosEMU yet, so this might also be broken. >=20 > With that post I'd like to ask for all kinds of constructive > feedback on the approaches I have taken and of course the > many things I broke with it :) >=20 > One of the things that are surely broken is XEN_PV support. > I'd appreciate any help with testing and bugfixing on that > front. >=20 > So please review and let me know your thoughts. >=20 > Thanks, >=20 > Joerg >=20 > Joerg Roedel (16): > x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack > x86/entry/32: Enter the kernel via trampoline stack > x86/entry/32: Leave the kernel via the trampoline stack > x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 > x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h > x86/mm/ldt: Reserve high address-space range for the LDT > x86/mm: Move two more functions from pgtable_64.h to pgtable.h > x86/pgtable/32: Allocate 8k page-tables when PTI is enabled > x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 > x86/mm/pti: Populate valid user pud entries > x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h > x86/mm/pae: Populate the user page-table with user pgd's > x86/mm/pti: Add an overflow check to pti_clone_pmds() > x86/mm/legacy: Populate the user page-table with user pgd's > x86/entry/32: Switch between kernel and user cr3 on entry/exit > x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 >=20 > arch/x86/entry/entry_32.S | 170 = +++++++++++++++++++++++++++++--- > arch/x86/include/asm/pgtable-2level.h | 3 + > arch/x86/include/asm/pgtable-3level.h | 3 + > arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ > arch/x86/include/asm/pgtable_32_types.h | 5 +- > arch/x86/include/asm/pgtable_64.h | 85 ---------------- > arch/x86/include/asm/processor-flags.h | 8 +- > arch/x86/include/asm/switch_to.h | 6 +- > arch/x86/kernel/asm-offsets_32.c | 5 +- > arch/x86/kernel/cpu/common.c | 5 +- > arch/x86/kernel/head_32.S | 23 ++++- > arch/x86/kernel/process.c | 2 - > arch/x86/kernel/process_32.c | 6 ++ > arch/x86/mm/pgtable.c | 11 ++- > arch/x86/mm/pti.c | 34 ++++++- > security/Kconfig | 2 +- > 16 files changed, 333 insertions(+), 123 deletions(-) >=20 > --=20 > 2.13.6 >=20 > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516567497; cv=none; d=google.com; s=arc-20160816; b=slZMLWc8EWY8X0t7jzzdLG1kgeGNCbsZ0KWiVx26Gi6ZY5WatyHK25DOSnuy6ZRtVW tRHM3D6MrccvjgwaOWerOD9DpBKF8LwWaHQ95xR2cSDLovho0ZzlRNoqTFbZ951x90y3 2o3RlIgA9CsiioFElWm+Kx8j9XADq1IOriqj3CsF3R+N+5LnR+uUKea6excTJG0YCJRQ uyoVsjZEakpq3Q1d8ru2BNN73G6ADXVfMDaYcDHJTrDwG2P6sZ5LaXYfIslZjuY4Yfhr SxAcEqPn004968sKYY+UfPW7Ht5ynLgHb1y6rsc326RWSmP84TJSVn01zhPN49+Ru0fB m/jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:dkim-signature :arc-authentication-results; bh=fAG+Eghsa7nkqPsxm4IW71PJ70TuPgA7VCb+G9fqx+w=; b=tIkYUNhwV0ICDg0Vjk2DPSLp2K5uupqyEGCkXMV/rdtD6qHAuNUiSuT2aQFLJOqpde CELuLkTsnyfmntawW14zauq4HPiQw1E8/EbT8wwa/HufxQiJtZeQMdEYFuLRvsRicrYF pas1B9HMiziNkk57EQKd9JVDhg7mlmEpCmSvOd5XF+HGR8DfWFdsbRz1hNjElJBxTOAZ CzEgFHeIB6JYwq90NGf1nP4Phbb12z55pOObvRxIyYke8DEXjBZ0P6WksENlVwCmhhud 5PfFp8KStu2WsaIbWPqPH0Dhn/EeiLvPD+mdYQqGBlebdG1H0M9FBXzEDQAEWjdd/rXY VGRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=B/lEjjT1; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=B/lEjjT1; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com X-Google-Smtp-Source: AH8x2245pbWc5SPAXrqR4HVVZ5agcV84K/aA1tiBeGGKhyeeai+j/YrwgiEXOB0RVYe+XrDQh9P01Q== Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> Date: Sun, 21 Jan 2018 12:44:53 -0800 Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Content-Transfer-Encoding: quoted-printable Message-Id: <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> To: Joerg Roedel X-Mailer: Apple Mail (2.3273) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590236279629291592?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Please ignore my previous email. I got it working=E2=80=A6 Sorry for the = spam. Nadav Amit wrote: > I am looking on PTI on x86-32, but I did not mange to get the PoC to = work on > this setup (kaslr disabled, similar setup works on 64-bit). >=20 > Did you use any PoC to =E2=80=9Ctest=E2=80=9D the protection? >=20 > Thanks, > Nadav >=20 >=20 > Joerg Roedel wrote: >=20 >> From: Joerg Roedel >>=20 >> Hi, >>=20 >> here is my current WIP code to enable PTI on x86-32. It is >> still in a pretty early state, but it successfully boots my >> KVM guest with PAE and with legacy paging. The existing PTI >> code for x86-64 already prepares a lot of the stuff needed >> for 32 bit too, thanks for that to all the people involved >> in its development :) >>=20 >> The patches are split as follows: >>=20 >> - 1-3 contain the entry-code changes to enter and >> exit the kernel via the sysenter trampoline stack. >>=20 >> - 4-7 are fixes to get the code compile on 32 bit >> with CONFIG_PAGE_TABLE_ISOLATION=3Dy. >>=20 >> - 8-14 adapt the existing PTI code to work properly >> on 32 bit and add the needed parts to 32 bit >> page-table code. >>=20 >> - 15 switches PTI on by adding the CR3 switches to >> kernel entry/exit. >>=20 >> - 16 enables the Kconfig for all of X86 >>=20 >> The code has not run on bare-metal yet, I'll test that in >> the next days once I setup a 32 bit box again. I also havn't >> tested Wine and DosEMU yet, so this might also be broken. >>=20 >> With that post I'd like to ask for all kinds of constructive >> feedback on the approaches I have taken and of course the >> many things I broke with it :) >>=20 >> One of the things that are surely broken is XEN_PV support. >> I'd appreciate any help with testing and bugfixing on that >> front. >>=20 >> So please review and let me know your thoughts. >>=20 >> Thanks, >>=20 >> Joerg >>=20 >> Joerg Roedel (16): >> x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack >> x86/entry/32: Enter the kernel via trampoline stack >> x86/entry/32: Leave the kernel via the trampoline stack >> x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 >> x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h >> x86/mm/ldt: Reserve high address-space range for the LDT >> x86/mm: Move two more functions from pgtable_64.h to pgtable.h >> x86/pgtable/32: Allocate 8k page-tables when PTI is enabled >> x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 >> x86/mm/pti: Populate valid user pud entries >> x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h >> x86/mm/pae: Populate the user page-table with user pgd's >> x86/mm/pti: Add an overflow check to pti_clone_pmds() >> x86/mm/legacy: Populate the user page-table with user pgd's >> x86/entry/32: Switch between kernel and user cr3 on entry/exit >> x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 >>=20 >> arch/x86/entry/entry_32.S | 170 = +++++++++++++++++++++++++++++--- >> arch/x86/include/asm/pgtable-2level.h | 3 + >> arch/x86/include/asm/pgtable-3level.h | 3 + >> arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ >> arch/x86/include/asm/pgtable_32_types.h | 5 +- >> arch/x86/include/asm/pgtable_64.h | 85 ---------------- >> arch/x86/include/asm/processor-flags.h | 8 +- >> arch/x86/include/asm/switch_to.h | 6 +- >> arch/x86/kernel/asm-offsets_32.c | 5 +- >> arch/x86/kernel/cpu/common.c | 5 +- >> arch/x86/kernel/head_32.S | 23 ++++- >> arch/x86/kernel/process.c | 2 - >> arch/x86/kernel/process_32.c | 6 ++ >> arch/x86/mm/pgtable.c | 11 ++- >> arch/x86/mm/pti.c | 34 ++++++- >> security/Kconfig | 2 +- >> 16 files changed, 333 insertions(+), 123 deletions(-) >>=20 >> --=20 >> 2.13.6 >>=20 >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516578388; cv=none; d=google.com; s=arc-20160816; b=JypzC4wV89fvKENq6axKCN+AGBxzHYqlZaH/QfVCMuhUe1hjao84Rlw4lKiLxAurRI iAOXkpeW4uCQG8XbuYeyGu3DAiC3Fqf4/1xFt+x3uKvsGkuWd3Q99ad0XyjCgNTz3o3V IpJJym8czeGFvoHQ7bFPO2KPWogmnJJnlCeKNCpLYniOLE2ftSBk96bsvnvr6tdHfXTr K/OsQ6s9j5fg2+d4E1H5FilbbA79heZI81qgwLRSdzSJT/8GlRDKr2tGtH6Me7H+h9q0 dQVFhHn/fUF2KN+3NCZ3mXas47+bOkIMcsC+haQrR1M7r5u0jw/ngDLUZQww2Uqx5VJJ x8Dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:dkim-signature :arc-authentication-results; bh=vsmQkUmUwfUKPoIWBmr/rZiLN2Lt9BMZ50r95/xJTaU=; b=utmE3u2xlU07oCzY+6tkvBYs0Vm88JpBMRkGcbDSp46IRmaA5+xyw8Bw4jVTi15tw9 PbxLMErowIhmDyoI3Qv2yX1FBDZibL2a9E3JpT7woRRdhmuh2VZFQCQIfBt5Koq8kTNc wbx22u0nTA0eDVCv13drTxRKSdD9Dw5GdpYCEyvKiQtwMcFyxFqLadY9k5oZdUug7Vmg n06fHmZod0oh/WE6UM+b1DFrJ5EjglIKmTf4Lz4CImdoaqz1Nvq3bcdsp76tu/OZraWP lqlpaQtcvxamYrnRAcgjsuLLMbA6S+CGzNIGabmMrMM5x4DhQ5C7bx1l+ThwH7sdN2Sg ZgBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dtz2tRm1; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dtz2tRm1; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com X-Google-Smtp-Source: AH8x22750ec/ttsfliXlmDzhFyu7Pl1OjybYsIQCttspqaqZ26lM1rw25c1uQyj3WSxLTw9zppjuwg== Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> Date: Sun, 21 Jan 2018 15:46:24 -0800 Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Content-Transfer-Encoding: quoted-printable Message-Id: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> To: Joerg Roedel X-Mailer: Apple Mail (2.3273) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590247700753617574?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: I wanted to see whether segments protection can be a replacement for PTI (yes, excluding SMEP emulation), or whether speculative execution = =E2=80=9Cignores=E2=80=9D limit checks, similarly to the way paging protection is skipped. It does seem that segmentation provides sufficient protection from = Meltdown. The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the segment = limit is set to prevent access to the kernel memory. [ It passes if the limit is not = set, even if the DS is reloaded. ] My test is enclosed below. So my question: wouldn=E2=80=99t it be much more efficient to use = segmentation protection for x86-32, and allow users to choose whether they want = SMEP-like protection if needed (and then enable PTI)? [ There might be some corner cases in which setting a segment limit introduces a problem, for example when modify_ldt() is used to set = invalid limit, but I presume that these are relatively uncommon, can be detected = on runtime, and PTI can then be used as a fallback mechanism. ] Thanks, Nadav -- >8 -- Subject: [PATCH] Test segmentation protection --- libkdump/libkdump.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/libkdump/libkdump.c b/libkdump/libkdump.c index c590391..db5bac3 100644 --- a/libkdump/libkdump.c +++ b/libkdump/libkdump.c @@ -10,6 +10,9 @@ #include #include #include +#include +#include +#include =20 libkdump_config_t libkdump_auto_config =3D {0}; =20 @@ -500,6 +503,31 @@ int __attribute__((optimize("-Os"), noinline)) = libkdump_read_tsx() { return 0; } =20 +extern int modify_ldt(int, void*, unsigned long); + +void change_ds(void) +{ + int r; + struct user_desc desc =3D { + .entry_number =3D 1, + .base_addr =3D 0, +#ifdef NO_SEGMENTS + .limit =3D 0xffffeu, +#else + .limit =3D 0xbffffu, +#endif + .seg_32bit =3D 1, + .contents =3D 0, + .read_exec_only =3D 0, + .limit_in_pages =3D 1, + .seg_not_present =3D 0, + }; + + r =3D modify_ldt(1 /* write */, &desc, sizeof(desc)); + assert(r =3D=3D 0); + asm volatile ("mov %0, %%ds\n\t" : : "r"((1 << 3) | (1 << 2) | = 3)); +} + // = --------------------------------------------------------------------------= - int __attribute__((optimize("-Os"), noinline)) = libkdump_read_signal_handler() { size_t retries =3D config.retries + 1; @@ -507,6 +535,9 @@ int __attribute__((optimize("-Os"), noinline)) = libkdump_read_signal_handler() { =20 while (retries--) { if (!setjmp(buf)) { + /* longjmp reloads the original DS... */ + change_ds(); + MELTDOWN; } Nadav Amit wrote: > Please ignore my previous email. I got it working=E2=80=A6 Sorry for = the spam. >=20 >=20 > Nadav Amit wrote: >=20 >> I am looking on PTI on x86-32, but I did not mange to get the PoC to = work on >> this setup (kaslr disabled, similar setup works on 64-bit). >>=20 >> Did you use any PoC to =E2=80=9Ctest=E2=80=9D the protection? >>=20 >> Thanks, >> Nadav >>=20 >>=20 >> Joerg Roedel wrote: >>=20 >>> From: Joerg Roedel >>>=20 >>> Hi, >>>=20 >>> here is my current WIP code to enable PTI on x86-32. It is >>> still in a pretty early state, but it successfully boots my >>> KVM guest with PAE and with legacy paging. The existing PTI >>> code for x86-64 already prepares a lot of the stuff needed >>> for 32 bit too, thanks for that to all the people involved >>> in its development :) >>>=20 >>> The patches are split as follows: >>>=20 >>> - 1-3 contain the entry-code changes to enter and >>> exit the kernel via the sysenter trampoline stack. >>>=20 >>> - 4-7 are fixes to get the code compile on 32 bit >>> with CONFIG_PAGE_TABLE_ISOLATION=3Dy. >>>=20 >>> - 8-14 adapt the existing PTI code to work properly >>> on 32 bit and add the needed parts to 32 bit >>> page-table code. >>>=20 >>> - 15 switches PTI on by adding the CR3 switches to >>> kernel entry/exit. >>>=20 >>> - 16 enables the Kconfig for all of X86 >>>=20 >>> The code has not run on bare-metal yet, I'll test that in >>> the next days once I setup a 32 bit box again. I also havn't >>> tested Wine and DosEMU yet, so this might also be broken. >>>=20 >>> With that post I'd like to ask for all kinds of constructive >>> feedback on the approaches I have taken and of course the >>> many things I broke with it :) >>>=20 >>> One of the things that are surely broken is XEN_PV support. >>> I'd appreciate any help with testing and bugfixing on that >>> front. >>>=20 >>> So please review and let me know your thoughts. >>>=20 >>> Thanks, >>>=20 >>> Joerg >>>=20 >>> Joerg Roedel (16): >>> x86/entry/32: Rename TSS_sysenter_sp0 to TSS_sysenter_stack >>> x86/entry/32: Enter the kernel via trampoline stack >>> x86/entry/32: Leave the kernel via the trampoline stack >>> x86/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 >>> x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h >>> x86/mm/ldt: Reserve high address-space range for the LDT >>> x86/mm: Move two more functions from pgtable_64.h to pgtable.h >>> x86/pgtable/32: Allocate 8k page-tables when PTI is enabled >>> x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32 >>> x86/mm/pti: Populate valid user pud entries >>> x86/mm/pgtable: Move pti_set_user_pgd() to pgtable.h >>> x86/mm/pae: Populate the user page-table with user pgd's >>> x86/mm/pti: Add an overflow check to pti_clone_pmds() >>> x86/mm/legacy: Populate the user page-table with user pgd's >>> x86/entry/32: Switch between kernel and user cr3 on entry/exit >>> x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 >>>=20 >>> arch/x86/entry/entry_32.S | 170 = +++++++++++++++++++++++++++++--- >>> arch/x86/include/asm/pgtable-2level.h | 3 + >>> arch/x86/include/asm/pgtable-3level.h | 3 + >>> arch/x86/include/asm/pgtable.h | 88 +++++++++++++++++ >>> arch/x86/include/asm/pgtable_32_types.h | 5 +- >>> arch/x86/include/asm/pgtable_64.h | 85 ---------------- >>> arch/x86/include/asm/processor-flags.h | 8 +- >>> arch/x86/include/asm/switch_to.h | 6 +- >>> arch/x86/kernel/asm-offsets_32.c | 5 +- >>> arch/x86/kernel/cpu/common.c | 5 +- >>> arch/x86/kernel/head_32.S | 23 ++++- >>> arch/x86/kernel/process.c | 2 - >>> arch/x86/kernel/process_32.c | 6 ++ >>> arch/x86/mm/pgtable.c | 11 ++- >>> arch/x86/mm/pti.c | 34 ++++++- >>> security/Kconfig | 2 +- >>> 16 files changed, 333 insertions(+), 123 deletions(-) >>>=20 >>> --=20 >>> 2.13.6 >>>=20 >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x2277oMcO8yR16hC3kH2IlBZJx9uZCPkYkS6fT2Xa0A4XceebjP+hpv8XUV97GH8IXq4uJ3DB ARC-Seal: i=1; a=rsa-sha256; t=1516611390; cv=none; d=google.com; s=arc-20160816; b=oygVeQPPs+w5HH2WFQlAwGFdeG3uuJyuO81PjtUytUBo3bnDyuIhe/bv6msV3G7N7r Sh98IQCMwAtab+A6MNNPEnDNBE5Y/LOiwjMFdYZLOahT4RgIcEsDmjhQv/sXg8tm+zVV c2yxGoOnFoXonKuv78dKIH4c9Uk3bSkz1bnotcBW5vVEzj3oufkR9ccCPRwPTGmhiMlN +xP0kcdUX85HuZc79Zme/OC8cbYhOhQWNkPV/s1PvtJCYxETmpg1hoYR8nyhHdHOVCks 9ORPnqRDuiod/u/JiTtjTpK8s1cACdORYxUtWvRBkbhRKxMqcONa+6pC6cw7GXxGUzEm aynA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=rY4rW4H/yaBeGxA9r4Nd/7ZSY9kiLMPzXIKaw25XXKc=; b=H6M1atZbbNCW3dV1dDPTkPIuJ2WfcLBQ8GQhYqlNEvpZzhMN2UeqFXuZd4sFracdCl wSLWEumtlzjTtHLjAbPLSGB4uTr2Z2SHRj046s150Hta0WyZ1vvagRxS+Sf8x8W3tg89 2Y5SpQcoH2u3xAyhjqGXEnCpZRq9/blg2j3SOlI+Ec2LJt/TYaHnJw/GNVYUcy4ZoWJF DTk1dFQ0GqOASJFLgWjTK+ofRuovtJbicDzN7eWaZfhrfUXHJExP62sMeTmBjswbXxqF EPaLwbyjL4aNrqFO5UMI49UQAKtoXo8qCyXGA8+4c16OLQOMOFfCY+8uQ4RZOKt5uYQj QEuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=R6RaU3VK; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=R6RaU3VK; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Date: Mon, 22 Jan 2018 09:56:25 +0100 From: Joerg Roedel To: Nadav Amit Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180122085625.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590282303497626901?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Hey Nadav, On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > It does seem that segmentation provides sufficient protection from Meltdown. Thanks for testing this, if this turns out to be true for all affected uarchs it would be a great and better way of protection than enabling PTI. But I'd like an official statement from Intel on that one, as their recommended fix is still to use PTI. And as you said, if it turns out that this works only on some Intel uarchs, we can also detect it at runtime and then chose the fasted meltdown protection mechanism. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x226PWrVmZllDUnhfzp7eDxTT2oCO5R36theqpggrx/7bzDX9OwZOi2ZbYF0PD1BqFVfF8sSq ARC-Seal: i=1; a=rsa-sha256; t=1516900256; cv=none; d=google.com; s=arc-20160816; b=rJwsD9GzcoqFK394+XZQBzv9jb3kW7cnFNjK6L1aF7GujYAaQCtE1/wJC0AYwfheGl 4idnPML5LH0UQPJkbPG+i3vX4t7iBTmSsH34y7pR8Fd0Bkygb5XCHzTrpqDQeu+CKo1k mcQTIq/+B10JNYeZPZhS7NUMeAiSDUE2L7F4h07vq8o9RH3AO6Sbe9Y0vW6OdgVMksQ4 6ZPWE3G4OHFJ9+9i5/QCMXDfI9YyFIh8Q0p56wvivCP57j+DrRlwJdzk6BmcHEGAJEXG 0g1bYH1pjvJnXU+8OfND0+1cBQT+m+X8258wX3D2CMFi/iH6vvY0i2cIoagy9gi83pdH wp+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=7vxLcRMmIKJkPYiFdvTnDBSC0q2JGZq3tAKUeEyxfzM=; b=BikSIpUKGxQGQoNdII1Kg+53NPpsxwxIrktXaWF+XUWfDNPVeOZOE33eiZymKPt35/ BXyT13LhwoBADKqX1VAaaoMpn8e1ntfzF8jHiyRP1l6KmrKUfMTSPPTKccnOLGs0S2pM tPkASh2b2YOo3dE9ujR6K8p9n2gGBTv6eTYOFgx+I9SdaoZRFnodJAz1zzZyHbVbDi0J d2J0PlIa/oQrw1dKUcN70kmyH8XI6acajmPl2N3lG9tA4Yb3S2dmBMgX0pRu104Fdw2J 9k9sy75oDnIZN4qcta4oIUrGgUAsLNrHWWCbZinYx2WPtjzDVkMjKHp2criPKDdQ//FK pBBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gnomes@lxorguk.ukuu.org.uk designates 82.70.14.225 as permitted sender) smtp.mailfrom=gnomes@lxorguk.ukuu.org.uk Authentication-Results: mx.google.com; spf=pass (google.com: domain of gnomes@lxorguk.ukuu.org.uk designates 82.70.14.225 as permitted sender) smtp.mailfrom=gnomes@lxorguk.ukuu.org.uk Date: Thu, 25 Jan 2018 17:09:25 +0000 From: Alan Cox To: Joerg Roedel Cc: Nadav Amit , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180125170925.1d72d587@alans-desktop> In-Reply-To: <20180122085625.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <20180122085625.GE28161@8bytes.org> Organization: Intel Corporation X-Mailer: Claws Mail 3.15.1-dirty (GTK+ 2.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590585202766481367?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Mon, 22 Jan 2018 09:56:25 +0100 Joerg Roedel wrote: > Hey Nadav, > > On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > > It does seem that segmentation provides sufficient protection from Meltdown. > > Thanks for testing this, if this turns out to be true for all affected > uarchs it would be a great and better way of protection than enabling > PTI. > > But I'd like an official statement from Intel on that one, as their > recommended fix is still to use PTI. It is: we don't think segmentation works on all processors as a defence. Alan From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x2250OvpnPuixz2vXu+EpdDnkOlVmpLUey6nFkbxbfWGZwtTTu9j2dXSvk02tLuO7Vr5RqD2y ARC-Seal: i=1; a=rsa-sha256; t=1516719480; cv=none; d=google.com; s=arc-20160816; b=jxKI0alQeo1tQrIT5CWpzDaup3+cuylmlX89a6Z2EV4wK8MPCmQILqzNUtMHzq+nST Gyth38w7xDoLuXiLmok/iBJ4/1+ogayKCnlG8emsIoeBVV7r7Agyi8gvkqUxu64IcMLu HhhaSst4/6FX8K6ZI8XwQuXIeE38EZP4/4ii0W5kYA/pfbA+vQLOQSE+OSKqAuxe2cQA DtjSB+jiQyUdttqsik0xU1nRVPi8hEaG8SAmxd+Dq60iSlLTSZplCjMaHoO6NhBcNVhG h7NvrBsjsblFDcDekXL4jzGqzqo540VdpJcIIsiSjOypfDUM0QHgu1NoAKdp+df+6nNm xi5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=LhaFw1qDqOCaDhvdJJZqLnldLcAFP/mRmwOQbE6b/0o=; b=EhghTnb3gf86twwMc/+u0pPsTVLOUZ6XEfAVdk6T3qkil8lYGIRThI2DIvfCT7BCcR hltmYDnwqfBCW+v2qTkFB/YYci+sv3Lkf28t6GukexK5yoGMItboFhXmmeRk2dpBXZXg Kol7IN3dWd3rNNb5ZjlflyhdbIlnZMFMgv8sjEIJztieDBh6d7OsH6E3yilBfykSf9iC xwBupBcARv/AZ8FxOmHZ28Uc35PBm3A7agMkdlRAmSJlzPhvO0qJP+JELLaoxNtVFBQt fJY6EFlgyk8kutlYqeLihpPMiEapb3+1zPLlkEhHSroQK6jm3HDePU0/Y2LSBtJXqQVa lWCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gnomes@lxorguk.ukuu.org.uk designates 82.70.14.225 as permitted sender) smtp.mailfrom=gnomes@lxorguk.ukuu.org.uk Authentication-Results: mx.google.com; spf=pass (google.com: domain of gnomes@lxorguk.ukuu.org.uk designates 82.70.14.225 as permitted sender) smtp.mailfrom=gnomes@lxorguk.ukuu.org.uk Date: Tue, 23 Jan 2018 14:57:17 +0000 From: Alan Cox To: Joerg Roedel Cc: Nadav Amit , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180123145717.75c84e9a@alans-desktop> In-Reply-To: <20180122085625.GE28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <20180122085625.GE28161@8bytes.org> Organization: Intel Corporation X-Mailer: Claws Mail 3.15.1-dirty (GTK+ 2.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590395645911121718?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Mon, 22 Jan 2018 09:56:25 +0100 Joerg Roedel wrote: > Hey Nadav, > > On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > > It does seem that segmentation provides sufficient protection from Meltdown. > > Thanks for testing this, if this turns out to be true for all affected > uarchs it would be a great and better way of protection than enabling > PTI. > > But I'd like an official statement from Intel on that one, as their > recommended fix is still to use PTI. > > And as you said, if it turns out that this works only on some Intel > uarchs, we can also detect it at runtime and then chose the fasted > meltdown protection mechanism. I'll follow this up and get an official statement. Alan From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516587068; cv=none; d=google.com; s=arc-20160816; b=maHvQoMG6XoY3XOKWmm/atJIjn+/CzWHYcwVUPiBEWquNfLwDJvnhTLtbOhPG28rXc MUhGiYoB+o8ms3CrIR8q8Wr4gmIq9G017g98WmBrZeMbj318oiR32wCf92yGsxpXZToN YzvbAJqACMihznDsu1BxzsFsEFUclOa3/knajvRnwrd+BxJd6NnVLKOVoILY7vBrFxeO y173ZV32/h0pJHErhVB6mVk2atdT5uTXcx6TKRhPOMYoHtaVHxL/66hC1jvG8uNXFF6C VWRhbIoKlwoy4lbGbzUYlgBDiVCWWg1uB5anNvCXEUkDjG8oGqaG1djeS5BSdZJYfvru sTtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:sender:mime-version:dkim-signature :arc-authentication-results; bh=NoC+pqptOHt//GwClQlYa0XcRaQhL24zlneaW2s2kZg=; b=yUZUaXirHJ4GfCWyttDPyC4nc1ZftRz5ppSW/DgwoxLstQ448ykppC6XJps6+1JvSj zLu5pTUoh0Os/Zp7Z+SKcqYfy1lwX6O9DQIRCphimw6B8YWdgJ76Qlc/tKPnRF4UBXku By3Z2LLqU+gBVMpAd/MFoYLidGpzGsBkKwbYMVjqNIljIZDESIHXxtOMH/Boa83gIUdq J3aFjF4BxX8IamXacF3j3JyeqcAm7il8W/fxkf0xfAM55A/lxOsA8uJvvMaMrfDs+dnQ lRq63ppWaUiI5H8zpV6DRpXJJGIhF46UXq5mpTiA4c5aTJN9VsP/EmJCeSsrCM1Rsl2v r7og== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lPNDXAuW; spf=pass (google.com: domain of linus971@gmail.com designates 209.85.220.41 as permitted sender) smtp.mailfrom=linus971@gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lPNDXAuW; spf=pass (google.com: domain of linus971@gmail.com designates 209.85.220.41 as permitted sender) smtp.mailfrom=linus971@gmail.com X-Google-Smtp-Source: AH8x225cbegRyuWR5N7H0G+owXrRVtgokBqg75WIWWO0c07/Klq8ikDATUZPIMM3MIWQwsoH4DSKmuSeflYwqr9mk4I= MIME-Version: 1.0 Sender: linus971@gmail.com In-Reply-To: <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> From: Linus Torvalds Date: Sun, 21 Jan 2018 18:11:07 -0800 X-Google-Sender-Auth: cYzZpOzd7AJu-XIr1LsHw8mGR9o Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Nadav Amit Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , "the arch/x86 maintainers" , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590256802011013772?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Sun, Jan 21, 2018 at 3:46 PM, Nadav Amit wrote: > I wanted to see whether segments protection can be a replacement for PTI > (yes, excluding SMEP emulation), or whether speculative execution =E2=80= =9Cignores=E2=80=9D > limit checks, similarly to the way paging protection is skipped. > > It does seem that segmentation provides sufficient protection from Meltdo= wn. > The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the segment = limit is set to > prevent access to the kernel memory. [ It passes if the limit is not set, > even if the DS is reloaded. ] My test is enclosed below. Interesting. It might not be entirely reliable for all microarchitectures, though. > So my question: wouldn=E2=80=99t it be much more efficient to use segment= ation > protection for x86-32, and allow users to choose whether they want SMEP-l= ike > protection if needed (and then enable PTI)? That's what we did long long ago, with user space segments actually using the limit (in fact, if you go back far enough, the kernel even used the base). You'd have to make sure that the LDT loading etc do not allow CPL3 segments with base+limit past TASK_SIZE, so that people can't generate their own. And the TLS segments also need to be limited (and remember, the limit has to be TASK_SIZE-base, not just TASK_SIZE). And we should check with Intel that segment limit checking really is guaranteed to be done before any access. Too bad x86-64 got rid of the segments ;) Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x226D2j3hYxwsKKb8fSTbjhFOThThM1D8Z1xjiLTNwxJuUgvJE5kUjWe53s70pJ4mcRAV2vQD ARC-Seal: i=1; a=rsa-sha256; t=1516588690; cv=none; d=google.com; s=arc-20160816; b=iGW+KzTQTt6xQwc53uKvhJfdmCjPh7HJkK7em3lNnZinGcJTXAEhWvNxj0K/JlX+B3 JEOU2KyUUGReg9eiEY5EPzB7Gxf8C9kxZpmJeqSZuHEbuCbPrNfrrkIihvsWTPsqMFjl ENWkn0Mr9PUpO48ct4JAf3dGyKiAFcovsW7NQgK/Zb+tbgcW9asPBFvjYZucGwgqZGum +ITgDQvXQH3LaH327Adh+HT+6pWpjaAOL3qbKaczh4CVrpIW5+4x0IvV8cBalKUG44d2 kyuuBDm5DbxE0LElY/bGqovVnf97x0gMIe2pMiLXoxPokaYaIieWKfOQ2/muy/ruNCE4 S83A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:from:cc:to:subject:content-transfer-encoding :mime-version:references:in-reply-to:user-agent:date :arc-authentication-results; bh=pT65PnBn277v2GsD0OvObkwIXhr0U/67FUrDPbjXT+Q=; b=ahy3w7gb8+ME6FNFS0LPWnPg+4Z1hgpIpa3Wkx88RwgyybmAAfvmUfxU7blw1EpwRE v/oygi2Un/B9ESb3pqL8abTtZ4iPDUFJ6FiiX4o7npk9AneXOxgKe3dzEQD0T6cMIsBn AcSKpIXIoHj5ai5rwTO9BOCfMvEVCBie6FYbV+B2ofiaHc8LZUSNcL1Jo+39gY+cWVVT on3cizTsHM/SDvN26cW43XgiSZJ6BxsqQoWA692CdNryq60ltmWPM5cAMvcp8Dc2WnrA 9STaL3ObI2qwv457xhLdA83aLH7tkCz3iR3TnIL6DmYEQ+L6Z3JKBahtGWjNlgj8c6LL 7/Ew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of hpa@zytor.com designates 65.50.211.136 as permitted sender) smtp.mailfrom=hpa@zytor.com Authentication-Results: mx.google.com; spf=pass (google.com: domain of hpa@zytor.com designates 65.50.211.136 as permitted sender) smtp.mailfrom=hpa@zytor.com Date: Sun, 21 Jan 2018 18:20:11 -0800 User-Agent: K-9 Mail for Android In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Linus Torvalds , Nadav Amit CC: Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel From: hpa@zytor.com Message-ID: <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590258502966045554?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On January 21, 2018 6:11:07 PM PST, Linus Torvalds wrote: >On Sun, Jan 21, 2018 at 3:46 PM, Nadav Amit >wrote: >> I wanted to see whether segments protection can be a replacement for >PTI >> (yes, excluding SMEP emulation), or whether speculative execution >=E2=80=9Cignores=E2=80=9D >> limit checks, similarly to the way paging protection is skipped=2E >> >> It does seem that segmentation provides sufficient protection from >Meltdown=2E >> The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the segmen= t limit is set >to >> prevent access to the kernel memory=2E [ It passes if the limit is not >set, >> even if the DS is reloaded=2E ] My test is enclosed below=2E > >Interesting=2E It might not be entirely reliable for all >microarchitectures, though=2E > >> So my question: wouldn=E2=80=99t it be much more efficient to use >segmentation >> protection for x86-32, and allow users to choose whether they want >SMEP-like >> protection if needed (and then enable PTI)? > >That's what we did long long ago, with user space segments actually >using the limit (in fact, if you go back far enough, the kernel even >used the base)=2E > >You'd have to make sure that the LDT loading etc do not allow CPL3 >segments with base+limit past TASK_SIZE, so that people can't generate >their own=2E And the TLS segments also need to be limited (and >remember, the limit has to be TASK_SIZE-base, not just TASK_SIZE)=2E > >And we should check with Intel that segment limit checking really is >guaranteed to be done before any access=2E > >Too bad x86-64 got rid of the segments ;) > > Linus No idea about Intel, but at least on Transmeta CPUs the limit check was as= ynchronous with the access=2E --=20 Sent from my Android device with K-9 Mail=2E Please excuse my brevity=2E From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516652045; cv=none; d=google.com; s=arc-20160816; b=VUuCmU8OHqOP7xQQcmC8jqVvS4gLaB65eOo5ul/rVclMgd+pc4OM5KLwWdkleJUXig 7QIMaibiL1Mi7LwYnqRoa1nHpFFwpvqVvMLIAsskvyxxkiKXo7iipxHflnOyzoUM5fL/ umv8XvjmTv/K1SyatfY0cj7P0DmdL4b+jb4zucQuNo3c2uT0RdBidj5n6vII2y86Pj/5 yoCm8rWjpZeEQmCIUNzuURj6htZY8I0Se0ZwJO5SHcQJR+2Pt1FWaY5g/BQyFy4QLKXK ad9Wropg/Bv4q1/gx52MZvRdVCnSaYlhr6ymoYUzNyJIr8hqF2UkYyfHfHKUgoyoUAzJ YGmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to:sender :mime-version:dkim-signature:arc-authentication-results; bh=yQLpyDWhLWyq+GcXWb3KapwqtjEkCWcau05LAPzbSY8=; b=sLfsG6xe0Uhi9jR9dOezNqrO/KqCfeD/5wXrYqZW8zsTjXuhK1NgwyFcqjl07BeoS/ rh621uqnTlBg+SFe//dHYqROV52iKG4BVvp9m6rTzQMdbxsbWsrPXJZez6XYgrl2PzBq ND4YIevIw/co7vh//NTBQNXCaPjvoDVC/VurYt21GEpo4Cms882GHl/w9juQGSPLU2q0 DD7Ckb6RS6sGpcgPgSHbJQiXDgYyIZLLl/kJ2SNBh+niB5EGZMtJD1b7qzCsxEuncq8s htt83WrbdvHEotgh8dsEld83e0p1u4AWnL1X/FFT9EhTgFI4/IBJdimaCrsjcz2qne8I PfUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MKJgFJsv; spf=pass (google.com: domain of linus971@gmail.com designates 209.85.220.41 as permitted sender) smtp.mailfrom=linus971@gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MKJgFJsv; spf=pass (google.com: domain of linus971@gmail.com designates 209.85.220.41 as permitted sender) smtp.mailfrom=linus971@gmail.com X-Google-Smtp-Source: AH8x224mtjeZKpCqKR32et12l7nbWQWLPDs4zY0qCyN+VhCakHkGoJQs6nYta3Ys7lUJohTIETySaJEr30FwcJsJli8= MIME-Version: 1.0 Sender: linus971@gmail.com In-Reply-To: <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> From: Linus Torvalds Date: Mon, 22 Jan 2018 12:14:03 -0800 X-Google-Sender-Auth: MOBSCOCcw7ddaSx1mBn3DSNhhCk Message-ID: Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Peter Anvin Cc: Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , "the arch/x86 maintainers" , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590324934837725187?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Sun, Jan 21, 2018 at 6:20 PM, wrote: > > No idea about Intel, but at least on Transmeta CPUs the limit check was asynchronous with the access. Yes, but TMTA had a really odd uarch and didn't check segment limits natively. When you do it in hardware. the limit check is actually fairly natural to do early rather than late (since it acts on the linear address _before_ base add and TLB lookup). So it's not like it can't be done late, but there are reasons why a traditional microarchitecture might always end up doing the limit check early and so segmentation might be a good defense against meltdown on 32-bit Intel. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x224uCgH93wAwNfXHG3CJ72ytQa3jbPbee/ITjOkw0PmE53T35Qvbn51V45ULY6/09ZcvA0BP ARC-Seal: i=1; a=rsa-sha256; t=1516656453; cv=none; d=google.com; s=arc-20160816; b=QCcY7j7JV5JfMVf4z/vrYWpG0YXX4f0j6VQjJVFd7ZB/mhwPRM9TmaFs2t5Mks7Dg8 F1a1z/VpwVjJEzOOqaZNU+eJzdqFV0KuOFzEsz70Ly5Au4WXULdAQdLaoipPrHe7k7m+ GaRGkNXHktG4wAHEmnp9uEYfBKjlpbGIN0QY69NzM6rLd0AKLPGTsCwmX6Blp5sJBNYy WyOEIZEY6+UoJBMcpNV4Hi30CDUMs12wCM7aO5GETeEBP6t6hTkxnBHr7Oo+UkoHLBtG 2Zf10BOppBf8bYXJJWuUQhLdnX3YovhHYbS5Qrm3llKUgbfIMOxox8pvLn2s10x5ajyA jBNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :arc-authentication-results; bh=bCHiYnTXy31KZBLKyJLs2VwoeM66e7Q/EGoBgNo9ZH4=; b=vXeo8corl0yEuOtt9W4kawnwPlGgqTxs0PdgbnIIskMev/Ip9FjcEQ8NEF0pAIwP1D JdoiWos3fhhrBFzJ1AobjbQESfTuf9AOZhwzcwfW6M7lJ+o/Y2Wnyy+hBQ0W13c6Nok3 5/jaztpLYjAZm2NPYrB5HjSqsySfO3I6UZtUdhizPKYJxooOzxkSqqjLiAJYMWW7bv9c XBUlhW13SnpQksPV5/MT3J8cLAuoA9M9ahkCOdHPH2CjZ11QhM1a8LrHjZXMUhPKBXGR tzq823PMzrlClilZ4qH6E47Ps5mfx3S+44yWL9JKVVopyB5Om4TwS9i3auqv8TbU1cym QSzw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of hpa@zytor.com designates 65.50.211.136 as permitted sender) smtp.mailfrom=hpa@zytor.com Authentication-Results: mx.google.com; spf=pass (google.com: domain of hpa@zytor.com designates 65.50.211.136 as permitted sender) smtp.mailfrom=hpa@zytor.com Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Linus Torvalds Cc: Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> From: "H. Peter Anvin" Message-ID: Date: Mon, 22 Jan 2018 13:10:19 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590329558010384616?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On 01/22/18 12:14, Linus Torvalds wrote: > On Sun, Jan 21, 2018 at 6:20 PM, wrote: >> >> No idea about Intel, but at least on Transmeta CPUs the limit check was asynchronous with the access. > > Yes, but TMTA had a really odd uarch and didn't check segment limits natively. > Only on TM3000 ("Wilma") and TM5000 ("Fred"), not on TM8000 ("Astro"). Astro might in fact have been more synchronous than most modern machines (see below.) > When you do it in hardware. the limit check is actually fairly natural > to do early rather than late (since it acts on the linear address > _before_ base add and TLB lookup). > > So it's not like it can't be done late, but there are reasons why a > traditional microarchitecture might always end up doing the limit > check early and so segmentation might be a good defense against > meltdown on 32-bit Intel. I will try to investigate, but as you can imagine the amount of bandwidth I might be able to get on this is definitely going to be limited. All of the below is generic discussion that almost certainly can be found in some form in Hennesey & Patterson, and so I don't have to worry about giving away Intel secrets: It isn't really true that it is natural to check this early. One of the most fundamental frequency limiters in a modern CPU architecture (meaning anything from the last 20 years or so) has been the data-dependent AGU-D$-AGU loop. Note that this doesn't even include the TLB: the TLB is looked up in parallel with the D$, and if the result was *either* a cache-TLB mismatch or a TLB miss the result is prevented from committing. In the case of the x86, the AGU receives up to three sources plus the segment base, and if possible given the target process and gates available might be designed to have a unified 4-input adder, with the 3-input case for limit checks being done separately. Misses and even more so exceptions (which are far less frequent than misses) are demoted to a slower where the goal is to prevent commit rather than trying to race to be in the data path. So although it is natural to *issue* the load and the limit check at the same time, the limit check is still going to be deferred. Whether or not it is permitted to be fully asynchronous with the load is probably a tradeoff of timing requirements vs complexity. At least theoretically one could imagine a machine which would take the trap after the speculative machine had already chased the pointer loop several levels down; this would most likely mean separate uops to allow for the existing out-of-order machine to do the bookkeeping. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x225MZ5pGElfj7gZE6gHwEUHpZlOmvdDn1h5AfKA7xVyObET7iR7DVpMdukeQyYuUIGYFIypJ ARC-Seal: i=1; a=rsa-sha256; t=1516718364; cv=none; d=google.com; s=arc-20160816; b=cgFUDxalVBck5uVlmSQvYRiynApXUflLSiq95BmCiSxTaOlLIrisFOLxWN/oAqiWPB gHQhnK8NLWN3N25FnWkJ4KHn10mYVpFWUNpjtBf/a60YwqmEDyG198QTGPy8AspatVk6 pyBVhtBZ5pYFC8fYQBROi/RX1sQfRhdOkBJGwV4i5/ix6RuVRveOEvDcmMWLLyWuKU5T 5gx2YePlac19P/SaKqG8cGtqk4E6BWvyQLe3Okcq9W4xjvjZYEhJrtR3Ka3QlDLYulvT PtpoBazhoyTO4axK1UGDmpndDnWcWK2eo58L5zOeFpN1rrBxurEt3n7Ww8j9xzdhh8dH N8lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=BaQ4YMQijeeGMHlpXWq0m/olIH6slPO9tKqeB+Nme7s=; b=fXbkEU+1d8TtoccqPR5OP8JhbFe8uiVzTVTCmXaY3YPi3KdkOpPoaCQklXKF9zY3NH jMyr+/bBdh6CAtM2FWv48Fd7Zr/pxsx+sPLhcfnd0cql0DHhxK7AypmgKpqOjD1+g5fG hecAqK6mVsa3VVfGZU1LAOPnRF6IR5elL08xwxcYGxRGIL2G9Q2m5NaLXULpkYMfLG4P Hro/pHDRFJevpZQI0Gk/KrT3bw9U2JuaKoFvsK2lNfw/IrmOJ6caoIU0twyVV3ii13bB wRLbboq9eqjQ47qPHtw3+Y2Q6ox4iwHthICvC5mJk4n/7WZ86PkjtSMrE+9uNZy87r57 zT/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gnomes@lxorguk.ukuu.org.uk designates 82.70.14.225 as permitted sender) smtp.mailfrom=gnomes@lxorguk.ukuu.org.uk Authentication-Results: mx.google.com; spf=pass (google.com: domain of gnomes@lxorguk.ukuu.org.uk designates 82.70.14.225 as permitted sender) smtp.mailfrom=gnomes@lxorguk.ukuu.org.uk Date: Tue, 23 Jan 2018 14:38:31 +0000 From: Alan Cox To: "H. Peter Anvin" Cc: Linus Torvalds , Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180123143831.2d769f9d@alans-desktop> In-Reply-To: References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> Organization: Intel Corporation X-Mailer: Claws Mail 3.15.1-dirty (GTK+ 2.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590394475775720007?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: > of timing requirements vs complexity. At least theoretically one could > imagine a machine which would take the trap after the speculative > machine had already chased the pointer loop several levels down; this > would most likely mean separate uops to allow for the existing > out-of-order machine to do the bookkeeping. It's not quite the same but in the IA-64 case you can write itanium code that does exactly that. The speculation is expressed in software not hardware (because you can trigger a load, then check later if it worked out and respond appripriately). Alan From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: ARC-Seal: i=1; a=rsa-sha256; t=1516588087; cv=none; d=google.com; s=arc-20160816; b=Vs+BPGv22J4lFpjzpSqWd3aPWfaH2yt/CWb1z2ADpzzQoKcBwdEYUZ5g9p+349oigh p4SGjiuymRT9QgJOmXK2Ofp/jn3m4VWRfT1HbA6Go2NMSX65zE7ED7W9/DoFEc37zjmT hdSUXIn8sk/6IqO86UzL6xykRed2shbSZEyj1p92rV0Q5TVJBbcm2nBWPCH6y8S06hHT UQkYJAuVlA6mAA5CorK8bhfBNSUeSQKUExlZKbbo0Vye4g4GPUyqqFmnbQrAklyxgFVn Es6dMejttxJqNsE6Pb/wcMNM1ya2STwM8aC4SsR8ETGhzKIIri/xnTptmJwVsokP4+Z6 +dSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:dkim-signature :arc-authentication-results; bh=OLl64QM8LbaeGqjnje8L5GVJXImcqDoHhXL5etf7jY4=; b=sDSpdRewQNXtyAN4+Ms/lBYWKoNTZTQUYVy0Jon2R7X8/AJbWiwih856cqw8TE/lkA LByon/574yaBZ06CbX6F7LVtQpiXEEVe4irXEFbg0b3eeBa/y6a+ws/GDbeSA7lbnRg0 K4Ma+D+mhGztq3oe4ufdGCfFyZD87XcBrpMBP0bwxadTjEQz5/P/2srIf2E8Imzu67Uo OCMOiLS/qRSbpC9RZXzGWd5Kkgx6mK4G+wWQsFaXAjj8QVAY9CDS0dpSp3kApgY6e5h5 7l76CDnOqFsDJJ1E/Qk2htPWmSmlfFJmPy7QrxLKzOWNGUHHGb9f7wbs8NtK5ZPKhns9 7Uog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZDCsAxaK; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.41 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZDCsAxaK; spf=pass (google.com: domain of nadav.amit@gmail.com designates 209.85.220.41 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com X-Google-Smtp-Source: AH8x225G0CEskJvnAZo3Au4ASimRBSM2HRfw+4HERS0pkur4eDBOe28CC5swxLnfMkbr9w91Hz0zhQ== Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 From: Nadav Amit In-Reply-To: Date: Sun, 21 Jan 2018 18:27:59 -0800 Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Transfer-Encoding: quoted-printable Message-Id: <8B8147E4-0560-456D-BA23-F0037C80C945@gmail.com> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> To: Linus Torvalds X-Mailer: Apple Mail (2.3273) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590257870279572638?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Linus Torvalds wrote: > On Sun, Jan 21, 2018 at 3:46 PM, Nadav Amit = wrote: >> I wanted to see whether segments protection can be a replacement for = PTI >> (yes, excluding SMEP emulation), or whether speculative execution = =E2=80=9Cignores=E2=80=9D >> limit checks, similarly to the way paging protection is skipped. >>=20 >> It does seem that segmentation provides sufficient protection from = Meltdown. >> The =E2=80=9Creliability=E2=80=9D test of Gratz PoC fails if the = segment limit is set to >> prevent access to the kernel memory. [ It passes if the limit is not = set, >> even if the DS is reloaded. ] My test is enclosed below. >=20 > Interesting. It might not be entirely reliable for all > microarchitectures, though. >=20 >> So my question: wouldn=E2=80=99t it be much more efficient to use = segmentation >> protection for x86-32, and allow users to choose whether they want = SMEP-like >> protection if needed (and then enable PTI)? >=20 > That's what we did long long ago, with user space segments actually > using the limit (in fact, if you go back far enough, the kernel even > used the base). >=20 > You'd have to make sure that the LDT loading etc do not allow CPL3 > segments with base+limit past TASK_SIZE, so that people can't generate > their own. And the TLS segments also need to be limited (and > remember, the limit has to be TASK_SIZE-base, not just TASK_SIZE). >=20 > And we should check with Intel that segment limit checking really is > guaranteed to be done before any access. Thanks. I=E2=80=99ll try to check with Intel liaison people of VMware = (my employer), yet any feedback will be appreciated. > Too bad x86-64 got rid of the segments ;) Actually, as I noted in a different thread, running 32-bit binaries on x86_64 in legacy-mode, without PTI, performs considerably better than = x86_64 binaries with PTI for workloads that are hit the most (e.g., Redis). By dynamically removing the 64-bit user-CS from the GDT, this mode should = be safe, as long as CS load is not done speculatively. Regards, Nadav= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x227fTNaQKwnF7VuBpZ5n3ala9iJmy96nlOeGIRwNaieO4Hr18Az7yOsKQHcV5cTpy7xqfnlg ARC-Seal: i=1; a=rsa-sha256; t=1516615450; cv=none; d=google.com; s=arc-20160816; b=zPOJS0fkLHpqXClqwQ4YdzDTm1p5mZQEuJt805dR0obvPadt+2VP9DAfirx2XAy4Xw uste3C9JTVcOLDra4/2Qo5jupbkZ+iq7cHlqYXm7efKyBH2NDpUq0paw7Axs0WPNmMS8 WywIrUZkuyMdj6OkVIY3/vGDAi8rTS66i8UA7X8eTuzMekUzeOzEqe/e61LcZ4msEIfE wYtcGHLY8BaAyfLXxZ9oZaHgMLwQADdIB/pq40opJlNzx9l8d1ghMSX89mLyFZL4ZRPf pJCQjSM3tPsZP+tkN5DPUZ/QW8pNGeFA74rLJh7gE70vPmcHARK0GtLR7fevjp0nsDZj dTKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=qSdWsCcNH1vhHEBevhm0tQB5AqbhDl7jMR6G7V6yTiQ=; b=FqUixgHOhre+n2mW1AGLLUHSX628Z+7grp5K5R12dHUL3bSzFZOs6+wEr44GS2v38Z dVS84fDeou8312PVYnnQW1oVuC/ivEB0jLjaLeBWWm3YkBbHWUEKrzdujLEAaRhvAZGc zJJeDt+A+IVU0ggwA52ntHTK+9jL6PAWA9fIj2SBq7A/pCJHfTE5UOU7NDUaLHW044or GuzSRP4qG/KltvGC/gWONxg2yXZdeG4qtryda3c89W2/bv6dWzb/ji9Ncm8EBM6kE3Z5 zFEfwGyf+OaK1wVOIBmpuSXxTpCN56ZUKbQP6Wvurall5ckK76tdJBXNld3vaqmLKWHK wtaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=C+MpY9Ol; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=C+MpY9Ol; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Date: Mon, 22 Jan 2018 11:04:09 +0100 From: Joerg Roedel To: David Laight Cc: 'Nadav Amit' , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "aliguori@amazon.com" , "daniel.gruss@iaik.tugraz.at" , "hughd@google.com" , "keescook@google.com" , Andrea Arcangeli , Waiman Long , "jroedel@suse.de" Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180122100409.GF28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <7f37ff1c10b04b2386c2044cdc8e38be@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7f37ff1c10b04b2386c2044cdc8e38be@AcuMS.aculab.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590286562691440649?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Mon, Jan 22, 2018 at 09:55:31AM +0000, David Laight wrote: > That's made me remember something about segment limits applying in 64bit mode. > I really can't remember the details at all. > I'm sure it had something to do with one of the VM implementations restricting > memory accesses. Some AMD chips have long-mode segment limits, not sure if Intel has them too. But they are useless here because the limit is 32 bit and can only protect the upper 4GB of virtual address space. The limits also don't apply to GS and CS segements. Regards, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBovHx6LgGWm2h+k1rT3dLvfs/u4nrlYps7QCSHh1z5X9aeOMcO2vtmB9dA1D4pNn0XfVXrlZ ARC-Seal: i=1; a=rsa-sha256; t=1516212646; cv=none; d=google.com; s=arc-20160816; b=IlK6g1uTfezzlDRWyW3qlxBM8xMRsGYaM96RHgDlaLYZ1hys6mS9AoGY4Elybk1Mf2 FSsekf9UI+Wioc5IHNS2mSmY7uPBAkvtJBJQc5jmrVeur7OI7UBxUD+9ojtsYd5VhvWl wjmJPZRBMjI/OZM9u+UD295bGa+9dvH7vUv6heTsdCq8YqZKJYXHMQ94biKk5sOJD+ik uBihxpZBHDHXMh6ij7oJzBZUsrIjQHBowbk5RlckUq6JArEhk4o8j8NlSKTt4xHdyvto RKjqq7ElIQ2MQh2JlRnx/t922+DnNunUUbVGVr+4iDlJGmJEkATH3YABshuxny8XPJDq szVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=tMuqsydGnNHCKMPMsIqBzburtASZY1ZV3krATRgw47A=; b=PdP0gTc+pPMHOk+1vdrhkQBHYQrzRjojG7KX2ZUtGvFRAAVpDQSGMxtrqKvhdwJP8J YYgh2Vg8OTvoKfS0/0BZCu/uzAfQW9pY4ErseCxkmojk+HT/1GX5dFaAVIftHsDGNl/y XIbDAiBdhAmKdGSjIsNxzrVhlJZsd1povpaelbG2tkZesEYSvypKs1Ucs3hZ/wJAuP4i MoSCehGDQRvWqRJUlEit/MDRSLGrbgXjccsQv8aQ0mTtXDfvCulXFl4OQ1pPGLDu1SBj 1GK8jmVBSlwThSaoiO7tQB7+5OTdWjvKXbcanvQ3ay8vWj4YAc3BYE9ZELgg3ACxp5JU majA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9B838217A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20180117091853.GI28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 10:10:23 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841589420443?= X-GMAIL-MSGID: =?utf-8?q?1589864191771255588?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > On Tue, Jan 16, 2018 at 02:45:27PM -0800, Andy Lutomirski wrote: >> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: >> > +.macro SWITCH_TO_KERNEL_STACK nr_regs=0 check_user=0 >> >> How about marking nr_regs with :req to force everyone to be explicit? > > Yeah, that's more readable, I'll change it. > >> > + /* >> > + * TSS_sysenter_stack is the offset from the bottom of the >> > + * entry-stack >> > + */ >> > + movl TSS_sysenter_stack + ((\nr_regs + 1) * 4)(%esp), %esp >> >> This is incomprehensible. You're adding what appears to be the offset >> of sysenter_stack within the TSS to something based on esp and >> dereferencing that to get the new esp. That't not actually what >> you're doing, but please change asm_offsets.c (as in my previous >> email) to avoid putting serious arithmetic in it and then do the >> arithmetic right here so that it's possible to follow what's going on. > > Probably this needs better comments. So TSS_sysenter_stack is the offset > from to tss.sp0 (tss.sp1 later) from the _bottom_ of the stack. But in > this macro the stack might not be empty, it has a configurable (by > \nr_regs) number of dwords on it. Before this instruction we also do a > push %edi, so we need (\nr_regs + 1). > > This can't be put into asm_offset.c, as the actual offset depends on how > much is on the stack. > >> > ENTRY(entry_INT80_32) >> > ASM_CLAC >> > pushl %eax /* pt_regs->orig_ax */ >> > + >> > + /* Stack layout: ss, esp, eflags, cs, eip, orig_eax */ >> > + SWITCH_TO_KERNEL_STACK nr_regs=6 check_user=1 >> > + >> >> Why check_user? > > You are right, check_user shouldn't ne needed as INT80 is never called > from kernel mode. > >> > ENTRY(nmi) >> > ASM_CLAC >> > + >> > + /* Stack layout: ss, esp, eflags, cs, eip */ >> > + SWITCH_TO_KERNEL_STACK nr_regs=5 check_user=1 >> >> This is wrong, I think. If you get an nmi in kernel mode but while >> still on the sysenter stack, you blow up. IIRC we have some crazy >> code already to handle this (for nmi and #DB), and maybe that's >> already adequate or can be made adequate, but at the very least this >> needs a big comment explaining why it's okay. > > If we get an nmi while still on the sysenter stack, then we are not > entering the handler from user-space and the above code will do > nothing and behave as before. > > But you are right, it might blow up. There is a problem with the cr3 > switch, because the nmi can happen in kernel mode before the cr3 is > switched, then this handler will not do the cr3 switch itself and crash > the kernel. But the stack switching should be fine, I think. > >> > + /* >> > + * TODO: Find a way to let cpu_current_top_of_stack point to >> > + * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with >> > + * iret exceptions. >> > + */ >> > + this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0); >> >> Do you know what the issue is? > > No, not yet, I will look into that again. But first I want to get > this series stable enough as it is. > >> As a general comment, the interaction between this patch and vm86 is a >> bit scary. In vm86 mode, the kernel gets entered with extra stuff on >> the stack, which may screw up all your offsets. > > Just read up on vm86 mode control transfers and the stack layout then. > Looks like I need to check for eflags.vm=1 and copy four more registers > from/to the entry stack. Thanks for pointing that out. You could just copy those slots unconditionally. After all, you're slowing down entries by an epic amount due to writing CR3 on with PCID off, so four words copied should be entirely lost in the noise. OTOH, checking for VM86 mode is just a single bt against EFLAGS. With the modern (rewritten a year or two ago by Brian Gerst) vm86 code, all the slots (those actually in pt_regs) are in the same location regardless of whether we're in VM86 mode or not, but we're still fiddling with the bottom of the stack. Since you're controlling the switch to the kernel thread stack, you can easily just write the frame to the correct location, so you should not need to context switch sp1 -- you can do it sanely and leave sp1 as the actual bottom of the kernel stack no matter what. In fact, you could probably avoid context switching sp0, either, which would be a nice cleanup. So I recommend the following. Keep sp0 as the bottom of the sysenter stack no matter what. Then do: bt $X86_EFLAGS_VM_BIT jc .Lfrom_vm_\@ push 5 regs to real stack, starting at four-word offset (so they're in the right place) update %esp ... .Lupdate_esp_\@ .Lfrom_vm_\@: push 9 regs to real stack, starting at the bottom jmp .Lupdate_esp_\@ Does that seem reasonable? It's arguably much nicer than what we have now. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBosiCssoExNfchxKS7gVJTJ1//Uf28jcFyyZo+3KekpnZux4UxoodcX9CequvM/tFrpQiuAb ARC-Seal: i=1; a=rsa-sha256; t=1516355728; cv=none; d=google.com; s=arc-20160816; b=bbEq1NaV5dVMrpJADbjAAxWG+CcHE9OSQAoAA4+B69cY9WsY/hAScob5sEFEu2lAcu 68Wefg2kbff30GBDDHxucZg+x0UPpEwDkfzQMk/6XWa5eWbng08cm1SRArdnd+oT2DSX RtofBwwCiEdDH6EkdJ4sPoGQ1+V7DbaHk9XJDSDLXf7DDN7ChBL+Aaj0MyPhB60+b1nT eNSKy/XMPULkQYVwVATkpDWxry81J6AmYw3foFn0Rtnn6nSOKvxCWdLeolxB4p6T8lUV ORZl6GfG9SC1K+3qn3ttImLcJpj6iVlxClTTD1Kt5IXxG4Nwvx5/LS6DwDFN6ju0MWp5 fOXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=WCvyRMFFhQfC22TyOKDUHXiPjAg0F0VKvWDrZ6PmTZY=; b=Ou/w99FSu5iQIvkxK2UDOLbLVmhdy9hTmynLf7Dx0spUOS5fjTbO1rLcXKWO87edV8 gYmhhUq/3PQEpuaD3GEjBWj8yobQOoAeaJMpGe+3rEeKGZ9fZQF3B1eaFJtwLMIflGDd BkWKqCgHoCSqx3MEOZzU8/Dr8Zg1oWhZXrf6V3zSpxqeML7X2UGYZmkyxCjrEgaTXDBJ x4STskFc0iWK5BCGfJ/s4KBU++aDMbkTwXSWioGhfFEeFKDzdP0UqGXXm/p46S9pTmdy dZ9vx97QxPNdJSJSXR5vrVmG4v3erDBnd9wCgkvz1kzobwr37Zktz5bRLqlaY0sPLoNW 86gw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=I6a9vpxp; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=I6a9vpxp; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Date: Fri, 19 Jan 2018 10:55:23 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , linux-mm@kvack.org, Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180119095523.GY28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841589420443?= X-GMAIL-MSGID: =?utf-8?q?1590014223528376974?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Hey Andy, On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote: > On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > > Just read up on vm86 mode control transfers and the stack layout then. > > Looks like I need to check for eflags.vm=1 and copy four more registers > > from/to the entry stack. Thanks for pointing that out. > > You could just copy those slots unconditionally. After all, you're > slowing down entries by an epic amount due to writing CR3 on with PCID > off, so four words copied should be entirely lost in the noise. OTOH, > checking for VM86 mode is just a single bt against EFLAGS. > > With the modern (rewritten a year or two ago by Brian Gerst) vm86 > code, all the slots (those actually in pt_regs) are in the same > location regardless of whether we're in VM86 mode or not, but we're > still fiddling with the bottom of the stack. Since you're controlling > the switch to the kernel thread stack, you can easily just write the > frame to the correct location, so you should not need to context > switch sp1 -- you can do it sanely and leave sp1 as the actual bottom > of the kernel stack no matter what. In fact, you could probably avoid > context switching sp0, either, which would be a nice cleanup. I am not sure what you mean by "not context switching sp0/sp1" ... > So I recommend the following. Keep sp0 as the bottom of the sysenter > stack no matter what. Then do: > > bt $X86_EFLAGS_VM_BIT > jc .Lfrom_vm_\@ > > push 5 regs to real stack, starting at four-word offset (so they're in > the right place) > update %esp > ... > .Lupdate_esp_\@ > > .Lfrom_vm_\@: > push 9 regs to real stack, starting at the bottom > jmp .Lupdate_esp_\@ > > Does that seem reasonable? It's arguably much nicer than what we have > now. But that looks like a good idea. Having a consistent stack with and without vm86 is certainly a nice cleanup. Regards, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBosVfhSeSkSdwfm6DBAhOJcl7ubk1MX8z1IjV+alnF2TsV5fq2qX6Egc0KR6vtt5KwW3iDCj ARC-Seal: i=1; a=rsa-sha256; t=1516379455; cv=none; d=google.com; s=arc-20160816; b=qKMg6OrBvZXwhsj8g8s48LgV5vpf0U7xIukah/gEJPt5c5CqH+SvO1fRuTr4NixQBY altMP2lFl0JqZ31zEzu5cqG6rUsWlO0eK5bZXcVOnv1pPdpYqm9DpVszB4RtEDGXGZwn 6Vo1bh5VOPRk+Pl/hTYJXyRGip+Xgop/9662xh1WlXokBlCdQRhiGCMIs/29MH8uWaOJ XuCwNs/LsFNZ2MDz+vEG1Dcmyu4fzfQ/m0zPUJevjcMhaR19YqGl5lQr6vWzQRiPDOsh bAiWnGyVmpkZxNEg2lxJYfqBeBpUipw6d0qM0HTp4jD5urf1vel4faHna9NhcqA/sfN9 R/WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=xB1K/pb0TB3o1ioL3eWT0qIjbJP+IysjsQf3PWXjiHA=; b=CO/6FGL4C6CSNULgNPU8EVzineZmRdIMHDP6qMVOQ5xns4+zCNa5HBzvaM8Q/7ImKy AZhX0ypZTpl1OgGT+8+PGl0kVcnzf7UbrhJxBjNX99qP5P0jsf9YgbrkO3O1mDlenHUb 0pHzzkIp8Q/9haykc186IJPI3X/qAgXcPlGRfpZhdMTSq1XET9/rkbEQ9tqCVt6TSJg7 0IQY7bGGnzY6mzTreugP9qyl+DhgIGwNNCzshLfrEGJGmHUuDAWlHKYwpqzRlo2TWq0y SFtSyHQlPACEI+3mAdHvGM2K1AC9nylpmCBg61BrYbLA2OML1agNS3rZ5Mfw/cDUUXCi e6TA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 879CE2175B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20180119095523.GY28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> From: Andy Lutomirski Date: Fri, 19 Jan 2018 08:30:33 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841589420443?= X-GMAIL-MSGID: =?utf-8?q?1590039104287229626?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Fri, Jan 19, 2018 at 1:55 AM, Joerg Roedel wrote: > Hey Andy, > > On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote: >> On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > >> > Just read up on vm86 mode control transfers and the stack layout then. >> > Looks like I need to check for eflags.vm=1 and copy four more registers >> > from/to the entry stack. Thanks for pointing that out. >> >> You could just copy those slots unconditionally. After all, you're >> slowing down entries by an epic amount due to writing CR3 on with PCID >> off, so four words copied should be entirely lost in the noise. OTOH, >> checking for VM86 mode is just a single bt against EFLAGS. >> >> With the modern (rewritten a year or two ago by Brian Gerst) vm86 >> code, all the slots (those actually in pt_regs) are in the same >> location regardless of whether we're in VM86 mode or not, but we're >> still fiddling with the bottom of the stack. Since you're controlling >> the switch to the kernel thread stack, you can easily just write the >> frame to the correct location, so you should not need to context >> switch sp1 -- you can do it sanely and leave sp1 as the actual bottom >> of the kernel stack no matter what. In fact, you could probably avoid >> context switching sp0, either, which would be a nice cleanup. > > I am not sure what you mean by "not context switching sp0/sp1" ... You're supposed to read what I meant, not what I said... I meant that we could have sp0 have a genuinely constant value per cpu. That means that the entry trampoline ends up with RIP, etc in a different place depending on whether VM was in use, but the entry trampoline code should be able to handle that. sp1 would have a value that varies by task, but it could just point to the top of the stack instead of being changed depending on whether VM is in use. Instead, the entry trampoline would offset the registers as needed to keep pt_regs in the right place. I think you already figured all of that out, though :) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x227X7u3iWCYb9muxAK52nb/turhJhqYcXHT/H5gWcwXiw1T8wYqnGwVY6d99Urh0sI4IJDM/ ARC-Seal: i=1; a=rsa-sha256; t=1516615879; cv=none; d=google.com; s=arc-20160816; b=Qm74sgv3ypthSSRoEyeoYzrCi+QhFAfR9dMR360Up2Rag9QkZD9csaOUNOfvi3g+Ae wV5zut+cd+ot/pq1ngwv+xTqUgNtnRQO9FptMWYLE369Qsx+YlFgi39VuugbWtMHZrsk JPkwlaAXmGwOb6vcJoNCMLKnYbUKBO00elh4oZJcCbrWmlDrO7wesAhj9omIkMSUZBpk bPdMNrm+3oqcj1svk2M58wBaPqZMYGJ9a5VpVj++o0CVaC+6KuxbxYpq/MFsz5eDcRQR qpEytsfzEazvwf+Qj8gCttzEldkGX3A3aonhl2iTYlH1TmYnGuCqSoJGOCci74PM9OJy 1JFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=46w380x51MVtJzVIxV+bguOZAFgzsuGlD5FF4lWni7Q=; b=eAYE8EY6OuFOD2RESwqZp7lbYY09IkjsvAy6Mv6a0AVNU9yiaNdZNoq5SjKXKqivWs lqB19c5maQiHH9NB0dH+beAEJ/6GboKHRGzQnYATr9nld4oMkdy8k+pGF5adJ+s023YI jbdArjdGbb9fLhR2USqnZJHFmSxzkbqRdFIuf48MtJ7GNjAoF0wVgcCi3cl7G67Ks4Jw 6sfbeIrENemOP0p30DXtAUU99En1i2YywL0yVKjPIqBScU8AlDVGBjcDcVRCblsv+Kuy 7GknazSegIud9PgSD2+lt8LIlGv8/UIz8cwp1QrUhb25va9Ja+AR3dCnv3xF6TKukDv/ +4xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=dDbWKEcQ; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=dDbWKEcQ; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Date: Mon, 22 Jan 2018 11:11:18 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180122101118.GG28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841589420443?= X-GMAIL-MSGID: =?utf-8?q?1590287012268686829?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Hey Andy, On Fri, Jan 19, 2018 at 08:30:33AM -0800, Andy Lutomirski wrote: > I meant that we could have sp0 have a genuinely constant value per > cpu. That means that the entry trampoline ends up with RIP, etc in a > different place depending on whether VM was in use, but the entry > trampoline code should be able to handle that. sp1 would have a value > that varies by task, but it could just point to the top of the stack > instead of being changed depending on whether VM is in use. Instead, > the entry trampoline would offset the registers as needed to keep > pt_regs in the right place. > > I think you already figured all of that out, though :) Yes, and after looking a while into it, it would make a nice cleanup for the entry code. On the other side, it would change the layout for the in-kernel 'struct pt_regs', so that the user-visible pt_regs ends up with a different layout than the one we use in the the kernel. This can certainly be all worked out, but it makes this nice entry-code cleanup not so nice and clean anymore. At least the work required to make it work without breaking user-space is not in the scope of this patch-set. Regards, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x227Sqvf007mttyup6wTqjn8f7JNDlBEaBhnWiMtpqp+fIEL/bBqA8vYvrB/1668kSohPAJ7A ARC-Seal: i=1; a=rsa-sha256; t=1516643206; cv=none; d=google.com; s=arc-20160816; b=LHozzNmpO2WTxY8AaNnrTEaToUePnjqGjaYLR5aJWZa0nI/vyIQU/goiIu88u4v4E7 s4LK/a1OlVsGDTlrrCiDASRb+W7pxbHQ2TJRC2xH6NqykaTAJ1/zblHTmWu5ZJrF6ugl edrxy/o1zxxZ+yl73SNZwEMn/GkS27Ha8TNGrFhUd0HotgIpLibmH43lmWoxldN4r2/1 /aUWthXPdz0Hlhm7x6Rkd8/BD3S7Pi2H4SBVV4yeNo+zhMNe9iSJw1bAFbLvAzi1wDs7 /ChHSx9eHmbo+uz6UDXcO0fQpsfVVYAFsoSFS/ZXz3Nk9p3SAMcNh2O7w43wGS9hNQ85 wsqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=w/OGcBPUT2fVLB1zO3YtaFlp9A3q9iWVHxFY/5bLd+8=; b=yJBXHwyyS039eIJv8b6qIQAegsqbRM/eUTYf0LcWOWsND1Q0DW2nSPoQRAFMIsnwp3 +gtd95baGQ5TJbK3BA5FWUMJXOHSWnFRyo7cVZW/F5jxz/wRveHmeKYXKTnLC5clyl1E AHmn5S6CHUwAuBWmtW87KXd17yXQmfJosrl0Unz2rOYN2p/M2D4eQyIUcJ0Lp+Lq/YSh G/clqLn2DWZcgcvM2w3LpHB9PlCIiIyPZ8HyJVXtGfsnYzauMYyqyA8oaQ8dAo082ffK rsweqJ3QBWj58bPQ/tiOqU+FQfD0TQMcq4D05LTSz5gLdA2ArB5grjp3Y5VNVW/Zi4sE KK+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE16821789 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20180122101118.GG28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> <20180122101118.GG28161@8bytes.org> From: Andy Lutomirski Date: Mon, 22 Jan 2018 09:46:24 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841589420443?= X-GMAIL-MSGID: =?utf-8?q?1590315666610588712?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Mon, Jan 22, 2018 at 2:11 AM, Joerg Roedel wrote: > Hey Andy, > > On Fri, Jan 19, 2018 at 08:30:33AM -0800, Andy Lutomirski wrote: >> I meant that we could have sp0 have a genuinely constant value per >> cpu. That means that the entry trampoline ends up with RIP, etc in a >> different place depending on whether VM was in use, but the entry >> trampoline code should be able to handle that. sp1 would have a value >> that varies by task, but it could just point to the top of the stack >> instead of being changed depending on whether VM is in use. Instead, >> the entry trampoline would offset the registers as needed to keep >> pt_regs in the right place. >> >> I think you already figured all of that out, though :) > > Yes, and after looking a while into it, it would make a nice cleanup for > the entry code. On the other side, it would change the layout for the > in-kernel 'struct pt_regs', so that the user-visible pt_regs ends up > with a different layout than the one we use in the the kernel. I don't think this is necessarily the case. We end up with four more fields that are logically there at the end of pt_regs (which is already kind-of-sort-of the case), but we don't actually need to put them in struct pt_regs. We just end up with (regs + 1) != "top of task stack", but even that has precedent -- it's already true for tasks in vm86 mode. > > This can certainly be all worked out, but it makes this nice entry-code > cleanup not so nice and clean anymore. At least the work required to > make it work without breaking user-space is not in the scope of this > patch-set. Agreed. This should probably be saved for later. Except that your patch set still needs to come up with some way to function correctly on vm86. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBosuc5xNyElhWPLoL22gC3aT1KujcWU3ouqJInewXlIWZ42Pri7rSIlcKMRCGw6F0NfbdKUV ARC-Seal: i=1; a=rsa-sha256; t=1516360058; cv=none; d=google.com; s=arc-20160816; b=qtef9KJFcKb2R5fQlPbqm977ozhvt7u+rZpym4hIn/TgaaAQ4+B/cj/JGPOhUNw1+z 5+huKTdLpOHMSgHCXslFRchfKPhQ1paFxoZfZj34pbpsJ4Pe0+IVD/Niz3CVXBYxkrvz EU95rL2RlfsEqqhSeu15XIQ1Aymz4m9EvOYX35jnuDciBeqgdxRTjNmzNrPVaopyQlLi ILG0UcaCIf6PJY+A/ecE1Q1FKbiFEu/6Hes/m3VVKwH7jaIIZQ0f8rgnrfwIjj3hAWzJ /t88MNLEsRzes1BrgSUYQhpx1kRn2xelSxh8tRJDTeGLtKQLXUeS0tNS7/kMoNvOE4EF DfwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=c57bhMQunJQdic87v/FEKfj0rzV9ZdkZJJhYGyfyt10=; b=fDqiDQoe7B6gC0TG2u+TaVkY6sExMJ88iDzytSZJu8mLWQLPoyfWFZFiXAizvqAvpg dqmuYhyOiuEaFSibh2IPjlEzN/j6zfC8VkjpMwkk1jU+tYU+a5VnXT1TZGby/6ga5Sny lYdThmBgzhHmLlm6eoaerkrSovv32JCGHC7yq3+5lDBGitb92rgU7XYhxqQHej1eAtM3 TqDBiygFv/tVWhI9a7XEkQXak9No3hLMyJQ4c2PhChTip1yWEfzOu0OqTwvawrb70IaW ii/waNMbUk/h1b9d/Psg8kSNH4rqdtFe5QldIKM4MBlQ6XmAXHhyvHD8hKE5PaipHRev Y0PQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jroedel@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=jroedel@suse.de Authentication-Results: mx.google.com; spf=pass (google.com: domain of jroedel@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=jroedel@suse.de Date: Fri, 19 Jan 2018 12:07:26 +0100 From: Joerg Roedel To: Pavel Machek Cc: Joerg Roedel , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180119110726.odea3h3smcjyicnk@suse.de> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <20180119105527.GB29725@amd> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180119105527.GB29725@amd> User-Agent: NeoMutt/20170421 (1.8.2) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841591470697?= X-GMAIL-MSGID: =?utf-8?q?1590018764362588565?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: Hey Pavel, On Fri, Jan 19, 2018 at 11:55:28AM +0100, Pavel Machek wrote: > Thanks for doing the work. > > I tried applying it on top of -next, and that did not succeed. Let me > try Linus tree... Thanks for your help with testing this patch-set, but I recommend to wait for the next version, as review already found a couple of bugs that might crash your system. For example there are NMI cases that might crash your machine because the NMI happens in kernel mode before the cr3 switch. VM86 mode is also definitly broken. I am about to fix that and will send a new version, if all goes well, at some point next week. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBovH0wjw/JJT4zhR+z5F9OscmDmyG14G0/IhspCxmeoS26ClHHCLojkOr26Nfbu9SK5NEViL ARC-Seal: i=1; a=rsa-sha256; t=1516232616; cv=none; d=google.com; s=arc-20160816; b=SOAOJYLf1cL4i2xXeyvwN9de9YCw0op45JjGQsE8H8f9MDv1SND77lFeZh9LDvB+wb 7UdyTGHwpwUIDfLF9vVAx0KNpoaNgZHKO7az5ipfVMRxFazRMgWA6ZU6s83QzKLfTce0 bYA5rTcJOdeZCJLv6O1lNwue4D8sW4P7H+JWG2biKyQF0NcqfibhrSEcpk7Yol/TtK3C 7fKDtQq3ObBJI1Eto9p+oMpfuo17TTu6aDk6hmVwX2ohm5QfOoduFu/Io9rmDBH7KjWP xtHYBS9caOW0TbkNToPIm0KAI7ggTPbf1PzGVrOqHASYy6QIXRFcRo4NPDVGjQEHhacb 9cUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=WDsJtGKpjTWfgGqwJlIXFOmu5jIXo4+f2t9jMll8Q7Y=; b=PMJIgnLtYNX4rtBb4grrUQU34jn6zk0kZ+U3oso2n8It34ONh5BITeIlF5p81aRLfV dHBRUfpDihO+1/hhqx5t31gptL0RG6DEZ8p8qwFvnx6n8RpRRrxAd+VWBsyvMfMlk9lo PRO2uf5iWNsflJelGEvKk/FcNVaIgQIKPbKOedrhNsQmzGDEXkh2aTUP5sfdQEhwOLVC fVc1OaiY8+DK2Nqae/57322lX2NRos685xybA+fNdl0sooXVV7q/3R+Ud3g/0WMEYW5f YEbP+Z6phY+ivxyBrfC7aQwiRPKiMTF9td+J+ofBylC0KA18rx0dwuAVM3Mg7Dxbbq94 dLcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B21552176E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <1516120619-1159-9-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-9-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 15:43:14 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 08/16] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767842947222657?= X-GMAIL-MSGID: =?utf-8?q?1589885131609873982?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Allocate a kernel and a user page-table root when PTI is > enabled. Also allocate a full page per root for PAEm because > otherwise the bit to flip in cr3 to switch between them > would be non-constant, which creates a lot of hassle. > Keep that for a later optimization. > > Signed-off-by: Joerg Roedel > --- > arch/x86/kernel/head_32.S | 23 ++++++++++++++++++----- > arch/x86/mm/pgtable.c | 11 ++++++----- > 2 files changed, 24 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S > index c29020907886..fc550559bf58 100644 > --- a/arch/x86/kernel/head_32.S > +++ b/arch/x86/kernel/head_32.S > @@ -512,28 +512,41 @@ ENTRY(initial_code) > ENTRY(setup_once_ref) > .long setup_once > > +#ifdef CONFIG_PAGE_TABLE_ISOLATION > +#define PGD_ALIGN (2 * PAGE_SIZE) > +#define PTI_USER_PGD_FILL 1024 > +#else > +#define PGD_ALIGN (PAGE_SIZE) > +#define PTI_USER_PGD_FILL 0 > +#endif > /* > * BSS section > */ > __PAGE_ALIGNED_BSS > - .align PAGE_SIZE > + .align PGD_ALIGN > #ifdef CONFIG_X86_PAE > .globl initial_pg_pmd > initial_pg_pmd: > .fill 1024*KPMDS,4,0 > + .fill PTI_USER_PGD_FILL,4,0 Couldn't this be simplified to just .align PGD_ALIGN, 0 without the .fill? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBotaFTQzZId5SCnGo3nHPamAIJW07M6wm6TJ1DdmLZF0Ia9xveI0dZWz7B2083a130ki0FfX ARC-Seal: i=1; a=rsa-sha256; t=1516355872; cv=none; d=google.com; s=arc-20160816; b=pT8090XyD90y4FJmRbPr759+6jXQYn4kI7hnt/W3w+jGCxr5m6DgiNRgaVRY8FFN3G MwwiIi1akvn/Peodj/7Magwl1TQ3X/9kfN5Bx17qnXiezIZuaGBRCUj6ivcTfqNaJy7m R2ODuPPKlTsGBxsGJ6CCugQNCYLQl1tIgRW6mWDgu2EWbPfbLFxREOasdZ9U5wk+9KLt FW+xYHY5vwBGpaOW2rrrQAnME0Z3ltEagQL/TaCXLaASIKQYZ7v9YK6xGfYYhat/ZuRc EPpz9f1m083+M5R1Qv13dDqNrUuU2jSGcyzQ0asxA2KbV9IZhYPatV1S7I4gPb7Tx9+T QydQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=WxGLJyn30jrjlN/A8o8KBUWNUG7kFFdb5htzA48F4D0=; b=eKB+IGGy80EBKEc0mgpLm+T5X0DmR7Z/bmkXNc6qYJzoQcokNHqfVjKeM+99oAOjqN ZFcs/Y+p6cXxnC2pxo64mYWlWi7bArd2liyW0i/G77CHJnEIWtj4nhqyBvOYdrs8UGPm qglWSntRFE2pW/gvB7Si4ooaw0DMB4I6aDFaNIq20d4kdOmA6a77DIDv5WJm1jmShDj6 fDFFVUdGEmy8OnwlQudQCsqAXKKay1L0q7hdr2Dq01J49HuQos29RfQRMSC/0fbuf7UE u0qWIMSF7IsF2nzwkGrBy8E6m4cBkzrhIbPyTCHjRxXRuxCSHExCDau19h8RQ3+QNxDc jf4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=bYXqZZpk; spf=pass (google.com: domain of joro@8bytes.org designates 81.169.241.247 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=bYXqZZpk; spf=pass (google.com: domain of joro@8bytes.org designates 81.169.241.247 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Date: Fri, 19 Jan 2018 10:57:51 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 08/16] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled Message-ID: <20180119095751.GA28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-9-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767842947222657?= X-GMAIL-MSGID: =?utf-8?q?1590014375088607096?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Wed, Jan 17, 2018 at 03:43:14PM -0800, Andy Lutomirski wrote: > On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > > #ifdef CONFIG_X86_PAE > > .globl initial_pg_pmd > > initial_pg_pmd: > > .fill 1024*KPMDS,4,0 > > + .fill PTI_USER_PGD_FILL,4,0 > > Couldn't this be simplified to just .align PGD_ALIGN, 0 without the .fill? You are right, will change that. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBotYj2pdwgSYV9Vjq/ZjexHMpYRmrT3L24mwXrR/ZKbsj7Wcv/GFeZbkAoVRJ1bLXOV6h3p6 ARC-Seal: i=1; a=rsa-sha256; t=1516212774; cv=none; d=google.com; s=arc-20160816; b=mDcbPU+jDZ/mrYAqy9+tYwjN6v1+t7xDsaQX73nxG/30EFrumotgv+WF7/tFHsUfP3 X3qSaMsq2KD6VOonOxd3bHoN9gXtHHv2T63lVhsYhRxrHQQvz8rTiBcDuBUxmlH2VBuN c4MyakR7bnXPWVv0fD9Qbi53zuCF9vQWndQQTPoNbeTWQPQyRxKaFFqpESdE92eMMotD sDMxaMg2Ut2Ce8EOq3fuPBL3hBIarI1Xzk5oc8Co2y8hrXWP/hQAmvHq3KJ7MyYiITfq EEzomnM25Khgc1WAbwV6Drbujhg0+G4U8ic+RJ7ZqU45OFSPR5aH2efX3zncWdjSZQU2 4J3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=xRCQ2qfdWbsJg0WtRdw2kBHPAMaJEz0vP73QEP1WIiY=; b=ynarph0hceNu6trNRZS2kaCWl7M6WQPidc7OwpQXCEsHQFs7/Nsww53CZWOXB3Da6a Yd6Bf4+bn+9MRIqdA9A2f4JVgC+ydzVEH3m5iVGO2FolvOKEOAj3Q2nj2Z0pl5oj0ig9 Zo8qe/qfXllObA+SuCrqQC8la4ARL/rmzbl4uaohoftYRLq1gk1PL52HNYh+4fwrZZE4 TWxWxnMZA5RxRLuyFQIQ4ye00dg8JUpGQLYZJydheeo5wrH2wzYQA9kIkPSA49TyzCSI CMf6tzpTF4gVLHO7NTrD90+0kTo0dFkjoByP8Dm9nfv8ydhKdpyj8DDQMVAWzIOvkQce OClA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1638521742 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20180117141006.GR28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> <20180117141006.GR28161@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 10:12:32 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack To: Joerg Roedel Cc: Brian Gerst , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767842589204195?= X-GMAIL-MSGID: =?utf-8?q?1589864326443719716?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Wed, Jan 17, 2018 at 6:10 AM, Joerg Roedel wrote: > On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote: >> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel wrote: > >> > I have no real idea on how to switch back to the entry stack without >> > access to per_cpu variables. I also can't access the cpu_entry_area for >> > the cpu yet, because for that we need to be on the entry stack already. >> >> Switch to the trampoline stack before loading user segments. > > That requires to copy most of pt_regs from task- to trampoline-stack, > not sure if that is faster than temporily restoring kernel %fs. > I would optimize for simplicity, not speed. You're already planning to write to CR3, which is serializing, blows away the TLB, *and* takes the absurdly large amount of time that the microcode needs to blow away the TLB. (For whatever reason, Intel doesn't seem to have hardware that can quickly wipe the TLB. I suspect that the actual implementation does it in a loop and wipes little pieces at a time. Whatever it actually does, the CR3 write itself is very slow.) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBotvlUKSAuejglNogmzp8z6bPmsUIZ9c6sHdnokTfLc369wfWQr8ZuCO1WinWwhIB2e7xMSo ARC-Seal: i=1; a=rsa-sha256; t=1516355826; cv=none; d=google.com; s=arc-20160816; b=JeGPFiolhae3XQ/DOcU50RCBCuL2LudIph0oM8ZrOLnbkra0zkUfmeT6bh0M5ad9eq vf/1R1gKqtCttTCTgXDNa8J1yh1ATo6SQbE9y/GnvdzsKD6QYE+cF0n+gQxA4Er1mooT SW7NEOExZk1/KZaPtYh1OFedX/rG9UTt2xAlK2EGo0LoJn3U2ZYQ7ivA7gdxqmR97AXT mkaDwkJ50rVKNbasX/PPenhaOefXqXw4PhjRiyn9xYkrGaHQ4TpKNU3q/8GIg9J77I0I IBjMut+F4mWaWp+cTcEovShpP+mo8e5Mro3gzO+9Pt4wC3cVWHN5uiHF2bTHJ3galfM4 1GAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=BV8lM+Ug9fEhqzXtrvRjR8gQTY4/ylFVafJSFcMyEPo=; b=T3qsiYuLcJ1wjpAX0kR685rNuAqHIUcQAet0mle3BylESPyWa6KlVGztKJJwmkdnM8 6Wf8KRW6+dd49CN0rwFy4kNAqad9fCO6yIffyYs5y6dg5Xiut4vJB7pq89Q8vu/rBVy0 hjBkOabmmO/clw+YUADJrUbQc3jNeq0CfHgspLYfg0fpIZNeSz7ls35cMpPXPVObYAyf qKuDmxXkH3ghy0OewjDUUTCgwFYHj4CW0NwMN24UETalt7B+FB1oFNW32imdVjPkJ3bs /nvGTp7FNjOFHRSitNk8el59w5aFfZFha/JghXz+i76L/+C8QZrcS+RyWu+I4tDmEvSG lk7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=ftNF9S+m; spf=pass (google.com: domain of joro@8bytes.org designates 81.169.241.247 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=ftNF9S+m; spf=pass (google.com: domain of joro@8bytes.org designates 81.169.241.247 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Date: Fri, 19 Jan 2018 10:57:05 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: Brian Gerst , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Subject: Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack Message-ID: <20180119095705.GZ28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-4-git-send-email-joro@8bytes.org> <20180117092442.GJ28161@8bytes.org> <20180117141006.GR28161@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767842589204195?= X-GMAIL-MSGID: =?utf-8?q?1590014327116233697?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Wed, Jan 17, 2018 at 10:12:32AM -0800, Andy Lutomirski wrote: > I would optimize for simplicity, not speed. You're already planning > to write to CR3, which is serializing, blows away the TLB, *and* takes > the absurdly large amount of time that the microcode needs to blow > away the TLB. Okay, so I am going to do the stack-switch before pt_regs is restored. This is at least better than playing games with hiding the entry/exit %esp somewhere in stack-memory. Thanks, Joerg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBosCRvKwcsxm7FCagR+NN+Iwnb36kL9owysPC1B/67mVB4VZJ/nEv37EHoZFEiXu4pLeoBXM ARC-Seal: i=1; a=rsa-sha256; t=1516232489; cv=none; d=google.com; s=arc-20160816; b=J+snwloQgunG61HA26M9mvxMJKg5fw3by3lLyHTy41WH+iLcrQ93rEBZtXxaqZHt4C GSExcBOJldVOtBG/uLWj3dVyb+zgEQNxsaOlUYoU49TvvYqKnX3T0N54N7n18Vl5BRCl 87tv04a8GQ5BO0MregOA0sxM76AqSM/8XyquB8gvJyipLGjySHqRDj8G7p0HRWL7zd22 S/7i40gmqODg/rz/3m4rcg3xFea6txxsmYBQk3/bBz57pGhQn0+f/fSGVhnWkdE6GIW5 UIWRgHHNl9e8TacBARZ8zD/cButRkYBt2kPYVXJ55DuZsvGyD+H2eJQR6fsW6MZvDIja 713Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=niAD9NYfbUKL+MjwB792Z9xPCF6Da9XSWbj1qAvBSEc=; b=YDOuGdXJlprWXt+pymknN6naukpPHHI+WJo5U14DEz9W4gyGQAw/o3Co1gUwrFYp3w GR2zsGpJjni8MCCGaw+4DQaS17efXvymGgRNrRlDr1VFYWmRja8LYLjQPD+2gDjWdxXM UNaj/9tnb9dKDYgXhj7R0EV+o2KIdu4hNHmQLpSFzFAXuBjnwywTALnoxl2F0EuB5jKc B6CcfBUG4mICqglxkl/a78WmOxgd67p/BZOMugNp4KZTQmuPZfYcaK+v1MRrB0OU46LP uwWSFRoPQ0dYBjfHsilbGxyzx6lIEtJ671PR2KX4tS8JawJaLSGSYy9AMFKxv9s0xAGW KgdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98ED82176D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <1516120619-1159-15-git-send-email-joro@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-15-git-send-email-joro@8bytes.org> From: Andy Lutomirski Date: Wed, 17 Jan 2018 15:41:07 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 14/16] x86/mm/legacy: Populate the user page-table with user pgd's To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767844692821087?= X-GMAIL-MSGID: =?utf-8?q?1589884999160338020?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Also populate the user-spage pgd's in the user page-table. > > Signed-off-by: Joerg Roedel > --- > arch/x86/include/asm/pgtable-2level.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h > index 685ffe8a0eaf..d96486d23c58 100644 > --- a/arch/x86/include/asm/pgtable-2level.h > +++ b/arch/x86/include/asm/pgtable-2level.h > @@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte) > > static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) > { > +#ifdef CONFIG_PAGE_TABLE_ISOLATION > + pmd.pud.p4d.pgd = pti_set_user_pgd(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd); > +#endif > *pmdp = pmd; > } > Nothing against your patch, but this seems like a perfectly fine place to rant: I *hate* the way we deal with page table folding. Grr. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965125AbeAXTDm (ORCPT ); Wed, 24 Jan 2018 14:03:42 -0500 Received: from shrek-s3.podlesie.net ([85.14.110.209]:37858 "EHLO shrek.podlesie.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965002AbeAXTDl (ORCPT ); Wed, 24 Jan 2018 14:03:41 -0500 Date: Wed, 24 Jan 2018 19:58:00 +0100 From: Krzysztof Mazur To: Joerg Roedel Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180124185800.GA11515@shrek.podlesie.net> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516120619-1159-1-git-send-email-joro@8bytes.org> User-Agent: Mutt/1.6.2 (2016-07-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 16, 2018 at 05:36:43PM +0100, Joerg Roedel wrote: > From: Joerg Roedel > > Hi, > > here is my current WIP code to enable PTI on x86-32. It is > still in a pretty early state, but it successfully boots my > KVM guest with PAE and with legacy paging. The existing PTI > code for x86-64 already prepares a lot of the stuff needed > for 32 bit too, thanks for that to all the people involved > in its development :) Hi, I've waited for this patches for a long time, until I've tried to exploit meltdown on some old 32-bit CPUs and failed. Pentium M seems to speculatively execute the second load with eax always equal to 0: movzx (%[addr]), %%eax shl $12, %%eax movzx (%[target], %%eax), %%eax And on Pentium 4-based Xeon the second load seems to be never executed, even without shift (shifts are slow on some or all Pentium 4's). Maybe not all P6 and Netbursts CPUs are affected, but I'm not sure. Maybe the kernel, at least on 32-bit, should try to exploit meltdown to test if the CPU is really affected. The series boots on Pentium M (and crashes when I've used perf, but it is an already known issue). However, I don't like the performance regression with CONFIG_PAGE_TABLE_ISOLATION=n (about 7.2%), trivial "benchmark": --- cut here --- #include #include int main(void) { unsigned long i; int fd; fd = open("/dev/null", O_WRONLY); for (i = 0; i < 10000000; i++) { char x = 0; write(fd, &x, 1); } return 0; } --- cut here --- Time (on Pentium M 1.73 GHz): baseline (4.15.0-rc8-gdda3e152): 2.415 s (+/- 0.64%) patched, without CONFIG_PAGE_TABLE_ISOLATION=n 2.588 s (+/- 0.01%) patched, nopti 2.597 s (+/- 0.31%) patched, pti 18.272 s (some older kernel, pre 4.15) 2.378 s Thanks, Krzysiek -- perf results: baseline: Performance counter stats for './bench' (5 runs): 2401.539139 task-clock:HG # 0.995 CPUs utilized ( +- 0.23% ) 23 context-switches:HG # 0.009 K/sec ( +- 4.02% ) 0 cpu-migrations:HG # 0.000 K/sec 30 page-faults:HG # 0.013 K/sec ( +- 1.24% ) 4142375834 cycles:HG # 1.725 GHz ( +- 0.23% ) [39.99%] 385110908 stalled-cycles-frontend:HG # 9.30% frontend cycles idle ( +- 0.06% ) [40.01%] stalled-cycles-backend:HG 4142489274 instructions:HG # 1.00 insns per cycle # 0.09 stalled cycles per insn ( +- 0.00% ) [40.00%] 802270380 branches:HG # 334.065 M/sec ( +- 0.00% ) [40.00%] 34278 branch-misses:HG # 0.00% of all branches ( +- 1.94% ) [40.00%] 2.414741497 seconds time elapsed ( +- 0.64% ) patched, without CONFIG_PAGE_TABLE_ISOLATION=n Performance counter stats for './bench' (5 runs): 2587.026405 task-clock:HG # 1.000 CPUs utilized ( +- 0.01% ) 28 context-switches:HG # 0.011 K/sec ( +- 5.95% ) 0 cpu-migrations:HG # 0.000 K/sec 31 page-faults:HG # 0.012 K/sec ( +- 1.21% ) 4462401079 cycles:HG # 1.725 GHz ( +- 0.01% ) [39.98%] 388646121 stalled-cycles-frontend:HG # 8.71% frontend cycles idle ( +- 0.05% ) [40.01%] stalled-cycles-backend:HG 4283638646 instructions:HG # 0.96 insns per cycle # 0.09 stalled cycles per insn ( +- 0.00% ) [40.03%] 822484311 branches:HG # 317.927 M/sec ( +- 0.00% ) [40.01%] 39372 branch-misses:HG # 0.00% of all branches ( +- 2.33% ) [39.98%] 2.587818354 seconds time elapsed ( +- 0.01% ) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752017AbeAZMgU (ORCPT ); Fri, 26 Jan 2018 07:36:20 -0500 Received: from 8bytes.org ([81.169.241.247]:52810 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751775AbeAZMgS (ORCPT ); Fri, 26 Jan 2018 07:36:18 -0500 Date: Fri, 26 Jan 2018 13:36:16 +0100 From: Joerg Roedel To: Alan Cox Cc: Nadav Amit , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , jroedel@suse.de Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 Message-ID: <20180126123616.GK28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <20180122085625.GE28161@8bytes.org> <20180125170925.1d72d587@alans-desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180125170925.1d72d587@alans-desktop> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alan, On Thu, Jan 25, 2018 at 05:09:25PM +0000, Alan Cox wrote: > On Mon, 22 Jan 2018 09:56:25 +0100 > Joerg Roedel wrote: > > > Hey Nadav, > > > > On Sun, Jan 21, 2018 at 03:46:24PM -0800, Nadav Amit wrote: > > > It does seem that segmentation provides sufficient protection from Meltdown. > > > > Thanks for testing this, if this turns out to be true for all affected > > uarchs it would be a great and better way of protection than enabling > > PTI. > > > > But I'd like an official statement from Intel on that one, as their > > recommended fix is still to use PTI. > > It is: we don't think segmentation works on all processors as a defence. Thanks for checking and the official statement. So the official mitigation recommendation is still to use PTI. Regards, Joerg