From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 8DD1C6B0033 for ; Thu, 26 Jan 2017 17:40:06 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id 80so327343566pfy.2 for ; Thu, 26 Jan 2017 14:40:06 -0800 (PST) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id w33si2571532plb.273.2017.01.26.14.40.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 14:40:05 -0800 (PST) Subject: [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA) From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:05 -0800 Message-Id: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen Kirill is chugging right along getting his 5-level paging[1] patch set ready to be merged. I figured I'd share an early draft of the MPX support that will to go along with it. Background: there is a lot more detail about what bounds tables are in the changelog for fe3d197f843. But, basically MPX bounds tables help us to store the ranges to which a pointer is allowed to point. The tables are walked by hardware and they are indexed by the virtual address of the pointer being checked. A larger virtual address space (from 5-level paging) means that we need larger tables. 5-level paging hardware includes a feature called MPX Address-Width Adjust (MAWA) that grows the bounds tables so they can address the new address space. MAWA is controlled independently from the paging mode (via an MSR) so that old MPX binaries can run on new hardware and kernels supporting 5-level paging. But, since userspace is responsible for allocating the table that is growing (the directory), we need to ensure that userspace and the kernel agree about the size of these tables and the kernel can set the MSR appropriately. These are not quite ready to get applied anywhere, but I don't expect the basics to change unless folks have big problems with this. The only big remaining piece of work is to update the MPX selftest code. Dave Hansen (4): x86, mpx: introduce per-mm MPX table size tracking x86, mpx: update MPX to grok larger bounds tables x86, mpx: extend MPX prctl() to pass in size of bounds directory x86, mpx: context-switch new MPX address size MSR arch/x86/include/asm/mmu.h | 1 + arch/x86/include/asm/mpx.h | 41 ++++++++++++++--- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/processor.h | 6 +-- arch/x86/mm/mpx.c | 79 ++++++++++++++++++++++++++++---- arch/x86/mm/pgtable.c | 2 +- arch/x86/mm/tlb.c | 42 +++++++++++++++++ kernel/sys.c | 6 +-- 8 files changed, 155 insertions(+), 23 deletions(-) 1. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 06CFF6B0038 for ; Thu, 26 Jan 2017 17:40:08 -0500 (EST) Received: by mail-pf0-f200.google.com with SMTP id y143so328226387pfb.6 for ; Thu, 26 Jan 2017 14:40:07 -0800 (PST) Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id x1si2577624pfa.171.2017.01.26.14.40.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 14:40:07 -0800 (PST) Subject: [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:06 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224006.DED9C8D3@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen Larger address spaces mean larger MPX bounds table sizes. This tracks which size tables we are using. "MAWA" is what the hardware documentation calls this feature: MPX Address-Width Adjust. We will carry that nomenclature throughout this series. The new field will be optimized and get packed into 'bd_addr' in a later patch. But, leave it separate for now to make the series simpler. --- b/arch/x86/include/asm/mmu.h | 1 + b/arch/x86/include/asm/mpx.h | 9 +++++++++ 2 files changed, 10 insertions(+) diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h --- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa 2017-01-26 14:31:32.643673297 -0800 +++ b/arch/x86/include/asm/mmu.h 2017-01-26 14:31:32.647673476 -0800 @@ -34,6 +34,7 @@ typedef struct { #ifdef CONFIG_X86_INTEL_MPX /* address of the bounds directory */ void __user *bd_addr; + int mpx_mawa; #endif } mm_context_t; diff -puN arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa 2017-01-26 14:31:32.644673342 -0800 +++ b/arch/x86/include/asm/mpx.h 2017-01-26 14:31:32.648673521 -0800 @@ -68,6 +68,15 @@ static inline void mpx_mm_init(struct mm * directory, so point this at an invalid address. */ mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR; + /* + * All processes start out in "legacy" MPX mode with + * MAWA=0. + */ + mm->context.mpx_mawa = 0; +} +static inline int mpx_mawa_shift(struct mm_struct *mm) +{ + return mm->context.mpx_mawa; } void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 300EC6B0253 for ; Thu, 26 Jan 2017 17:40:10 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id d185so327888068pgc.2 for ; Thu, 26 Jan 2017 14:40:10 -0800 (PST) Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id w127si489546pgb.313.2017.01.26.14.40.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 14:40:09 -0800 (PST) Subject: [RFC][PATCH 2/4] x86, mpx: update MPX to grok larger bounds tables From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:07 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224007.3E06536B@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen As mentioned repeatedly, larger address spaces mean larger MPX bounds tables. The MPX code in the kernel needs to walk these tables in order to populate them on demand as well as unmap them when memory is freed. This updates the bounds table walking code to understand how to walk the larger table size. It uses the new per-mm "MAWA" value to determine which format to use. --- b/arch/x86/include/asm/mpx.h | 27 +++++++++++++++++++++------ b/arch/x86/mm/mpx.c | 25 +++++++++++++++++-------- 2 files changed, 38 insertions(+), 14 deletions(-) diff -puN arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes 2017-01-26 14:31:33.098693731 -0800 +++ b/arch/x86/include/asm/mpx.h 2017-01-26 14:31:33.103693956 -0800 @@ -14,15 +14,30 @@ #define MPX_BD_ENTRY_VALID_FLAG 0x1 /* - * The upper 28 bits [47:20] of the virtual address in 64-bit - * are used to index into bounds directory (BD). + * The uppermost bits [56:20] of the virtual address in 64-bit + * are used to index into bounds directory (BD). On processors + * with support for smaller virtual address space size, the "56" + * is obviously smaller. * - * The directory is 2G (2^31) in size, and with 8-byte entries - * it has 2^28 entries. + * When using 47-bit virtual addresses, the directory is 2G + * (2^31) bytes in size, and with 8-byte entries it has 2^28 + * entries. With 56-bit virtual addresses, it goes to 1T in size + * and has 2^37 entries. + * + * Needs to be ULL so we can use this in 32-bit kernels without + * warnings. */ -#define MPX_BD_SIZE_BYTES_64 (1UL<<31) +#define MPX_BD_BASE_SIZE_BYTES_64 (1ULL<<31) #define MPX_BD_ENTRY_BYTES_64 8 -#define MPX_BD_NR_ENTRIES_64 (MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64) +/* + * Note: size of tables on 64-bit is not constant, so we have no + * fixed definition for MPX_BD_NR_ENTRIES_64. + * + * The 5-Level Paging Whitepaper says: + * A bound directory comprises 2^(28+MAWA) 64-bit entries. + * MAWA=0 in the legacy mode, so: + */ +#define MPX_BD_LEGACY_NR_ENTRIES_64 (1UL<<28) /* * The 32-bit directory is 4MB (2^22) in size, and with 4-byte diff -puN arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes arch/x86/mm/mpx.c --- a/arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes 2017-01-26 14:31:33.099693776 -0800 +++ b/arch/x86/mm/mpx.c 2017-01-26 14:31:33.103693956 -0800 @@ -22,10 +22,14 @@ static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm) { - if (is_64bit_mm(mm)) - return MPX_BD_SIZE_BYTES_64; - else + if (!is_64bit_mm(mm)) return MPX_BD_SIZE_BYTES_32; + + /* + * The bounds directory grows with the MAWA value. The + * "legacy" shift is 0. + */ + return MPX_BD_BASE_SIZE_BYTES_64 << mpx_mawa_shift(mm); } static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm) @@ -724,6 +728,7 @@ static inline unsigned long bd_entry_vir { unsigned long long virt_space; unsigned long long GB = (1ULL << 30); + unsigned long legacy_64bit_vaddr_bits = 48; /* * This covers 32-bit emulation as well as 32-bit kernels @@ -733,12 +738,16 @@ static inline unsigned long bd_entry_vir return (4ULL * GB) / MPX_BD_NR_ENTRIES_32; /* - * 'x86_virt_bits' returns what the hardware is capable - * of, and returns the full >32-bit address space when - * running 32-bit kernels on 64-bit hardware. + * With 5-level paging, the virtual address space size + * gets bigger. A bounds directory entry still points to + * a single bounds table and the *tables* stay the same + * size. Thus, the address space that a directory entry + * covers does not change based on the paging mode (or + * MAWA value). Just use the legacy calculation despite + * the MAWA mode. */ - virt_space = (1ULL << boot_cpu_data.x86_virt_bits); - return virt_space / MPX_BD_NR_ENTRIES_64; + virt_space = (1ULL << legacy_64bit_vaddr_bits); + return virt_space / MPX_BD_LEGACY_NR_ENTRIES_64; } /* _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 362366B025E for ; Thu, 26 Jan 2017 17:40:11 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id 3so75329414pgj.6 for ; Thu, 26 Jan 2017 14:40:11 -0800 (PST) Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id w16si2594822plk.57.2017.01.26.14.40.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 14:40:09 -0800 (PST) Subject: [RFC][PATCH 3/4] x86, mpx: extend MPX prctl() to pass in size of bounds directory From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:09 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224009.ECA68304@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen The MPX bounds tables are indexed by virtual address. A larger virtual address space means that we need larger tables. But, we need to ensure that userspace and the kernel agree about the size of these tables. To do this, we require that userspace pass in the size of the tables if they want a non-legacy size. They do this with a previously unused (required to be 0) argument to the PR_MPX_ENABLE_MANAGEMENT ptctl(). This way, the kernel can make sure that the size of the tables is consistent with the size of the address space and can return an error if there is a mismatch. There are essentially 3 table sizes that matter: 1. 32-bit table sized for a 32-bit address space 2. 64-bit table sized for a 48-bit address space 3. 64-bit table sized for a 57-bit address space We cover all three of those cases. FIXME: we also need to ensure that we check the current state of the larger address space opt-in. If we've opted in to larger address spaces we can not allow a small bounds directory to be used. Also, if we've not opted in, we can not allow the larger bounds directory to be used. --- b/arch/x86/include/asm/mpx.h | 5 +++ b/arch/x86/include/asm/processor.h | 6 ++-- b/arch/x86/mm/mpx.c | 54 +++++++++++++++++++++++++++++++++++-- b/arch/x86/mm/pgtable.c | 2 - b/kernel/sys.c | 6 ++-- 5 files changed, 64 insertions(+), 9 deletions(-) diff -puN arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.564714660 -0800 +++ b/arch/x86/include/asm/mpx.h 2017-01-26 14:31:33.574715109 -0800 @@ -40,6 +40,11 @@ #define MPX_BD_LEGACY_NR_ENTRIES_64 (1UL<<28) /* + * We only support one value for MAWA + */ +#define MPX_MAWA_VALUE 9 + +/* * The 32-bit directory is 4MB (2^22) in size, and with 4-byte * entries it has 2^20 entries. */ diff -puN arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa arch/x86/include/asm/processor.h --- a/arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.566714750 -0800 +++ b/arch/x86/include/asm/processor.h 2017-01-26 14:31:33.575715154 -0800 @@ -863,14 +863,14 @@ extern int get_tsc_mode(unsigned long ad extern int set_tsc_mode(unsigned int val); /* Register/unregister a process' MPX related resource */ -#define MPX_ENABLE_MANAGEMENT() mpx_enable_management() +#define MPX_ENABLE_MANAGEMENT(bd_size) mpx_enable_management(bd_size) #define MPX_DISABLE_MANAGEMENT() mpx_disable_management() #ifdef CONFIG_X86_INTEL_MPX -extern int mpx_enable_management(void); +extern int mpx_enable_management(unsigned long bd_size); extern int mpx_disable_management(void); #else -static inline int mpx_enable_management(void) +static inline int mpx_enable_management(unsigned long bd_size) { return -EINVAL; } diff -puN arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa arch/x86/mm/mpx.c --- a/arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.567714795 -0800 +++ b/arch/x86/mm/mpx.c 2017-01-26 14:31:33.575715154 -0800 @@ -339,7 +339,54 @@ static __user void *mpx_get_bounds_dir(v (bndcsr->bndcfgu & MPX_BNDCFG_ADDR_MASK); } -int mpx_enable_management(void) +int mpx_set_mm_bd_size(unsigned long bd_size) +{ + struct mm_struct *mm = current->mm; + + switch ((unsigned long long)bd_size) { + case 0: + /* Legacy call to prctl(): */ + mm->context.mpx_mawa = 0; + return 0; + case MPX_BD_SIZE_BYTES_32: + /* 32-bit, legacy-sized bounds directory: */ + if (is_64bit_mm(mm)) + return -EINVAL; + mm->context.mpx_mawa = 0; + return 0; + case MPX_BD_BASE_SIZE_BYTES_64: + /* 64-bit, legacy-sized bounds directory: */ + if (!is_64bit_mm(mm) + // FIXME && ! opted-in to larger address space + ) + return -EINVAL; + mm->context.mpx_mawa = 0; + return 0; + case MPX_BD_BASE_SIZE_BYTES_64 << MPX_MAWA_VALUE: + /* + * Non-legacy call, with larger directory. + * Note that there is no 32-bit equivalent for + * this case since its address space does not + * change sizes. + */ + if (!is_64bit_mm(mm)) + return -EINVAL; + /* + * Do not let this be enabled unles we are on + * 5-level hardware *and* have that feature + * enabled. FIXME: need runtime check + */ + if (!cpu_feature_enabled(X86_FEATURE_LA57) + // FIXME && opted into larger address space + ) + return -EINVAL; + mm->context.mpx_mawa = MPX_MAWA_VALUE; + return 0; + } + return -EINVAL; +} + +int mpx_enable_management(unsigned long bd_size) { void __user *bd_base = MPX_INVALID_BOUNDS_DIR; struct mm_struct *mm = current->mm; @@ -358,10 +405,13 @@ int mpx_enable_management(void) */ bd_base = mpx_get_bounds_dir(); down_write(&mm->mmap_sem); + ret = mpx_set_mm_bd_size(bd_size); + if (ret) + goto out; mm->context.bd_addr = bd_base; if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) ret = -ENXIO; - +out: up_write(&mm->mmap_sem); return ret; } diff -puN arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.569714885 -0800 +++ b/arch/x86/mm/pgtable.c 2017-01-26 14:31:33.575715154 -0800 @@ -85,7 +85,7 @@ void ___pud_free_tlb(struct mmu_gather * #if CONFIG_PGTABLE_LEVELS > 4 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) { - paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); + //paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); tlb_remove_page(tlb, virt_to_page(p4d)); } #endif /* CONFIG_PGTABLE_LEVELS > 4 */ diff -puN kernel/sys.c~mawa-040-prctl-set-mawa kernel/sys.c --- a/kernel/sys.c~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.571714974 -0800 +++ b/kernel/sys.c 2017-01-26 14:31:33.576715199 -0800 @@ -92,7 +92,7 @@ # define SET_TSC_CTL(a) (-EINVAL) #endif #ifndef MPX_ENABLE_MANAGEMENT -# define MPX_ENABLE_MANAGEMENT() (-EINVAL) +# define MPX_ENABLE_MANAGEMENT(bd_size) (-EINVAL) #endif #ifndef MPX_DISABLE_MANAGEMENT # define MPX_DISABLE_MANAGEMENT() (-EINVAL) @@ -2246,9 +2246,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsi up_write(&me->mm->mmap_sem); break; case PR_MPX_ENABLE_MANAGEMENT: - if (arg2 || arg3 || arg4 || arg5) + if (arg3 || arg4 || arg5) return -EINVAL; - error = MPX_ENABLE_MANAGEMENT(); + error = MPX_ENABLE_MANAGEMENT(arg2); break; case PR_MPX_DISABLE_MANAGEMENT: if (arg2 || arg3 || arg4 || arg5) _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 52B566B0260 for ; Thu, 26 Jan 2017 17:40:12 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id d123so75999770pfd.0 for ; Thu, 26 Jan 2017 14:40:12 -0800 (PST) Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id x1si2577624pfa.171.2017.01.26.14.40.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 14:40:11 -0800 (PST) Subject: [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:10 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224010.3534C154@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen As mentioned in previous patches, larger address spaces mean larger MPX tables. But, the entire system is either entirely using 5-level paging, or not. We do not mix pagetable formats. If the size of the MPX tables depended soley on the paging mode, old binaries would break because the format of the tables changed underneath them. So, since CR4 never changes, but we need some way to change the MPX table format, a new MSR is introduced: MSR_IA32_MPX_LAX. If we are in 5-level paging mode *and* the enable bit in this MSR is set, the CPU will use the new, larger MPX bounds table format. If 5-level paging is disabled, or the enable bit is clear, then the legacy-style smaller tables will be used. But, we might mix legacy and non-legacy binaries on the same system, so this MSR needs to be context-switched. Add code to do this, along with some simple optimizations to skip the MSR writes if the MSR does not need to be updated. --- b/arch/x86/include/asm/msr-index.h | 1 b/arch/x86/mm/tlb.c | 42 +++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff -puN arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr arch/x86/include/asm/msr-index.h --- a/arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr 2017-01-26 14:31:37.747902524 -0800 +++ b/arch/x86/include/asm/msr-index.h 2017-01-26 14:31:37.752902749 -0800 @@ -410,6 +410,7 @@ #define MSR_IA32_BNDCFGS 0x00000d90 #define MSR_IA32_XSS 0x00000da0 +#define MSR_IA32_MPX_LAX 0x00001000 #define FEATURE_CONTROL_LOCKED (1<<0) #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) diff -puN arch/x86/mm/tlb.c~mawa-050-context-switch-msr arch/x86/mm/tlb.c --- a/arch/x86/mm/tlb.c~mawa-050-context-switch-msr 2017-01-26 14:31:37.749902614 -0800 +++ b/arch/x86/mm/tlb.c 2017-01-26 14:31:37.753902794 -0800 @@ -71,6 +71,47 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); } +/* + * The MPX tables change sizes based on the size of the virtual + * (aka. linear) address space. There is an MSR to tell the CPU + * whether we want the legacy-style ones or the larger ones when + * we are running with an eXtended virtual address space. + */ +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next) +{ + /* + * Note: there is one and only one bit in use in the MSR + * at this time, so we do not have to be concerned with + * preseving any of the other bits. Just write 0 or 1. + */ + unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001; + + if (!cpu_feature_enabled(X86_FEATURE_MPX)) + return; + /* + * FIXME: do we want a check here for the 5-level paging + * CR4 bit or CPUID bit, or is the mawa check below OK? + * It's not obvious what would be the fastest or if it + * matters. + */ + + /* + * Avoid the relatively costly MSR if we are not changing + * MAWA state. All processes not using MPX will have a + * mpx_mawa_shift()=0, so we do not need to check + * separately for whether MPX management is enabled. + */ + if (mpx_mawa_shift(prev) == mpx_mawa_shift(next)) + return; + + if (mpx_mawa_shift(next)) { + wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0); + } else { + /* clear the enable bit: */ + wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0); + } +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct /* Load per-mm CR4 state */ load_mm_cr4(next); + switch_mawa(prev, next); #ifdef CONFIG_MODIFY_LDT_SYSCALL /* * Load the LDT, if the LDT is different. _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id B10096B0033 for ; Fri, 27 Jan 2017 03:16:58 -0500 (EST) Received: by mail-wm0-f69.google.com with SMTP id p192so50427764wme.1 for ; Fri, 27 Jan 2017 00:16:58 -0800 (PST) Received: from mail-wm0-x242.google.com (mail-wm0-x242.google.com. [2a00:1450:400c:c09::242]) by mx.google.com with ESMTPS id v30si4962873wra.229.2017.01.27.00.16.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jan 2017 00:16:57 -0800 (PST) Received: by mail-wm0-x242.google.com with SMTP id r144so56459098wme.0 for ; Fri, 27 Jan 2017 00:16:57 -0800 (PST) Date: Fri, 27 Jan 2017 09:16:54 +0100 From: Ingo Molnar Subject: Re: [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA) Message-ID: <20170127081654.GA25162@gmail.com> References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra * Dave Hansen wrote: > Kirill is chugging right along getting his 5-level paging[1] patch set > ready to be merged. I figured I'd share an early draft of the MPX > support that will to go along with it. > > Background: there is a lot more detail about what bounds tables are in > the changelog for fe3d197f843. But, basically MPX bounds tables help > us to store the ranges to which a pointer is allowed to point. The > tables are walked by hardware and they are indexed by the virtual > address of the pointer being checked. > > A larger virtual address space (from 5-level paging) means that we > need larger tables. 5-level paging hardware includes a feature called > MPX Address-Width Adjust (MAWA) that grows the bounds tables so they > can address the new address space. MAWA is controlled independently > from the paging mode (via an MSR) so that old MPX binaries can run on > new hardware and kernels supporting 5-level paging. > > But, since userspace is responsible for allocating the table that is > growing (the directory), we need to ensure that userspace and the > kernel agree about the size of these tables and the kernel can set the > MSR appropriately. > > These are not quite ready to get applied anywhere, but I don't expect > the basics to change unless folks have big problems with this. The > only big remaining piece of work is to update the MPX selftest code. > > Dave Hansen (4): > x86, mpx: introduce per-mm MPX table size tracking > x86, mpx: update MPX to grok larger bounds tables > x86, mpx: extend MPX prctl() to pass in size of bounds directory > x86, mpx: context-switch new MPX address size MSR On a related note, the MPX testcases seem to have gone from the tools/testing/selftests/x86/Makefile (possibly a merge mishap - the original commit adds it correctly), so they are not being built. Plus I noticed that the pkeys testcases are producing a lot of noise: triton:~/tip/tools/testing/selftests/x86> make [...] gcc -m64 -o protection_keys_64 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl protection_keys.c: In function a??setup_hugetlbfsa??: protection_keys.c:816:6: warning: unused variable a??ia?? [-Wunused-variable] int i; ^ protection_keys.c:815:6: warning: unused variable a??validated_nr_pagesa?? [-Wunused-variable] int validated_nr_pages; ^ protection_keys.c: In function a??test_pkey_syscalls_bad_argsa??: protection_keys.c:1136:6: warning: unused variable a??bad_flaga?? [-Wunused-variable] int bad_flag = (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + 1; ^ protection_keys.c: In function a??test_pkey_alloc_exhausta??: protection_keys.c:1153:16: warning: unused variable a??init_vala?? [-Wunused-variable] unsigned long init_val; ^ protection_keys.c:1152:16: warning: unused variable a??flagsa?? [-Wunused-variable] unsigned long flags; ^ In file included from protection_keys.c:45:0: pkey-helpers.h: In function a??sigsafe_printfa??: pkey-helpers.h:41:3: warning: ignoring return value of a??writea??, declared with attribute warn_unused_result [-Wunused-result] write(1, dprint_in_signal_buffer, len); ^ protection_keys.c: In function a??dumpita??: protection_keys.c:407:3: warning: ignoring return value of a??writea??, declared with attribute warn_unused_result [-Wunused-result] write(1, buf, nr_read); ^ protection_keys.c: In function a??pkey_disable_seta??: protection_keys.c:68:5: warning: a??orig_pkrua?? may be used uninitialized in this function [-Wmaybe-uninitialized] if (!(condition)) { \ ^ protection_keys.c:465:6: note: a??orig_pkrua?? was declared here u32 orig_pkru; ^ [...] Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 65FCC6B0253 for ; Fri, 27 Jan 2017 03:26:34 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id r144so50219155wme.0 for ; Fri, 27 Jan 2017 00:26:34 -0800 (PST) Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com. [2a00:1450:400c:c09::244]) by mx.google.com with ESMTPS id 185si1990498wmm.14.2017.01.27.00.26.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jan 2017 00:26:33 -0800 (PST) Received: by mail-wm0-x244.google.com with SMTP id r144so56520780wme.0 for ; Fri, 27 Jan 2017 00:26:33 -0800 (PST) Date: Fri, 27 Jan 2017 09:26:30 +0100 From: Ingo Molnar Subject: Re: [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking Message-ID: <20170127082629.GB25162@gmail.com> References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> <20170126224006.DED9C8D3@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170126224006.DED9C8D3@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org * Dave Hansen wrote: > Larger address spaces mean larger MPX bounds table sizes. This > tracks which size tables we are using. > > "MAWA" is what the hardware documentation calls this feature: > MPX Address-Width Adjust. We will carry that nomenclature throughout > this series. > > The new field will be optimized and get packed into 'bd_addr' in a later > patch. But, leave it separate for now to make the series simpler. > > --- > > b/arch/x86/include/asm/mmu.h | 1 + > b/arch/x86/include/asm/mpx.h | 9 +++++++++ > 2 files changed, 10 insertions(+) > > diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h > --- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa 2017-01-26 14:31:32.643673297 -0800 > +++ b/arch/x86/include/asm/mmu.h 2017-01-26 14:31:32.647673476 -0800 > @@ -34,6 +34,7 @@ typedef struct { > #ifdef CONFIG_X86_INTEL_MPX > /* address of the bounds directory */ > void __user *bd_addr; > + int mpx_mawa; -ENOCOMMENT. Plus 'int' looks probably wrong, unless the hardware really wants signed shift values. (whatever 'mpx_mawa' is.) Plus, while Intel is free to use sucky acronyms such as MAWA, could we please name this and related functionality sensibly: mpx_table_size or mpx_table_shift or such? The data structure comment can point out that Intel calls this 'MAWA'. (Also, the changelog refers to a later change, which never happens in this series.) Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f200.google.com (mail-wj0-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 409406B0260 for ; Fri, 27 Jan 2017 03:31:27 -0500 (EST) Received: by mail-wj0-f200.google.com with SMTP id yr2so45146337wjc.4 for ; Fri, 27 Jan 2017 00:31:27 -0800 (PST) Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com. [2a00:1450:400c:c09::241]) by mx.google.com with ESMTPS id a16si5064682wra.331.2017.01.27.00.31.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jan 2017 00:31:25 -0800 (PST) Received: by mail-wm0-x241.google.com with SMTP id r126so56484960wmr.3 for ; Fri, 27 Jan 2017 00:31:25 -0800 (PST) Date: Fri, 27 Jan 2017 09:31:22 +0100 From: Ingo Molnar Subject: Re: [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR Message-ID: <20170127083122.GC25162@gmail.com> References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> <20170126224010.3534C154@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170126224010.3534C154@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, Peter Zijlstra , Thomas Gleixner , "H. Peter Anvin" * Dave Hansen wrote: > + * The MPX tables change sizes based on the size of the virtual > + * (aka. linear) address space. There is an MSR to tell the CPU > + * whether we want the legacy-style ones or the larger ones when > + * we are running with an eXtended virtual address space. > + */ > +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next) > +{ > + /* > + * Note: there is one and only one bit in use in the MSR > + * at this time, so we do not have to be concerned with > + * preseving any of the other bits. Just write 0 or 1. > + */ > + unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001; > + > + if (!cpu_feature_enabled(X86_FEATURE_MPX)) > + return; > + /* > + * FIXME: do we want a check here for the 5-level paging > + * CR4 bit or CPUID bit, or is the mawa check below OK? > + * It's not obvious what would be the fastest or if it > + * matters. > + */ > + > + /* > + * Avoid the relatively costly MSR if we are not changing > + * MAWA state. All processes not using MPX will have a > + * mpx_mawa_shift()=0, so we do not need to check > + * separately for whether MPX management is enabled. > + */ > + if (mpx_mawa_shift(prev) == mpx_mawa_shift(next)) > + return; Please stop the senseless looking wrappery - if the field is name sensibly then it can be accessed directly through mm_struct. > + > + if (mpx_mawa_shift(next)) { > + wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0); > + } else { > + /* clear the enable bit: */ > + wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0); > + } > +} > + > void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, > struct task_struct *tsk) > { > @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct > /* Load per-mm CR4 state */ > load_mm_cr4(next); > > + switch_mawa(prev, next); This implementation adds about 4-5 unnecessary instructions to the context switching hot path of every non-MPX task, even on non-MPX hardware. Please make sure that this is something like: if (unlikely(prev->mpx_msr_val != next->mpx_msr_val)) switch_mpx(prev, next); ... which reduces the hot path overhead to something like 2 instruction (if we are lucky). This can be put into switch_mpx() and can be inlined - just make sure that on a defconfig the generated machine code is sane. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753088AbdAZWkL (ORCPT ); Thu, 26 Jan 2017 17:40:11 -0500 Received: from mga09.intel.com ([134.134.136.24]:32319 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752639AbdAZWkJ (ORCPT ); Thu, 26 Jan 2017 17:40:09 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,291,1477983600"; d="scan'208";a="58664260" Subject: [RFC][PATCH 2/4] x86, mpx: update MPX to grok larger bounds tables To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:07 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224007.3E06536B@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As mentioned repeatedly, larger address spaces mean larger MPX bounds tables. The MPX code in the kernel needs to walk these tables in order to populate them on demand as well as unmap them when memory is freed. This updates the bounds table walking code to understand how to walk the larger table size. It uses the new per-mm "MAWA" value to determine which format to use. --- b/arch/x86/include/asm/mpx.h | 27 +++++++++++++++++++++------ b/arch/x86/mm/mpx.c | 25 +++++++++++++++++-------- 2 files changed, 38 insertions(+), 14 deletions(-) diff -puN arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes 2017-01-26 14:31:33.098693731 -0800 +++ b/arch/x86/include/asm/mpx.h 2017-01-26 14:31:33.103693956 -0800 @@ -14,15 +14,30 @@ #define MPX_BD_ENTRY_VALID_FLAG 0x1 /* - * The upper 28 bits [47:20] of the virtual address in 64-bit - * are used to index into bounds directory (BD). + * The uppermost bits [56:20] of the virtual address in 64-bit + * are used to index into bounds directory (BD). On processors + * with support for smaller virtual address space size, the "56" + * is obviously smaller. * - * The directory is 2G (2^31) in size, and with 8-byte entries - * it has 2^28 entries. + * When using 47-bit virtual addresses, the directory is 2G + * (2^31) bytes in size, and with 8-byte entries it has 2^28 + * entries. With 56-bit virtual addresses, it goes to 1T in size + * and has 2^37 entries. + * + * Needs to be ULL so we can use this in 32-bit kernels without + * warnings. */ -#define MPX_BD_SIZE_BYTES_64 (1UL<<31) +#define MPX_BD_BASE_SIZE_BYTES_64 (1ULL<<31) #define MPX_BD_ENTRY_BYTES_64 8 -#define MPX_BD_NR_ENTRIES_64 (MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64) +/* + * Note: size of tables on 64-bit is not constant, so we have no + * fixed definition for MPX_BD_NR_ENTRIES_64. + * + * The 5-Level Paging Whitepaper says: + * A bound directory comprises 2^(28+MAWA) 64-bit entries. + * MAWA=0 in the legacy mode, so: + */ +#define MPX_BD_LEGACY_NR_ENTRIES_64 (1UL<<28) /* * The 32-bit directory is 4MB (2^22) in size, and with 4-byte diff -puN arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes arch/x86/mm/mpx.c --- a/arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes 2017-01-26 14:31:33.099693776 -0800 +++ b/arch/x86/mm/mpx.c 2017-01-26 14:31:33.103693956 -0800 @@ -22,10 +22,14 @@ static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm) { - if (is_64bit_mm(mm)) - return MPX_BD_SIZE_BYTES_64; - else + if (!is_64bit_mm(mm)) return MPX_BD_SIZE_BYTES_32; + + /* + * The bounds directory grows with the MAWA value. The + * "legacy" shift is 0. + */ + return MPX_BD_BASE_SIZE_BYTES_64 << mpx_mawa_shift(mm); } static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm) @@ -724,6 +728,7 @@ static inline unsigned long bd_entry_vir { unsigned long long virt_space; unsigned long long GB = (1ULL << 30); + unsigned long legacy_64bit_vaddr_bits = 48; /* * This covers 32-bit emulation as well as 32-bit kernels @@ -733,12 +738,16 @@ static inline unsigned long bd_entry_vir return (4ULL * GB) / MPX_BD_NR_ENTRIES_32; /* - * 'x86_virt_bits' returns what the hardware is capable - * of, and returns the full >32-bit address space when - * running 32-bit kernels on 64-bit hardware. + * With 5-level paging, the virtual address space size + * gets bigger. A bounds directory entry still points to + * a single bounds table and the *tables* stay the same + * size. Thus, the address space that a directory entry + * covers does not change based on the paging mode (or + * MAWA value). Just use the legacy calculation despite + * the MAWA mode. */ - virt_space = (1ULL << boot_cpu_data.x86_virt_bits); - return virt_space / MPX_BD_NR_ENTRIES_64; + virt_space = (1ULL << legacy_64bit_vaddr_bits); + return virt_space / MPX_BD_LEGACY_NR_ENTRIES_64; } /* _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752963AbdAZWkI (ORCPT ); Thu, 26 Jan 2017 17:40:08 -0500 Received: from mga01.intel.com ([192.55.52.88]:5967 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752639AbdAZWkH (ORCPT ); Thu, 26 Jan 2017 17:40:07 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,291,1477983600"; d="scan'208";a="57997944" Subject: [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:06 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224006.DED9C8D3@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Larger address spaces mean larger MPX bounds table sizes. This tracks which size tables we are using. "MAWA" is what the hardware documentation calls this feature: MPX Address-Width Adjust. We will carry that nomenclature throughout this series. The new field will be optimized and get packed into 'bd_addr' in a later patch. But, leave it separate for now to make the series simpler. --- b/arch/x86/include/asm/mmu.h | 1 + b/arch/x86/include/asm/mpx.h | 9 +++++++++ 2 files changed, 10 insertions(+) diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h --- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa 2017-01-26 14:31:32.643673297 -0800 +++ b/arch/x86/include/asm/mmu.h 2017-01-26 14:31:32.647673476 -0800 @@ -34,6 +34,7 @@ typedef struct { #ifdef CONFIG_X86_INTEL_MPX /* address of the bounds directory */ void __user *bd_addr; + int mpx_mawa; #endif } mm_context_t; diff -puN arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa 2017-01-26 14:31:32.644673342 -0800 +++ b/arch/x86/include/asm/mpx.h 2017-01-26 14:31:32.648673521 -0800 @@ -68,6 +68,15 @@ static inline void mpx_mm_init(struct mm * directory, so point this at an invalid address. */ mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR; + /* + * All processes start out in "legacy" MPX mode with + * MAWA=0. + */ + mm->context.mpx_mawa = 0; +} +static inline int mpx_mawa_shift(struct mm_struct *mm) +{ + return mm->context.mpx_mawa; } void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753303AbdAZWkO (ORCPT ); Thu, 26 Jan 2017 17:40:14 -0500 Received: from mga04.intel.com ([192.55.52.120]:59563 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753126AbdAZWkL (ORCPT ); Thu, 26 Jan 2017 17:40:11 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,291,1477983600"; d="scan'208";a="1099422370" Subject: [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:10 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224010.3534C154@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As mentioned in previous patches, larger address spaces mean larger MPX tables. But, the entire system is either entirely using 5-level paging, or not. We do not mix pagetable formats. If the size of the MPX tables depended soley on the paging mode, old binaries would break because the format of the tables changed underneath them. So, since CR4 never changes, but we need some way to change the MPX table format, a new MSR is introduced: MSR_IA32_MPX_LAX. If we are in 5-level paging mode *and* the enable bit in this MSR is set, the CPU will use the new, larger MPX bounds table format. If 5-level paging is disabled, or the enable bit is clear, then the legacy-style smaller tables will be used. But, we might mix legacy and non-legacy binaries on the same system, so this MSR needs to be context-switched. Add code to do this, along with some simple optimizations to skip the MSR writes if the MSR does not need to be updated. --- b/arch/x86/include/asm/msr-index.h | 1 b/arch/x86/mm/tlb.c | 42 +++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff -puN arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr arch/x86/include/asm/msr-index.h --- a/arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr 2017-01-26 14:31:37.747902524 -0800 +++ b/arch/x86/include/asm/msr-index.h 2017-01-26 14:31:37.752902749 -0800 @@ -410,6 +410,7 @@ #define MSR_IA32_BNDCFGS 0x00000d90 #define MSR_IA32_XSS 0x00000da0 +#define MSR_IA32_MPX_LAX 0x00001000 #define FEATURE_CONTROL_LOCKED (1<<0) #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) diff -puN arch/x86/mm/tlb.c~mawa-050-context-switch-msr arch/x86/mm/tlb.c --- a/arch/x86/mm/tlb.c~mawa-050-context-switch-msr 2017-01-26 14:31:37.749902614 -0800 +++ b/arch/x86/mm/tlb.c 2017-01-26 14:31:37.753902794 -0800 @@ -71,6 +71,47 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); } +/* + * The MPX tables change sizes based on the size of the virtual + * (aka. linear) address space. There is an MSR to tell the CPU + * whether we want the legacy-style ones or the larger ones when + * we are running with an eXtended virtual address space. + */ +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next) +{ + /* + * Note: there is one and only one bit in use in the MSR + * at this time, so we do not have to be concerned with + * preseving any of the other bits. Just write 0 or 1. + */ + unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001; + + if (!cpu_feature_enabled(X86_FEATURE_MPX)) + return; + /* + * FIXME: do we want a check here for the 5-level paging + * CR4 bit or CPUID bit, or is the mawa check below OK? + * It's not obvious what would be the fastest or if it + * matters. + */ + + /* + * Avoid the relatively costly MSR if we are not changing + * MAWA state. All processes not using MPX will have a + * mpx_mawa_shift()=0, so we do not need to check + * separately for whether MPX management is enabled. + */ + if (mpx_mawa_shift(prev) == mpx_mawa_shift(next)) + return; + + if (mpx_mawa_shift(next)) { + wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0); + } else { + /* clear the enable bit: */ + wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0); + } +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct /* Load per-mm CR4 state */ load_mm_cr4(next); + switch_mawa(prev, next); #ifdef CONFIG_MODIFY_LDT_SYSCALL /* * Load the LDT, if the LDT is different. _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753557AbdAZWkV (ORCPT ); Thu, 26 Jan 2017 17:40:21 -0500 Received: from mga02.intel.com ([134.134.136.20]:10475 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752397AbdAZWkP (ORCPT ); Thu, 26 Jan 2017 17:40:15 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,291,1477983600"; d="scan'208";a="813769202" Subject: [RFC][PATCH 3/4] x86, mpx: extend MPX prctl() to pass in size of bounds directory To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:09 -0800 References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Message-Id: <20170126224009.ECA68304@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The MPX bounds tables are indexed by virtual address. A larger virtual address space means that we need larger tables. But, we need to ensure that userspace and the kernel agree about the size of these tables. To do this, we require that userspace pass in the size of the tables if they want a non-legacy size. They do this with a previously unused (required to be 0) argument to the PR_MPX_ENABLE_MANAGEMENT ptctl(). This way, the kernel can make sure that the size of the tables is consistent with the size of the address space and can return an error if there is a mismatch. There are essentially 3 table sizes that matter: 1. 32-bit table sized for a 32-bit address space 2. 64-bit table sized for a 48-bit address space 3. 64-bit table sized for a 57-bit address space We cover all three of those cases. FIXME: we also need to ensure that we check the current state of the larger address space opt-in. If we've opted in to larger address spaces we can not allow a small bounds directory to be used. Also, if we've not opted in, we can not allow the larger bounds directory to be used. --- b/arch/x86/include/asm/mpx.h | 5 +++ b/arch/x86/include/asm/processor.h | 6 ++-- b/arch/x86/mm/mpx.c | 54 +++++++++++++++++++++++++++++++++++-- b/arch/x86/mm/pgtable.c | 2 - b/kernel/sys.c | 6 ++-- 5 files changed, 64 insertions(+), 9 deletions(-) diff -puN arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.564714660 -0800 +++ b/arch/x86/include/asm/mpx.h 2017-01-26 14:31:33.574715109 -0800 @@ -40,6 +40,11 @@ #define MPX_BD_LEGACY_NR_ENTRIES_64 (1UL<<28) /* + * We only support one value for MAWA + */ +#define MPX_MAWA_VALUE 9 + +/* * The 32-bit directory is 4MB (2^22) in size, and with 4-byte * entries it has 2^20 entries. */ diff -puN arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa arch/x86/include/asm/processor.h --- a/arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.566714750 -0800 +++ b/arch/x86/include/asm/processor.h 2017-01-26 14:31:33.575715154 -0800 @@ -863,14 +863,14 @@ extern int get_tsc_mode(unsigned long ad extern int set_tsc_mode(unsigned int val); /* Register/unregister a process' MPX related resource */ -#define MPX_ENABLE_MANAGEMENT() mpx_enable_management() +#define MPX_ENABLE_MANAGEMENT(bd_size) mpx_enable_management(bd_size) #define MPX_DISABLE_MANAGEMENT() mpx_disable_management() #ifdef CONFIG_X86_INTEL_MPX -extern int mpx_enable_management(void); +extern int mpx_enable_management(unsigned long bd_size); extern int mpx_disable_management(void); #else -static inline int mpx_enable_management(void) +static inline int mpx_enable_management(unsigned long bd_size) { return -EINVAL; } diff -puN arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa arch/x86/mm/mpx.c --- a/arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.567714795 -0800 +++ b/arch/x86/mm/mpx.c 2017-01-26 14:31:33.575715154 -0800 @@ -339,7 +339,54 @@ static __user void *mpx_get_bounds_dir(v (bndcsr->bndcfgu & MPX_BNDCFG_ADDR_MASK); } -int mpx_enable_management(void) +int mpx_set_mm_bd_size(unsigned long bd_size) +{ + struct mm_struct *mm = current->mm; + + switch ((unsigned long long)bd_size) { + case 0: + /* Legacy call to prctl(): */ + mm->context.mpx_mawa = 0; + return 0; + case MPX_BD_SIZE_BYTES_32: + /* 32-bit, legacy-sized bounds directory: */ + if (is_64bit_mm(mm)) + return -EINVAL; + mm->context.mpx_mawa = 0; + return 0; + case MPX_BD_BASE_SIZE_BYTES_64: + /* 64-bit, legacy-sized bounds directory: */ + if (!is_64bit_mm(mm) + // FIXME && ! opted-in to larger address space + ) + return -EINVAL; + mm->context.mpx_mawa = 0; + return 0; + case MPX_BD_BASE_SIZE_BYTES_64 << MPX_MAWA_VALUE: + /* + * Non-legacy call, with larger directory. + * Note that there is no 32-bit equivalent for + * this case since its address space does not + * change sizes. + */ + if (!is_64bit_mm(mm)) + return -EINVAL; + /* + * Do not let this be enabled unles we are on + * 5-level hardware *and* have that feature + * enabled. FIXME: need runtime check + */ + if (!cpu_feature_enabled(X86_FEATURE_LA57) + // FIXME && opted into larger address space + ) + return -EINVAL; + mm->context.mpx_mawa = MPX_MAWA_VALUE; + return 0; + } + return -EINVAL; +} + +int mpx_enable_management(unsigned long bd_size) { void __user *bd_base = MPX_INVALID_BOUNDS_DIR; struct mm_struct *mm = current->mm; @@ -358,10 +405,13 @@ int mpx_enable_management(void) */ bd_base = mpx_get_bounds_dir(); down_write(&mm->mmap_sem); + ret = mpx_set_mm_bd_size(bd_size); + if (ret) + goto out; mm->context.bd_addr = bd_base; if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) ret = -ENXIO; - +out: up_write(&mm->mmap_sem); return ret; } diff -puN arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.569714885 -0800 +++ b/arch/x86/mm/pgtable.c 2017-01-26 14:31:33.575715154 -0800 @@ -85,7 +85,7 @@ void ___pud_free_tlb(struct mmu_gather * #if CONFIG_PGTABLE_LEVELS > 4 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) { - paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); + //paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); tlb_remove_page(tlb, virt_to_page(p4d)); } #endif /* CONFIG_PGTABLE_LEVELS > 4 */ diff -puN kernel/sys.c~mawa-040-prctl-set-mawa kernel/sys.c --- a/kernel/sys.c~mawa-040-prctl-set-mawa 2017-01-26 14:31:33.571714974 -0800 +++ b/kernel/sys.c 2017-01-26 14:31:33.576715199 -0800 @@ -92,7 +92,7 @@ # define SET_TSC_CTL(a) (-EINVAL) #endif #ifndef MPX_ENABLE_MANAGEMENT -# define MPX_ENABLE_MANAGEMENT() (-EINVAL) +# define MPX_ENABLE_MANAGEMENT(bd_size) (-EINVAL) #endif #ifndef MPX_DISABLE_MANAGEMENT # define MPX_DISABLE_MANAGEMENT() (-EINVAL) @@ -2246,9 +2246,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsi up_write(&me->mm->mmap_sem); break; case PR_MPX_ENABLE_MANAGEMENT: - if (arg2 || arg3 || arg4 || arg5) + if (arg3 || arg4 || arg5) return -EINVAL; - error = MPX_ENABLE_MANAGEMENT(); + error = MPX_ENABLE_MANAGEMENT(arg2); break; case PR_MPX_DISABLE_MANAGEMENT: if (arg2 || arg3 || arg4 || arg5) _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753441AbdAZWkT (ORCPT ); Thu, 26 Jan 2017 17:40:19 -0500 Received: from mga02.intel.com ([134.134.136.20]:12786 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752919AbdAZWkQ (ORCPT ); Thu, 26 Jan 2017 17:40:16 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,291,1477983600"; d="scan'208";a="57683378" Subject: [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA) To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen From: Dave Hansen Date: Thu, 26 Jan 2017 14:40:05 -0800 Message-Id: <20170126224005.A6BBEF2C@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Kirill is chugging right along getting his 5-level paging[1] patch set ready to be merged. I figured I'd share an early draft of the MPX support that will to go along with it. Background: there is a lot more detail about what bounds tables are in the changelog for fe3d197f843. But, basically MPX bounds tables help us to store the ranges to which a pointer is allowed to point. The tables are walked by hardware and they are indexed by the virtual address of the pointer being checked. A larger virtual address space (from 5-level paging) means that we need larger tables. 5-level paging hardware includes a feature called MPX Address-Width Adjust (MAWA) that grows the bounds tables so they can address the new address space. MAWA is controlled independently from the paging mode (via an MSR) so that old MPX binaries can run on new hardware and kernels supporting 5-level paging. But, since userspace is responsible for allocating the table that is growing (the directory), we need to ensure that userspace and the kernel agree about the size of these tables and the kernel can set the MSR appropriately. These are not quite ready to get applied anywhere, but I don't expect the basics to change unless folks have big problems with this. The only big remaining piece of work is to update the MPX selftest code. Dave Hansen (4): x86, mpx: introduce per-mm MPX table size tracking x86, mpx: update MPX to grok larger bounds tables x86, mpx: extend MPX prctl() to pass in size of bounds directory x86, mpx: context-switch new MPX address size MSR arch/x86/include/asm/mmu.h | 1 + arch/x86/include/asm/mpx.h | 41 ++++++++++++++--- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/processor.h | 6 +-- arch/x86/mm/mpx.c | 79 ++++++++++++++++++++++++++++---- arch/x86/mm/pgtable.c | 2 +- arch/x86/mm/tlb.c | 42 +++++++++++++++++ kernel/sys.c | 6 +-- 8 files changed, 155 insertions(+), 23 deletions(-) 1. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754315AbdA0I1J (ORCPT ); Fri, 27 Jan 2017 03:27:09 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36054 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754268AbdA0I1G (ORCPT ); Fri, 27 Jan 2017 03:27:06 -0500 Date: Fri, 27 Jan 2017 09:26:30 +0100 From: Ingo Molnar To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking Message-ID: <20170127082629.GB25162@gmail.com> References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> <20170126224006.DED9C8D3@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170126224006.DED9C8D3@viggo.jf.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Dave Hansen wrote: > Larger address spaces mean larger MPX bounds table sizes. This > tracks which size tables we are using. > > "MAWA" is what the hardware documentation calls this feature: > MPX Address-Width Adjust. We will carry that nomenclature throughout > this series. > > The new field will be optimized and get packed into 'bd_addr' in a later > patch. But, leave it separate for now to make the series simpler. > > --- > > b/arch/x86/include/asm/mmu.h | 1 + > b/arch/x86/include/asm/mpx.h | 9 +++++++++ > 2 files changed, 10 insertions(+) > > diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h > --- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa 2017-01-26 14:31:32.643673297 -0800 > +++ b/arch/x86/include/asm/mmu.h 2017-01-26 14:31:32.647673476 -0800 > @@ -34,6 +34,7 @@ typedef struct { > #ifdef CONFIG_X86_INTEL_MPX > /* address of the bounds directory */ > void __user *bd_addr; > + int mpx_mawa; -ENOCOMMENT. Plus 'int' looks probably wrong, unless the hardware really wants signed shift values. (whatever 'mpx_mawa' is.) Plus, while Intel is free to use sucky acronyms such as MAWA, could we please name this and related functionality sensibly: mpx_table_size or mpx_table_shift or such? The data structure comment can point out that Intel calls this 'MAWA'. (Also, the changelog refers to a later change, which never happens in this series.) Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754367AbdA0I1K (ORCPT ); Fri, 27 Jan 2017 03:27:10 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35340 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754205AbdA0I1G (ORCPT ); Fri, 27 Jan 2017 03:27:06 -0500 Date: Fri, 27 Jan 2017 09:16:54 +0100 From: Ingo Molnar To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra Subject: Re: [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA) Message-ID: <20170127081654.GA25162@gmail.com> References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170126224005.A6BBEF2C@viggo.jf.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Dave Hansen wrote: > Kirill is chugging right along getting his 5-level paging[1] patch set > ready to be merged. I figured I'd share an early draft of the MPX > support that will to go along with it. > > Background: there is a lot more detail about what bounds tables are in > the changelog for fe3d197f843. But, basically MPX bounds tables help > us to store the ranges to which a pointer is allowed to point. The > tables are walked by hardware and they are indexed by the virtual > address of the pointer being checked. > > A larger virtual address space (from 5-level paging) means that we > need larger tables. 5-level paging hardware includes a feature called > MPX Address-Width Adjust (MAWA) that grows the bounds tables so they > can address the new address space. MAWA is controlled independently > from the paging mode (via an MSR) so that old MPX binaries can run on > new hardware and kernels supporting 5-level paging. > > But, since userspace is responsible for allocating the table that is > growing (the directory), we need to ensure that userspace and the > kernel agree about the size of these tables and the kernel can set the > MSR appropriately. > > These are not quite ready to get applied anywhere, but I don't expect > the basics to change unless folks have big problems with this. The > only big remaining piece of work is to update the MPX selftest code. > > Dave Hansen (4): > x86, mpx: introduce per-mm MPX table size tracking > x86, mpx: update MPX to grok larger bounds tables > x86, mpx: extend MPX prctl() to pass in size of bounds directory > x86, mpx: context-switch new MPX address size MSR On a related note, the MPX testcases seem to have gone from the tools/testing/selftests/x86/Makefile (possibly a merge mishap - the original commit adds it correctly), so they are not being built. Plus I noticed that the pkeys testcases are producing a lot of noise: triton:~/tip/tools/testing/selftests/x86> make [...] gcc -m64 -o protection_keys_64 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl protection_keys.c: In function ‘setup_hugetlbfs’: protection_keys.c:816:6: warning: unused variable ‘i’ [-Wunused-variable] int i; ^ protection_keys.c:815:6: warning: unused variable ‘validated_nr_pages’ [-Wunused-variable] int validated_nr_pages; ^ protection_keys.c: In function ‘test_pkey_syscalls_bad_args’: protection_keys.c:1136:6: warning: unused variable ‘bad_flag’ [-Wunused-variable] int bad_flag = (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + 1; ^ protection_keys.c: In function ‘test_pkey_alloc_exhaust’: protection_keys.c:1153:16: warning: unused variable ‘init_val’ [-Wunused-variable] unsigned long init_val; ^ protection_keys.c:1152:16: warning: unused variable ‘flags’ [-Wunused-variable] unsigned long flags; ^ In file included from protection_keys.c:45:0: pkey-helpers.h: In function ‘sigsafe_printf’: pkey-helpers.h:41:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] write(1, dprint_in_signal_buffer, len); ^ protection_keys.c: In function ‘dumpit’: protection_keys.c:407:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] write(1, buf, nr_read); ^ protection_keys.c: In function ‘pkey_disable_set’: protection_keys.c:68:5: warning: ‘orig_pkru’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (!(condition)) { \ ^ protection_keys.c:465:6: note: ‘orig_pkru’ was declared here u32 orig_pkru; ^ [...] Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754412AbdA0Icd (ORCPT ); Fri, 27 Jan 2017 03:32:33 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:35350 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754201AbdA0Ib1 (ORCPT ); Fri, 27 Jan 2017 03:31:27 -0500 Date: Fri, 27 Jan 2017 09:31:22 +0100 From: Ingo Molnar To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, Peter Zijlstra , Thomas Gleixner , "H. Peter Anvin" Subject: Re: [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR Message-ID: <20170127083122.GC25162@gmail.com> References: <20170126224005.A6BBEF2C@viggo.jf.intel.com> <20170126224010.3534C154@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170126224010.3534C154@viggo.jf.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Dave Hansen wrote: > + * The MPX tables change sizes based on the size of the virtual > + * (aka. linear) address space. There is an MSR to tell the CPU > + * whether we want the legacy-style ones or the larger ones when > + * we are running with an eXtended virtual address space. > + */ > +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next) > +{ > + /* > + * Note: there is one and only one bit in use in the MSR > + * at this time, so we do not have to be concerned with > + * preseving any of the other bits. Just write 0 or 1. > + */ > + unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001; > + > + if (!cpu_feature_enabled(X86_FEATURE_MPX)) > + return; > + /* > + * FIXME: do we want a check here for the 5-level paging > + * CR4 bit or CPUID bit, or is the mawa check below OK? > + * It's not obvious what would be the fastest or if it > + * matters. > + */ > + > + /* > + * Avoid the relatively costly MSR if we are not changing > + * MAWA state. All processes not using MPX will have a > + * mpx_mawa_shift()=0, so we do not need to check > + * separately for whether MPX management is enabled. > + */ > + if (mpx_mawa_shift(prev) == mpx_mawa_shift(next)) > + return; Please stop the senseless looking wrappery - if the field is name sensibly then it can be accessed directly through mm_struct. > + > + if (mpx_mawa_shift(next)) { > + wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0); > + } else { > + /* clear the enable bit: */ > + wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0); > + } > +} > + > void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, > struct task_struct *tsk) > { > @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct > /* Load per-mm CR4 state */ > load_mm_cr4(next); > > + switch_mawa(prev, next); This implementation adds about 4-5 unnecessary instructions to the context switching hot path of every non-MPX task, even on non-MPX hardware. Please make sure that this is something like: if (unlikely(prev->mpx_msr_val != next->mpx_msr_val)) switch_mpx(prev, next); ... which reduces the hot path overhead to something like 2 instruction (if we are lucky). This can be put into switch_mpx() and can be inlined - just make sure that on a defconfig the generated machine code is sane. Thanks, Ingo