From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by kanga.kvack.org (Postfix) with ESMTP id 20A2E6B0037 for ; Wed, 18 Jun 2014 10:41:17 -0400 (EDT) Received: by mail-pa0-f43.google.com with SMTP id lf10so815913pab.16 for ; Wed, 18 Jun 2014 07:41:16 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTP id mi7si2438151pab.136.2014.06.18.07.41.15 for ; Wed, 18 Jun 2014 07:41:16 -0700 (PDT) Message-ID: <53A1A509.4080802@intel.com> Date: Wed, 18 Jun 2014 07:41:13 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 00/10] Intel MPX support References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Linux-MM On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > This patchset adds support for the Memory Protection Extensions > (MPX) feature found in future Intel processors. It's very important to note that this is a very different patch set than the last one. The way we are freeing the unused bounds tables is _completely different (9/10), and needs some very heavy mm reviews. I'm sure Qiaowei will cc linux-mm@ next time. We're also not asking that this be merged in its current state. The 32-bit binary on 64-bit kernel issue is a show stopper for merging, but we're trying to post early and often. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id 4BAD56B0031 for ; Mon, 23 Jun 2014 15:49:11 -0400 (EDT) Received: by mail-pa0-f50.google.com with SMTP id bj1so6254902pad.37 for ; Mon, 23 Jun 2014 12:49:11 -0700 (PDT) Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) by mx.google.com with ESMTPS id ba10si23317382pbd.48.2014.06.23.12.49.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Jun 2014 12:49:10 -0700 (PDT) Received: by mail-pd0-f172.google.com with SMTP id w10so6065611pde.31 for ; Mon, 23 Jun 2014 12:49:09 -0700 (PDT) From: Andy Lutomirski Message-ID: <53A884B2.5070702@mit.edu> Date: Mon, 23 Jun 2014 12:49:06 -0700 MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Linux MM On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > This patch adds one MPX specific mmap interface, which only handles > mpx related maps, including bounds table and bounds directory. > > In order to track MPX specific memory usage, this interface is added > to stick new vm_flag VM_MPX in the vma_area_struct when create a > bounds table or bounds directory. I imagine the linux-mm people would want to think about any new vm flag. Why is this needed? > > Signed-off-by: Qiaowei Ren > --- > arch/x86/Kconfig | 4 +++ > arch/x86/include/asm/mpx.h | 38 ++++++++++++++++++++++++++++ > arch/x86/mm/Makefile | 2 + > arch/x86/mm/mpx.c | 58 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 102 insertions(+), 0 deletions(-) > create mode 100644 arch/x86/include/asm/mpx.h > create mode 100644 arch/x86/mm/mpx.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 25d2c6f..0194790 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -237,6 +237,10 @@ config HAVE_INTEL_TXT > def_bool y > depends on INTEL_IOMMU && ACPI > > +config X86_INTEL_MPX > + def_bool y > + depends on CPU_SUP_INTEL > + > config X86_32_SMP > def_bool y > depends on X86_32 && SMP > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > new file mode 100644 > index 0000000..5725ac4 > --- /dev/null > +++ b/arch/x86/include/asm/mpx.h > @@ -0,0 +1,38 @@ > +#ifndef _ASM_X86_MPX_H > +#define _ASM_X86_MPX_H > + > +#include > +#include > + > +#ifdef CONFIG_X86_64 > + > +/* upper 28 bits [47:20] of the virtual address in 64-bit used to > + * index into bounds directory (BD). > + */ > +#define MPX_BD_ENTRY_OFFSET 28 > +#define MPX_BD_ENTRY_SHIFT 3 > +/* bits [19:3] of the virtual address in 64-bit used to index into > + * bounds table (BT). > + */ > +#define MPX_BT_ENTRY_OFFSET 17 > +#define MPX_BT_ENTRY_SHIFT 5 > +#define MPX_IGN_BITS 3 > + > +#else > + > +#define MPX_BD_ENTRY_OFFSET 20 > +#define MPX_BD_ENTRY_SHIFT 2 > +#define MPX_BT_ENTRY_OFFSET 10 > +#define MPX_BT_ENTRY_SHIFT 4 > +#define MPX_IGN_BITS 2 > + > +#endif > + > +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) > +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) > + > +#define MPX_BNDSTA_ERROR_CODE 0x3 > + > +unsigned long mpx_mmap(unsigned long len); > + > +#endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile > index 6a19ad9..ecfdc46 100644 > --- a/arch/x86/mm/Makefile > +++ b/arch/x86/mm/Makefile > @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o > obj-$(CONFIG_NUMA_EMU) += numa_emulation.o > > obj-$(CONFIG_MEMTEST) += memtest.o > + > +obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c > new file mode 100644 > index 0000000..546c5d1 > --- /dev/null > +++ b/arch/x86/mm/mpx.c > @@ -0,0 +1,58 @@ > +#include > +#include > +#include > +#include > +#include > + > +/* > + * this is really a simplified "vm_mmap". it only handles mpx > + * related maps, including bounds table and bounds directory. > + * > + * here we can stick new vm_flag VM_MPX in the vma_area_struct > + * when create a bounds table or bounds directory, in order to > + * track MPX specific memory. > + */ > +unsigned long mpx_mmap(unsigned long len) > +{ > + unsigned long ret; > + unsigned long addr, pgoff; > + struct mm_struct *mm = current->mm; > + vm_flags_t vm_flags; > + > + /* Only bounds table and bounds directory can be allocated here */ > + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) > + return -EINVAL; > + > + down_write(&mm->mmap_sem); > + > + /* Too many mappings? */ > + if (mm->map_count > sysctl_max_map_count) { > + ret = -ENOMEM; > + goto out; > + } > + > + /* Obtain the address to map to. we verify (or select) it and ensure > + * that it represents a valid section of the address space. > + */ > + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); > + if (addr & ~PAGE_MASK) { > + ret = addr; > + goto out; > + } > + > + vm_flags = VM_READ | VM_WRITE | VM_MPX | > + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; > + > + /* Make bounds tables and bouds directory unlocked. */ > + if (vm_flags & VM_LOCKED) > + vm_flags &= ~VM_LOCKED; Why? I would expect MCL_FUTURE to lock these. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id 5702B6B0037 for ; Mon, 23 Jun 2014 16:03:49 -0400 (EDT) Received: by mail-pa0-f50.google.com with SMTP id bj1so6330314pad.9 for ; Mon, 23 Jun 2014 13:03:49 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTP id qd5si23311218pbb.211.2014.06.23.13.03.48 for ; Mon, 23 Jun 2014 13:03:48 -0700 (PDT) Message-ID: <53A88806.1060908@intel.com> Date: Mon, 23 Jun 2014 13:03:18 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> In-Reply-To: <53A884B2.5070702@mit.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Linux MM On 06/23/2014 12:49 PM, Andy Lutomirski wrote: > On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >> This patch adds one MPX specific mmap interface, which only handles >> mpx related maps, including bounds table and bounds directory. >> >> In order to track MPX specific memory usage, this interface is added >> to stick new vm_flag VM_MPX in the vma_area_struct when create a >> bounds table or bounds directory. > > I imagine the linux-mm people would want to think about any new vm flag. > Why is this needed? These tables can take huge amounts of memory. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. My expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. With this feature, plus some grepping in /proc/$pid/smaps one could take a pretty good stab at it. I know VM flags are scarce, and I'm open to other ways to skin this cat. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f179.google.com (mail-lb0-f179.google.com [209.85.217.179]) by kanga.kvack.org (Postfix) with ESMTP id 1AB0F6B0037 for ; Mon, 23 Jun 2014 16:07:00 -0400 (EDT) Received: by mail-lb0-f179.google.com with SMTP id z11so5208071lbi.38 for ; Mon, 23 Jun 2014 13:06:59 -0700 (PDT) Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com [209.85.217.181]) by mx.google.com with ESMTPS id nc2si35730842lbc.35.2014.06.23.13.06.57 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Jun 2014 13:06:58 -0700 (PDT) Received: by mail-lb0-f181.google.com with SMTP id p9so5144516lbv.26 for ; Mon, 23 Jun 2014 13:06:57 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53A88806.1060908@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> From: Andy Lutomirski Date: Mon, 23 Jun 2014 13:06:37 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Mon, Jun 23, 2014 at 1:03 PM, Dave Hansen wrote: > On 06/23/2014 12:49 PM, Andy Lutomirski wrote: >> On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >>> This patch adds one MPX specific mmap interface, which only handles >>> mpx related maps, including bounds table and bounds directory. >>> >>> In order to track MPX specific memory usage, this interface is added >>> to stick new vm_flag VM_MPX in the vma_area_struct when create a >>> bounds table or bounds directory. >> >> I imagine the linux-mm people would want to think about any new vm flag. >> Why is this needed? > > These tables can take huge amounts of memory. In the worst-case > scenario, the tables can be 4x the size of the data structure being > tracked. IOW, a 1-page structure can require 4 bounds-table pages. > > My expectation is that folks using MPX are going to be keen on figuring > out how much memory is being dedicated to it. With this feature, plus > some grepping in /proc/$pid/smaps one could take a pretty good stab at it. > > I know VM flags are scarce, and I'm open to other ways to skin this cat. > Can the new vm_operation "name" be use for this? The magic "always written to core dumps" feature might need to be reconsidered. There's also arch_vma_name, but I just finished removing for x86, and I'd be a little sad to see it come right back. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id 455A76B0031 for ; Mon, 23 Jun 2014 16:28:36 -0400 (EDT) Received: by mail-pa0-f42.google.com with SMTP id lj1so6331491pab.1 for ; Mon, 23 Jun 2014 13:28:35 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id ym1si23520540pac.25.2014.06.23.13.28.35 for ; Mon, 23 Jun 2014 13:28:35 -0700 (PDT) Message-ID: <53A88DE4.8050107@intel.com> Date: Mon, 23 Jun 2014 13:28:20 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/23/2014 01:06 PM, Andy Lutomirski wrote: > Can the new vm_operation "name" be use for this? The magic "always > written to core dumps" feature might need to be reconsidered. One thing I'd like to avoid is an MPX vma getting merged with a non-MPX vma. I don't see any code to prevent two VMAs with different vm_ops->names from getting merged. That seems like a bit of a design oversight for ->name. Right? Thinking out loud a bit... There are also some more complicated but more performant cleanup mechanisms that I'd like to go after in the future. Given a page, we might want to figure out if it is an MPX page or not. I wonder if we'll ever collide with some other user of vm_ops->name. It looks fairly narrowly used at the moment, but would this keep us from putting these pages on, say, a tmpfs mount? Doesn't look that way at the moment. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by kanga.kvack.org (Postfix) with ESMTP id CC80C6B0031 for ; Mon, 23 Jun 2014 17:04:44 -0400 (EDT) Received: by mail-lb0-f176.google.com with SMTP id w7so5355199lbi.35 for ; Mon, 23 Jun 2014 14:04:44 -0700 (PDT) Received: from mail-lb0-f179.google.com (mail-lb0-f179.google.com [209.85.217.179]) by mx.google.com with ESMTPS id bm6si27840472lbb.30.2014.06.23.14.04.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Jun 2014 14:04:43 -0700 (PDT) Received: by mail-lb0-f179.google.com with SMTP id z11so5291756lbi.38 for ; Mon, 23 Jun 2014 14:04:42 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53A88DE4.8050107@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> From: Andy Lutomirski Date: Mon, 23 Jun 2014 14:04:22 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Mon, Jun 23, 2014 at 1:28 PM, Dave Hansen wrote: > On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >> Can the new vm_operation "name" be use for this? The magic "always >> written to core dumps" feature might need to be reconsidered. > > One thing I'd like to avoid is an MPX vma getting merged with a non-MPX > vma. I don't see any code to prevent two VMAs with different > vm_ops->names from getting merged. That seems like a bit of a design > oversight for ->name. Right? AFAIK there are no ->name users that don't also set ->close, for exactly that reason. I'd be okay with adding a check for ->name, too. Hmm. If MPX vmas had a real struct file attached, this would all come for free. Maybe vmas with non-default vm_ops and file != NULL should never be mergeable? > > Thinking out loud a bit... There are also some more complicated but more > performant cleanup mechanisms that I'd like to go after in the future. > Given a page, we might want to figure out if it is an MPX page or not. > I wonder if we'll ever collide with some other user of vm_ops->name. It > looks fairly narrowly used at the moment, but would this keep us from > putting these pages on, say, a tmpfs mount? Doesn't look that way at > the moment. You could always check the vm_ops pointer to see if it's MPX. One feature I've wanted: a way to have special per-process vmas that can be easily found. For example, I want to be able to efficiently find out where the vdso and vvar vmas are. I don't think this is currently supported. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f44.google.com (mail-pb0-f44.google.com [209.85.160.44]) by kanga.kvack.org (Postfix) with ESMTP id 548676B0037 for ; Mon, 23 Jun 2014 22:53:19 -0400 (EDT) Received: by mail-pb0-f44.google.com with SMTP id md12so6617096pbc.17 for ; Mon, 23 Jun 2014 19:53:18 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [143.182.124.21]) by mx.google.com with ESMTP id rm10si24367710pab.197.2014.06.23.19.53.17 for ; Mon, 23 Jun 2014 19:53:18 -0700 (PDT) From: "Ren, Qiaowei" Subject: RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Date: Tue, 24 Jun 2014 02:53:12 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016AF2DB@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> In-Reply-To: <53A884B2.5070702@mit.edu> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "Hansen, Dave" Cc: "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Linux MM On 2014-06-24, Andy Lutomirski wrote: >> + /* Make bounds tables and bouds directory unlocked. */ >> + if (vm_flags & VM_LOCKED) >> + vm_flags &=3D ~VM_LOCKED; >=20 > Why? I would expect MCL_FUTURE to lock these. >=20 Andy, I was just a little confused about LOCKED & POPULATE earlier and I th= ought VM_LOCKED is not necessary for MPX specific bounds tables. Now, this = checking should be removed, and there should be mm_populate() for VM_LOCKED= case after mmap_region(): if (!IS_ERR_VALUE(addr) && (vm_flags & VM_LOCKED)) mm_populate(addr, len); Thanks, Qiaowei -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f178.google.com (mail-pd0-f178.google.com [209.85.192.178]) by kanga.kvack.org (Postfix) with ESMTP id D12966B0031 for ; Tue, 24 Jun 2014 01:58:12 -0400 (EDT) Received: by mail-pd0-f178.google.com with SMTP id r10so6528185pdi.23 for ; Mon, 23 Jun 2014 22:58:12 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTP id nx10si24712682pbb.197.2014.06.23.22.58.11 for ; Mon, 23 Jun 2014 22:58:11 -0700 (PDT) From: "Ren, Qiaowei" Subject: RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Date: Tue, 24 Jun 2014 05:53:46 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> In-Reply-To: Content-Language: en-US Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , "Hansen, Dave" Cc: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM T24gMjAxNC0wNi0yNCwgQW5keSBMdXRvbWlyc2tpIHdyb3RlOg0KPj4gT24gMDYvMjMvMjAxNCAw MTowNiBQTSwgQW5keSBMdXRvbWlyc2tpIHdyb3RlOg0KPj4+IENhbiB0aGUgbmV3IHZtX29wZXJh dGlvbiAibmFtZSIgYmUgdXNlIGZvciB0aGlzPyAgVGhlIG1hZ2ljICJhbHdheXMNCj4+PiB3cml0 dGVuIHRvIGNvcmUgZHVtcHMiIGZlYXR1cmUgbWlnaHQgbmVlZCB0byBiZSByZWNvbnNpZGVyZWQu DQo+PiANCj4+IE9uZSB0aGluZyBJJ2QgbGlrZSB0byBhdm9pZCBpcyBhbiBNUFggdm1hIGdldHRp bmcgbWVyZ2VkIHdpdGggYQ0KPj4gbm9uLU1QWCB2bWEuICBJIGRvbid0IHNlZSBhbnkgY29kZSB0 byBwcmV2ZW50IHR3byBWTUFzIHdpdGgNCj4+IGRpZmZlcmVudCB2bV9vcHMtPm5hbWVzIGZyb20g Z2V0dGluZyBtZXJnZWQuICBUaGF0IHNlZW1zIGxpa2UgYSBiaXQNCj4+IG9mIGEgZGVzaWduIG92 ZXJzaWdodCBmb3IgLT5uYW1lLiAgUmlnaHQ/DQo+IA0KPiBBRkFJSyB0aGVyZSBhcmUgbm8gLT5u YW1lIHVzZXJzIHRoYXQgZG9uJ3QgYWxzbyBzZXQgLT5jbG9zZSwgZm9yDQo+IGV4YWN0bHkgdGhh dCByZWFzb24uICBJJ2QgYmUgb2theSB3aXRoIGFkZGluZyBhIGNoZWNrIGZvciAtPm5hbWUsIHRv by4NCj4gDQo+IEhtbS4gIElmIE1QWCB2bWFzIGhhZCBhIHJlYWwgc3RydWN0IGZpbGUgYXR0YWNo ZWQsIHRoaXMgd291bGQgYWxsIGNvbWUNCj4gZm9yIGZyZWUuIE1heWJlIHZtYXMgd2l0aCBub24t ZGVmYXVsdCB2bV9vcHMgYW5kIGZpbGUgIT0gTlVMTCBzaG91bGQNCj4gbmV2ZXIgYmUgbWVyZ2Vh YmxlPw0KPiANCj4+IA0KPj4gVGhpbmtpbmcgb3V0IGxvdWQgYSBiaXQuLi4gVGhlcmUgYXJlIGFs c28gc29tZSBtb3JlIGNvbXBsaWNhdGVkIGJ1dA0KPj4gbW9yZSBwZXJmb3JtYW50IGNsZWFudXAg bWVjaGFuaXNtcyB0aGF0IEknZCBsaWtlIHRvIGdvIGFmdGVyIGluIHRoZSBmdXR1cmUuDQo+PiBH aXZlbiBhIHBhZ2UsIHdlIG1pZ2h0IHdhbnQgdG8gZmlndXJlIG91dCBpZiBpdCBpcyBhbiBNUFgg cGFnZSBvciBub3QuDQo+PiBJIHdvbmRlciBpZiB3ZSdsbCBldmVyIGNvbGxpZGUgd2l0aCBzb21l IG90aGVyIHVzZXIgb2Ygdm1fb3BzLT5uYW1lLg0KPj4gSXQgbG9va3MgZmFpcmx5IG5hcnJvd2x5 IHVzZWQgYXQgdGhlIG1vbWVudCwgYnV0IHdvdWxkIHRoaXMga2VlcCB1cw0KPj4gZnJvbSBwdXR0 aW5nIHRoZXNlIHBhZ2VzIG9uLCBzYXksIGEgdG1wZnMgbW91bnQ/ICBEb2Vzbid0IGxvb2sgdGhh dA0KPj4gd2F5IGF0IHRoZSBtb21lbnQuDQo+IA0KPiBZb3UgY291bGQgYWx3YXlzIGNoZWNrIHRo ZSB2bV9vcHMgcG9pbnRlciB0byBzZWUgaWYgaXQncyBNUFguDQo+IA0KPiBPbmUgZmVhdHVyZSBJ J3ZlIHdhbnRlZDogYSB3YXkgdG8gaGF2ZSBzcGVjaWFsIHBlci1wcm9jZXNzIHZtYXMgdGhhdA0K PiBjYW4gYmUgZWFzaWx5IGZvdW5kLiAgRm9yIGV4YW1wbGUsIEkgd2FudCB0byBiZSBhYmxlIHRv IGVmZmljaWVudGx5DQo+IGZpbmQgb3V0IHdoZXJlIHRoZSB2ZHNvIGFuZCB2dmFyIHZtYXMgYXJl LiAgSSBkb24ndCB0aGluayB0aGlzIGlzIGN1cnJlbnRseSBzdXBwb3J0ZWQuDQo+IA0KQW5keSwg aWYgeW91IGFkZCBhIGNoZWNrIGZvciAtPm5hbWUgdG8gYXZvaWQgdGhlIE1QWCB2bWFzIG1lcmdl ZCB3aXRoIG5vbi1NUFggdm1hcywgSSBndWVzcyB0aGUgd29yayBmbG93IHNob3VsZCBiZSBhcyBm b2xsb3cgKHVzZSBfaW5zdGFsbF9zcGVjaWFsX21hcHBpbmcgdG8gZ2V0IGEgbmV3IHZtYSk6DQoN CnVuc2lnbmVkIGxvbmcgbXB4X21tYXAodW5zaWduZWQgbG9uZyBsZW4pDQp7DQogICAgLi4uLi4u DQogICAgc3RhdGljIHN0cnVjdCB2bV9zcGVjaWFsX21hcHBpbmcgbXB4X21hcHBpbmcgPSB7DQog ICAgICAgIC5uYW1lID0gIlttcHhdIiwNCiAgICAgICAgLnBhZ2VzID0gbm9fcGFnZXMsDQogICAg fTsNCg0KICAgIC4uLi4uLi4NCiAgICB2bWEgPSBfaW5zdGFsbF9zcGVjaWFsX21hcHBpbmcobW0s IGFkZHIsIGxlbiwgdm1fZmxhZ3MsICZtcHhfbWFwcGluZyk7DQogICAgLi4uLi4uDQp9DQoNClRo ZW4sIHdlIGNvdWxkIGNoZWNrIHRoZSAtPm5hbWUgdG8gc2VlIGlmIHRoZSBWTUEgaXMgTVBYIHNw ZWNpZmljLiBSaWdodD8NCg0KVGhhbmtzLA0KUWlhb3dlaQ0KDQo= -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f44.google.com (mail-la0-f44.google.com [209.85.215.44]) by kanga.kvack.org (Postfix) with ESMTP id 42FDE6B0031 for ; Tue, 24 Jun 2014 19:56:04 -0400 (EDT) Received: by mail-la0-f44.google.com with SMTP id ty20so193652lab.31 for ; Tue, 24 Jun 2014 16:56:03 -0700 (PDT) Received: from mail-lb0-f173.google.com (mail-lb0-f173.google.com [209.85.217.173]) by mx.google.com with ESMTPS id td2si3444087lbb.50.2014.06.24.16.56.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 24 Jun 2014 16:56:02 -0700 (PDT) Received: by mail-lb0-f173.google.com with SMTP id s7so1297788lbd.18 for ; Tue, 24 Jun 2014 16:56:01 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> From: Andy Lutomirski Date: Tue, 24 Jun 2014 16:55:41 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: "Ren, Qiaowei" Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei wrote: > On 2014-06-24, Andy Lutomirski wrote: >>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>> Can the new vm_operation "name" be use for this? The magic "always >>>> written to core dumps" feature might need to be reconsidered. >>> >>> One thing I'd like to avoid is an MPX vma getting merged with a >>> non-MPX vma. I don't see any code to prevent two VMAs with >>> different vm_ops->names from getting merged. That seems like a bit >>> of a design oversight for ->name. Right? >> >> AFAIK there are no ->name users that don't also set ->close, for >> exactly that reason. I'd be okay with adding a check for ->name, too. >> >> Hmm. If MPX vmas had a real struct file attached, this would all come >> for free. Maybe vmas with non-default vm_ops and file != NULL should >> never be mergeable? >> >>> >>> Thinking out loud a bit... There are also some more complicated but >>> more performant cleanup mechanisms that I'd like to go after in the future. >>> Given a page, we might want to figure out if it is an MPX page or not. >>> I wonder if we'll ever collide with some other user of vm_ops->name. >>> It looks fairly narrowly used at the moment, but would this keep us >>> from putting these pages on, say, a tmpfs mount? Doesn't look that >>> way at the moment. >> >> You could always check the vm_ops pointer to see if it's MPX. >> >> One feature I've wanted: a way to have special per-process vmas that >> can be easily found. For example, I want to be able to efficiently >> find out where the vdso and vvar vmas are. I don't think this is currently supported. >> > Andy, if you add a check for ->name to avoid the MPX vmas merged with non-MPX vmas, I guess the work flow should be as follow (use _install_special_mapping to get a new vma): > > unsigned long mpx_mmap(unsigned long len) > { > ...... > static struct vm_special_mapping mpx_mapping = { > .name = "[mpx]", > .pages = no_pages, > }; > > ....... > vma = _install_special_mapping(mm, addr, len, vm_flags, &mpx_mapping); > ...... > } > > Then, we could check the ->name to see if the VMA is MPX specific. Right? Does this actually create a vma backed with real memory? Doesn't this need to go through anon_vma or something? _install_special_mapping completely prevents merging. Possibly silly question: would it make more sense to just create one giant vma for the MPX tables and only populate pieces of it as needed? This wouldn't work for 32-bit code, but maybe we don't care. (I see no reason why it couldn't work for x32, though.) (I don't really understand how anonymous memory works at all. I'm not an mm person.) --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by kanga.kvack.org (Postfix) with ESMTP id 61B806B0031 for ; Tue, 24 Jun 2014 21:41:21 -0400 (EDT) Received: by mail-pb0-f54.google.com with SMTP id un15so994932pbc.27 for ; Tue, 24 Jun 2014 18:41:21 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id nd4si2894997pbc.20.2014.06.24.18.41.19 for ; Tue, 24 Jun 2014 18:41:20 -0700 (PDT) From: "Ren, Qiaowei" Subject: RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Date: Wed, 25 Jun 2014 01:40:48 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> In-Reply-To: Content-Language: en-US Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM T24gMjAxNC0wNi0yNSwgQW5keSBMdXRvbWlyc2tpIHdyb3RlOg0KPiBPbiBNb24sIEp1biAyMywg MjAxNCBhdCAxMDo1MyBQTSwgUmVuLCBRaWFvd2VpIDxxaWFvd2VpLnJlbkBpbnRlbC5jb20+DQo+ IHdyb3RlOg0KPj4gT24gMjAxNC0wNi0yNCwgQW5keSBMdXRvbWlyc2tpIHdyb3RlOg0KPj4+PiBP biAwNi8yMy8yMDE0IDAxOjA2IFBNLCBBbmR5IEx1dG9taXJza2kgd3JvdGU6DQo+Pj4+PiBDYW4g dGhlIG5ldyB2bV9vcGVyYXRpb24gIm5hbWUiIGJlIHVzZSBmb3IgdGhpcz8gIFRoZSBtYWdpYw0K Pj4+Pj4gImFsd2F5cyB3cml0dGVuIHRvIGNvcmUgZHVtcHMiIGZlYXR1cmUgbWlnaHQgbmVlZCB0 byBiZSByZWNvbnNpZGVyZWQuDQo+Pj4+IA0KPj4+PiBPbmUgdGhpbmcgSSdkIGxpa2UgdG8gYXZv aWQgaXMgYW4gTVBYIHZtYSBnZXR0aW5nIG1lcmdlZCB3aXRoIGENCj4+Pj4gbm9uLU1QWCB2bWEu ICBJIGRvbid0IHNlZSBhbnkgY29kZSB0byBwcmV2ZW50IHR3byBWTUFzIHdpdGgNCj4+Pj4gZGlm ZmVyZW50IHZtX29wcy0+bmFtZXMgZnJvbSBnZXR0aW5nIG1lcmdlZC4gIFRoYXQgc2VlbXMgbGlr ZSBhDQo+Pj4+IGJpdCBvZiBhIGRlc2lnbiBvdmVyc2lnaHQgZm9yIC0+bmFtZS4gIFJpZ2h0Pw0K Pj4+IA0KPj4+IEFGQUlLIHRoZXJlIGFyZSBubyAtPm5hbWUgdXNlcnMgdGhhdCBkb24ndCBhbHNv IHNldCAtPmNsb3NlLCBmb3INCj4+PiBleGFjdGx5IHRoYXQgcmVhc29uLiAgSSdkIGJlIG9rYXkg d2l0aCBhZGRpbmcgYSBjaGVjayBmb3IgLT5uYW1lLCB0b28uDQo+Pj4gDQo+Pj4gSG1tLiAgSWYg TVBYIHZtYXMgaGFkIGEgcmVhbCBzdHJ1Y3QgZmlsZSBhdHRhY2hlZCwgdGhpcyB3b3VsZCBhbGwN Cj4+PiBjb21lIGZvciBmcmVlLiBNYXliZSB2bWFzIHdpdGggbm9uLWRlZmF1bHQgdm1fb3BzIGFu ZCBmaWxlICE9IE5VTEwNCj4+PiBzaG91bGQgbmV2ZXIgYmUgbWVyZ2VhYmxlPw0KPj4+IA0KPj4+ PiANCj4+Pj4gVGhpbmtpbmcgb3V0IGxvdWQgYSBiaXQuLi4gVGhlcmUgYXJlIGFsc28gc29tZSBt b3JlIGNvbXBsaWNhdGVkDQo+Pj4+IGJ1dCBtb3JlIHBlcmZvcm1hbnQgY2xlYW51cCBtZWNoYW5p c21zIHRoYXQgSSdkIGxpa2UgdG8gZ28gYWZ0ZXIgaW4gdGhlIGZ1dHVyZS4NCj4+Pj4gR2l2ZW4g YSBwYWdlLCB3ZSBtaWdodCB3YW50IHRvIGZpZ3VyZSBvdXQgaWYgaXQgaXMgYW4gTVBYIHBhZ2Ug b3Igbm90Lg0KPj4+PiBJIHdvbmRlciBpZiB3ZSdsbCBldmVyIGNvbGxpZGUgd2l0aCBzb21lIG90 aGVyIHVzZXIgb2Ygdm1fb3BzLT5uYW1lLg0KPj4+PiBJdCBsb29rcyBmYWlybHkgbmFycm93bHkg dXNlZCBhdCB0aGUgbW9tZW50LCBidXQgd291bGQgdGhpcyBrZWVwDQo+Pj4+IHVzIGZyb20gcHV0 dGluZyB0aGVzZSBwYWdlcyBvbiwgc2F5LCBhIHRtcGZzIG1vdW50PyAgRG9lc24ndCBsb29rDQo+ Pj4+IHRoYXQgd2F5IGF0IHRoZSBtb21lbnQuDQo+Pj4gDQo+Pj4gWW91IGNvdWxkIGFsd2F5cyBj aGVjayB0aGUgdm1fb3BzIHBvaW50ZXIgdG8gc2VlIGlmIGl0J3MgTVBYLg0KPj4+IA0KPj4+IE9u ZSBmZWF0dXJlIEkndmUgd2FudGVkOiBhIHdheSB0byBoYXZlIHNwZWNpYWwgcGVyLXByb2Nlc3Mg dm1hcyB0aGF0DQo+Pj4gY2FuIGJlIGVhc2lseSBmb3VuZC4gIEZvciBleGFtcGxlLCBJIHdhbnQg dG8gYmUgYWJsZSB0byBlZmZpY2llbnRseQ0KPj4+IGZpbmQgb3V0IHdoZXJlIHRoZSB2ZHNvIGFu ZCB2dmFyIHZtYXMgYXJlLiAgSSBkb24ndCB0aGluayB0aGlzIGlzDQo+Pj4gY3VycmVudGx5IHN1 cHBvcnRlZC4NCj4+PiANCj4+IEFuZHksIGlmIHlvdSBhZGQgYSBjaGVjayBmb3IgLT5uYW1lIHRv IGF2b2lkIHRoZSBNUFggdm1hcyBtZXJnZWQNCj4+IHdpdGgNCj4gbm9uLU1QWCB2bWFzLCBJIGd1 ZXNzIHRoZSB3b3JrIGZsb3cgc2hvdWxkIGJlIGFzIGZvbGxvdyAodXNlDQo+IF9pbnN0YWxsX3Nw ZWNpYWxfbWFwcGluZyB0byBnZXQgYSBuZXcgdm1hKToNCj4+IA0KPj4gdW5zaWduZWQgbG9uZyBt cHhfbW1hcCh1bnNpZ25lZCBsb25nIGxlbikgew0KPj4gICAgIC4uLi4uLg0KPj4gICAgIHN0YXRp YyBzdHJ1Y3Qgdm1fc3BlY2lhbF9tYXBwaW5nIG1weF9tYXBwaW5nID0gew0KPj4gICAgICAgICAu bmFtZSA9ICJbbXB4XSIsDQo+PiAgICAgICAgIC5wYWdlcyA9IG5vX3BhZ2VzLA0KPj4gICAgIH07 DQo+PiAgICAgDQo+PiAgICAgLi4uLi4uLiB2bWEgPSBfaW5zdGFsbF9zcGVjaWFsX21hcHBpbmco bW0sIGFkZHIsIGxlbiwgdm1fZmxhZ3MsDQo+PiAgICAgJm1weF9tYXBwaW5nKTsgLi4uLi4uDQo+ PiB9DQo+PiANCj4+IFRoZW4sIHdlIGNvdWxkIGNoZWNrIHRoZSAtPm5hbWUgdG8gc2VlIGlmIHRo ZSBWTUEgaXMgTVBYIHNwZWNpZmljLiBSaWdodD8NCj4gDQo+IERvZXMgdGhpcyBhY3R1YWxseSBj cmVhdGUgYSB2bWEgYmFja2VkIHdpdGggcmVhbCBtZW1vcnk/ICBEb2Vzbid0IHRoaXMNCj4gbmVl ZCB0byBnbyB0aHJvdWdoIGFub25fdm1hIG9yIHNvbWV0aGluZz8gIF9pbnN0YWxsX3NwZWNpYWxf bWFwcGluZw0KPiBjb21wbGV0ZWx5IHByZXZlbnRzIG1lcmdpbmcuDQo+IA0KSG1tLCBfaW5zdGFs bF9zcGVjaWFsX21hcHBpbmcgc2hvdWxkIGNvbXBsZXRlbHkgcHJldmVudCBtZXJnaW5nLCBldmVu IGFtb25nIE1QWCB2bWFzLg0KDQpTbywgY291bGQgeW91IHRlbGwgbWUgaG93IHRvIHNldCBNUFgg c3BlY2lmaWMgLT5uYW1lIHRvIHRoZSB2bWEgd2hlbiBpdCBpcyBjcmVhdGVkPyBTZWVtcyBsaWtl IHRoYXQgSSBjb3VsZCBub3QgZmluZCBzdWNoIGludGVyZmFjZS4NCg0KVGhhbmtzLA0KUWlhb3dl aQ0K -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by kanga.kvack.org (Postfix) with ESMTP id E96366B0035 for ; Wed, 25 Jun 2014 17:04:59 -0400 (EDT) Received: by mail-lb0-f176.google.com with SMTP id w7so2291121lbi.35 for ; Wed, 25 Jun 2014 14:04:59 -0700 (PDT) Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by mx.google.com with ESMTPS id pu8si9532073lbb.37.2014.06.25.14.04.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 25 Jun 2014 14:04:58 -0700 (PDT) Received: by mail-la0-f42.google.com with SMTP id pn19so1204281lab.15 for ; Wed, 25 Jun 2014 14:04:57 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> From: Andy Lutomirski Date: Wed, 25 Jun 2014 14:04:36 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: "Ren, Qiaowei" Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Tue, Jun 24, 2014 at 6:40 PM, Ren, Qiaowei wrote: > On 2014-06-25, Andy Lutomirski wrote: >> On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei >> wrote: >>> On 2014-06-24, Andy Lutomirski wrote: >>>>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>>>> Can the new vm_operation "name" be use for this? The magic >>>>>> "always written to core dumps" feature might need to be reconsidered. >>>>> >>>>> One thing I'd like to avoid is an MPX vma getting merged with a >>>>> non-MPX vma. I don't see any code to prevent two VMAs with >>>>> different vm_ops->names from getting merged. That seems like a >>>>> bit of a design oversight for ->name. Right? >>>> >>>> AFAIK there are no ->name users that don't also set ->close, for >>>> exactly that reason. I'd be okay with adding a check for ->name, too. >>>> >>>> Hmm. If MPX vmas had a real struct file attached, this would all >>>> come for free. Maybe vmas with non-default vm_ops and file != NULL >>>> should never be mergeable? >>>> >>>>> >>>>> Thinking out loud a bit... There are also some more complicated >>>>> but more performant cleanup mechanisms that I'd like to go after in the future. >>>>> Given a page, we might want to figure out if it is an MPX page or not. >>>>> I wonder if we'll ever collide with some other user of vm_ops->name. >>>>> It looks fairly narrowly used at the moment, but would this keep >>>>> us from putting these pages on, say, a tmpfs mount? Doesn't look >>>>> that way at the moment. >>>> >>>> You could always check the vm_ops pointer to see if it's MPX. >>>> >>>> One feature I've wanted: a way to have special per-process vmas that >>>> can be easily found. For example, I want to be able to efficiently >>>> find out where the vdso and vvar vmas are. I don't think this is >>>> currently supported. >>>> >>> Andy, if you add a check for ->name to avoid the MPX vmas merged >>> with >> non-MPX vmas, I guess the work flow should be as follow (use >> _install_special_mapping to get a new vma): >>> >>> unsigned long mpx_mmap(unsigned long len) { >>> ...... >>> static struct vm_special_mapping mpx_mapping = { >>> .name = "[mpx]", >>> .pages = no_pages, >>> }; >>> >>> ....... vma = _install_special_mapping(mm, addr, len, vm_flags, >>> &mpx_mapping); ...... >>> } >>> >>> Then, we could check the ->name to see if the VMA is MPX specific. Right? >> >> Does this actually create a vma backed with real memory? Doesn't this >> need to go through anon_vma or something? _install_special_mapping >> completely prevents merging. >> > Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. > > So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. You may need to add one. I'd suggest posting a new thread to linux-mm describing what you need and asking how to do it. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f46.google.com (mail-la0-f46.google.com [209.85.215.46]) by kanga.kvack.org (Postfix) with ESMTP id CE5CD6B0036 for ; Wed, 25 Jun 2014 17:05:49 -0400 (EDT) Received: by mail-la0-f46.google.com with SMTP id el20so1178589lab.5 for ; Wed, 25 Jun 2014 14:05:48 -0700 (PDT) Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) by mx.google.com with ESMTPS id xs1si9533647lbb.39.2014.06.25.14.05.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 25 Jun 2014 14:05:47 -0700 (PDT) Received: by mail-la0-f51.google.com with SMTP id mc6so1185500lab.38 for ; Wed, 25 Jun 2014 14:05:47 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> From: Andy Lutomirski Date: Wed, 25 Jun 2014 14:05:27 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: "Ren, Qiaowei" Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Wed, Jun 25, 2014 at 2:04 PM, Andy Lutomirski wrote: > On Tue, Jun 24, 2014 at 6:40 PM, Ren, Qiaowei wrote: >> On 2014-06-25, Andy Lutomirski wrote: >>> On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei >>> wrote: >>>> On 2014-06-24, Andy Lutomirski wrote: >>>>>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>>>>> Can the new vm_operation "name" be use for this? The magic >>>>>>> "always written to core dumps" feature might need to be reconsidered. >>>>>> >>>>>> One thing I'd like to avoid is an MPX vma getting merged with a >>>>>> non-MPX vma. I don't see any code to prevent two VMAs with >>>>>> different vm_ops->names from getting merged. That seems like a >>>>>> bit of a design oversight for ->name. Right? >>>>> >>>>> AFAIK there are no ->name users that don't also set ->close, for >>>>> exactly that reason. I'd be okay with adding a check for ->name, too. >>>>> >>>>> Hmm. If MPX vmas had a real struct file attached, this would all >>>>> come for free. Maybe vmas with non-default vm_ops and file != NULL >>>>> should never be mergeable? >>>>> >>>>>> >>>>>> Thinking out loud a bit... There are also some more complicated >>>>>> but more performant cleanup mechanisms that I'd like to go after in the future. >>>>>> Given a page, we might want to figure out if it is an MPX page or not. >>>>>> I wonder if we'll ever collide with some other user of vm_ops->name. >>>>>> It looks fairly narrowly used at the moment, but would this keep >>>>>> us from putting these pages on, say, a tmpfs mount? Doesn't look >>>>>> that way at the moment. >>>>> >>>>> You could always check the vm_ops pointer to see if it's MPX. >>>>> >>>>> One feature I've wanted: a way to have special per-process vmas that >>>>> can be easily found. For example, I want to be able to efficiently >>>>> find out where the vdso and vvar vmas are. I don't think this is >>>>> currently supported. >>>>> >>>> Andy, if you add a check for ->name to avoid the MPX vmas merged >>>> with >>> non-MPX vmas, I guess the work flow should be as follow (use >>> _install_special_mapping to get a new vma): >>>> >>>> unsigned long mpx_mmap(unsigned long len) { >>>> ...... >>>> static struct vm_special_mapping mpx_mapping = { >>>> .name = "[mpx]", >>>> .pages = no_pages, >>>> }; >>>> >>>> ....... vma = _install_special_mapping(mm, addr, len, vm_flags, >>>> &mpx_mapping); ...... >>>> } >>>> >>>> Then, we could check the ->name to see if the VMA is MPX specific. Right? >>> >>> Does this actually create a vma backed with real memory? Doesn't this >>> need to go through anon_vma or something? _install_special_mapping >>> completely prevents merging. >>> >> Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. >> >> So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. > > You may need to add one. > > I'd suggest posting a new thread to linux-mm describing what you need > and asking how to do it. Hmm. the memfd_create thing may be able to do this for you. If you created a per-mm memfd and mapped it, it all just might work. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by kanga.kvack.org (Postfix) with ESMTP id 59F336B0035 for ; Wed, 25 Jun 2014 17:43:15 -0400 (EDT) Received: by mail-pb0-f54.google.com with SMTP id un15so2233569pbc.27 for ; Wed, 25 Jun 2014 14:43:15 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id qe5si6997464pac.103.2014.06.25.14.43.13 for ; Wed, 25 Jun 2014 14:43:14 -0700 (PDT) Message-ID: <53AB4271.6040301@intel.com> Date: Wed, 25 Jun 2014 14:43:13 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , "Ren, Qiaowei" Cc: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/25/2014 02:04 PM, Andy Lutomirski wrote: > On Tue, Jun 24, 2014 at 6:40 PM, Ren, Qiaowei wrote: >> Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. >> >> So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. > > You may need to add one. > > I'd suggest posting a new thread to linux-mm describing what you need > and asking how to do it. I shared this with Qiaowei privately, but might as well repeat myself here in case anyone wants to set me straight. Most of the interfaces do to set vm_ops do it in file_operations ->mmap op. Nobody sets ->vm_ops on anonymous VMAs, so we're in uncharted territory. My suggestion: you can either plumb a new API down in to mmap_region() to get the VMA or set ->vm_ops, or just call find_vma() after mmap_region() or get_unmapped_area() and set it manually. Just make sure you still have mmap_sem held over the whole thing. I think I prefer just setting ->vm_ops directly, even though it's a wee bit of a hack to create something just to look it up a moment later. Oh, well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by kanga.kvack.org (Postfix) with ESMTP id 0F4356B00AB for ; Thu, 26 Jun 2014 18:12:31 -0400 (EDT) Received: by mail-pa0-f51.google.com with SMTP id hz1so3707531pad.24 for ; Thu, 26 Jun 2014 15:12:31 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTP id kn1si11596979pbd.226.2014.06.26.15.12.30 for ; Thu, 26 Jun 2014 15:12:31 -0700 (PDT) Message-ID: <53AB42E1.4090102@intel.com> Date: Wed, 25 Jun 2014 14:45:05 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , "Ren, Qiaowei" Cc: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/25/2014 02:05 PM, Andy Lutomirski wrote: > Hmm. the memfd_create thing may be able to do this for you. If you > created a per-mm memfd and mapped it, it all just might work. memfd_create() seems to bring a fair amount of baggage along (the fd part :) if all we want is a marker. Really, all we need is _a_ bit, and some way to plumb to userspace the RSS values of VMAs with that bit set. Creating and mmap()'ing a fd seems a rather roundabout way to get there. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by kanga.kvack.org (Postfix) with ESMTP id 950216B00AE for ; Thu, 26 Jun 2014 18:19:48 -0400 (EDT) Received: by mail-la0-f42.google.com with SMTP id pn19so2387026lab.1 for ; Thu, 26 Jun 2014 15:19:47 -0700 (PDT) Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by mx.google.com with ESMTPS id bm6si16715140lbb.30.2014.06.26.15.19.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 26 Jun 2014 15:19:47 -0700 (PDT) Received: by mail-lb0-f169.google.com with SMTP id l4so3456213lbv.0 for ; Thu, 26 Jun 2014 15:19:47 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53AB42E1.4090102@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 15:19:26 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: > On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >> Hmm. the memfd_create thing may be able to do this for you. If you >> created a per-mm memfd and mapped it, it all just might work. > > memfd_create() seems to bring a fair amount of baggage along (the fd > part :) if all we want is a marker. Really, all we need is _a_ bit, and > some way to plumb to userspace the RSS values of VMAs with that bit set. > > Creating and mmap()'ing a fd seems a rather roundabout way to get there. Hmm. So does VM_MPX, though. If this stuff were done entirely in userspace, then memfd_create would be exactly the right solution, I think. Would it work to just scan the bound directory to figure out how many bound tables exist? --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by kanga.kvack.org (Postfix) with ESMTP id 20C836B0083 for ; Thu, 26 Jun 2014 18:59:02 -0400 (EDT) Received: by mail-pd0-f171.google.com with SMTP id fp1so3608734pdb.30 for ; Thu, 26 Jun 2014 15:59:01 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTP id ew3si11714882pbb.184.2014.06.26.15.59.00 for ; Thu, 26 Jun 2014 15:59:01 -0700 (PDT) Message-ID: <53ACA5B3.3010702@intel.com> Date: Thu, 26 Jun 2014 15:58:59 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/26/2014 03:19 PM, Andy Lutomirski wrote: > On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: >> On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >>> Hmm. the memfd_create thing may be able to do this for you. If you >>> created a per-mm memfd and mapped it, it all just might work. >> >> memfd_create() seems to bring a fair amount of baggage along (the fd >> part :) if all we want is a marker. Really, all we need is _a_ bit, and >> some way to plumb to userspace the RSS values of VMAs with that bit set. >> >> Creating and mmap()'ing a fd seems a rather roundabout way to get there. > > Hmm. So does VM_MPX, though. If this stuff were done entirely in > userspace, then memfd_create would be exactly the right solution, I > think. > > Would it work to just scan the bound directory to figure out how many > bound tables exist? Theoretically, perhaps. Practically, the bounds directory is 2GB, and it is likely to be very sparse. You would have to walk the page tables finding where pages were mapped, then search the mapped pages for bounds table entries. Assuming that it was aligned and minimally populated, that's a *MINIMUM* search looking for a PGD entry, then you have to look at 512 PUD entries. A full search would have to look at half a million ptes. That's just finding out how sparse the first level of the tables are before you've looked at a byte of actual data, and if they were empty. We could keep another, parallel, data structure that handles this better other than the hardware tables. Like, say, an rbtree that stores ranges of virtual addresses. We could call them vm_area_somethings ... wait a sec... we have a structure like that. ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f45.google.com (mail-la0-f45.google.com [209.85.215.45]) by kanga.kvack.org (Postfix) with ESMTP id 84A336B003D for ; Thu, 26 Jun 2014 19:15:34 -0400 (EDT) Received: by mail-la0-f45.google.com with SMTP id hr17so2393990lab.18 for ; Thu, 26 Jun 2014 16:15:33 -0700 (PDT) Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by mx.google.com with ESMTPS id iz6si16846324lbc.46.2014.06.26.16.15.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 26 Jun 2014 16:15:32 -0700 (PDT) Received: by mail-la0-f50.google.com with SMTP id pv20so2360985lab.23 for ; Thu, 26 Jun 2014 16:15:32 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53ACA5B3.3010702@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 16:15:12 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Thu, Jun 26, 2014 at 3:58 PM, Dave Hansen wrote: > On 06/26/2014 03:19 PM, Andy Lutomirski wrote: >> On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: >>> On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >>>> Hmm. the memfd_create thing may be able to do this for you. If you >>>> created a per-mm memfd and mapped it, it all just might work. >>> >>> memfd_create() seems to bring a fair amount of baggage along (the fd >>> part :) if all we want is a marker. Really, all we need is _a_ bit, and >>> some way to plumb to userspace the RSS values of VMAs with that bit set. >>> >>> Creating and mmap()'ing a fd seems a rather roundabout way to get there. >> >> Hmm. So does VM_MPX, though. If this stuff were done entirely in >> userspace, then memfd_create would be exactly the right solution, I >> think. >> >> Would it work to just scan the bound directory to figure out how many >> bound tables exist? > > Theoretically, perhaps. > > Practically, the bounds directory is 2GB, and it is likely to be very > sparse. You would have to walk the page tables finding where pages were > mapped, then search the mapped pages for bounds table entries. > > Assuming that it was aligned and minimally populated, that's a *MINIMUM* > search looking for a PGD entry, then you have to look at 512 PUD > entries. A full search would have to look at half a million ptes. > That's just finding out how sparse the first level of the tables are > before you've looked at a byte of actual data, and if they were empty. > > We could keep another, parallel, data structure that handles this better > other than the hardware tables. Like, say, an rbtree that stores ranges > of virtual addresses. We could call them vm_area_somethings ... wait a > sec... we have a structure like that. ;) > > So here's my mental image of how I might do this if I were doing it entirely in userspace: I'd create a file or memfd for the bound tables and another for the bound directory. These files would be *huge*: the bound directory file would be 2GB and the bounds table file would be 2^48 bytes or whatever it is. (Maybe even bigger?) Then I'd just map pieces of those files wherever they'd need to be, and I'd make the mappings sparse. I suspect that you don't actually want a vma for each piece of bound table that gets mapped -- the space of vmas could end up incredibly sparse. So I'd at least map (in the vma sense, not the pte sense) and entire bound table at a time. And I'd probably just map the bound directory in one big piece. Then I'd populate it in the fault handler. This is almost what the code is doing, I think, modulo the files. This has one killer problem: these mappings need to be private (cowed on fork). So memfd is no good. There's got to be an easyish way to modify the mm code to allow anonymous maps with vm_ops. Maybe a new mmap_region parameter or something? Maybe even a special anon_vma, but I don't really understand how those work. Also, egads: what happens when a bound table entry is associated with a MAP_SHARED page? --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id 9EAC96B00B6 for ; Thu, 26 Jun 2014 20:19:57 -0400 (EDT) Received: by mail-pa0-f42.google.com with SMTP id lj1so3854852pab.1 for ; Thu, 26 Jun 2014 17:19:57 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTP id ez5si11897283pbc.174.2014.06.26.17.19.56 for ; Thu, 26 Jun 2014 17:19:56 -0700 (PDT) Message-ID: <53ACB8A7.9050002@intel.com> Date: Thu, 26 Jun 2014 17:19:51 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/26/2014 04:15 PM, Andy Lutomirski wrote: > So here's my mental image of how I might do this if I were doing it > entirely in userspace: I'd create a file or memfd for the bound tables > and another for the bound directory. These files would be *huge*: the > bound directory file would be 2GB and the bounds table file would be > 2^48 bytes or whatever it is. (Maybe even bigger?) > > Then I'd just map pieces of those files wherever they'd need to be, > and I'd make the mappings sparse. I suspect that you don't actually > want a vma for each piece of bound table that gets mapped -- the space > of vmas could end up incredibly sparse. So I'd at least map (in the > vma sense, not the pte sense) and entire bound table at a time. And > I'd probably just map the bound directory in one big piece. > > Then I'd populate it in the fault handler. > > This is almost what the code is doing, I think, modulo the files. > > This has one killer problem: these mappings need to be private (cowed > on fork). So memfd is no good. This essentially uses the page cache's radix tree as a parallel data structure in order to keep a vaddr->mpx_vma map. That's not a bad idea, but it is a parallel data structure that does not handle copy-on-write very well. I'm pretty sure we need the semantics that anonymous memory provides. > There's got to be an easyish way to > modify the mm code to allow anonymous maps with vm_ops. Maybe a new > mmap_region parameter or something? Maybe even a special anon_vma, > but I don't really understand how those work. Yeah, we very well might end up having to go down that path. > Also, egads: what happens when a bound table entry is associated with > a MAP_SHARED page? Bounds table entries are for pointers. Do we keep pointers inside of MAP_SHARED-mapped things? :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f43.google.com (mail-la0-f43.google.com [209.85.215.43]) by kanga.kvack.org (Postfix) with ESMTP id CBF2F6B00B8 for ; Thu, 26 Jun 2014 20:27:03 -0400 (EDT) Received: by mail-la0-f43.google.com with SMTP id e16so2420929lan.2 for ; Thu, 26 Jun 2014 17:27:02 -0700 (PDT) Received: from mail-la0-f46.google.com (mail-la0-f46.google.com [209.85.215.46]) by mx.google.com with ESMTPS id gk9si17022440lbc.45.2014.06.26.17.27.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 26 Jun 2014 17:27:01 -0700 (PDT) Received: by mail-la0-f46.google.com with SMTP id el20so2417791lab.19 for ; Thu, 26 Jun 2014 17:27:01 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53ACB8A7.9050002@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 17:26:40 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Thu, Jun 26, 2014 at 5:19 PM, Dave Hansen wrote: > On 06/26/2014 04:15 PM, Andy Lutomirski wrote: >> So here's my mental image of how I might do this if I were doing it >> entirely in userspace: I'd create a file or memfd for the bound tables >> and another for the bound directory. These files would be *huge*: the >> bound directory file would be 2GB and the bounds table file would be >> 2^48 bytes or whatever it is. (Maybe even bigger?) >> >> Then I'd just map pieces of those files wherever they'd need to be, >> and I'd make the mappings sparse. I suspect that you don't actually >> want a vma for each piece of bound table that gets mapped -- the space >> of vmas could end up incredibly sparse. So I'd at least map (in the >> vma sense, not the pte sense) and entire bound table at a time. And >> I'd probably just map the bound directory in one big piece. >> >> Then I'd populate it in the fault handler. >> >> This is almost what the code is doing, I think, modulo the files. >> >> This has one killer problem: these mappings need to be private (cowed >> on fork). So memfd is no good. > > This essentially uses the page cache's radix tree as a parallel data > structure in order to keep a vaddr->mpx_vma map. That's not a bad idea, > but it is a parallel data structure that does not handle copy-on-write > very well. > > I'm pretty sure we need the semantics that anonymous memory provides. > >> There's got to be an easyish way to >> modify the mm code to allow anonymous maps with vm_ops. Maybe a new >> mmap_region parameter or something? Maybe even a special anon_vma, >> but I don't really understand how those work. > > Yeah, we very well might end up having to go down that path. > >> Also, egads: what happens when a bound table entry is associated with >> a MAP_SHARED page? > > Bounds table entries are for pointers. Do we keep pointers inside of > MAP_SHARED-mapped things? :) Sure, if it's MAP_SHARED | MAP_ANONYMOUS. For example: struct thing { struct thing *next; }; struct thing *storage = mmap(..., MAP_SHARED | MAP_ANONYMOUS, ...); storage[0].next = &storage[1]; fork(); I'm not suggesting that this needs to *work* in the first incarnation of this :) --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f44.google.com (mail-pb0-f44.google.com [209.85.160.44]) by kanga.kvack.org (Postfix) with ESMTP id 5206C6B003B for ; Fri, 27 Jun 2014 13:34:51 -0400 (EDT) Received: by mail-pb0-f44.google.com with SMTP id md12so4818853pbc.17 for ; Fri, 27 Jun 2014 10:34:50 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id nd4si14719362pbc.20.2014.06.27.10.34.49 for ; Fri, 27 Jun 2014 10:34:50 -0700 (PDT) Message-ID: <53ADAB39.6030403@intel.com> Date: Fri, 27 Jun 2014 10:34:49 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/26/2014 05:26 PM, Andy Lutomirski wrote: > On Thu, Jun 26, 2014 at 5:19 PM, Dave Hansen wrote: >> On 06/26/2014 04:15 PM, Andy Lutomirski wrote: >>> Also, egads: what happens when a bound table entry is associated with >>> a MAP_SHARED page? >> >> Bounds table entries are for pointers. Do we keep pointers inside of >> MAP_SHARED-mapped things? :) > > Sure, if it's MAP_SHARED | MAP_ANONYMOUS. For example: > > struct thing { > struct thing *next; > }; > > struct thing *storage = mmap(..., MAP_SHARED | MAP_ANONYMOUS, ...); > storage[0].next = &storage[1]; > fork(); > > I'm not suggesting that this needs to *work* in the first incarnation of this :) I'm not sure I'm seeing the issue. I'm claiming that we need COW behavior for the bounds tables, at least by default. If userspace knows enough about the ways that it is using the tables and knows how to share them, let it go to town. The kernel will permit this kind of usage model, but we simply won't be helping with the management of the tables when userspace creates them. You've demonstrated a case where userspace might theoretically might want to share bounds tables (although I think it's pretty dangerous). It's equally theoretically possible that userspace might *not* want to share the tables for instance if one process narrowed the bounds and the other did not. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by kanga.kvack.org (Postfix) with ESMTP id 0658C6B003D for ; Fri, 27 Jun 2014 13:43:08 -0400 (EDT) Received: by mail-pd0-f177.google.com with SMTP id y10so4644908pdj.22 for ; Fri, 27 Jun 2014 10:43:08 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTP id fl5si14657737pbb.220.2014.06.27.10.43.07 for ; Fri, 27 Jun 2014 10:43:08 -0700 (PDT) Message-ID: <53ADACF7.7080203@intel.com> Date: Fri, 27 Jun 2014 10:42:15 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> <53ADAB39.6030403@intel.com> In-Reply-To: <53ADAB39.6030403@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On 06/27/2014 10:34 AM, Dave Hansen wrote: > I'm claiming that we need COW behavior for the bounds tables, at least > by default. If userspace knows enough about the ways that it is using > the tables and knows how to share them, let it go to town. The kernel > will permit this kind of usage model, but we simply won't be helping > with the management of the tables when userspace creates them. Actually, this is another reason we need to mark VMAs as being MPX-related explicitly instead of inferring it from the tables. If userspace does something really specialized like this, the kernel does not want to confuse these VMAs the ones it created. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by kanga.kvack.org (Postfix) with ESMTP id AC7AC6B0036 for ; Fri, 27 Jun 2014 14:57:37 -0400 (EDT) Received: by mail-lb0-f178.google.com with SMTP id 10so4238011lbg.23 for ; Fri, 27 Jun 2014 11:57:36 -0700 (PDT) Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by mx.google.com with ESMTPS id lc2si10619139lab.131.2014.06.27.11.57.35 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 27 Jun 2014 11:57:36 -0700 (PDT) Received: by mail-la0-f50.google.com with SMTP id pv20so3237457lab.37 for ; Fri, 27 Jun 2014 11:57:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53ADACF7.7080203@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> <53ADAB39.6030403@intel.com> <53ADACF7.7080203@intel.com> From: Andy Lutomirski Date: Fri, 27 Jun 2014 11:57:15 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM On Fri, Jun 27, 2014 at 10:42 AM, Dave Hansen wrote: > On 06/27/2014 10:34 AM, Dave Hansen wrote: >> I'm claiming that we need COW behavior for the bounds tables, at least >> by default. If userspace knows enough about the ways that it is using >> the tables and knows how to share them, let it go to town. The kernel >> will permit this kind of usage model, but we simply won't be helping >> with the management of the tables when userspace creates them. > > Actually, this is another reason we need to mark VMAs as being > MPX-related explicitly instead of inferring it from the tables. If > userspace does something really specialized like this, the kernel does > not want to confuse these VMAs the ones it created. > Good point. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965831AbaFRJsJ (ORCPT ); Wed, 18 Jun 2014 05:48:09 -0400 Received: from mga02.intel.com ([134.134.136.20]:24697 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965007AbaFRJsF (ORCPT ); Wed, 18 Jun 2014 05:48:05 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353766" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Date: Wed, 18 Jun 2014 17:44:08 +0800 Message-Id: <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds one MPX specific mmap interface, which only handles mpx related maps, including bounds table and bounds directory. In order to track MPX specific memory usage, this interface is added to stick new vm_flag VM_MPX in the vma_area_struct when create a bounds table or bounds directory. Signed-off-by: Qiaowei Ren --- arch/x86/Kconfig | 4 +++ arch/x86/include/asm/mpx.h | 38 ++++++++++++++++++++++++++++ arch/x86/mm/Makefile | 2 + arch/x86/mm/mpx.c | 58 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/mm/mpx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 25d2c6f..0194790 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -237,6 +237,10 @@ config HAVE_INTEL_TXT def_bool y depends on INTEL_IOMMU && ACPI +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL + config X86_32_SMP def_bool y depends on X86_32 && SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h new file mode 100644 index 0000000..5725ac4 --- /dev/null +++ b/arch/x86/include/asm/mpx.h @@ -0,0 +1,38 @@ +#ifndef _ASM_X86_MPX_H +#define _ASM_X86_MPX_H + +#include +#include + +#ifdef CONFIG_X86_64 + +/* upper 28 bits [47:20] of the virtual address in 64-bit used to + * index into bounds directory (BD). + */ +#define MPX_BD_ENTRY_OFFSET 28 +#define MPX_BD_ENTRY_SHIFT 3 +/* bits [19:3] of the virtual address in 64-bit used to index into + * bounds table (BT). + */ +#define MPX_BT_ENTRY_OFFSET 17 +#define MPX_BT_ENTRY_SHIFT 5 +#define MPX_IGN_BITS 3 + +#else + +#define MPX_BD_ENTRY_OFFSET 20 +#define MPX_BD_ENTRY_SHIFT 2 +#define MPX_BT_ENTRY_OFFSET 10 +#define MPX_BT_ENTRY_SHIFT 4 +#define MPX_IGN_BITS 2 + +#endif + +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) + +#define MPX_BNDSTA_ERROR_CODE 0x3 + +unsigned long mpx_mmap(unsigned long len); + +#endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 6a19ad9..ecfdc46 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_MEMTEST) += memtest.o + +obj-$(CONFIG_X86_INTEL_MPX) += mpx.o diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c new file mode 100644 index 0000000..546c5d1 --- /dev/null +++ b/arch/x86/mm/mpx.c @@ -0,0 +1,58 @@ +#include +#include +#include +#include +#include + +/* + * this is really a simplified "vm_mmap". it only handles mpx + * related maps, including bounds table and bounds directory. + * + * here we can stick new vm_flag VM_MPX in the vma_area_struct + * when create a bounds table or bounds directory, in order to + * track MPX specific memory. + */ +unsigned long mpx_mmap(unsigned long len) +{ + unsigned long ret; + unsigned long addr, pgoff; + struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; + + /* Only bounds table and bounds directory can be allocated here */ + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) + return -EINVAL; + + down_write(&mm->mmap_sem); + + /* Too many mappings? */ + if (mm->map_count > sysctl_max_map_count) { + ret = -ENOMEM; + goto out; + } + + /* Obtain the address to map to. we verify (or select) it and ensure + * that it represents a valid section of the address space. + */ + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); + if (addr & ~PAGE_MASK) { + ret = addr; + goto out; + } + + vm_flags = VM_READ | VM_WRITE | VM_MPX | + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + + /* Make bounds tables and bouds directory unlocked. */ + if (vm_flags & VM_LOCKED) + vm_flags &= ~VM_LOCKED; + + /* Set pgoff according to addr for anon_vma */ + pgoff = addr >> PAGE_SHIFT; + + ret = mmap_region(NULL, addr, len, vm_flags, pgoff); + +out: + up_write(&mm->mmap_sem); + return ret; +} -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965669AbaFRJsN (ORCPT ); Wed, 18 Jun 2014 05:48:13 -0400 Received: from mga02.intel.com ([134.134.136.20]:24707 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965822AbaFRJsJ (ORCPT ); Wed, 18 Jun 2014 05:48:09 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353801" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables Date: Wed, 18 Jun 2014 17:44:10 +0800 Message-Id: <1403084656-27284-5-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch handles a #BR exception for non-existent tables by carving the space out of the normal processes address space (essentially calling mmap() from inside the kernel) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the compiler generates instructions for MPX-enabled code which frequently store and retrieve entries from the bounds tables. Any direct kernel involvement (like a syscall) to access the tables would destroy performance since these are so frequent. The tables are carved out of userspace because we have no better spot to put them. For each pointer which is being tracked by MPX, the bounds tables contain 4 longs worth of data, and the tables are indexed virtually. If we were to preallocate the tables, we would theoretically need to allocate 4x the virtual space that we have available for userspace somewhere else. We don't have that room in the kernel address space. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 20 ++++++++++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/mpx.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 56 ++++++++++++++++++++++++++++++++++++++- 4 files changed, 139 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kernel/mpx.c diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 5725ac4..b7598ac 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -18,6 +18,8 @@ #define MPX_BT_ENTRY_SHIFT 5 #define MPX_IGN_BITS 3 +#define MPX_BD_ENTRY_TAIL 3 + #else #define MPX_BD_ENTRY_OFFSET 20 @@ -26,13 +28,31 @@ #define MPX_BT_ENTRY_SHIFT 4 #define MPX_IGN_BITS 2 +#define MPX_BD_ENTRY_TAIL 2 + #endif +#define MPX_BNDSTA_TAIL 2 +#define MPX_BNDCFG_TAIL 12 +#define MPX_BNDSTA_ADDR_MASK (~((1UL< +#include +#include + +static int allocate_bt(long __user *bd_entry) +{ + unsigned long bt_addr, old_val = 0; + int ret = 0; + + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); + if (IS_ERR((void *)bt_addr)) { + pr_err("Bounds table allocation failed at entry addr %p\n", + bd_entry); + return bt_addr; + } + bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; + + ret = user_atomic_cmpxchg_inatomic(&old_val, bd_entry, 0, bt_addr); + if (ret) + goto out; + + /* + * there is a existing bounds table pointed at this bounds + * directory entry, and so we need to free the bounds table + * allocated just now. + */ + if (old_val) + goto out; + + pr_debug("Allocate bounds table %lx at entry %p\n", + bt_addr, bd_entry); + return 0; + +out: + vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); + return ret; +} + +/* + * When a BNDSTX instruction attempts to save bounds to a BD entry + * with the lack of the valid bit being set, a #BR is generated. + * This is an indication that no BT exists for this entry. In this + * case the fault handler will allocate a new BT. + * + * With 32-bit mode, the size of BD is 4MB, and the size of each + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, + * and the size of each bound table is 4MB. + */ +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) +{ + unsigned long status; + unsigned long bd_entry, bd_base; + + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; + status = xsave_buf->bndcsr.status_reg; + + bd_entry = status & MPX_BNDSTA_ADDR_MASK; + if ((bd_entry < bd_base) || + (bd_entry >= bd_base + MPX_BD_SIZE_BYTES)) + return -EINVAL; + + return allocate_bt((long __user *)bd_entry); +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index f73b5d4..35b9b29 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -59,6 +59,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -213,7 +214,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip ) DO_ERROR (X86_TRAP_OF, SIGSEGV, "overflow", overflow ) -DO_ERROR (X86_TRAP_BR, SIGSEGV, "bounds", bounds ) DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip ) DO_ERROR (X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun ) DO_ERROR (X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS ) @@ -263,6 +263,60 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) } #endif +dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) +{ + enum ctx_state prev_state; + unsigned long status; + struct xsave_struct *xsave_buf; + struct task_struct *tsk = current; + + prev_state = exception_enter(); + if (notify_die(DIE_TRAP, "bounds", regs, error_code, + X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP) + goto exit; + conditional_sti(regs); + + if (!user_mode(regs)) + die("bounds", regs, error_code); + + if (!cpu_has_mpx) { + /* The exception is not from Intel MPX */ + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); + goto exit; + } + + fpu_xsave(&tsk->thread.fpu); + xsave_buf = &(tsk->thread.fpu.state->xsave); + status = xsave_buf->bndcsr.status_reg; + + /* + * The error code field of the BNDSTATUS register communicates status + * information of a bound range exception #BR or operation involving + * bound directory. + */ + switch (status & MPX_BNDSTA_ERROR_CODE) { + case 2: + /* + * Bound directory has invalid entry. + * No signal will be sent to the user space. + */ + if (do_mpx_bt_fault(xsave_buf)) + force_sig(SIGBUS, tsk); + break; + + case 1: /* Bound violation. */ + case 0: /* No exception caused by Intel MPX operations. */ + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); + break; + + default: + die("bounds", regs, error_code); + } + +exit: + exception_exit(prev_state); +} + dotraplinkage void __kprobes do_general_protection(struct pt_regs *regs, long error_code) { -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965910AbaFRJsX (ORCPT ); Wed, 18 Jun 2014 05:48:23 -0400 Received: from mga02.intel.com ([134.134.136.20]:24707 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965842AbaFRJsN (ORCPT ); Wed, 18 Jun 2014 05:48:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353838" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 06/10] mips: sync struct siginfo with general version Date: Wed, 18 Jun 2014 17:44:12 +0800 Message-Id: <1403084656-27284-7-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. Signed-off-by: Qiaowei Ren --- arch/mips/include/uapi/asm/siginfo.h | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno; /* TRAP # which caused the signal */ #endif short _addr_lsb; + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965879AbaFRJsT (ORCPT ); Wed, 18 Jun 2014 05:48:19 -0400 Received: from mga02.intel.com ([134.134.136.20]:24728 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965351AbaFRJsP (ORCPT ); Wed, 18 Jun 2014 05:48:15 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353860" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Date: Wed, 18 Jun 2014 17:44:13 +0800 Message-Id: <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 23 ++++ arch/x86/kernel/mpx.c | 294 ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 6 + 3 files changed, 323 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index b7598ac..780af63 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -3,6 +3,7 @@ #include #include +#include #ifdef CONFIG_X86_64 @@ -44,15 +45,37 @@ #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BD_ENTRY_VALID_FLAG 0x1 +struct mpx_insn { + struct insn_field rex_prefix; /* REX prefix */ + struct insn_field modrm; + struct insn_field sib; + struct insn_field displacement; + + unsigned char addr_bytes; /* effective address size */ + unsigned char limit; + unsigned char x86_64; + + const unsigned char *kaddr; /* kernel address of insn to analyze */ + const unsigned char *next_byte; +}; + +#define MAX_MPX_INSN_SIZE 15 + unsigned long mpx_mmap(unsigned long len); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf); #else static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) { return -EINVAL; } +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 4230c7b..650b282 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -2,6 +2,270 @@ #include #include +typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, + reg_type_t type) +{ + int regno = 0; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + + switch (type) { + case REG_TYPE_RM: + regno = X86_MODRM_RM(modrm); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_INDEX: + regno = X86_SIB_INDEX(sib); + if (X86_REX_X(insn->rex_prefix.value) == 1) + regno += 8; + break; + + case REG_TYPE_BASE: + regno = X86_SIB_BASE(sib); + if (X86_REX_B(insn->rex_prefix.value) == 1) + regno += 8; + break; + + default: + break; + } + + return regs_get_register(regs, regoff[regno]); +} + +/* + * return the address being referenced be instruction + * for rm=3 returning the content of the rm reg + * for rm!=3 calculates the address using SIB and Disp + */ +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned long addr; + unsigned long base; + unsigned long indx; + unsigned char modrm = (unsigned char)insn->modrm.value; + unsigned char sib = (unsigned char)insn->sib.value; + + if (X86_MODRM_MOD(modrm) == 3) { + addr = get_reg(insn, regs, REG_TYPE_RM); + } else { + if (insn->sib.nbytes) { + base = get_reg(insn, regs, REG_TYPE_BASE); + indx = get_reg(insn, regs, REG_TYPE_INDEX); + addr = base + indx * (1 << X86_SIB_SCALE(sib)); + } else { + addr = get_reg(insn, regs, REG_TYPE_RM); + } + addr += insn->displacement.value; + } + + return addr; +} + +/* Verify next sizeof(t) bytes can be on the same instruction */ +#define validate_next(t, insn, n) \ + ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit) + +#define __get_next(t, insn) \ +({ \ + t r = *(t *)insn->next_byte; \ + insn->next_byte += sizeof(t); \ + r; \ +}) + +#define __peek_next(t, insn) \ +({ \ + t r = *(t *)insn->next_byte; \ + r; \ +}) + +#define get_next(t, insn) \ +({ \ + if (unlikely(!validate_next(t, insn, 0))) \ + goto err_out; \ + __get_next(t, insn); \ +}) + +#define peek_next(t, insn) \ +({ \ + if (unlikely(!validate_next(t, insn, 0))) \ + goto err_out; \ + __peek_next(t, insn); \ +}) + +static void mpx_insn_get_prefixes(struct mpx_insn *insn) +{ + unsigned char b; + + /* Decode legacy prefix and REX prefix */ + b = peek_next(unsigned char, insn); + while (b != 0x0f) { + /* + * look for a rex prefix + * a REX prefix cannot be followed by a legacy prefix. + */ + if (insn->x86_64 && ((b&0xf0) == 0x40)) { + insn->rex_prefix.value = b; + insn->rex_prefix.nbytes = 1; + insn->next_byte++; + break; + } + + /* check the other legacy prefixes */ + switch (b) { + case 0xf2: + case 0xf3: + case 0xf0: + case 0x64: + case 0x65: + case 0x2e: + case 0x3e: + case 0x26: + case 0x36: + case 0x66: + case 0x67: + insn->next_byte++; + break; + default: /* everything else is garbage */ + goto err_out; + } + b = peek_next(unsigned char, insn); + } + +err_out: + return; +} + +static void mpx_insn_get_modrm(struct mpx_insn *insn) +{ + insn->modrm.value = get_next(unsigned char, insn); + insn->modrm.nbytes = 1; + +err_out: + return; +} + +static void mpx_insn_get_sib(struct mpx_insn *insn) +{ + unsigned char modrm = (unsigned char)insn->modrm.value; + + if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) { + insn->sib.value = get_next(unsigned char, insn); + insn->sib.nbytes = 1; + } + +err_out: + return; +} + +static void mpx_insn_get_displacement(struct mpx_insn *insn) +{ + unsigned char mod, rm, base; + + /* + * Interpreting the modrm byte: + * mod = 00 - no displacement fields (exceptions below) + * mod = 01 - 1-byte displacement field + * mod = 10 - displacement field is 4 bytes + * mod = 11 - no memory operand + * + * mod != 11, r/m = 100 - SIB byte exists + * mod = 00, SIB base = 101 - displacement field is 4 bytes + * mod = 00, r/m = 101 - rip-relative addressing, displacement + * field is 4 bytes + */ + mod = X86_MODRM_MOD(insn->modrm.value); + rm = X86_MODRM_RM(insn->modrm.value); + base = X86_SIB_BASE(insn->sib.value); + if (mod == 3) + return; + if (mod == 1) { + insn->displacement.value = get_next(unsigned char, insn); + insn->displacement.nbytes = 1; + } else if ((mod == 0 && rm == 5) || mod == 2 || + (mod == 0 && base == 5)) { + insn->displacement.value = get_next(int, insn); + insn->displacement.nbytes = 4; + } + +err_out: + return; +} + +static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs) +{ + unsigned char buf[MAX_MPX_INSN_SIZE]; + int bytes; + + memset(insn, 0, sizeof(*insn)); + + bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE); + insn->limit = MAX_MPX_INSN_SIZE - bytes; + insn->kaddr = buf; + insn->next_byte = buf; + + /* + * In 64-bit Mode, all Intel MPX instructions use 64-bit + * operands for bounds and 64 bit addressing, i.e. REX.W & + * 67H have no effect on data or address size. + * + * In compatibility and legacy modes (including 16-bit code + * segments, real and virtual 8086 modes) all Intel MPX + * instructions use 32-bit operands for bounds and 32 bit + * addressing. + */ +#ifdef CONFIG_X86_64 + insn->x86_64 = 1; + insn->addr_bytes = 8; +#else + insn->x86_64 = 0; + insn->addr_bytes = 4; +#endif +} + +static unsigned long mpx_insn_decode(struct mpx_insn *insn, + struct pt_regs *regs) +{ + mpx_insn_init(insn, regs); + + /* + * In this case, we only need decode bndcl/bndcn/bndcu, + * so we can use private diassembly interfaces to get + * prefixes, modrm, sib, displacement, etc.. + */ + mpx_insn_get_prefixes(insn); + insn->next_byte += 2; /* ignore opcode */ + mpx_insn_get_modrm(insn); + mpx_insn_get_sib(insn); + mpx_insn_get_displacement(insn); + + return get_addr_ref(insn, regs); +} + static int allocate_bt(long __user *bd_entry) { unsigned long bt_addr, old_val = 0; @@ -61,3 +325,33 @@ int do_mpx_bt_fault(struct xsave_struct *xsave_buf) return allocate_bt((long __user *)bd_entry); } + +/* + * If a bounds overflow occurs then a #BR is generated. The fault + * handler will decode MPX instructions to get violation address + * and set this address into extended struct siginfo. + */ +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, + struct xsave_struct *xsave_buf) +{ + struct mpx_insn insn; + uint8_t bndregno; + unsigned long addr_vio; + + addr_vio = mpx_insn_decode(&insn, regs); + + bndregno = X86_MODRM_REG(insn.modrm.value); + if (bndregno > 3) + return; + + /* Note: the upper 32 bits are ignored in 32-bit mode. */ + info->si_lower = (void __user *)(unsigned long) + (xsave_buf->bndregs.bndregs[2*bndregno]); + info->si_upper = (void __user *)(unsigned long) + (~xsave_buf->bndregs.bndregs[2*bndregno+1]); + info->si_addr_lsb = 0; + info->si_signo = SIGSEGV; + info->si_errno = 0; + info->si_code = SEGV_BNDERR; + info->si_addr = (void __user *)addr_vio; +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 35b9b29..859cf50 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -269,6 +269,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) unsigned long status; struct xsave_struct *xsave_buf; struct task_struct *tsk = current; + siginfo_t info; prev_state = exception_enter(); if (notify_die(DIE_TRAP, "bounds", regs, error_code, @@ -305,6 +306,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) break; case 1: /* Bound violation. */ + do_mpx_bounds(regs, &info, xsave_buf); + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, + error_code, &info); + break; + case 0: /* No exception caused by Intel MPX operations. */ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); break; -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965963AbaFRJsc (ORCPT ); Wed, 18 Jun 2014 05:48:32 -0400 Received: from mga02.intel.com ([134.134.136.20]:24728 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965901AbaFRJsW (ORCPT ); Wed, 18 Jun 2014 05:48:22 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353932" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 10/10] x86, mpx: add documentation on Intel MPX Date: Wed, 18 Jun 2014 17:44:16 +0800 Message-Id: <1403084656-27284-11-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds the Documentation/x86/intel_mpx.txt file with some information about Intel MPX. Signed-off-by: Qiaowei Ren --- Documentation/x86/intel_mpx.txt | 127 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt new file mode 100644 index 0000000..1af9809 --- /dev/null +++ b/Documentation/x86/intel_mpx.txt @@ -0,0 +1,127 @@ +1. Intel(R) MPX Overview +======================== + +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new +capability introduced into Intel Architecture. Intel MPX provides +hardware features that can be used in conjunction with compiler +changes to check memory references, for those references whose +compile-time normal intentions are usurped at runtime due to +buffer overflow or underflow. + +For more information, please refer to Intel(R) Architecture +Instruction Set Extensions Programming Reference, Chapter 9: +Intel(R) Memory Protection Extensions. + +Note: Currently no hardware with MPX ISA is available but it is always +possible to use SDE (Intel(R) Software Development Emulator) instead, +which can be downloaded from +http://software.intel.com/en-us/articles/intel-software-development-emulator + + +2. How does MPX kernel code work +================================ + +Handling #BR faults caused by MPX +--------------------------------- + +When MPX is enabled, there are 2 new situations that can generate +#BR faults. + * bounds violation caused by MPX instructions. + * new bounds tables (BT) need to be allocated to save bounds. + +We hook #BR handler to handle these two new situations. + +Decoding MPX instructions +------------------------- + +If a #BR is generated due to a bounds violation caused by MPX. +We need to decode MPX instructions to get violation address and +set this address into extended struct siginfo. + +The _sigfault feild of struct siginfo is extended as follow: + +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ +88 struct { +89 void __user *_addr; /* faulting insn/memory ref. */ +90 #ifdef __ARCH_SI_TRAPNO +91 int _trapno; /* TRAP # which caused the signal */ +92 #endif +93 short _addr_lsb; /* LSB of the reported address */ +94 struct { +95 void __user *_lower; +96 void __user *_upper; +97 } _addr_bnd; +98 } _sigfault; + +The '_addr' field refers to violation address, and new '_addr_and' +field refers to the upper/lower bounds when a #BR is caused. + +Glibc will be also updated to support this new siginfo. So user +can get violation address and bounds when bounds violations occur. + +Freeing unused bounds tables +---------------------------- + +When a BNDSTX instruction attempts to save bounds to a bounds directory +entry marked as invalid, a #BR is generated. This is an indication that +no bounds table exists for this entry. In this case the fault handler +will allocate a new bounds table on demand. + +Since the kernel allocated those tables on-demand without userspace +knowledge, it is also responsible for freeing them when the associated +mappings go away. + +Here, the solution for this issue is to hook do_munmap() to check +whether one process is MPX enabled. If yes, those bounds tables covered +in the virtual address region which is being unmapped will be freed also. + +Adding new prctl commands +------------------------- + +Runtime library in userspace is responsible for allocation of bounds +directory. So kernel have to use XSAVE instruction to get the base +of bounds directory from BNDCFG register. + +But XSAVE is expected to be very expensive. In order to do performance +optimization, we have to add new prctl command to get the base of +bounds directory to be used in future. + +Two new prctl commands are added to register and unregister MPX related +resource. + +155 #define PR_MPX_REGISTER 41 +156 #define PR_MPX_UNREGISTER 42 + +The base of the bounds directory is set into mm_struct during +PR_MPX_REGISTER command execution. This member can be used to +check whether one application is mpx enabled. + + +3. Tips +======= + +1) Users are not allowed to create bounds tables and point the bounds +directory at them in the userspace. In fact, it is not also necessary +for users to create bounds tables in the userspace. + +When #BR fault is produced due to invalid entry, bounds table will be +created in kernel on demand and kernel will not transfer this fault to +userspace. So usersapce can't receive #BR fault for invalid entry, and +it is not also necessary for users to create bounds tables by themselves. + +Certainly users can allocate bounds tables and forcibly point the bounds +directory at them through XSAVE instruction, and then set valid bit +of bounds entry to have this entry valid. But we have no way to track +the memory usage of these user-created bounds tables. In regard to this, +this behaviour is outlawed here. + +2) We will not support the case that multiple bounds directory entries +are pointed at the same bounds table. + +Users can be allowed to take multiple bounds directory entries and point +them at the same bounds table. See more information "Intel(R) Architecture +Instruction Set Extensions Programming Reference" (9.3.4). + +If userspace did this, it will be possible for kernel to unmap an in-use +bounds table since it does not recognize sharing. So this behavior is +also outlawed here. -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965974AbaFRJtF (ORCPT ); Wed, 18 Jun 2014 05:49:05 -0400 Received: from mga02.intel.com ([134.134.136.20]:24734 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965351AbaFRJsU (ORCPT ); Wed, 18 Jun 2014 05:48:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353907" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 09/10] x86, mpx: cleanup unused bound tables Date: Wed, 18 Jun 2014 17:44:15 +0800 Message-Id: <1403084656-27284-10-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When user memory region is unmapped, related bound tables become unused and need to be released also. This patch cleanups these unused bound tables through hooking unmap path. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mmu_context.h | 16 +++ arch/x86/include/asm/mpx.h | 9 ++ arch/x86/mm/mpx.c | 189 ++++++++++++++++++++++++++++++++++++ include/asm-generic/mmu_context.h | 6 + mm/mmap.c | 2 + 5 files changed, 222 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index be12c53..af70d4f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -6,6 +6,7 @@ #include #include #include +#include #ifndef CONFIG_PARAVIRT #include @@ -96,4 +97,19 @@ do { \ } while (0) #endif +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +#ifdef CONFIG_X86_INTEL_MPX + /* + * Check whether this vma comes from MPX-enabled application. + * If so, release this vma related bound tables. + */ + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) + mpx_unmap(mm, start, end); + +#endif +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 6cb0853..e848a74 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -42,6 +42,13 @@ #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) +#define MPX_BD_ENTRY_MASK ((1<>(MPX_BT_ENTRY_OFFSET+ \ + MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT) +#define MPX_GET_BT_ENTRY_OFFSET(addr) ((((addr)>>MPX_IGN_BITS) & \ + MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT) + #define MPX_BNDSTA_ERROR_CODE 0x3 #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG 0x1 @@ -63,6 +70,8 @@ struct mpx_insn { #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end); #ifdef CONFIG_X86_INTEL_MPX int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 546c5d1..fd05cd4 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -2,6 +2,7 @@ #include #include #include +#include #include /* @@ -56,3 +57,191 @@ out: up_write(&mm->mmap_sem); return ret; } + +/* + * Get the base of bounds tables pointed by specific bounds + * directory entry. + */ +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr, + unsigned int *valid) +{ + if (get_user(*bt_addr, bd_entry)) + return -EFAULT; + + *valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG; + *bt_addr &= MPX_BT_ADDR_MASK; + + /* + * If this bounds directory entry is nonzero, and meanwhile + * the valid bit is zero, one SIGSEGV will be produced due to + * this unexpected situation. + */ + if (!(*valid) && *bt_addr) + force_sig(SIGSEGV, current); + + pr_debug("get_bt: BD Entry (%p) - Table (%lx,%d)\n", + bd_entry, *bt_addr, *valid); + return 0; +} + +/* + * Free the backing physical pages of bounds table 'bt_addr'. + * Assume start...end is within that bounds table. + */ +static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr, + unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma; + + /* Find the vma which overlaps this bounds table */ + vma = find_vma(mm, bt_addr); + if (!vma || vma->vm_start > bt_addr || + vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES) + return; + + zap_page_range(vma, start, end, NULL); + pr_debug("Bound table de-allocation %lx (%lx, %lx)\n", + bt_addr, start, end); +} + +static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry, + unsigned long bt_addr) +{ + if (user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry, + bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0)) + return; + + pr_debug("Bound table de-allocation %lx at entry addr %p\n", + bt_addr, bd_entry); + /* + * to avoid recursion, do_munmap() will check whether it comes + * from one bounds table through VM_MPX flag. + */ + do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); +} + +/* + * If the bounds table pointed by bounds directory 'bd_entry' is + * not shared, unmap this whole bounds table. Otherwise, only free + * those backing physical pages of bounds table entries covered + * in this virtual address region start...end. + */ +static void unmap_shared_bt(struct mm_struct *mm, long __user *bd_entry, + unsigned long start, unsigned long end, + bool prev_shared, bool next_shared) +{ + unsigned long bt_addr; + unsigned int bde_valid = 0; + + if (get_bt_addr(bd_entry, &bt_addr, &bde_valid) || !bde_valid) + return; + + if (prev_shared && next_shared) + zap_bt_entries(mm, bt_addr, + bt_addr+MPX_GET_BT_ENTRY_OFFSET(start), + bt_addr+MPX_GET_BT_ENTRY_OFFSET(end-1)); + else if (prev_shared) + zap_bt_entries(mm, bt_addr, + bt_addr+MPX_GET_BT_ENTRY_OFFSET(start), + bt_addr+MPX_BT_SIZE_BYTES); + else if (next_shared) + zap_bt_entries(mm, bt_addr, bt_addr, + bt_addr+MPX_GET_BT_ENTRY_OFFSET(end-1)); + else + unmap_single_bt(mm, bd_entry, bt_addr); +} + +/* + * A virtual address region being munmap()ed might share bounds table + * with adjacent VMAs. We only need to free the backing physical + * memory of these shared bounds tables entries covered in this virtual + * address region. + * + * the VMAs covering the virtual address region start...end have already + * been split if necessary and removed from the VMA list. + */ +static void unmap_side_bts(struct mm_struct *mm, unsigned long start, + unsigned long end) +{ + long __user *bde_start, *bde_end; + struct vm_area_struct *prev, *next; + bool prev_shared = false, next_shared = false; + + bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start); + bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1); + + /* + * Check whether bde_start and bde_end are shared with adjacent + * VMAs. Because the VMAs covering the virtual address region + * start...end have already been removed from the VMA list, if + * next is not NULL it will satisfy start < end <= next->vm_start. + * And if prev is not NULL, prev->vm_end <= start < end. + */ + next = find_vma_prev(mm, start, &prev); + if (prev && MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1) == (long)bde_start) + prev_shared = true; + if (next && MPX_GET_BD_ENTRY_OFFSET(next->vm_start) == (long)bde_end) + next_shared = true; + + /* + * This virtual address region being munmap()ed is only + * covered by one bounds table. + * + * In this case, if this table is also shared with adjacent + * VMAs, only part of the backing physical memory of the bounds + * table need be freeed. Otherwise the whole bounds table need + * be unmapped. + */ + if (bde_start == bde_end) { + unmap_shared_bt(mm, bde_start, start, end, + prev_shared, next_shared); + return; + } + + /* + * If more than one bounds tables are covered in this virtual + * address region being munmap()ed, we need to separately check + * whether bde_start and bde_end are shared with adjacent VMAs. + */ + unmap_shared_bt(mm, bde_start, start, end, prev_shared, false); + unmap_shared_bt(mm, bde_end, start, end, false, next_shared); +} + +/* + * Free unused bounds tables covered in a virtual address region being + * munmap()ed. Assume end > start. + * + * This function will be called by do_munmap(), and the VMAs covering + * the virtual address region start...end have already been split if + * necessary and remvoed from the VMA list. + */ +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + long __user *bd_entry, *bde_start, *bde_end; + unsigned long bt_addr; + unsigned int bde_valid; + + pr_debug("addr(%lx, %lx) related Bounds Table will be released\n", + start, end); + /* + * unmap bounds tables pointed out by start/end bounds directory + * entries, or only free part of their backing physical memroy + * if they are shared with adjacent VMAs. + */ + unmap_side_bts(mm, start, end); + + /* + * unmap those bounds table which are entirely covered in this + * virtual address region. + */ + bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start); + bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1); + for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) { + if (get_bt_addr(bd_entry, &bt_addr, &bde_valid)) + return; + if (!bde_valid) + continue; + unmap_single_bt(mm, bd_entry, bt_addr); + } +} diff --git a/include/asm-generic/mmu_context.h b/include/asm-generic/mmu_context.h index a7eec91..ac558ca 100644 --- a/include/asm-generic/mmu_context.h +++ b/include/asm-generic/mmu_context.h @@ -42,4 +42,10 @@ static inline void activate_mm(struct mm_struct *prev_mm, { } +static inline void arch_unmap(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +} + #endif /* __ASM_GENERIC_MMU_CONTEXT_H */ diff --git a/mm/mmap.c b/mm/mmap.c index b1202cf..8d8ac65 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2558,6 +2558,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len) /* Fix up all other VM information */ remove_vma_list(mm, vma); + arch_unmap(mm, vma, start, end); + return 0; } -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965822AbaFRJtf (ORCPT ); Wed, 18 Jun 2014 05:49:35 -0400 Received: from mga02.intel.com ([134.134.136.20]:24728 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965855AbaFRJsR (ORCPT ); Wed, 18 Jun 2014 05:48:17 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353889" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER Date: Wed, 18 Jun 2014 17:44:14 +0800 Message-Id: <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/mpx.h | 1 + arch/x86/include/asm/processor.h | 18 ++++++++++++ arch/x86/kernel/mpx.c | 56 ++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 3 ++ include/uapi/linux/prctl.h | 6 ++++ kernel/sys.c | 12 ++++++++ 6 files changed, 96 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 780af63..6cb0853 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -43,6 +43,7 @@ #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) #define MPX_BNDSTA_ERROR_CODE 0x3 +#define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG 0x1 struct mpx_insn { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6e0966e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -952,6 +952,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); +/* Register/unregister a process' MPX related resource */ +#define MPX_REGISTER(tsk) mpx_register((tsk)) +#define MPX_UNREGISTER(tsk) mpx_unregister((tsk)) + +#ifdef CONFIG_X86_INTEL_MPX +extern int mpx_register(struct task_struct *tsk); +extern int mpx_unregister(struct task_struct *tsk); +#else +static inline int mpx_register(struct task_struct *tsk) +{ + return -EINVAL; +} +static inline int mpx_unregister(struct task_struct *tsk) +{ + return -EINVAL; +} +#endif /* CONFIG_X86_INTEL_MPX */ + extern u16 amd_get_nb_id(int cpu); static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c index 650b282..d8a2a09 100644 --- a/arch/x86/kernel/mpx.c +++ b/arch/x86/kernel/mpx.c @@ -1,6 +1,62 @@ #include #include +#include #include +#include +#include + +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ + struct xsave_struct *xsave_buf; + + fpu_xsave(&tsk->thread.fpu); + xsave_buf = &(tsk->thread.fpu.state->xsave); + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) + return NULL; + + return (void __user *)(xsave_buf->bndcsr.cfg_reg_u & + MPX_BNDCFG_ADDR_MASK); +} + +int mpx_register(struct task_struct *tsk) +{ + struct mm_struct *mm = tsk->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + /* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ + mm->bd_addr = task_get_bounds_dir(tsk); + if (!mm->bd_addr) + return -EINVAL; + + pr_debug("MPX BD base address %p\n", mm->bd_addr); + return 0; +} + +int mpx_unregister(struct task_struct *tsk) +{ + struct mm_struct *mm = current->mm; + + if (!cpu_has_mpx) + return -EINVAL; + + mm->bd_addr = NULL; + return 0; +} typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8967e20..54b8011 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -454,6 +454,9 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_X86_INTEL_MPX + void __user *bd_addr; /* address of the bounds directory */ +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 58afc04..ce86fa9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -152,4 +152,10 @@ #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Register/unregister MPX related resource. + */ +#define PR_MPX_REGISTER 43 +#define PR_MPX_UNREGISTER 44 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index fba0f29..ed1a03c 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -91,6 +91,12 @@ #ifndef SET_TSC_CTL # define SET_TSC_CTL(a) (-EINVAL) #endif +#ifndef MPX_REGISTER +# define MPX_REGISTER(a) (-EINVAL) +#endif +#ifndef MPX_UNREGISTER +# define MPX_UNREGISTER(a) (-EINVAL) +#endif /* * this is where the system-wide overflow UID and GID are defined, for @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, me->mm->def_flags &= ~VM_NOHUGEPAGE; up_write(&me->mm->mmap_sem); break; + case PR_MPX_REGISTER: + error = MPX_REGISTER(me); + break; + case PR_MPX_UNREGISTER: + error = MPX_UNREGISTER(me); + break; default: error = -EINVAL; break; -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965985AbaFRJt5 (ORCPT ); Wed, 18 Jun 2014 05:49:57 -0400 Received: from mga02.intel.com ([134.134.136.20]:24707 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965007AbaFRJsL (ORCPT ); Wed, 18 Jun 2014 05:48:11 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353823" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 05/10] x86, mpx: extend siginfo structure to include bound violation information Date: Wed, 18 Jun 2014 17:44:11 +0800 Message-Id: <1403084656-27284-6-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren --- include/uapi/asm-generic/siginfo.h | 9 ++++++++- kernel/signal.c | 4 ++++ 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ba5be7f..1e35520 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -91,6 +91,10 @@ typedef struct siginfo { int _trapno; /* TRAP # which caused the signal */ #endif short _addr_lsb; /* LSB of the reported address */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; } _sigfault; /* SIGPOLL */ @@ -131,6 +135,8 @@ typedef struct siginfo { #define si_trapno _sifields._sigfault._trapno #endif #define si_addr_lsb _sifields._sigfault._addr_lsb +#define si_lower _sifields._sigfault._addr_bnd._lower +#define si_upper _sifields._sigfault._addr_bnd._upper #define si_band _sifields._sigpoll._band #define si_fd _sifields._sigpoll._fd #ifdef __ARCH_SIGSYS @@ -199,7 +205,8 @@ typedef struct siginfo { */ #define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped object */ -#define NSIGSEGV 2 +#define SEGV_BNDERR (__SI_FAULT|3) /* failed address bound checks */ +#define NSIGSEGV 3 /* * SIGBUS si_codes diff --git a/kernel/signal.c b/kernel/signal.c index 6ea13c0..0fcf749 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2773,6 +2773,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from) if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO) err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb); #endif +#ifdef SEGV_BNDERR + err |= __put_user(from->si_lower, &to->si_lower); + err |= __put_user(from->si_upper, &to->si_upper); +#endif break; case __SI_CHLD: err |= __put_user(from->si_pid, &to->si_pid); -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966061AbaFRJvP (ORCPT ); Wed, 18 Jun 2014 05:51:15 -0400 Received: from mga02.intel.com ([134.134.136.20]:24700 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965781AbaFRJsH (ORCPT ); Wed, 18 Jun 2014 05:48:07 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353782" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx Date: Wed, 18 Jun 2014 17:44:09 +0800 Message-Id: <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In order to do performance optimization, this patch adds macro cpu_has_mpx which will directly return 0 when MPX is not supported by kernel. Signed-off-by: Qiaowei Ren --- arch/x86/include/asm/cpufeature.h | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e265ff9..f302d08 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) +#ifdef CONFIG_X86_INTEL_MPX +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) +#else +#define cpu_has_mpx 0 +#endif /* CONFIG_X86_INTEL_MPX */ + #ifdef CONFIG_X86_64 #undef cpu_has_vme -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966080AbaFRJvj (ORCPT ); Wed, 18 Jun 2014 05:51:39 -0400 Received: from mga02.intel.com ([134.134.136.20]:24690 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965513AbaFRJsD (ORCPT ); Wed, 18 Jun 2014 05:48:03 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353737" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific Date: Wed, 18 Jun 2014 17:44:07 +0800 Message-Id: <1403084656-27284-2-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org MPX-enabled application will possibly create a lot of bounds tables in process address space to save bounds information. These tables can take up huge swaths of memory (as much as 80% of the memory on the system) even if we clean them up aggressively. Being this huge, we need a way to track their memory use. If we want to track them, we essentially have two options: 1. walk the multi-GB (in virtual space) bounds directory to locate all the VMAs and walk them 2. Find a way to distinguish MPX bounds-table VMAs from normal anonymous VMAs and use some existing mechanism to walk them We expect (1) will be prohibitively expensive. For (2), we only need a single bit, and we've chosen to use a VM_ flag. We understand that they are scarce and are open to other options. There is one potential hybrid approach: check the bounds directory entry for any anonymous VMA that could possibly contain a bounds table. This is less expensive than (1), but still requires reading a pointer out of userspace for every VMA that we iterate over. Signed-off-by: Qiaowei Ren --- arch/x86/mm/init_64.c | 2 ++ fs/proc/task_mmu.c | 1 + include/linux/mm.h | 2 ++ 3 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index f35c66c..2d41679 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1223,6 +1223,8 @@ int in_gate_area_no_mm(unsigned long addr) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_MPX) + return "[mpx]"; if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso) return "[vdso]"; if (vma == &gate_vma) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 442177b..09266bd 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -543,6 +543,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_GROWSDOWN)] = "gd", [ilog2(VM_PFNMAP)] = "pf", [ilog2(VM_DENYWRITE)] = "dw", + [ilog2(VM_MPX)] = "mp", [ilog2(VM_LOCKED)] = "lo", [ilog2(VM_IO)] = "io", [ilog2(VM_SEQ_READ)] = "sr", diff --git a/include/linux/mm.h b/include/linux/mm.h index d677706..029c716 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -127,6 +127,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_HUGETLB 0x00400000 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x00800000 /* Is non-linear (remap_file_pages) */ #define VM_ARCH_1 0x01000000 /* Architecture-specific flag */ +/* MPX specific bounds table or bounds directory (x86) */ +#define VM_MPX 0x02000000 #define VM_DONTDUMP 0x04000000 /* Do not include in the core dump */ #ifdef CONFIG_MEM_SOFT_DIRTY -- 1.7.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966089AbaFRJv7 (ORCPT ); Wed, 18 Jun 2014 05:51:59 -0400 Received: from mga02.intel.com ([134.134.136.20]:24679 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965348AbaFRJsB (ORCPT ); Wed, 18 Jun 2014 05:48:01 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,499,1400050800"; d="scan'208";a="530353711" From: Qiaowei Ren To: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Qiaowei Ren Subject: [PATCH v6 00/10] Intel MPX support Date: Wed, 18 Jun 2014 17:44:06 +0800 Message-Id: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> X-Mailer: git-send-email 1.7.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patchset adds support for the Memory Protection Extensions (MPX) feature found in future Intel processors. MPX can be used in conjunction with compiler changes to check memory references, for those references whose compile-time normal intentions are usurped at runtime due to buffer overflow or underflow. MPX provides this capability at very low performance overhead for newly compiled code, and provides compatibility mechanisms with legacy software components. MPX architecture is designed allow a machine to run both MPX enabled software and legacy software that is MPX unaware. In such a case, the legacy software does not benefit from MPX, but it also does not experience any change in functionality or reduction in performance. More information about Intel MPX can be found in "Intel(R) Architecture Instruction Set Extensions Programming Reference". To get the advantage of MPX, changes are required in the OS kernel, binutils, compiler, system libraries support. New GCC option -fmpx is introduced to utilize MPX instructions. Currently GCC compiler sources with MPX support is available in a separate branch in common GCC SVN repository. See GCC SVN page (http://gcc.gnu.org/svn.html) for details. To have the full protection, we had to add MPX instrumentation to all the necessary Glibc routines (e.g. memcpy) written on assembler, and compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled Glibc source can be found in Glibc git repository. Enabling an application to use MPX will generally not require source code updates but there is some runtime code, which is responsible for configuring and enabling MPX, needed in order to make use of MPX. For most applications this runtime support will be available by linking to a library supplied by the compiler or possibly it will come directly from the OS once OS versions that support MPX are available. MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. The high-level areas modified in the patchset are as follow: 1) struct siginfo is extended to include bound violation information. 2) two prctl() commands are added to do performance optimization. Currently no hardware with MPX ISA is available but it is always possible to use SDE (Intel(R) software Development Emulator) instead, which can be downloaded from http://software.intel.com/en-us/articles/intel-software-development-emulator Future TODO items: 1) support 32-bit binaries on 64-bit kernels. Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Changes since v2: * fix some compile warnings. * update documentation. Changes since v3: * correct some syntax errors at documentation, and document extended struct siginfo. * for kill the process when the error code of BNDSTATUS is 3. * add some comments. * remove new prctl() commands. * fix some compile warnings for 32-bit. Changes since v4: * raise SIGBUS if the allocations of the bound tables fail. Changes since v5: * hook unmap() path to cleanup unused bounds tables, and use new prctl() command to register bounds directory address to struct mm_struct to check whether one process is MPX enabled during unmap(). * in order track precisely MPX memory usage, add MPX specific mmap interface and one VM_MPX flag to check whether a VMA is MPX bounds table. * add macro cpu_has_mpx to do performance optimization. * sync struct figinfo for mips with general version to avoid build issue. Qiaowei Ren (10): x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: add MPX specific mmap interface x86, mpx: add macro cpu_has_mpx x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: extend siginfo structure to include bound violation information mips: sync struct siginfo with general version x86, mpx: decode MPX instruction to get bound violation information x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER x86, mpx: cleanup unused bound tables x86, mpx: add documentation on Intel MPX Documentation/x86/intel_mpx.txt | 127 +++++++++++ arch/mips/include/uapi/asm/siginfo.h | 4 + arch/x86/Kconfig | 4 + arch/x86/include/asm/cpufeature.h | 6 + arch/x86/include/asm/mmu_context.h | 16 ++ arch/x86/include/asm/mpx.h | 91 ++++++++ arch/x86/include/asm/processor.h | 18 ++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/mpx.c | 413 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 62 +++++- arch/x86/mm/Makefile | 2 + arch/x86/mm/init_64.c | 2 + arch/x86/mm/mpx.c | 247 ++++++++++++++++++++ fs/proc/task_mmu.c | 1 + include/asm-generic/mmu_context.h | 6 + include/linux/mm.h | 2 + include/linux/mm_types.h | 3 + include/uapi/asm-generic/siginfo.h | 9 +- include/uapi/linux/prctl.h | 6 + kernel/signal.c | 4 + kernel/sys.c | 12 + mm/mmap.c | 2 + 22 files changed, 1036 insertions(+), 2 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/kernel/mpx.c create mode 100644 arch/x86/mm/mpx.c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965587AbaFRJ5o (ORCPT ); Wed, 18 Jun 2014 05:57:44 -0400 Received: from mail.skyhub.de ([78.46.96.112]:52563 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964994AbaFRJ5n (ORCPT ); Wed, 18 Jun 2014 05:57:43 -0400 Date: Wed, 18 Jun 2014 11:57:39 +0200 From: Borislav Petkov To: Qiaowei Ren Cc: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx Message-ID: <20140618095739.GA24419@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 05:44:09PM +0800, Qiaowei Ren wrote: > In order to do performance optimization, this patch adds macro > cpu_has_mpx which will directly return 0 when MPX is not supported > by kernel. > > Signed-off-by: Qiaowei Ren > --- > arch/x86/include/asm/cpufeature.h | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h > index e265ff9..f302d08 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; > #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) > #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) > > +#ifdef CONFIG_X86_INTEL_MPX > +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) I think we don't want those macros anymore because they're obfuscating the code. You should use static_cpu_has instead. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965835AbaFRKHw (ORCPT ); Wed, 18 Jun 2014 06:07:52 -0400 Received: from mail.skyhub.de ([78.46.96.112]:52582 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965123AbaFRKHv (ORCPT ); Wed, 18 Jun 2014 06:07:51 -0400 Date: Wed, 18 Jun 2014 12:07:45 +0200 From: Borislav Petkov To: Qiaowei Ren Cc: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Message-ID: <20140618100745.GB24419@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: > This patch sets bound violation fields of siginfo struct in #BR > exception handler by decoding the user instruction and constructing > the faulting pointer. > > Signed-off-by: Qiaowei Ren > --- > arch/x86/include/asm/mpx.h | 23 ++++ > arch/x86/kernel/mpx.c | 294 ++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/kernel/traps.c | 6 + > 3 files changed, 323 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > index b7598ac..780af63 100644 > --- a/arch/x86/include/asm/mpx.h > +++ b/arch/x86/include/asm/mpx.h > @@ -3,6 +3,7 @@ > > #include > #include > +#include > > #ifdef CONFIG_X86_64 > > @@ -44,15 +45,37 @@ > #define MPX_BNDSTA_ERROR_CODE 0x3 > #define MPX_BD_ENTRY_VALID_FLAG 0x1 > > +struct mpx_insn { > + struct insn_field rex_prefix; /* REX prefix */ > + struct insn_field modrm; > + struct insn_field sib; > + struct insn_field displacement; > + > + unsigned char addr_bytes; /* effective address size */ > + unsigned char limit; > + unsigned char x86_64; > + > + const unsigned char *kaddr; /* kernel address of insn to analyze */ > + const unsigned char *next_byte; > +}; > + > +#define MAX_MPX_INSN_SIZE 15 > + > unsigned long mpx_mmap(unsigned long len); > > #ifdef CONFIG_X86_INTEL_MPX > int do_mpx_bt_fault(struct xsave_struct *xsave_buf); > +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, > + struct xsave_struct *xsave_buf); > #else > static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) > { > return -EINVAL; > } > +static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, > + struct xsave_struct *xsave_buf) > +{ > +} > #endif /* CONFIG_X86_INTEL_MPX */ > > #endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c > index 4230c7b..650b282 100644 > --- a/arch/x86/kernel/mpx.c > +++ b/arch/x86/kernel/mpx.c > @@ -2,6 +2,270 @@ > #include > #include > > +typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; > +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, > + reg_type_t type) > +{ > + int regno = 0; > + unsigned char modrm = (unsigned char)insn->modrm.value; > + unsigned char sib = (unsigned char)insn->sib.value; > + > + static const int regoff[] = { > + offsetof(struct pt_regs, ax), > + offsetof(struct pt_regs, cx), > + offsetof(struct pt_regs, dx), > + offsetof(struct pt_regs, bx), > + offsetof(struct pt_regs, sp), > + offsetof(struct pt_regs, bp), > + offsetof(struct pt_regs, si), > + offsetof(struct pt_regs, di), > +#ifdef CONFIG_X86_64 > + offsetof(struct pt_regs, r8), > + offsetof(struct pt_regs, r9), > + offsetof(struct pt_regs, r10), > + offsetof(struct pt_regs, r11), > + offsetof(struct pt_regs, r12), > + offsetof(struct pt_regs, r13), > + offsetof(struct pt_regs, r14), > + offsetof(struct pt_regs, r15), > +#endif > + }; > + > + switch (type) { > + case REG_TYPE_RM: > + regno = X86_MODRM_RM(modrm); > + if (X86_REX_B(insn->rex_prefix.value) == 1) > + regno += 8; > + break; > + > + case REG_TYPE_INDEX: > + regno = X86_SIB_INDEX(sib); > + if (X86_REX_X(insn->rex_prefix.value) == 1) > + regno += 8; > + break; > + > + case REG_TYPE_BASE: > + regno = X86_SIB_BASE(sib); > + if (X86_REX_B(insn->rex_prefix.value) == 1) > + regno += 8; > + break; > + > + default: > + break; > + } > + > + return regs_get_register(regs, regoff[regno]); > +} > + > +/* > + * return the address being referenced be instruction > + * for rm=3 returning the content of the rm reg > + * for rm!=3 calculates the address using SIB and Disp > + */ > +static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) > +{ > + unsigned long addr; > + unsigned long base; > + unsigned long indx; > + unsigned char modrm = (unsigned char)insn->modrm.value; > + unsigned char sib = (unsigned char)insn->sib.value; > + > + if (X86_MODRM_MOD(modrm) == 3) { > + addr = get_reg(insn, regs, REG_TYPE_RM); > + } else { > + if (insn->sib.nbytes) { > + base = get_reg(insn, regs, REG_TYPE_BASE); > + indx = get_reg(insn, regs, REG_TYPE_INDEX); > + addr = base + indx * (1 << X86_SIB_SCALE(sib)); > + } else { > + addr = get_reg(insn, regs, REG_TYPE_RM); > + } > + addr += insn->displacement.value; > + } > + > + return addr; > +} > + > +/* Verify next sizeof(t) bytes can be on the same instruction */ > +#define validate_next(t, insn, n) \ > + ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit) > + > +#define __get_next(t, insn) \ > +({ \ > + t r = *(t *)insn->next_byte; \ > + insn->next_byte += sizeof(t); \ > + r; \ > +}) > + > +#define __peek_next(t, insn) \ > +({ \ > + t r = *(t *)insn->next_byte; \ > + r; \ > +}) > + > +#define get_next(t, insn) \ > +({ \ > + if (unlikely(!validate_next(t, insn, 0))) \ > + goto err_out; \ > + __get_next(t, insn); \ > +}) > + > +#define peek_next(t, insn) \ > +({ \ > + if (unlikely(!validate_next(t, insn, 0))) \ > + goto err_out; \ > + __peek_next(t, insn); \ > +}) > + > +static void mpx_insn_get_prefixes(struct mpx_insn *insn) > +{ > + unsigned char b; > + > + /* Decode legacy prefix and REX prefix */ > + b = peek_next(unsigned char, insn); > + while (b != 0x0f) { > + /* > + * look for a rex prefix > + * a REX prefix cannot be followed by a legacy prefix. > + */ > + if (insn->x86_64 && ((b&0xf0) == 0x40)) { > + insn->rex_prefix.value = b; > + insn->rex_prefix.nbytes = 1; > + insn->next_byte++; > + break; > + } > + > + /* check the other legacy prefixes */ > + switch (b) { > + case 0xf2: > + case 0xf3: > + case 0xf0: > + case 0x64: > + case 0x65: > + case 0x2e: > + case 0x3e: > + case 0x26: > + case 0x36: > + case 0x66: > + case 0x67: > + insn->next_byte++; > + break; > + default: /* everything else is garbage */ > + goto err_out; > + } > + b = peek_next(unsigned char, insn); > + } > + > +err_out: > + return; > +} > + > +static void mpx_insn_get_modrm(struct mpx_insn *insn) > +{ > + insn->modrm.value = get_next(unsigned char, insn); > + insn->modrm.nbytes = 1; > + > +err_out: > + return; > +} > + > +static void mpx_insn_get_sib(struct mpx_insn *insn) > +{ > + unsigned char modrm = (unsigned char)insn->modrm.value; > + > + if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) { > + insn->sib.value = get_next(unsigned char, insn); > + insn->sib.nbytes = 1; > + } > + > +err_out: > + return; > +} > + > +static void mpx_insn_get_displacement(struct mpx_insn *insn) > +{ > + unsigned char mod, rm, base; > + > + /* > + * Interpreting the modrm byte: > + * mod = 00 - no displacement fields (exceptions below) > + * mod = 01 - 1-byte displacement field > + * mod = 10 - displacement field is 4 bytes > + * mod = 11 - no memory operand > + * > + * mod != 11, r/m = 100 - SIB byte exists > + * mod = 00, SIB base = 101 - displacement field is 4 bytes > + * mod = 00, r/m = 101 - rip-relative addressing, displacement > + * field is 4 bytes > + */ > + mod = X86_MODRM_MOD(insn->modrm.value); > + rm = X86_MODRM_RM(insn->modrm.value); > + base = X86_SIB_BASE(insn->sib.value); > + if (mod == 3) > + return; > + if (mod == 1) { > + insn->displacement.value = get_next(unsigned char, insn); > + insn->displacement.nbytes = 1; > + } else if ((mod == 0 && rm == 5) || mod == 2 || > + (mod == 0 && base == 5)) { > + insn->displacement.value = get_next(int, insn); > + insn->displacement.nbytes = 4; > + } > + > +err_out: > + return; > +} > + > +static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs) > +{ > + unsigned char buf[MAX_MPX_INSN_SIZE]; > + int bytes; > + > + memset(insn, 0, sizeof(*insn)); > + > + bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE); > + insn->limit = MAX_MPX_INSN_SIZE - bytes; > + insn->kaddr = buf; > + insn->next_byte = buf; > + > + /* > + * In 64-bit Mode, all Intel MPX instructions use 64-bit > + * operands for bounds and 64 bit addressing, i.e. REX.W & > + * 67H have no effect on data or address size. > + * > + * In compatibility and legacy modes (including 16-bit code > + * segments, real and virtual 8086 modes) all Intel MPX > + * instructions use 32-bit operands for bounds and 32 bit > + * addressing. > + */ > +#ifdef CONFIG_X86_64 > + insn->x86_64 = 1; > + insn->addr_bytes = 8; > +#else > + insn->x86_64 = 0; > + insn->addr_bytes = 4; > +#endif > +} > + > +static unsigned long mpx_insn_decode(struct mpx_insn *insn, > + struct pt_regs *regs) > +{ > + mpx_insn_init(insn, regs); > + > + /* > + * In this case, we only need decode bndcl/bndcn/bndcu, > + * so we can use private diassembly interfaces to get > + * prefixes, modrm, sib, displacement, etc.. > + */ > + mpx_insn_get_prefixes(insn); > + insn->next_byte += 2; /* ignore opcode */ > + mpx_insn_get_modrm(insn); > + mpx_insn_get_sib(insn); > + mpx_insn_get_displacement(insn); > + > + return get_addr_ref(insn, regs); > +} > + This whole insn decoding machinery above looks like adapted from arch/x86/lib/insn.c. You should merge it with the generic code in insn.c instead of homegrowing it here only for the purposes of MPX. And if it doesn't work for your needs, you should should extend the generic code to do so. I think we even talked about this last time. Also, make sure you run all your patches through checkpatch.pl before submitting. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756895AbaFROf3 (ORCPT ); Wed, 18 Jun 2014 10:35:29 -0400 Received: from mga09.intel.com ([134.134.136.24]:61960 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752334AbaFROf2 (ORCPT ); Wed, 18 Jun 2014 10:35:28 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,501,1400050800"; d="scan'208";a="559589477" Message-ID: <53A1A3A5.9010109@intel.com> Date: Wed, 18 Jun 2014 07:35:17 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Borislav Petkov , Qiaowei Ren CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> In-Reply-To: <20140618095739.GA24419@pd.tnic> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:57 AM, Borislav Petkov wrote: >> > @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; >> > #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) >> > #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) >> > >> > +#ifdef CONFIG_X86_INTEL_MPX >> > +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) > I think we don't want those macros anymore because they're obfuscating > the code. You should use static_cpu_has instead. It looks like static_cpu_has() is the right thing to use instead of boot_cpu_has(). But, this doesn't just obfuscate things. We actually _want_ the compiler to cull code out when the config option is off. Things like do_bounds() will see code savings with _some_ kind of #ifdef rather than using static_cpu_has(). So, we can either use the well worn, consistent with other features in x86, cpu_has_$foo approach. Or, we can roll our own macros. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752150AbaFROof (ORCPT ); Wed, 18 Jun 2014 10:44:35 -0400 Received: from mail.skyhub.de ([78.46.96.112]:40635 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751666AbaFROoe (ORCPT ); Wed, 18 Jun 2014 10:44:34 -0400 Date: Wed, 18 Jun 2014 16:44:29 +0200 From: Borislav Petkov To: Dave Hansen Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx Message-ID: <20140618144429.GD24024@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <53A1A3A5.9010109@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 07:35:17AM -0700, Dave Hansen wrote: > On 06/18/2014 02:57 AM, Borislav Petkov wrote: > >> > @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; > >> > #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) > >> > #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) > >> > > >> > +#ifdef CONFIG_X86_INTEL_MPX > >> > +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) > > I think we don't want those macros anymore because they're obfuscating > > the code. You should use static_cpu_has instead. > > It looks like static_cpu_has() is the right thing to use instead of > boot_cpu_has(). But, this doesn't just obfuscate things. > > We actually _want_ the compiler to cull code out when the config option > is off. Things like do_bounds() will see code savings with _some_ kind > of #ifdef rather than using static_cpu_has(). Why? Practically, distros will have it enabled anyway (you have X86_INTEL_MPX depend on CPU_SUP_INTEL). Are you talking about the miniscule percentage of people building their own kernels? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756935AbaFROlX (ORCPT ); Wed, 18 Jun 2014 10:41:23 -0400 Received: from mga09.intel.com ([134.134.136.24]:62913 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752303AbaFROlV (ORCPT ); Wed, 18 Jun 2014 10:41:21 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,501,1400050800"; d="scan'208";a="559592387" Message-ID: <53A1A509.4080802@intel.com> Date: Wed, 18 Jun 2014 07:41:13 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar CC: x86@kernel.org, linux-kernel@vger.kernel.org, Linux-MM Subject: Re: [PATCH v6 00/10] Intel MPX support References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > This patchset adds support for the Memory Protection Extensions > (MPX) feature found in future Intel processors. It's very important to note that this is a very different patch set than the last one. The way we are freeing the unused bounds tables is _completely different (9/10), and needs some very heavy mm reviews. I'm sure Qiaowei will cc linux-mm@ next time. We're also not asking that this be merged in its current state. The 32-bit binary on 64-bit kernel issue is a show stopper for merging, but we're trying to post early and often. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751699AbaFRO6h (ORCPT ); Wed, 18 Jun 2014 10:58:37 -0400 Received: from mga02.intel.com ([134.134.136.20]:58856 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751142AbaFRO6g (ORCPT ); Wed, 18 Jun 2014 10:58:36 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,501,1400050800"; d="scan'208";a="559599404" Message-ID: <53A1A90D.5010604@intel.com> Date: Wed, 18 Jun 2014 07:58:21 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Borislav Petkov CC: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <20140618144429.GD24024@pd.tnic> In-Reply-To: <20140618144429.GD24024@pd.tnic> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 07:44 AM, Borislav Petkov wrote: > On Wed, Jun 18, 2014 at 07:35:17AM -0700, Dave Hansen wrote: >> On 06/18/2014 02:57 AM, Borislav Petkov wrote: >>>>> @@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32]; >>>>> #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) >>>>> #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) >>>>> >>>>> +#ifdef CONFIG_X86_INTEL_MPX >>>>> +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) >>> I think we don't want those macros anymore because they're obfuscating >>> the code. You should use static_cpu_has instead. >> >> It looks like static_cpu_has() is the right thing to use instead of >> boot_cpu_has(). But, this doesn't just obfuscate things. >> >> We actually _want_ the compiler to cull code out when the config option >> is off. Things like do_bounds() will see code savings with _some_ kind >> of #ifdef rather than using static_cpu_has(). > > Why? Are you seriously asking why we would want to cull out code guaranteed to be unused? People are going to want to turn this off at compile time. When they do, I want as much of the code to go away as is reasonably possible. Adding a single-line #ifdef in a header qualifies as a pretty decent tradeoff in my book. I'm sure Qiaowei will do an experiment and show us what the code savings are. > Practically, distros will have it enabled anyway (you have X86_INTEL_MPX > depend on CPU_SUP_INTEL). > > Are you talking about the miniscule percentage of people building their > own kernels? The minuscule number of people not using a distro kernel? Like, every Android and Chrome device in the world? How about the cloud providers with millions of servers? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752939AbaFRPB5 (ORCPT ); Wed, 18 Jun 2014 11:01:57 -0400 Received: from terminus.zytor.com ([198.137.202.10]:46896 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751251AbaFRPBH (ORCPT ); Wed, 18 Jun 2014 11:01:07 -0400 Message-ID: <53A1A98D.8050408@zytor.com> Date: Wed, 18 Jun 2014 08:00:29 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Borislav Petkov , Dave Hansen CC: Qiaowei Ren , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <20140618144429.GD24024@pd.tnic> In-Reply-To: <20140618144429.GD24024@pd.tnic> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 07:44 AM, Borislav Petkov wrote: > > Why? > > Practically, distros will have it enabled anyway (you have X86_INTEL_MPX > depend on CPU_SUP_INTEL). > > Are you talking about the miniscule percentage of people building their > own kernels? > We have people working hard to shave kilobytes off the smallest configurations, because they want to fit into a megabyte of memory with the kernel and application. Let's try to not make their life harder than it needs to be. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752605AbaFRO74 (ORCPT ); Wed, 18 Jun 2014 10:59:56 -0400 Received: from terminus.zytor.com ([198.137.202.10]:46843 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752097AbaFRO7x (ORCPT ); Wed, 18 Jun 2014 10:59:53 -0400 Message-ID: <53A1A942.1090001@zytor.com> Date: Wed, 18 Jun 2014 07:59:14 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Dave Hansen , Borislav Petkov , Qiaowei Ren CC: Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> In-Reply-To: <53A1A3A5.9010109@intel.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 07:35 AM, Dave Hansen wrote: > > It looks like static_cpu_has() is the right thing to use instead of > boot_cpu_has(). But, this doesn't just obfuscate things. > > We actually _want_ the compiler to cull code out when the config option > is off. Things like do_bounds() will see code savings with _some_ kind > of #ifdef rather than using static_cpu_has(). > > So, we can either use the well worn, consistent with other features in > x86, cpu_has_$foo approach. Or, we can roll our own macros. > We could do something like: #define MPX_ENABLED (IS_ENABLED(CONFIG_X86_MPX) && static_cpu_has(X86_FEATURE_MPX)) -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752602AbaFRPZM (ORCPT ); Wed, 18 Jun 2014 11:25:12 -0400 Received: from mail.skyhub.de ([78.46.96.112]:33314 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbaFRPZK (ORCPT ); Wed, 18 Jun 2014 11:25:10 -0400 Date: Wed, 18 Jun 2014 17:25:05 +0200 From: Borislav Petkov To: Dave Hansen Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx Message-ID: <20140618152505.GE24024@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <20140618144429.GD24024@pd.tnic> <53A1A90D.5010604@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <53A1A90D.5010604@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 07:58:21AM -0700, Dave Hansen wrote: > Are you seriously asking why we would want to cull out code guaranteed > to be unused? Yes, I am. Don't get me wrong - I'm all for not enabling code which is unused - I'm just questioning the actual usage case here. Because you have this: +config X86_INTEL_MPX + def_bool y + depends on CPU_SUP_INTEL which means, even on !MPX Intels, you need to explicitly say "no" here. Is this how the configuration is supposed to be done? Or do you need to add help text to explain to people not to enable this on !MPX machines after looking at /proc/cpuinfo first? Am I close? > The minuscule number of people not using a distro kernel? Like, every > Android and Chrome device in the world? How about the cloud providers > with millions of servers? Cloud people won't be benefiting from MPX? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753046AbaFRP1Q (ORCPT ); Wed, 18 Jun 2014 11:27:16 -0400 Received: from mail.skyhub.de ([78.46.96.112]:39562 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752614AbaFRP1O (ORCPT ); Wed, 18 Jun 2014 11:27:14 -0400 Date: Wed, 18 Jun 2014 17:27:10 +0200 From: Borislav Petkov To: "H. Peter Anvin" Cc: Dave Hansen , Qiaowei Ren , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx Message-ID: <20140618152710.GF24024@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <20140618144429.GD24024@pd.tnic> <53A1A98D.8050408@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <53A1A98D.8050408@zytor.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 08:00:29AM -0700, H. Peter Anvin wrote: > We have people working hard to shave kilobytes off the smallest > configurations, because they want to fit into a megabyte of memory > with the kernel and application. > > Let's try to not make their life harder than it needs to be. Oh sure - I'm simply asking how this is going to be handled practically by asking stupid questions. You know me :) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753421AbaFRQSG (ORCPT ); Wed, 18 Jun 2014 12:18:06 -0400 Received: from mga09.intel.com ([134.134.136.24]:17422 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753359AbaFRQSC (ORCPT ); Wed, 18 Jun 2014 12:18:02 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,501,1400050800"; d="scan'208";a="559645300" Message-ID: <53A1BBB3.4070809@intel.com> Date: Wed, 18 Jun 2014 09:17:55 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Borislav Petkov CC: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <20140618144429.GD24024@pd.tnic> <53A1A90D.5010604@intel.com> <20140618152505.GE24024@pd.tnic> In-Reply-To: <20140618152505.GE24024@pd.tnic> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 08:25 AM, Borislav Petkov wrote: > +config X86_INTEL_MPX > + def_bool y > + depends on CPU_SUP_INTEL > > which means, even on !MPX Intels, you need to explicitly say "no" here. > Is this how the configuration is supposed to be done? I think we need to promote this to a real option with real help text. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753397AbaFRQ0B (ORCPT ); Wed, 18 Jun 2014 12:26:01 -0400 Received: from mga02.intel.com ([134.134.136.20]:47110 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752067AbaFRQ0A (ORCPT ); Wed, 18 Jun 2014 12:26:00 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,501,1400050800"; d="scan'208";a="559649023" Message-ID: <53A1BD95.10701@intel.com> Date: Wed, 18 Jun 2014 09:25:57 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: "H. Peter Anvin" , Borislav Petkov , Qiaowei Ren CC: Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <53A1A942.1090001@zytor.com> In-Reply-To: <53A1A942.1090001@zytor.com> Content-Type: multipart/mixed; boundary="------------070706040202000507010002" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------070706040202000507010002 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 06/18/2014 07:59 AM, H. Peter Anvin wrote: > On 06/18/2014 07:35 AM, Dave Hansen wrote: >> It looks like static_cpu_has() is the right thing to use instead of >> boot_cpu_has(). But, this doesn't just obfuscate things. >> >> We actually _want_ the compiler to cull code out when the config option >> is off. Things like do_bounds() will see code savings with _some_ kind >> of #ifdef rather than using static_cpu_has(). >> >> So, we can either use the well worn, consistent with other features in >> x86, cpu_has_$foo approach. Or, we can roll our own macros. > > We could do something like: > > #define MPX_ENABLED (IS_ENABLED(CONFIG_X86_MPX) && > static_cpu_has(X86_FEATURE_MPX)) How about something like the attached patch? This lets us use static_cpu_has() for the checks, and allows us to easily add new checks for other features that might be compile-time disabled. --------------070706040202000507010002 Content-Type: text/x-patch; name="x86-disabled_mask.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="x86-disabled_mask.patch" --- b/arch/x86/include/asm/cpufeature.h | 26 ++++++++++++++++++++------ b/arch/x86/kernel/mpx.c | 4 ++-- b/arch/x86/kernel/traps.c | 2 +- 3 files changed, 23 insertions(+), 9 deletions(-) diff -puN arch/x86/include/asm/cpufeature.h~x86-disabled_mask arch/x86/include/asm/cpufeature.h --- a/arch/x86/include/asm/cpufeature.h~x86-disabled_mask 2014-06-18 08:48:41.329750895 -0700 +++ b/arch/x86/include/asm/cpufeature.h 2014-06-18 09:19:19.143546973 -0700 @@ -339,12 +339,6 @@ extern const char * const x86_power_flag #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) -#ifdef CONFIG_X86_INTEL_MPX -#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) -#else -#define cpu_has_mpx 0 -#endif /* CONFIG_X86_INTEL_MPX */ - #ifdef CONFIG_X86_64 #undef cpu_has_vme @@ -367,6 +361,22 @@ extern const char * const x86_power_flag #endif /* CONFIG_X86_64 */ +/* + * Add features and their corresponding config options here + * if you want to have the compiler optimize out code that + * uses them. + * + * You should not use this function directly. Use + * static_cpu_has() so that you also benefit from alternatives + * when the features are enabled. + */ +static __always_inline int __cpu_feature_compile_enabled(u16 bit) +{ + if (bit == X86_FEATURE_MPX) + return IS_ENABLED(CONFIG_X86_MPX); + return 0; +} + #if __GNUC__ >= 4 extern void warn_pre_alternatives(void); extern bool __static_cpu_has_safe(u16 bit); @@ -378,6 +388,8 @@ extern bool __static_cpu_has_safe(u16 bi */ static __always_inline __pure bool __static_cpu_has(u16 bit) { + if (!__cpu_feature_compile_enabled(bit)) + return 0; #ifdef CC_HAVE_ASM_GOTO #ifdef CONFIG_X86_DEBUG_STATIC_CPU_HAS @@ -458,6 +470,8 @@ static __always_inline __pure bool __sta static __always_inline __pure bool _static_cpu_has_safe(u16 bit) { + if (!__cpu_feature_compile_enabled(bit)) + return 0; #ifdef CC_HAVE_ASM_GOTO /* * We need to spell the jumps to the compiler because, depending on the offset, diff -puN arch/x86/kernel/traps.c~x86-disabled_mask arch/x86/kernel/traps.c --- a/arch/x86/kernel/traps.c~x86-disabled_mask 2014-06-18 09:14:23.686340296 -0700 +++ b/arch/x86/kernel/traps.c 2014-06-18 09:14:53.698681684 -0700 @@ -280,7 +280,7 @@ dotraplinkage void do_bounds(struct pt_r if (!user_mode(regs)) die("bounds", regs, error_code); - if (!cpu_has_mpx) { + if (static_cpu_has(X86_FEATURE_MPX)) { /* The exception is not from Intel MPX */ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); goto exit; diff -puN arch/x86/kernel/mpx.c~x86-disabled_mask arch/x86/kernel/mpx.c --- a/arch/x86/kernel/mpx.c~x86-disabled_mask 2014-06-18 09:14:23.739342667 -0700 +++ b/arch/x86/kernel/mpx.c 2014-06-18 09:22:32.091172228 -0700 @@ -26,7 +26,7 @@ int mpx_register(struct task_struct *tsk { struct mm_struct *mm = tsk->mm; - if (!cpu_has_mpx) + if (!static_cpu_has(X86_FEATURE_MPX)) return -EINVAL; /* @@ -51,7 +51,7 @@ int mpx_unregister(struct task_struct *t { struct mm_struct *mm = current->mm; - if (!cpu_has_mpx) + if (!static_cpu_has(X86_FEATURE_MPX)) return -EINVAL; mm->bd_addr = NULL; _ --------------070706040202000507010002-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753095AbaFRRVS (ORCPT ); Wed, 18 Jun 2014 13:21:18 -0400 Received: from mail.skyhub.de ([78.46.96.112]:57712 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750862AbaFRRVQ (ORCPT ); Wed, 18 Jun 2014 13:21:16 -0400 Date: Wed, 18 Jun 2014 19:21:11 +0200 From: Borislav Petkov To: Dave Hansen Cc: "H. Peter Anvin" , Qiaowei Ren , Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx Message-ID: <20140618172111.GJ24024@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <53A1A942.1090001@zytor.com> <53A1BD95.10701@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <53A1BD95.10701@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 09:25:57AM -0700, Dave Hansen wrote: > On 06/18/2014 07:59 AM, H. Peter Anvin wrote: > > On 06/18/2014 07:35 AM, Dave Hansen wrote: > >> It looks like static_cpu_has() is the right thing to use instead of > >> boot_cpu_has(). But, this doesn't just obfuscate things. > >> > >> We actually _want_ the compiler to cull code out when the config option > >> is off. Things like do_bounds() will see code savings with _some_ kind > >> of #ifdef rather than using static_cpu_has(). > >> > >> So, we can either use the well worn, consistent with other features in > >> x86, cpu_has_$foo approach. Or, we can roll our own macros. > > > > We could do something like: > > > > #define MPX_ENABLED (IS_ENABLED(CONFIG_X86_MPX) && > > static_cpu_has(X86_FEATURE_MPX)) > > How about something like the attached patch? > > This lets us use static_cpu_has() for the checks, and allows us to > easily add new checks for other features that might be compile-time > disabled. > > > --- > > b/arch/x86/include/asm/cpufeature.h | 26 ++++++++++++++++++++------ > b/arch/x86/kernel/mpx.c | 4 ++-- > b/arch/x86/kernel/traps.c | 2 +- > 3 files changed, 23 insertions(+), 9 deletions(-) > > diff -puN arch/x86/include/asm/cpufeature.h~x86-disabled_mask arch/x86/include/asm/cpufeature.h > --- a/arch/x86/include/asm/cpufeature.h~x86-disabled_mask 2014-06-18 08:48:41.329750895 -0700 > +++ b/arch/x86/include/asm/cpufeature.h 2014-06-18 09:19:19.143546973 -0700 > @@ -339,12 +339,6 @@ extern const char * const x86_power_flag > #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) > #define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT) > > -#ifdef CONFIG_X86_INTEL_MPX > -#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) > -#else > -#define cpu_has_mpx 0 > -#endif /* CONFIG_X86_INTEL_MPX */ > - > #ifdef CONFIG_X86_64 > > #undef cpu_has_vme > @@ -367,6 +361,22 @@ extern const char * const x86_power_flag > > #endif /* CONFIG_X86_64 */ > > +/* > + * Add features and their corresponding config options here > + * if you want to have the compiler optimize out code that > + * uses them. > + * > + * You should not use this function directly. Use > + * static_cpu_has() so that you also benefit from alternatives > + * when the features are enabled. > + */ > +static __always_inline int __cpu_feature_compile_enabled(u16 bit) > +{ > + if (bit == X86_FEATURE_MPX) > + return IS_ENABLED(CONFIG_X86_MPX); Right, this should be CONFIG_X86_INTEL_MPX but that's details. I guess this might grow into a big if-else noodle case but when that happens, we probably could add some sort of mapping between X86_FEATURE bits to CONFIG_ items or so. Then, you could probably add this call to the enclosing macros static_cpu_has{,_safe} so that it gets evaluated first. The advantage is that the IS_ENABLED test would work very early - always, actually - even before alternatives have run and thus catch the cases where the feature is config-disabled. Otherwise, asm looks ok. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756682AbaFSBNy (ORCPT ); Wed, 18 Jun 2014 21:13:54 -0400 Received: from mga02.intel.com ([134.134.136.20]:11790 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756604AbaFSBNx (ORCPT ); Wed, 18 Jun 2014 21:13:53 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,503,1400050800"; d="scan'208";a="530732250" From: "Ren, Qiaowei" To: Borislav Petkov CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "Hansen, Dave" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Thread-Topic: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Thread-Index: AQHPitp5+re2FjJ0DkeRHdzRM5ptp5t2HnuAgAGAu2A= Date: Thu, 19 Jun 2014 01:13:48 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> <20140618100745.GB24419@pd.tnic> In-Reply-To: <20140618100745.GB24419@pd.tnic> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id s5J1E1DB022732 On 2014-06-18, Borislav Petkov wrote: > On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: > > This whole insn decoding machinery above looks like adapted from > arch/x86/lib/insn.c. You should merge it with the generic code in > insn.c instead of homegrowing it here only for the purposes of MPX. > And if it doesn't work for your needs, you should should extend the > generic code to do so. I think we even talked about this last time. > > Also, make sure you run all your patches through checkpatch.pl before > submitting. > Petkov, as we discussed on initial version of this patchset, general insn framework didn't work out well and I have tried to use generic struct insn, insn_field, etc. for obvious benefits. Thanks, Qiaowei {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754418AbaFSG2c (ORCPT ); Thu, 19 Jun 2014 02:28:32 -0400 Received: from mail.skyhub.de ([78.46.96.112]:41629 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751875AbaFSG2b (ORCPT ); Thu, 19 Jun 2014 02:28:31 -0400 Date: Thu, 19 Jun 2014 08:28:24 +0200 From: Borislav Petkov To: "Ren, Qiaowei" Cc: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "Hansen, Dave" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Message-ID: <20140619062824.GB22025@pd.tnic> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> <20140618100745.GB24419@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 19, 2014 at 01:13:48AM +0000, Ren, Qiaowei wrote: > On 2014-06-18, Borislav Petkov wrote: > > On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: > > > > This whole insn decoding machinery above looks like adapted from > > arch/x86/lib/insn.c. You should merge it with the generic code in > > insn.c instead of homegrowing it here only for the purposes of MPX. > > And if it doesn't work for your needs, you should should extend the > > generic code to do so. > Petkov, as we discussed on initial version of this patchset, general > insn framework didn't work out well and I have tried to use generic > struct insn, insn_field, etc. for obvious benefits. Let me repeat myself: "And if it doesn't work for your needs, you should extend the generic code to do so." We don't do homegrown almost-copies of generic code. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756534AbaFSGx7 (ORCPT ); Thu, 19 Jun 2014 02:53:59 -0400 Received: from mga11.intel.com ([192.55.52.93]:38062 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753061AbaFSGx6 (ORCPT ); Thu, 19 Jun 2014 02:53:58 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,505,1400050800"; d="scan'208";a="557790825" From: "Ren, Qiaowei" To: Borislav Petkov CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "Hansen, Dave" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Thread-Topic: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Thread-Index: AQHPitp5+re2FjJ0DkeRHdzRM5ptp5t2HnuAgAGAu2D//9RQAIAAiwew Date: Thu, 19 Jun 2014 06:53:28 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016A3F75@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> <20140618100745.GB24419@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> <20140619062824.GB22025@pd.tnic> In-Reply-To: <20140619062824.GB22025@pd.tnic> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id s5J6s4Oi024239 On 2014-06-19, Borislav Petkov wrote: > On Thu, Jun 19, 2014 at 01:13:48AM +0000, Ren, Qiaowei wrote: >> On 2014-06-18, Borislav Petkov wrote: >>> On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: >>> >>> This whole insn decoding machinery above looks like adapted from >>> arch/x86/lib/insn.c. You should merge it with the generic code in >>> insn.c instead of homegrowing it here only for the purposes of MPX. >>> And if it doesn't work for your needs, you should should extend >>> the generic code to do so. > >> Petkov, as we discussed on initial version of this patchset, general >> insn framework didn't work out well and I have tried to use generic >> struct insn, insn_field, etc. for obvious benefits. > > Let me repeat myself: "And if it doesn't work for your needs, you > should extend the generic code to do so." > > We don't do homegrown almost-copies of generic code. > I see. If possible, I will be very happy to use or extend generic code. But due to extra overhead caused by MPX, I have to use MPX specific decoding to do performance optimization. You can check the discussion on this: https://lkml.org/lkml/2014/1/11/190 Thanks, Qiaowei {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934149AbaFSRHy (ORCPT ); Thu, 19 Jun 2014 13:07:54 -0400 Received: from mga09.intel.com ([134.134.136.24]:59927 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932894AbaFSRHx (ORCPT ); Thu, 19 Jun 2014 13:07:53 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,508,1400050800"; d="scan'208";a="560272642" Message-ID: <53A3181F.9070005@intel.com> Date: Thu, 19 Jun 2014 10:04:31 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: "Ren, Qiaowei" , Borislav Petkov CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> <20140618100745.GB24419@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> <20140619062824.GB22025@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3F75@shsmsx102.ccr.corp.intel.com> In-Reply-To: <9E0BE1322F2F2246BD820DA9FC397ADE016A3F75@shsmsx102.ccr.corp.intel.com> Content-Type: multipart/mixed; boundary="------------020505010703080004050409" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------020505010703080004050409 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 06/18/2014 11:53 PM, Ren, Qiaowei wrote: > On 2014-06-19, Borislav Petkov wrote: >> On Thu, Jun 19, 2014 at 01:13:48AM +0000, Ren, Qiaowei wrote: >>> On 2014-06-18, Borislav Petkov wrote: >>>> On Wed, Jun 18, 2014 at 05:44:13PM +0800, Qiaowei Ren wrote: >>>> >>>> This whole insn decoding machinery above looks like adapted from >>>> arch/x86/lib/insn.c. You should merge it with the generic code in >>>> insn.c instead of homegrowing it here only for the purposes of MPX. >>>> And if it doesn't work for your needs, you should should extend >>>> the generic code to do so. >> >>> Petkov, as we discussed on initial version of this patchset, general >>> insn framework didn't work out well and I have tried to use generic >>> struct insn, insn_field, etc. for obvious benefits. >> >> Let me repeat myself: "And if it doesn't work for your needs, you >> should extend the generic code to do so." >> >> We don't do homegrown almost-copies of generic code. >> > I see. If possible, I will be very happy to use or extend generic > code. But due to extra overhead caused by MPX, I have to use MPX > specific decoding to do performance optimization. Could you please support this position with some data? I'm a bit skeptical that instruction decoding is going to be a performance-critical path. I also don't see the extra field that you talked about in the previous thread? What's the extra field? I see a 'limit' vs. 'length', but you don't use 'length' at all, so I think you can use it instead, or at least union it. I've taken a quick stab at trying to consolidate things. I think I may have screwed up this: insn->limit = MAX_MPX_INSN_SIZE - bytes; Qiaowei, is there anything fundamentally broken with what I've got here? --------------020505010703080004050409 Content-Type: text/x-patch; name="consolidate-instruction-decoding.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="consolidate-instruction-decoding.patch" --- b/arch/x86/include/asm/insn.h | 1 b/arch/x86/include/asm/mpx.h | 14 --- b/arch/x86/kernel/mpx.c | 183 ++---------------------------------------- b/arch/x86/lib/insn.c | 43 +++++++++ 4 files changed, 56 insertions(+), 185 deletions(-) diff -puN arch/x86/kernel/mpx.c~consolidate-instruction-decoding arch/x86/kernel/mpx.c --- a/arch/x86/kernel/mpx.c~consolidate-instruction-decoding 2014-06-19 09:53:53.894441788 -0700 +++ b/arch/x86/kernel/mpx.c 2014-06-19 09:53:53.909442460 -0700 @@ -3,6 +3,7 @@ #include #include #include +#include #include /* @@ -59,7 +60,7 @@ int mpx_unregister(struct task_struct *t } typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t; -static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs, +static unsigned long get_reg(struct insn *insn, struct pt_regs *regs, reg_type_t type) { int regno = 0; @@ -118,7 +119,7 @@ static unsigned long get_reg(struct mpx_ * for rm=3 returning the content of the rm reg * for rm!=3 calculates the address using SIB and Disp */ -static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs) +static unsigned long get_addr_ref(struct insn *insn, struct pt_regs *regs) { unsigned long addr; unsigned long base; @@ -142,182 +143,22 @@ static unsigned long get_addr_ref(struct return addr; } -/* Verify next sizeof(t) bytes can be on the same instruction */ -#define validate_next(t, insn, n) \ - ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit) - -#define __get_next(t, insn) \ -({ \ - t r = *(t *)insn->next_byte; \ - insn->next_byte += sizeof(t); \ - r; \ -}) - -#define __peek_next(t, insn) \ -({ \ - t r = *(t *)insn->next_byte; \ - r; \ -}) - -#define get_next(t, insn) \ -({ \ - if (unlikely(!validate_next(t, insn, 0))) \ - goto err_out; \ - __get_next(t, insn); \ -}) - -#define peek_next(t, insn) \ -({ \ - if (unlikely(!validate_next(t, insn, 0))) \ - goto err_out; \ - __peek_next(t, insn); \ -}) - -static void mpx_insn_get_prefixes(struct mpx_insn *insn) -{ - unsigned char b; - - /* Decode legacy prefix and REX prefix */ - b = peek_next(unsigned char, insn); - while (b != 0x0f) { - /* - * look for a rex prefix - * a REX prefix cannot be followed by a legacy prefix. - */ - if (insn->x86_64 && ((b&0xf0) == 0x40)) { - insn->rex_prefix.value = b; - insn->rex_prefix.nbytes = 1; - insn->next_byte++; - break; - } - - /* check the other legacy prefixes */ - switch (b) { - case 0xf2: - case 0xf3: - case 0xf0: - case 0x64: - case 0x65: - case 0x2e: - case 0x3e: - case 0x26: - case 0x36: - case 0x66: - case 0x67: - insn->next_byte++; - break; - default: /* everything else is garbage */ - goto err_out; - } - b = peek_next(unsigned char, insn); - } - -err_out: - return; -} - -static void mpx_insn_get_modrm(struct mpx_insn *insn) -{ - insn->modrm.value = get_next(unsigned char, insn); - insn->modrm.nbytes = 1; - -err_out: - return; -} - -static void mpx_insn_get_sib(struct mpx_insn *insn) -{ - unsigned char modrm = (unsigned char)insn->modrm.value; - - if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) { - insn->sib.value = get_next(unsigned char, insn); - insn->sib.nbytes = 1; - } - -err_out: - return; -} - -static void mpx_insn_get_displacement(struct mpx_insn *insn) -{ - unsigned char mod, rm, base; - - /* - * Interpreting the modrm byte: - * mod = 00 - no displacement fields (exceptions below) - * mod = 01 - 1-byte displacement field - * mod = 10 - displacement field is 4 bytes - * mod = 11 - no memory operand - * - * mod != 11, r/m = 100 - SIB byte exists - * mod = 00, SIB base = 101 - displacement field is 4 bytes - * mod = 00, r/m = 101 - rip-relative addressing, displacement - * field is 4 bytes - */ - mod = X86_MODRM_MOD(insn->modrm.value); - rm = X86_MODRM_RM(insn->modrm.value); - base = X86_SIB_BASE(insn->sib.value); - if (mod == 3) - return; - if (mod == 1) { - insn->displacement.value = get_next(unsigned char, insn); - insn->displacement.nbytes = 1; - } else if ((mod == 0 && rm == 5) || mod == 2 || - (mod == 0 && base == 5)) { - insn->displacement.value = get_next(int, insn); - insn->displacement.nbytes = 4; - } - -err_out: - return; -} - -static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs) -{ - unsigned char buf[MAX_MPX_INSN_SIZE]; - int bytes; - - memset(insn, 0, sizeof(*insn)); - - bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE); - insn->limit = MAX_MPX_INSN_SIZE - bytes; - insn->kaddr = buf; - insn->next_byte = buf; - - /* - * In 64-bit Mode, all Intel MPX instructions use 64-bit - * operands for bounds and 64 bit addressing, i.e. REX.W & - * 67H have no effect on data or address size. - * - * In compatibility and legacy modes (including 16-bit code - * segments, real and virtual 8086 modes) all Intel MPX - * instructions use 32-bit operands for bounds and 32 bit - * addressing. - */ -#ifdef CONFIG_X86_64 - insn->x86_64 = 1; - insn->addr_bytes = 8; -#else - insn->x86_64 = 0; - insn->addr_bytes = 4; -#endif -} - -static unsigned long mpx_insn_decode(struct mpx_insn *insn, +static unsigned long insn_decode(struct insn *insn, struct pt_regs *regs) { - mpx_insn_init(insn, regs); + int is_64bit = IS_ENABLED(CONFIG_X86_64) && !test_thread_flag(TIF_IA32); + insn_init(insn, regs, is_64bit); /* * In this case, we only need decode bndcl/bndcn/bndcu, * so we can use private diassembly interfaces to get * prefixes, modrm, sib, displacement, etc.. */ - mpx_insn_get_prefixes(insn); + insn_get_prefixes(insn); insn->next_byte += 2; /* ignore opcode */ - mpx_insn_get_modrm(insn); - mpx_insn_get_sib(insn); - mpx_insn_get_displacement(insn); + insn_get_modrm(insn); + insn_get_sib(insn); + insn_get_displacement(insn); return get_addr_ref(insn, regs); } @@ -390,11 +231,11 @@ int do_mpx_bt_fault(struct xsave_struct void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info, struct xsave_struct *xsave_buf) { - struct mpx_insn insn; + struct insn insn; uint8_t bndregno; unsigned long addr_vio; - addr_vio = mpx_insn_decode(&insn, regs); + addr_vio = insn_decode(&insn, regs); bndregno = X86_MODRM_REG(insn.modrm.value); if (bndregno > 3) diff -puN arch/x86/include/asm/mpx.h~consolidate-instruction-decoding arch/x86/include/asm/mpx.h --- a/arch/x86/include/asm/mpx.h~consolidate-instruction-decoding 2014-06-19 09:53:53.895441833 -0700 +++ b/arch/x86/include/asm/mpx.h 2014-06-19 09:53:53.909442460 -0700 @@ -53,20 +53,6 @@ #define MPX_BNDCFG_ENABLE_FLAG 0x1 #define MPX_BD_ENTRY_VALID_FLAG 0x1 -struct mpx_insn { - struct insn_field rex_prefix; /* REX prefix */ - struct insn_field modrm; - struct insn_field sib; - struct insn_field displacement; - - unsigned char addr_bytes; /* effective address size */ - unsigned char limit; - unsigned char x86_64; - - const unsigned char *kaddr; /* kernel address of insn to analyze */ - const unsigned char *next_byte; -}; - #define MAX_MPX_INSN_SIZE 15 unsigned long mpx_mmap(unsigned long len); diff -puN arch/x86/include/asm/insn.h~consolidate-instruction-decoding arch/x86/include/asm/insn.h --- a/arch/x86/include/asm/insn.h~consolidate-instruction-decoding 2014-06-19 09:53:53.897441922 -0700 +++ b/arch/x86/include/asm/insn.h 2014-06-19 09:53:53.910442505 -0700 @@ -98,6 +98,7 @@ struct insn { extern void insn_init(struct insn *insn, const void *kaddr, int x86_64); extern void insn_get_prefixes(struct insn *insn); +extern void mpx_insn_get_prefixes(struct insn *insn); extern void insn_get_opcode(struct insn *insn); extern void insn_get_modrm(struct insn *insn); extern void insn_get_sib(struct insn *insn); diff -puN arch/x86/lib/insn.c~consolidate-instruction-decoding arch/x86/lib/insn.c --- a/arch/x86/lib/insn.c~consolidate-instruction-decoding 2014-06-19 09:53:53.899442011 -0700 +++ b/arch/x86/lib/insn.c 2014-06-19 09:53:53.910442505 -0700 @@ -63,6 +63,49 @@ void insn_init(struct insn *insn, const insn->addr_bytes = 4; } +void mpx_insn_get_prefixes(struct insn *insn) +{ + unsigned char b; + + /* Decode legacy prefix and REX prefix */ + b = peek_next(unsigned char, insn); + while (b != 0x0f) { + /* + * look for a rex prefix + * a REX prefix cannot be followed by a legacy prefix. + */ + if (insn->x86_64 && ((b&0xf0) == 0x40)) { + insn->prefixes.value = b; + insn->prefixes.nbytes = 1; + insn->next_byte++; + break; + } + + /* check the other legacy prefixes */ + switch (b) { + case 0xf2: + case 0xf3: + case 0xf0: + case 0x64: + case 0x65: + case 0x2e: + case 0x3e: + case 0x26: + case 0x36: + case 0x66: + case 0x67: + insn->next_byte++; + break; + default: /* everything else is garbage */ + goto err_out; + } + b = peek_next(unsigned char, insn); + } + +err_out: + return; +} + /** * insn_get_prefixes - scan x86 instruction prefix bytes * @insn: &struct insn containing instruction _ --------------020505010703080004050409-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934032AbaFSRdl (ORCPT ); Thu, 19 Jun 2014 13:33:41 -0400 Received: from terminus.zytor.com ([198.137.202.10]:60358 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932295AbaFSRdj (ORCPT ); Thu, 19 Jun 2014 13:33:39 -0400 Message-ID: <53A31EBD.8060708@zytor.com> Date: Thu, 19 Jun 2014 10:32:45 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Dave Hansen , "Ren, Qiaowei" , Borislav Petkov CC: Thomas Gleixner , Ingo Molnar , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> <20140618100745.GB24419@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> <20140619062824.GB22025@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3F75@shsmsx102.ccr.corp.intel.com> <53A3181F.9070005@intel.com> In-Reply-To: <53A3181F.9070005@intel.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/19/2014 10:04 AM, Dave Hansen wrote: > > Could you please support this position with some data? I'm a bit > skeptical that instruction decoding is going to be a > performance-critical path. > > I also don't see the extra field that you talked about in the previous > thread? What's the extra field? I see a 'limit' vs. 'length', but you > don't use 'length' at all, so I think you can use it instead, or at > least union it. > > I've taken a quick stab at trying to consolidate things. I think I may > have screwed up this: > > insn->limit = MAX_MPX_INSN_SIZE - bytes; > > Qiaowei, is there anything fundamentally broken with what I've got here? > So I encouraged Qiaowei to do a limited special-purpose decoder, simply because the glue to use the generic decoder was almost as large. I am overall not a huge fan of using the generic decoder in constrained situation, because the generic decoder is very heavyweight not just in terms of performance but in terms of interface -- because it has to. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934145AbaFSSD3 (ORCPT ); Thu, 19 Jun 2014 14:03:29 -0400 Received: from terminus.zytor.com ([198.137.202.10]:60730 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933226AbaFSSD1 (ORCPT ); Thu, 19 Jun 2014 14:03:27 -0400 Message-ID: <53A325CE.5020206@zytor.com> Date: Thu, 19 Jun 2014 11:02:54 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Dave Hansen , Borislav Petkov , Qiaowei Ren CC: Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <53A1A942.1090001@zytor.com> <53A1BD95.10701@intel.com> In-Reply-To: <53A1BD95.10701@intel.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 09:25 AM, Dave Hansen wrote: > On 06/18/2014 07:59 AM, H. Peter Anvin wrote: >> On 06/18/2014 07:35 AM, Dave Hansen wrote: >>> It looks like static_cpu_has() is the right thing to use instead of >>> boot_cpu_has(). But, this doesn't just obfuscate things. >>> >>> We actually _want_ the compiler to cull code out when the config option >>> is off. Things like do_bounds() will see code savings with _some_ kind >>> of #ifdef rather than using static_cpu_has(). >>> >>> So, we can either use the well worn, consistent with other features in >>> x86, cpu_has_$foo approach. Or, we can roll our own macros. >> >> We could do something like: >> >> #define MPX_ENABLED (IS_ENABLED(CONFIG_X86_MPX) && >> static_cpu_has(X86_FEATURE_MPX)) > > How about something like the attached patch? > > This lets us use static_cpu_has() for the checks, and allows us to > easily add new checks for other features that might be compile-time > disabled. > Hmm... I would like something similar to required-features.h which reflect features which *cannot* be enabled or will always be ignored; we actually already have a handful of those. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934359AbaFSSuL (ORCPT ); Thu, 19 Jun 2014 14:50:11 -0400 Received: from mga09.intel.com ([134.134.136.24]:14728 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933059AbaFSSuI (ORCPT ); Thu, 19 Jun 2014 14:50:08 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,508,1400050800"; d="scan'208";a="560321060" Message-ID: <53A330DD.1090106@intel.com> Date: Thu, 19 Jun 2014 11:50:05 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: "H. Peter Anvin" , Borislav Petkov , Qiaowei Ren CC: Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <53A1A942.1090001@zytor.com> <53A1BD95.10701@intel.com> <53A325CE.5020206@zytor.com> In-Reply-To: <53A325CE.5020206@zytor.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/19/2014 11:02 AM, H. Peter Anvin wrote: > On 06/18/2014 09:25 AM, Dave Hansen wrote: >> How about something like the attached patch? >> >> This lets us use static_cpu_has() for the checks, and allows us to >> easily add new checks for other features that might be compile-time >> disabled. > > Hmm... I would like something similar to required-features.h which > reflect features which *cannot* be enabled or will always be ignored; we > actually already have a handful of those Could you elaborate a bit? I'll try and include them in the approach to make sure it works broadly. Is there a benefit to the required-features.h approach that's missing from mine? I _believe_ all of the compiler optimization around __builtin_constant_p() continues to work with the inline function instead of the #defines and bitmasks. I think the inline function approach is a bit easier to work with. Could the required-features.h approach just be from a time before __builtin_constant_p() worked well across inlines? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934629AbaFSU6i (ORCPT ); Thu, 19 Jun 2014 16:58:38 -0400 Received: from mga09.intel.com ([134.134.136.24]:43454 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932343AbaFSU6g (ORCPT ); Thu, 19 Jun 2014 16:58:36 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,509,1400050800"; d="scan'208";a="560374922" Message-ID: <53A34EFC.5070809@intel.com> Date: Thu, 19 Jun 2014 13:58:36 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar CC: x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > +static __user void *task_get_bounds_dir(struct task_struct *tsk) > +{ > + struct xsave_struct *xsave_buf; > + > + fpu_xsave(&tsk->thread.fpu); > + xsave_buf = &(tsk->thread.fpu.state->xsave); > + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) > + return NULL; > + > + return (void __user *)(xsave_buf->bndcsr.cfg_reg_u & > + MPX_BNDCFG_ADDR_MASK); > +} This spits out a warning on 32-bit: arch/x86/kernel/mpx.c: In function task_get_bounds_dir: arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934067AbaFTDWT (ORCPT ); Thu, 19 Jun 2014 23:22:19 -0400 Received: from mga03.intel.com ([143.182.124.21]:32869 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753775AbaFTDWQ (ORCPT ); Thu, 19 Jun 2014 23:22:16 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,511,1400050800"; d="scan'208";a="447795031" From: "Ren, Qiaowei" To: "H. Peter Anvin" , "Hansen, Dave" , Borislav Petkov CC: Thomas Gleixner , Ingo Molnar , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Thread-Topic: [PATCH v6 07/10] x86, mpx: decode MPX instruction to get bound violation information Thread-Index: AQHPitp5+re2FjJ0DkeRHdzRM5ptp5t2HnuAgAGAu2D//9RQAIAAiwewgAAmtICAAAfjgIABJ8Lw Date: Fri, 20 Jun 2014 03:21:58 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016A5747@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-8-git-send-email-qiaowei.ren@intel.com> <20140618100745.GB24419@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3BAA@shsmsx102.ccr.corp.intel.com> <20140619062824.GB22025@pd.tnic> <9E0BE1322F2F2246BD820DA9FC397ADE016A3F75@shsmsx102.ccr.corp.intel.com> <53A3181F.9070005@intel.com> <53A31EBD.8060708@zytor.com> In-Reply-To: <53A31EBD.8060708@zytor.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id s5K3MVPt030842 On 2014-06-20, H. Peter Anvin wrote: > On 06/19/2014 10:04 AM, Dave Hansen wrote: >> >> Could you please support this position with some data? I'm a bit >> skeptical that instruction decoding is going to be a >> performance-critical path. >> >> I also don't see the extra field that you talked about in the >> previous thread? What's the extra field? I see a 'limit' vs. >> 'length', but you don't use 'length' at all, so I think you can use >> it instead, or at least union it. >> >> I've taken a quick stab at trying to consolidate things. I think I >> may have screwed up this: >> >> insn->limit = MAX_MPX_INSN_SIZE - bytes; >> >> Qiaowei, is there anything fundamentally broken with what I've got here? >> > Firstly instructions will be got from user pointer stored in 'ip', and then validate_next() will use 'limit' to make sure that next sizeof(t) bytes can be on the same instruction. As hpa said, generic decoder, including struct insn and implementation of decoding, is very heavyweight because it has to. So MPX specific decoding should be better choice. > So I encouraged Qiaowei to do a limited special-purpose decoder, > simply because the glue to use the generic decoder was almost as > large. I am overall not a huge fan of using the generic decoder in > constrained situation, because the generic decoder is very heavyweight > not just in terms of performance but in terms of interface -- because it has to. > Thanks, Qiaowei {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933336AbaFTD30 (ORCPT ); Thu, 19 Jun 2014 23:29:26 -0400 Received: from terminus.zytor.com ([198.137.202.10]:38261 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753775AbaFTD3Z (ORCPT ); Thu, 19 Jun 2014 23:29:25 -0400 Message-ID: <53A3AA70.5090401@zytor.com> Date: Thu, 19 Jun 2014 20:28:48 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Dave Hansen , Borislav Petkov , Qiaowei Ren CC: Thomas Gleixner , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 03/10] x86, mpx: add macro cpu_has_mpx References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-4-git-send-email-qiaowei.ren@intel.com> <20140618095739.GA24419@pd.tnic> <53A1A3A5.9010109@intel.com> <53A1A942.1090001@zytor.com> <53A1BD95.10701@intel.com> <53A325CE.5020206@zytor.com> <53A330DD.1090106@intel.com> In-Reply-To: <53A330DD.1090106@intel.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/19/2014 11:50 AM, Dave Hansen wrote: > On 06/19/2014 11:02 AM, H. Peter Anvin wrote: >> On 06/18/2014 09:25 AM, Dave Hansen wrote: >>> How about something like the attached patch? >>> >>> This lets us use static_cpu_has() for the checks, and allows us to >>> easily add new checks for other features that might be compile-time >>> disabled. >> >> Hmm... I would like something similar to required-features.h which >> reflect features which *cannot* be enabled or will always be ignored; we >> actually already have a handful of those > > Could you elaborate a bit? I'll try and include them in the approach to > make sure it works broadly. > > Is there a benefit to the required-features.h approach that's missing > from mine? I _believe_ all of the compiler optimization around > __builtin_constant_p() continues to work with the inline function > instead of the #defines and bitmasks. I think the inline function > approach is a bit easier to work with. > > Could the required-features.h approach just be from a time before > __builtin_constant_p() worked well across inlines? > Not so much. What I don't want is one approach for doing things in one direction and another approach for the other direction. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932120AbaFWTtM (ORCPT ); Mon, 23 Jun 2014 15:49:12 -0400 Received: from mail-pb0-f45.google.com ([209.85.160.45]:36660 "EHLO mail-pb0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756812AbaFWTtK (ORCPT ); Mon, 23 Jun 2014 15:49:10 -0400 From: Andy Lutomirski X-Google-Original-From: Andy Lutomirski Message-ID: <53A884B2.5070702@mit.edu> Date: Mon, 23 Jun 2014 12:49:06 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen CC: x86@kernel.org, linux-kernel@vger.kernel.org, Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > This patch adds one MPX specific mmap interface, which only handles > mpx related maps, including bounds table and bounds directory. > > In order to track MPX specific memory usage, this interface is added > to stick new vm_flag VM_MPX in the vma_area_struct when create a > bounds table or bounds directory. I imagine the linux-mm people would want to think about any new vm flag. Why is this needed? > > Signed-off-by: Qiaowei Ren > --- > arch/x86/Kconfig | 4 +++ > arch/x86/include/asm/mpx.h | 38 ++++++++++++++++++++++++++++ > arch/x86/mm/Makefile | 2 + > arch/x86/mm/mpx.c | 58 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 102 insertions(+), 0 deletions(-) > create mode 100644 arch/x86/include/asm/mpx.h > create mode 100644 arch/x86/mm/mpx.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 25d2c6f..0194790 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -237,6 +237,10 @@ config HAVE_INTEL_TXT > def_bool y > depends on INTEL_IOMMU && ACPI > > +config X86_INTEL_MPX > + def_bool y > + depends on CPU_SUP_INTEL > + > config X86_32_SMP > def_bool y > depends on X86_32 && SMP > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > new file mode 100644 > index 0000000..5725ac4 > --- /dev/null > +++ b/arch/x86/include/asm/mpx.h > @@ -0,0 +1,38 @@ > +#ifndef _ASM_X86_MPX_H > +#define _ASM_X86_MPX_H > + > +#include > +#include > + > +#ifdef CONFIG_X86_64 > + > +/* upper 28 bits [47:20] of the virtual address in 64-bit used to > + * index into bounds directory (BD). > + */ > +#define MPX_BD_ENTRY_OFFSET 28 > +#define MPX_BD_ENTRY_SHIFT 3 > +/* bits [19:3] of the virtual address in 64-bit used to index into > + * bounds table (BT). > + */ > +#define MPX_BT_ENTRY_OFFSET 17 > +#define MPX_BT_ENTRY_SHIFT 5 > +#define MPX_IGN_BITS 3 > + > +#else > + > +#define MPX_BD_ENTRY_OFFSET 20 > +#define MPX_BD_ENTRY_SHIFT 2 > +#define MPX_BT_ENTRY_OFFSET 10 > +#define MPX_BT_ENTRY_SHIFT 4 > +#define MPX_IGN_BITS 2 > + > +#endif > + > +#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) > +#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) > + > +#define MPX_BNDSTA_ERROR_CODE 0x3 > + > +unsigned long mpx_mmap(unsigned long len); > + > +#endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile > index 6a19ad9..ecfdc46 100644 > --- a/arch/x86/mm/Makefile > +++ b/arch/x86/mm/Makefile > @@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o > obj-$(CONFIG_NUMA_EMU) += numa_emulation.o > > obj-$(CONFIG_MEMTEST) += memtest.o > + > +obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c > new file mode 100644 > index 0000000..546c5d1 > --- /dev/null > +++ b/arch/x86/mm/mpx.c > @@ -0,0 +1,58 @@ > +#include > +#include > +#include > +#include > +#include > + > +/* > + * this is really a simplified "vm_mmap". it only handles mpx > + * related maps, including bounds table and bounds directory. > + * > + * here we can stick new vm_flag VM_MPX in the vma_area_struct > + * when create a bounds table or bounds directory, in order to > + * track MPX specific memory. > + */ > +unsigned long mpx_mmap(unsigned long len) > +{ > + unsigned long ret; > + unsigned long addr, pgoff; > + struct mm_struct *mm = current->mm; > + vm_flags_t vm_flags; > + > + /* Only bounds table and bounds directory can be allocated here */ > + if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES) > + return -EINVAL; > + > + down_write(&mm->mmap_sem); > + > + /* Too many mappings? */ > + if (mm->map_count > sysctl_max_map_count) { > + ret = -ENOMEM; > + goto out; > + } > + > + /* Obtain the address to map to. we verify (or select) it and ensure > + * that it represents a valid section of the address space. > + */ > + addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); > + if (addr & ~PAGE_MASK) { > + ret = addr; > + goto out; > + } > + > + vm_flags = VM_READ | VM_WRITE | VM_MPX | > + mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; > + > + /* Make bounds tables and bouds directory unlocked. */ > + if (vm_flags & VM_LOCKED) > + vm_flags &= ~VM_LOCKED; Why? I would expect MCL_FUTURE to lock these. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932330AbaFWTyw (ORCPT ); Mon, 23 Jun 2014 15:54:52 -0400 Received: from mail-pa0-f41.google.com ([209.85.220.41]:50971 "EHLO mail-pa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756817AbaFWTyv (ORCPT ); Mon, 23 Jun 2014 15:54:51 -0400 From: Andy Lutomirski X-Google-Original-From: Andy Lutomirski Message-ID: <53A88606.2050108@mit.edu> Date: Mon, 23 Jun 2014 12:54:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen CC: x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-5-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-5-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > This patch handles a #BR exception for non-existent tables by > carving the space out of the normal processes address space > (essentially calling mmap() from inside the kernel) and then > pointing the bounds-directory over to it. > > The tables need to be accessed and controlled by userspace > because the compiler generates instructions for MPX-enabled > code which frequently store and retrieve entries from the bounds > tables. Any direct kernel involvement (like a syscall) to access > the tables would destroy performance since these are so frequent. > > The tables are carved out of userspace because we have no better > spot to put them. For each pointer which is being tracked by MPX, > the bounds tables contain 4 longs worth of data, and the tables > are indexed virtually. If we were to preallocate the tables, we > would theoretically need to allocate 4x the virtual space that > we have available for userspace somewhere else. We don't have > that room in the kernel address space. > > Signed-off-by: Qiaowei Ren > --- > arch/x86/include/asm/mpx.h | 20 ++++++++++++++ > arch/x86/kernel/Makefile | 1 + > arch/x86/kernel/mpx.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/kernel/traps.c | 56 ++++++++++++++++++++++++++++++++++++++- > 4 files changed, 139 insertions(+), 1 deletions(-) > create mode 100644 arch/x86/kernel/mpx.c > > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > index 5725ac4..b7598ac 100644 > --- a/arch/x86/include/asm/mpx.h > +++ b/arch/x86/include/asm/mpx.h > @@ -18,6 +18,8 @@ > #define MPX_BT_ENTRY_SHIFT 5 > #define MPX_IGN_BITS 3 > > +#define MPX_BD_ENTRY_TAIL 3 > + > #else > > #define MPX_BD_ENTRY_OFFSET 20 > @@ -26,13 +28,31 @@ > #define MPX_BT_ENTRY_SHIFT 4 > #define MPX_IGN_BITS 2 > > +#define MPX_BD_ENTRY_TAIL 2 > + > #endif > > +#define MPX_BNDSTA_TAIL 2 > +#define MPX_BNDCFG_TAIL 12 > +#define MPX_BNDSTA_ADDR_MASK (~((1UL< +#define MPX_BNDCFG_ADDR_MASK (~((1UL< +#define MPX_BT_ADDR_MASK (~((1UL< + > #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT)) > #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT)) > > #define MPX_BNDSTA_ERROR_CODE 0x3 > +#define MPX_BD_ENTRY_VALID_FLAG 0x1 > > unsigned long mpx_mmap(unsigned long len); > > +#ifdef CONFIG_X86_INTEL_MPX > +int do_mpx_bt_fault(struct xsave_struct *xsave_buf); > +#else > +static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf) > +{ > + return -EINVAL; > +} > +#endif /* CONFIG_X86_INTEL_MPX */ > + > #endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index f4d9600..3e81aed 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -41,6 +41,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o > > obj-y += process.o > obj-y += i387.o xsave.o > +obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > obj-y += ptrace.o > obj-$(CONFIG_X86_32) += tls.o > obj-$(CONFIG_IA32_EMULATION) += tls.o > diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c > new file mode 100644 > index 0000000..4230c7b > --- /dev/null > +++ b/arch/x86/kernel/mpx.c > @@ -0,0 +1,63 @@ > +#include > +#include > +#include > + > +static int allocate_bt(long __user *bd_entry) > +{ > + unsigned long bt_addr, old_val = 0; > + int ret = 0; > + > + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); > + if (IS_ERR((void *)bt_addr)) { > + pr_err("Bounds table allocation failed at entry addr %p\n", > + bd_entry); > + return bt_addr; > + } > + bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG; > + > + ret = user_atomic_cmpxchg_inatomic(&old_val, bd_entry, 0, bt_addr); > + if (ret) > + goto out; > + > + /* > + * there is a existing bounds table pointed at this bounds > + * directory entry, and so we need to free the bounds table > + * allocated just now. > + */ > + if (old_val) > + goto out; > + > + pr_debug("Allocate bounds table %lx at entry %p\n", > + bt_addr, bd_entry); > + return 0; > + > +out: > + vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES); > + return ret; > +} > + > +/* > + * When a BNDSTX instruction attempts to save bounds to a BD entry > + * with the lack of the valid bit being set, a #BR is generated. > + * This is an indication that no BT exists for this entry. In this > + * case the fault handler will allocate a new BT. > + * > + * With 32-bit mode, the size of BD is 4MB, and the size of each > + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB, > + * and the size of each bound table is 4MB. > + */ > +int do_mpx_bt_fault(struct xsave_struct *xsave_buf) > +{ > + unsigned long status; > + unsigned long bd_entry, bd_base; > + > + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; > + status = xsave_buf->bndcsr.status_reg; > + > + bd_entry = status & MPX_BNDSTA_ADDR_MASK; > + if ((bd_entry < bd_base) || > + (bd_entry >= bd_base + MPX_BD_SIZE_BYTES)) > + return -EINVAL; > + > + return allocate_bt((long __user *)bd_entry); > +} > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c > index f73b5d4..35b9b29 100644 > --- a/arch/x86/kernel/traps.c > +++ b/arch/x86/kernel/traps.c > @@ -59,6 +59,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_X86_64 > #include > @@ -213,7 +214,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ > > DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip ) > DO_ERROR (X86_TRAP_OF, SIGSEGV, "overflow", overflow ) > -DO_ERROR (X86_TRAP_BR, SIGSEGV, "bounds", bounds ) > DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip ) > DO_ERROR (X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun ) > DO_ERROR (X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS ) > @@ -263,6 +263,60 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) > } > #endif > > +dotraplinkage void do_bounds(struct pt_regs *regs, long error_code) > +{ > + enum ctx_state prev_state; > + unsigned long status; > + struct xsave_struct *xsave_buf; > + struct task_struct *tsk = current; > + > + prev_state = exception_enter(); > + if (notify_die(DIE_TRAP, "bounds", regs, error_code, > + X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP) > + goto exit; > + conditional_sti(regs); > + > + if (!user_mode(regs)) > + die("bounds", regs, error_code); Does this need to be user_mode_vm? > + > + if (!cpu_has_mpx) { > + /* The exception is not from Intel MPX */ > + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); > + goto exit; > + } > + > + fpu_xsave(&tsk->thread.fpu); > + xsave_buf = &(tsk->thread.fpu.state->xsave); > + status = xsave_buf->bndcsr.status_reg; > + > + /* > + * The error code field of the BNDSTATUS register communicates status > + * information of a bound range exception #BR or operation involving > + * bound directory. > + */ > + switch (status & MPX_BNDSTA_ERROR_CODE) { > + case 2: > + /* > + * Bound directory has invalid entry. > + * No signal will be sent to the user space. This comment is a lie. > + */ > + if (do_mpx_bt_fault(xsave_buf)) > + force_sig(SIGBUS, tsk); Would it make sense to assign and use a new si_code value here? --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756839AbaFWT5Z (ORCPT ); Mon, 23 Jun 2014 15:57:25 -0400 Received: from mail-pa0-f48.google.com ([209.85.220.48]:53814 "EHLO mail-pa0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753812AbaFWT5Y (ORCPT ); Mon, 23 Jun 2014 15:57:24 -0400 From: Andy Lutomirski X-Google-Original-From: Andy Lutomirski Message-ID: <53A886A0.1010508@mit.edu> Date: Mon, 23 Jun 2014 12:57:20 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen CC: x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 09/10] x86, mpx: cleanup unused bound tables References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-10-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-10-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > When user memory region is unmapped, related bound tables > become unused and need to be released also. This patch cleanups > these unused bound tables through hooking unmap path. This is not a full review at all -- I don't understand the mm stuff well enough. > + > + /* > + * If this bounds directory entry is nonzero, and meanwhile > + * the valid bit is zero, one SIGSEGV will be produced due to > + * this unexpected situation. > + */ > + if (!(*valid) && *bt_addr) > + force_sig(SIGSEGV, current); force_sig from a syscall handler seems rather weird. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756768AbaFWUAQ (ORCPT ); Mon, 23 Jun 2014 16:00:16 -0400 Received: from mail-pb0-f50.google.com ([209.85.160.50]:47111 "EHLO mail-pb0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753812AbaFWUAO (ORCPT ); Mon, 23 Jun 2014 16:00:14 -0400 From: Andy Lutomirski X-Google-Original-From: Andy Lutomirski Message-ID: <53A8874A.3050301@mit.edu> Date: Mon, 23 Jun 2014 13:00:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Dave Hansen CC: x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() > commands. These commands can be used to register and unregister MPX > related resource on the x86 platform. > > The base of the bounds directory is set into mm_struct during > PR_MPX_REGISTER command execution. This member can be used to > check whether one application is mpx enabled. The register and unregister operations seem to be almost the same thing. How about just PR_SYNC_MPX? Also, would this make more sense in arch_prctl? --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756879AbaFWUJO (ORCPT ); Mon, 23 Jun 2014 16:09:14 -0400 Received: from mga09.intel.com ([134.134.136.24]:61881 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753812AbaFWUJN (ORCPT ); Mon, 23 Jun 2014 16:09:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,532,1400050800"; d="scan'208";a="562070122" Message-ID: <53A88966.5080400@intel.com> Date: Mon, 23 Jun 2014 13:09:10 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski , Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar CC: x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> <53A8874A.3050301@mit.edu> In-Reply-To: <53A8874A.3050301@mit.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23/2014 01:00 PM, Andy Lutomirski wrote: > On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >> > This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() >> > commands. These commands can be used to register and unregister MPX >> > related resource on the x86 platform. >> > >> > The base of the bounds directory is set into mm_struct during >> > PR_MPX_REGISTER command execution. This member can be used to >> > check whether one application is mpx enabled. > The register and unregister operations seem to be almost the same thing. > How about just PR_SYNC_MPX? That wouldn't support a usage model where userspace wanted to keep using MPX, but wanted the kernel to butt out and not try to free unused bounds tables. That's not super-important, but it does lose some level of flexibility. FWIW, I think it would also be handy to support a PR_MPX_DISABLE prctl too. That way, a wrapper program could set a flag that any children could notice if they try a PR_MPX_REGISTER. That way we could software-disable MPX in cases in a process tree where it was not wanted. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756783AbaFWUDi (ORCPT ); Mon, 23 Jun 2014 16:03:38 -0400 Received: from mga09.intel.com ([134.134.136.24]:60923 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753073AbaFWUDh (ORCPT ); Mon, 23 Jun 2014 16:03:37 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,532,1400050800"; d="scan'208";a="562067770" Message-ID: <53A88806.1060908@intel.com> Date: Mon, 23 Jun 2014 13:03:18 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski , Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar CC: x86@kernel.org, linux-kernel@vger.kernel.org, Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> In-Reply-To: <53A884B2.5070702@mit.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23/2014 12:49 PM, Andy Lutomirski wrote: > On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >> This patch adds one MPX specific mmap interface, which only handles >> mpx related maps, including bounds table and bounds directory. >> >> In order to track MPX specific memory usage, this interface is added >> to stick new vm_flag VM_MPX in the vma_area_struct when create a >> bounds table or bounds directory. > > I imagine the linux-mm people would want to think about any new vm flag. > Why is this needed? These tables can take huge amounts of memory. In the worst-case scenario, the tables can be 4x the size of the data structure being tracked. IOW, a 1-page structure can require 4 bounds-table pages. My expectation is that folks using MPX are going to be keen on figuring out how much memory is being dedicated to it. With this feature, plus some grepping in /proc/$pid/smaps one could take a pretty good stab at it. I know VM flags are scarce, and I'm open to other ways to skin this cat. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756854AbaFWUHA (ORCPT ); Mon, 23 Jun 2014 16:07:00 -0400 Received: from mail-lb0-f177.google.com ([209.85.217.177]:38357 "EHLO mail-lb0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753812AbaFWUG7 (ORCPT ); Mon, 23 Jun 2014 16:06:59 -0400 MIME-Version: 1.0 In-Reply-To: <53A88806.1060908@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> From: Andy Lutomirski Date: Mon, 23 Jun 2014 13:06:37 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 23, 2014 at 1:03 PM, Dave Hansen wrote: > On 06/23/2014 12:49 PM, Andy Lutomirski wrote: >> On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >>> This patch adds one MPX specific mmap interface, which only handles >>> mpx related maps, including bounds table and bounds directory. >>> >>> In order to track MPX specific memory usage, this interface is added >>> to stick new vm_flag VM_MPX in the vma_area_struct when create a >>> bounds table or bounds directory. >> >> I imagine the linux-mm people would want to think about any new vm flag. >> Why is this needed? > > These tables can take huge amounts of memory. In the worst-case > scenario, the tables can be 4x the size of the data structure being > tracked. IOW, a 1-page structure can require 4 bounds-table pages. > > My expectation is that folks using MPX are going to be keen on figuring > out how much memory is being dedicated to it. With this feature, plus > some grepping in /proc/$pid/smaps one could take a pretty good stab at it. > > I know VM flags are scarce, and I'm open to other ways to skin this cat. > Can the new vm_operation "name" be use for this? The magic "always written to core dumps" feature might need to be reconsidered. There's also arch_vma_name, but I just finished removing for x86, and I'd be a little sad to see it come right back. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932392AbaFWU2g (ORCPT ); Mon, 23 Jun 2014 16:28:36 -0400 Received: from mga09.intel.com ([134.134.136.24]:41060 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755379AbaFWU2f (ORCPT ); Mon, 23 Jun 2014 16:28:35 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,532,1400050800"; d="scan'208";a="562077486" Message-ID: <53A88DE4.8050107@intel.com> Date: Mon, 23 Jun 2014 13:28:20 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23/2014 01:06 PM, Andy Lutomirski wrote: > Can the new vm_operation "name" be use for this? The magic "always > written to core dumps" feature might need to be reconsidered. One thing I'd like to avoid is an MPX vma getting merged with a non-MPX vma. I don't see any code to prevent two VMAs with different vm_ops->names from getting merged. That seems like a bit of a design oversight for ->name. Right? Thinking out loud a bit... There are also some more complicated but more performant cleanup mechanisms that I'd like to go after in the future. Given a page, we might want to figure out if it is an MPX page or not. I wonder if we'll ever collide with some other user of vm_ops->name. It looks fairly narrowly used at the moment, but would this keep us from putting these pages on, say, a tmpfs mount? Doesn't look that way at the moment. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754032AbaFWVEp (ORCPT ); Mon, 23 Jun 2014 17:04:45 -0400 Received: from mail-lb0-f180.google.com ([209.85.217.180]:39879 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752821AbaFWVEn (ORCPT ); Mon, 23 Jun 2014 17:04:43 -0400 MIME-Version: 1.0 In-Reply-To: <53A88DE4.8050107@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> From: Andy Lutomirski Date: Mon, 23 Jun 2014 14:04:22 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 23, 2014 at 1:28 PM, Dave Hansen wrote: > On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >> Can the new vm_operation "name" be use for this? The magic "always >> written to core dumps" feature might need to be reconsidered. > > One thing I'd like to avoid is an MPX vma getting merged with a non-MPX > vma. I don't see any code to prevent two VMAs with different > vm_ops->names from getting merged. That seems like a bit of a design > oversight for ->name. Right? AFAIK there are no ->name users that don't also set ->close, for exactly that reason. I'd be okay with adding a check for ->name, too. Hmm. If MPX vmas had a real struct file attached, this would all come for free. Maybe vmas with non-default vm_ops and file != NULL should never be mergeable? > > Thinking out loud a bit... There are also some more complicated but more > performant cleanup mechanisms that I'd like to go after in the future. > Given a page, we might want to figure out if it is an MPX page or not. > I wonder if we'll ever collide with some other user of vm_ops->name. It > looks fairly narrowly used at the moment, but would this keep us from > putting these pages on, say, a tmpfs mount? Doesn't look that way at > the moment. You could always check the vm_ops pointer to see if it's MPX. One feature I've wanted: a way to have special per-process vmas that can be easily found. For example, I want to be able to efficiently find out where the vdso and vvar vmas are. I don't think this is currently supported. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755776AbaFWWBM (ORCPT ); Mon, 23 Jun 2014 18:01:12 -0400 Received: from mail-lb0-f171.google.com ([209.85.217.171]:58403 "EHLO mail-lb0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753505AbaFWWBI (ORCPT ); Mon, 23 Jun 2014 18:01:08 -0400 MIME-Version: 1.0 In-Reply-To: <53A88966.5080400@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> <53A8874A.3050301@mit.edu> <53A88966.5080400@intel.com> From: Andy Lutomirski Date: Mon, 23 Jun 2014 15:00:46 -0700 Message-ID: Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER To: Dave Hansen Cc: "H. Peter Anvin" , X86 ML , Thomas Gleixner , Qiaowei Ren , "linux-kernel@vger.kernel.org" , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Jun 23, 2014 1:09 PM, "Dave Hansen" wrote: > > On 06/23/2014 01:00 PM, Andy Lutomirski wrote: > > On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > >> > This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() > >> > commands. These commands can be used to register and unregister MPX > >> > related resource on the x86 platform. > >> > > >> > The base of the bounds directory is set into mm_struct during > >> > PR_MPX_REGISTER command execution. This member can be used to > >> > check whether one application is mpx enabled. > > The register and unregister operations seem to be almost the same thing. > > How about just PR_SYNC_MPX? > > That wouldn't support a usage model where userspace wanted to keep using > MPX, but wanted the kernel to butt out and not try to free unused bounds > tables. That's not super-important, but it does lose some level of > flexibility. > Hmm. How about PR_SET/GET_MPX_BOUNDS_TABLE, to update the kernel's copy. No fpu magic needed. This has an added benefit: CRIU will need updating for MPX, and they'll appreciate having the required interface already exist. (They'll want a way to allocate "MPX" memory, too, but that's probably somewhat less important, and it won't result in duplicated functionality.) > FWIW, I think it would also be handy to support a PR_MPX_DISABLE prctl > too. That way, a wrapper program could set a flag that any children > could notice if they try a PR_MPX_REGISTER. That way we could > software-disable MPX in cases in a process tree where it was not wanted. Seccomp can do this :) --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752784AbaFWXm0 (ORCPT ); Mon, 23 Jun 2014 19:42:26 -0400 Received: from mga02.intel.com ([134.134.136.20]:64635 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750931AbaFWXmZ (ORCPT ); Mon, 23 Jun 2014 19:42:25 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,533,1400050800"; d="scan'208";a="562162356" Message-ID: <53A8BB5A.5090103@intel.com> Date: Mon, 23 Jun 2014 16:42:18 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: "H. Peter Anvin" , X86 ML , Thomas Gleixner , Qiaowei Ren , "linux-kernel@vger.kernel.org" , Ingo Molnar Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> <53A8874A.3050301@mit.edu> <53A88966.5080400@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23/2014 03:00 PM, Andy Lutomirski wrote: > Hmm. How about PR_SET/GET_MPX_BOUNDS_TABLE, to update the kernel's > copy. No fpu magic needed. > > This has an added benefit: CRIU will need updating for MPX, and > they'll appreciate having the required interface already exist. > (They'll want a way to allocate "MPX" memory, too, but that's probably > somewhat less important, and it won't result in duplicated > functionality.) I like the idea of the most minimal interface possible. If the kernel ever needed or wanted to cache more of the register setup, we wouldn't need to change the interface. For CRIU, I don't know much about the phases of how it sets itself up, but I guess the difference would be whether userspace has to do a register save and restore and a prctl() or just a plain prctl() with extra arguments. Doesn't seem fundamentally different to me. BTW, it's not a pointer to a bounds table, it's the bounds directory. There are two levels of the tables. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753051AbaFXABp (ORCPT ); Mon, 23 Jun 2014 20:01:45 -0400 Received: from mail-lb0-f182.google.com ([209.85.217.182]:63614 "EHLO mail-lb0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751242AbaFXABn (ORCPT ); Mon, 23 Jun 2014 20:01:43 -0400 MIME-Version: 1.0 In-Reply-To: <53A8BB5A.5090103@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> <53A8874A.3050301@mit.edu> <53A88966.5080400@intel.com> <53A8BB5A.5090103@intel.com> From: Andy Lutomirski Date: Mon, 23 Jun 2014 17:01:22 -0700 Message-ID: Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER To: Dave Hansen Cc: "H. Peter Anvin" , X86 ML , Thomas Gleixner , Qiaowei Ren , "linux-kernel@vger.kernel.org" , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 23, 2014 at 4:42 PM, Dave Hansen wrote: > On 06/23/2014 03:00 PM, Andy Lutomirski wrote: >> Hmm. How about PR_SET/GET_MPX_BOUNDS_TABLE, to update the kernel's >> copy. No fpu magic needed. >> >> This has an added benefit: CRIU will need updating for MPX, and >> they'll appreciate having the required interface already exist. >> (They'll want a way to allocate "MPX" memory, too, but that's probably >> somewhat less important, and it won't result in duplicated >> functionality.) > > I like the idea of the most minimal interface possible. If the kernel > ever needed or wanted to cache more of the register setup, we wouldn't > need to change the interface. For CRIU, I don't know much about the > phases of how it sets itself up, but I guess the difference would be > whether userspace has to do a register save and restore and a prctl() or > just a plain prctl() with extra arguments. Doesn't seem fundamentally > different to me. > > BTW, it's not a pointer to a bounds table, it's the bounds directory. > There are two levels of the tables. I suspect that the existence of the get operation matters more. What if the checkpointed process has the cached copy out of sync with the register copy? More realistically, what if the checkpointed process doesn't want to use kernel MPX assistance at all? CRIU won't be able to detect this with the current proposed interface. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753321AbaFXAKt (ORCPT ); Mon, 23 Jun 2014 20:10:49 -0400 Received: from mga02.intel.com ([134.134.136.20]:14802 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752327AbaFXAKs (ORCPT ); Mon, 23 Jun 2014 20:10:48 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,533,1400050800"; d="scan'208";a="562173371" Message-ID: <53A8C1EF.7030309@intel.com> Date: Mon, 23 Jun 2014 17:10:23 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: "H. Peter Anvin" , X86 ML , Thomas Gleixner , Qiaowei Ren , "linux-kernel@vger.kernel.org" , Ingo Molnar , Pavel Emelyanov Subject: Re: [PATCH v6 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-9-git-send-email-qiaowei.ren@intel.com> <53A8874A.3050301@mit.edu> <53A88966.5080400@intel.com> <53A8BB5A.5090103@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23/2014 05:01 PM, Andy Lutomirski wrote: > I suspect that the existence of the get operation matters more. What > if the checkpointed process has the cached copy out of sync with the > register copy? More realistically, what if the checkpointed process > doesn't want to use kernel MPX assistance at all? CRIU won't be able > to detect this with the current proposed interface. That's fair criticism. Pavel, do you have any concrete suggestions on what kind of interface would be nice for checkpoint/restart? Or is this something you will just tackle down the road some other way? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751161AbaFXByc (ORCPT ); Mon, 23 Jun 2014 21:54:32 -0400 Received: from mga02.intel.com ([134.134.136.20]:62543 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750827AbaFXByb convert rfc822-to-8bit (ORCPT ); Mon, 23 Jun 2014 21:54:31 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,534,1400050800"; d="scan'208";a="562210976" From: "Ren, Qiaowei" To: Andy Lutomirski , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "Hansen, Dave" CC: "x86@kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables Thread-Topic: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables Thread-Index: AQHPitp1GK5+vOkvwEyRhjxb8GKmZpt+niYAgADni8A= Date: Tue, 24 Jun 2014 01:53:18 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016AF270@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-5-git-send-email-qiaowei.ren@intel.com> <53A88606.2050108@mit.edu> In-Reply-To: <53A88606.2050108@mit.edu> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2014-06-24, Andy Lutomirski wrote: > On 06/18/2014 02:44 AM, Qiaowei Ren wrote: >> + /* >> + * The error code field of the BNDSTATUS register communicates status >> + * information of a bound range exception #BR or operation involving >> + * bound directory. >> + */ >> + switch (status & MPX_BNDSTA_ERROR_CODE) { >> + case 2: >> + /* >> + * Bound directory has invalid entry. >> + * No signal will be sent to the user space. > > This comment is a lie. > Hmm, thanks. >> + */ >> + if (do_mpx_bt_fault(xsave_buf)) >> + force_sig(SIGBUS, tsk); > > Would it make sense to assign and use a new si_code value here? > There is a new si_code SEGV_BNDERR for bounds violation reported by MPX. But in this case, it is mainly due to the failure caused by allocation of bounds table. I guess it is not necessary to add another new si_code value. Thanks, Qiaowei From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751200AbaFXCxT (ORCPT ); Mon, 23 Jun 2014 22:53:19 -0400 Received: from mga03.intel.com ([143.182.124.21]:10338 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750827AbaFXCxS convert rfc822-to-8bit (ORCPT ); Mon, 23 Jun 2014 22:53:18 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,535,1400050800"; d="scan'208";a="449111299" From: "Ren, Qiaowei" To: Andy Lutomirski , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "Hansen, Dave" CC: "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Linux MM Subject: RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Thread-Topic: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Thread-Index: AQHPitp0g7xixZVNTkiLN9tFeBCqMZt+nJAAgAD2LmA= Date: Tue, 24 Jun 2014 02:53:12 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016AF2DB@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> In-Reply-To: <53A884B2.5070702@mit.edu> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2014-06-24, Andy Lutomirski wrote: >> + /* Make bounds tables and bouds directory unlocked. */ >> + if (vm_flags & VM_LOCKED) >> + vm_flags &= ~VM_LOCKED; > > Why? I would expect MCL_FUTURE to lock these. > Andy, I was just a little confused about LOCKED & POPULATE earlier and I thought VM_LOCKED is not necessary for MPX specific bounds tables. Now, this checking should be removed, and there should be mm_populate() for VM_LOCKED case after mmap_region(): if (!IS_ERR_VALUE(addr) && (vm_flags & VM_LOCKED)) mm_populate(addr, len); Thanks, Qiaowei From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751845AbaFXF6T (ORCPT ); Tue, 24 Jun 2014 01:58:19 -0400 Received: from mga11.intel.com ([192.55.52.93]:52866 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751155AbaFXF6R (ORCPT ); Tue, 24 Jun 2014 01:58:17 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,536,1400050800"; d="scan'208";a="552373529" From: "Ren, Qiaowei" To: Andy Lutomirski , "Hansen, Dave" CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Thread-Topic: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Thread-Index: AQHPitp0g7xixZVNTkiLN9tFeBCqMZt+nJAAgAAD+ACAAADtgIAABhEAgAAKEgCAARYTkA== Date: Tue, 24 Jun 2014 05:53:46 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id s5O5wTim025211 On 2014-06-24, Andy Lutomirski wrote: >> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>> Can the new vm_operation "name" be use for this? The magic "always >>> written to core dumps" feature might need to be reconsidered. >> >> One thing I'd like to avoid is an MPX vma getting merged with a >> non-MPX vma. I don't see any code to prevent two VMAs with >> different vm_ops->names from getting merged. That seems like a bit >> of a design oversight for ->name. Right? > > AFAIK there are no ->name users that don't also set ->close, for > exactly that reason. I'd be okay with adding a check for ->name, too. > > Hmm. If MPX vmas had a real struct file attached, this would all come > for free. Maybe vmas with non-default vm_ops and file != NULL should > never be mergeable? > >> >> Thinking out loud a bit... There are also some more complicated but >> more performant cleanup mechanisms that I'd like to go after in the future. >> Given a page, we might want to figure out if it is an MPX page or not. >> I wonder if we'll ever collide with some other user of vm_ops->name. >> It looks fairly narrowly used at the moment, but would this keep us >> from putting these pages on, say, a tmpfs mount? Doesn't look that >> way at the moment. > > You could always check the vm_ops pointer to see if it's MPX. > > One feature I've wanted: a way to have special per-process vmas that > can be easily found. For example, I want to be able to efficiently > find out where the vdso and vvar vmas are. I don't think this is currently supported. > Andy, if you add a check for ->name to avoid the MPX vmas merged with non-MPX vmas, I guess the work flow should be as follow (use _install_special_mapping to get a new vma): unsigned long mpx_mmap(unsigned long len) { ...... static struct vm_special_mapping mpx_mapping = { .name = "[mpx]", .pages = no_pages, }; ....... vma = _install_special_mapping(mm, addr, len, vm_flags, &mpx_mapping); ...... } Then, we could check the ->name to see if the VMA is MPX specific. Right? Thanks, Qiaowei {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752704AbaFXX4G (ORCPT ); Tue, 24 Jun 2014 19:56:06 -0400 Received: from mail-la0-f54.google.com ([209.85.215.54]:35057 "EHLO mail-la0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751211AbaFXX4E (ORCPT ); Tue, 24 Jun 2014 19:56:04 -0400 MIME-Version: 1.0 In-Reply-To: <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> From: Andy Lutomirski Date: Tue, 24 Jun 2014 16:55:41 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: "Ren, Qiaowei" Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei wrote: > On 2014-06-24, Andy Lutomirski wrote: >>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>> Can the new vm_operation "name" be use for this? The magic "always >>>> written to core dumps" feature might need to be reconsidered. >>> >>> One thing I'd like to avoid is an MPX vma getting merged with a >>> non-MPX vma. I don't see any code to prevent two VMAs with >>> different vm_ops->names from getting merged. That seems like a bit >>> of a design oversight for ->name. Right? >> >> AFAIK there are no ->name users that don't also set ->close, for >> exactly that reason. I'd be okay with adding a check for ->name, too. >> >> Hmm. If MPX vmas had a real struct file attached, this would all come >> for free. Maybe vmas with non-default vm_ops and file != NULL should >> never be mergeable? >> >>> >>> Thinking out loud a bit... There are also some more complicated but >>> more performant cleanup mechanisms that I'd like to go after in the future. >>> Given a page, we might want to figure out if it is an MPX page or not. >>> I wonder if we'll ever collide with some other user of vm_ops->name. >>> It looks fairly narrowly used at the moment, but would this keep us >>> from putting these pages on, say, a tmpfs mount? Doesn't look that >>> way at the moment. >> >> You could always check the vm_ops pointer to see if it's MPX. >> >> One feature I've wanted: a way to have special per-process vmas that >> can be easily found. For example, I want to be able to efficiently >> find out where the vdso and vvar vmas are. I don't think this is currently supported. >> > Andy, if you add a check for ->name to avoid the MPX vmas merged with non-MPX vmas, I guess the work flow should be as follow (use _install_special_mapping to get a new vma): > > unsigned long mpx_mmap(unsigned long len) > { > ...... > static struct vm_special_mapping mpx_mapping = { > .name = "[mpx]", > .pages = no_pages, > }; > > ....... > vma = _install_special_mapping(mm, addr, len, vm_flags, &mpx_mapping); > ...... > } > > Then, we could check the ->name to see if the VMA is MPX specific. Right? Does this actually create a vma backed with real memory? Doesn't this need to go through anon_vma or something? _install_special_mapping completely prevents merging. Possibly silly question: would it make more sense to just create one giant vma for the MPX tables and only populate pieces of it as needed? This wouldn't work for 32-bit code, but maybe we don't care. (I see no reason why it couldn't work for x32, though.) (I don't really understand how anonymous memory works at all. I'm not an mm person.) --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754295AbaFYBlV (ORCPT ); Tue, 24 Jun 2014 21:41:21 -0400 Received: from mga02.intel.com ([134.134.136.20]:13776 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751890AbaFYBlU (ORCPT ); Tue, 24 Jun 2014 21:41:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,541,1400050800"; d="scan'208";a="533659820" From: "Ren, Qiaowei" To: Andy Lutomirski CC: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: RE: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Thread-Topic: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface Thread-Index: AQHPitp0g7xixZVNTkiLN9tFeBCqMZt+nJAAgAAD+ACAAADtgIAABhEAgAAKEgCAARYTkIAArB+AgACg/iA= Date: Wed, 25 Jun 2014 01:40:48 +0000 Message-ID: <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id s5P1fQ3H002998 On 2014-06-25, Andy Lutomirski wrote: > On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei > wrote: >> On 2014-06-24, Andy Lutomirski wrote: >>>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>>> Can the new vm_operation "name" be use for this? The magic >>>>> "always written to core dumps" feature might need to be reconsidered. >>>> >>>> One thing I'd like to avoid is an MPX vma getting merged with a >>>> non-MPX vma. I don't see any code to prevent two VMAs with >>>> different vm_ops->names from getting merged. That seems like a >>>> bit of a design oversight for ->name. Right? >>> >>> AFAIK there are no ->name users that don't also set ->close, for >>> exactly that reason. I'd be okay with adding a check for ->name, too. >>> >>> Hmm. If MPX vmas had a real struct file attached, this would all >>> come for free. Maybe vmas with non-default vm_ops and file != NULL >>> should never be mergeable? >>> >>>> >>>> Thinking out loud a bit... There are also some more complicated >>>> but more performant cleanup mechanisms that I'd like to go after in the future. >>>> Given a page, we might want to figure out if it is an MPX page or not. >>>> I wonder if we'll ever collide with some other user of vm_ops->name. >>>> It looks fairly narrowly used at the moment, but would this keep >>>> us from putting these pages on, say, a tmpfs mount? Doesn't look >>>> that way at the moment. >>> >>> You could always check the vm_ops pointer to see if it's MPX. >>> >>> One feature I've wanted: a way to have special per-process vmas that >>> can be easily found. For example, I want to be able to efficiently >>> find out where the vdso and vvar vmas are. I don't think this is >>> currently supported. >>> >> Andy, if you add a check for ->name to avoid the MPX vmas merged >> with > non-MPX vmas, I guess the work flow should be as follow (use > _install_special_mapping to get a new vma): >> >> unsigned long mpx_mmap(unsigned long len) { >> ...... >> static struct vm_special_mapping mpx_mapping = { >> .name = "[mpx]", >> .pages = no_pages, >> }; >> >> ....... vma = _install_special_mapping(mm, addr, len, vm_flags, >> &mpx_mapping); ...... >> } >> >> Then, we could check the ->name to see if the VMA is MPX specific. Right? > > Does this actually create a vma backed with real memory? Doesn't this > need to go through anon_vma or something? _install_special_mapping > completely prevents merging. > Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. Thanks, Qiaowei {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755663AbaFYVFB (ORCPT ); Wed, 25 Jun 2014 17:05:01 -0400 Received: from mail-la0-f47.google.com ([209.85.215.47]:38546 "EHLO mail-la0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755392AbaFYVE7 (ORCPT ); Wed, 25 Jun 2014 17:04:59 -0400 MIME-Version: 1.0 In-Reply-To: <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> From: Andy Lutomirski Date: Wed, 25 Jun 2014 14:04:36 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: "Ren, Qiaowei" Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 24, 2014 at 6:40 PM, Ren, Qiaowei wrote: > On 2014-06-25, Andy Lutomirski wrote: >> On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei >> wrote: >>> On 2014-06-24, Andy Lutomirski wrote: >>>>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>>>> Can the new vm_operation "name" be use for this? The magic >>>>>> "always written to core dumps" feature might need to be reconsidered. >>>>> >>>>> One thing I'd like to avoid is an MPX vma getting merged with a >>>>> non-MPX vma. I don't see any code to prevent two VMAs with >>>>> different vm_ops->names from getting merged. That seems like a >>>>> bit of a design oversight for ->name. Right? >>>> >>>> AFAIK there are no ->name users that don't also set ->close, for >>>> exactly that reason. I'd be okay with adding a check for ->name, too. >>>> >>>> Hmm. If MPX vmas had a real struct file attached, this would all >>>> come for free. Maybe vmas with non-default vm_ops and file != NULL >>>> should never be mergeable? >>>> >>>>> >>>>> Thinking out loud a bit... There are also some more complicated >>>>> but more performant cleanup mechanisms that I'd like to go after in the future. >>>>> Given a page, we might want to figure out if it is an MPX page or not. >>>>> I wonder if we'll ever collide with some other user of vm_ops->name. >>>>> It looks fairly narrowly used at the moment, but would this keep >>>>> us from putting these pages on, say, a tmpfs mount? Doesn't look >>>>> that way at the moment. >>>> >>>> You could always check the vm_ops pointer to see if it's MPX. >>>> >>>> One feature I've wanted: a way to have special per-process vmas that >>>> can be easily found. For example, I want to be able to efficiently >>>> find out where the vdso and vvar vmas are. I don't think this is >>>> currently supported. >>>> >>> Andy, if you add a check for ->name to avoid the MPX vmas merged >>> with >> non-MPX vmas, I guess the work flow should be as follow (use >> _install_special_mapping to get a new vma): >>> >>> unsigned long mpx_mmap(unsigned long len) { >>> ...... >>> static struct vm_special_mapping mpx_mapping = { >>> .name = "[mpx]", >>> .pages = no_pages, >>> }; >>> >>> ....... vma = _install_special_mapping(mm, addr, len, vm_flags, >>> &mpx_mapping); ...... >>> } >>> >>> Then, we could check the ->name to see if the VMA is MPX specific. Right? >> >> Does this actually create a vma backed with real memory? Doesn't this >> need to go through anon_vma or something? _install_special_mapping >> completely prevents merging. >> > Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. > > So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. You may need to add one. I'd suggest posting a new thread to linux-mm describing what you need and asking how to do it. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755862AbaFYVFu (ORCPT ); Wed, 25 Jun 2014 17:05:50 -0400 Received: from mail-la0-f53.google.com ([209.85.215.53]:63257 "EHLO mail-la0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754214AbaFYVFt (ORCPT ); Wed, 25 Jun 2014 17:05:49 -0400 MIME-Version: 1.0 In-Reply-To: References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> From: Andy Lutomirski Date: Wed, 25 Jun 2014 14:05:27 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: "Ren, Qiaowei" Cc: "Hansen, Dave" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 25, 2014 at 2:04 PM, Andy Lutomirski wrote: > On Tue, Jun 24, 2014 at 6:40 PM, Ren, Qiaowei wrote: >> On 2014-06-25, Andy Lutomirski wrote: >>> On Mon, Jun 23, 2014 at 10:53 PM, Ren, Qiaowei >>> wrote: >>>> On 2014-06-24, Andy Lutomirski wrote: >>>>>> On 06/23/2014 01:06 PM, Andy Lutomirski wrote: >>>>>>> Can the new vm_operation "name" be use for this? The magic >>>>>>> "always written to core dumps" feature might need to be reconsidered. >>>>>> >>>>>> One thing I'd like to avoid is an MPX vma getting merged with a >>>>>> non-MPX vma. I don't see any code to prevent two VMAs with >>>>>> different vm_ops->names from getting merged. That seems like a >>>>>> bit of a design oversight for ->name. Right? >>>>> >>>>> AFAIK there are no ->name users that don't also set ->close, for >>>>> exactly that reason. I'd be okay with adding a check for ->name, too. >>>>> >>>>> Hmm. If MPX vmas had a real struct file attached, this would all >>>>> come for free. Maybe vmas with non-default vm_ops and file != NULL >>>>> should never be mergeable? >>>>> >>>>>> >>>>>> Thinking out loud a bit... There are also some more complicated >>>>>> but more performant cleanup mechanisms that I'd like to go after in the future. >>>>>> Given a page, we might want to figure out if it is an MPX page or not. >>>>>> I wonder if we'll ever collide with some other user of vm_ops->name. >>>>>> It looks fairly narrowly used at the moment, but would this keep >>>>>> us from putting these pages on, say, a tmpfs mount? Doesn't look >>>>>> that way at the moment. >>>>> >>>>> You could always check the vm_ops pointer to see if it's MPX. >>>>> >>>>> One feature I've wanted: a way to have special per-process vmas that >>>>> can be easily found. For example, I want to be able to efficiently >>>>> find out where the vdso and vvar vmas are. I don't think this is >>>>> currently supported. >>>>> >>>> Andy, if you add a check for ->name to avoid the MPX vmas merged >>>> with >>> non-MPX vmas, I guess the work flow should be as follow (use >>> _install_special_mapping to get a new vma): >>>> >>>> unsigned long mpx_mmap(unsigned long len) { >>>> ...... >>>> static struct vm_special_mapping mpx_mapping = { >>>> .name = "[mpx]", >>>> .pages = no_pages, >>>> }; >>>> >>>> ....... vma = _install_special_mapping(mm, addr, len, vm_flags, >>>> &mpx_mapping); ...... >>>> } >>>> >>>> Then, we could check the ->name to see if the VMA is MPX specific. Right? >>> >>> Does this actually create a vma backed with real memory? Doesn't this >>> need to go through anon_vma or something? _install_special_mapping >>> completely prevents merging. >>> >> Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. >> >> So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. > > You may need to add one. > > I'd suggest posting a new thread to linux-mm describing what you need > and asking how to do it. Hmm. the memfd_create thing may be able to do this for you. If you created a per-mm memfd and mapped it, it all just might work. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756144AbaFYVnP (ORCPT ); Wed, 25 Jun 2014 17:43:15 -0400 Received: from mga02.intel.com ([134.134.136.20]:34504 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754995AbaFYVnO (ORCPT ); Wed, 25 Jun 2014 17:43:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,548,1400050800"; d="scan'208";a="563476439" Message-ID: <53AB4271.6040301@intel.com> Date: Wed, 25 Jun 2014 14:43:13 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski , "Ren, Qiaowei" CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/25/2014 02:04 PM, Andy Lutomirski wrote: > On Tue, Jun 24, 2014 at 6:40 PM, Ren, Qiaowei wrote: >> Hmm, _install_special_mapping should completely prevent merging, even among MPX vmas. >> >> So, could you tell me how to set MPX specific ->name to the vma when it is created? Seems like that I could not find such interface. > > You may need to add one. > > I'd suggest posting a new thread to linux-mm describing what you need > and asking how to do it. I shared this with Qiaowei privately, but might as well repeat myself here in case anyone wants to set me straight. Most of the interfaces do to set vm_ops do it in file_operations ->mmap op. Nobody sets ->vm_ops on anonymous VMAs, so we're in uncharted territory. My suggestion: you can either plumb a new API down in to mmap_region() to get the VMA or set ->vm_ops, or just call find_vma() after mmap_region() or get_unmapped_area() and set it manually. Just make sure you still have mmap_sem held over the whole thing. I think I prefer just setting ->vm_ops directly, even though it's a wee bit of a hack to create something just to look it up a moment later. Oh, well. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752048AbaFZWMc (ORCPT ); Thu, 26 Jun 2014 18:12:32 -0400 Received: from mga02.intel.com ([134.134.136.20]:23705 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750974AbaFZWMb (ORCPT ); Thu, 26 Jun 2014 18:12:31 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,556,1400050800"; d="scan'208";a="564132102" Message-ID: <53AB42E1.4090102@intel.com> Date: Wed, 25 Jun 2014 14:45:05 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski , "Ren, Qiaowei" CC: "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/25/2014 02:05 PM, Andy Lutomirski wrote: > Hmm. the memfd_create thing may be able to do this for you. If you > created a per-mm memfd and mapped it, it all just might work. memfd_create() seems to bring a fair amount of baggage along (the fd part :) if all we want is a marker. Really, all we need is _a_ bit, and some way to plumb to userspace the RSS values of VMAs with that bit set. Creating and mmap()'ing a fd seems a rather roundabout way to get there. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751749AbaFZWTt (ORCPT ); Thu, 26 Jun 2014 18:19:49 -0400 Received: from mail-la0-f49.google.com ([209.85.215.49]:46405 "EHLO mail-la0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750944AbaFZWTs (ORCPT ); Thu, 26 Jun 2014 18:19:48 -0400 MIME-Version: 1.0 In-Reply-To: <53AB42E1.4090102@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 15:19:26 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: > On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >> Hmm. the memfd_create thing may be able to do this for you. If you >> created a per-mm memfd and mapped it, it all just might work. > > memfd_create() seems to bring a fair amount of baggage along (the fd > part :) if all we want is a marker. Really, all we need is _a_ bit, and > some way to plumb to userspace the RSS values of VMAs with that bit set. > > Creating and mmap()'ing a fd seems a rather roundabout way to get there. Hmm. So does VM_MPX, though. If this stuff were done entirely in userspace, then memfd_create would be exactly the right solution, I think. Would it work to just scan the bound directory to figure out how many bound tables exist? --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752224AbaFZW7D (ORCPT ); Thu, 26 Jun 2014 18:59:03 -0400 Received: from mga02.intel.com ([134.134.136.20]:26145 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751899AbaFZW7B (ORCPT ); Thu, 26 Jun 2014 18:59:01 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,556,1400050800"; d="scan'208";a="564152257" Message-ID: <53ACA5B3.3010702@intel.com> Date: Thu, 26 Jun 2014 15:58:59 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/26/2014 03:19 PM, Andy Lutomirski wrote: > On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: >> On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >>> Hmm. the memfd_create thing may be able to do this for you. If you >>> created a per-mm memfd and mapped it, it all just might work. >> >> memfd_create() seems to bring a fair amount of baggage along (the fd >> part :) if all we want is a marker. Really, all we need is _a_ bit, and >> some way to plumb to userspace the RSS values of VMAs with that bit set. >> >> Creating and mmap()'ing a fd seems a rather roundabout way to get there. > > Hmm. So does VM_MPX, though. If this stuff were done entirely in > userspace, then memfd_create would be exactly the right solution, I > think. > > Would it work to just scan the bound directory to figure out how many > bound tables exist? Theoretically, perhaps. Practically, the bounds directory is 2GB, and it is likely to be very sparse. You would have to walk the page tables finding where pages were mapped, then search the mapped pages for bounds table entries. Assuming that it was aligned and minimally populated, that's a *MINIMUM* search looking for a PGD entry, then you have to look at 512 PUD entries. A full search would have to look at half a million ptes. That's just finding out how sparse the first level of the tables are before you've looked at a byte of actual data, and if they were empty. We could keep another, parallel, data structure that handles this better other than the hardware tables. Like, say, an rbtree that stores ranges of virtual addresses. We could call them vm_area_somethings ... wait a sec... we have a structure like that. ;) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752070AbaFZXPf (ORCPT ); Thu, 26 Jun 2014 19:15:35 -0400 Received: from mail-lb0-f169.google.com ([209.85.217.169]:38225 "EHLO mail-lb0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751329AbaFZXPe (ORCPT ); Thu, 26 Jun 2014 19:15:34 -0400 MIME-Version: 1.0 In-Reply-To: <53ACA5B3.3010702@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 16:15:12 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 26, 2014 at 3:58 PM, Dave Hansen wrote: > On 06/26/2014 03:19 PM, Andy Lutomirski wrote: >> On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: >>> On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >>>> Hmm. the memfd_create thing may be able to do this for you. If you >>>> created a per-mm memfd and mapped it, it all just might work. >>> >>> memfd_create() seems to bring a fair amount of baggage along (the fd >>> part :) if all we want is a marker. Really, all we need is _a_ bit, and >>> some way to plumb to userspace the RSS values of VMAs with that bit set. >>> >>> Creating and mmap()'ing a fd seems a rather roundabout way to get there. >> >> Hmm. So does VM_MPX, though. If this stuff were done entirely in >> userspace, then memfd_create would be exactly the right solution, I >> think. >> >> Would it work to just scan the bound directory to figure out how many >> bound tables exist? > > Theoretically, perhaps. > > Practically, the bounds directory is 2GB, and it is likely to be very > sparse. You would have to walk the page tables finding where pages were > mapped, then search the mapped pages for bounds table entries. > > Assuming that it was aligned and minimally populated, that's a *MINIMUM* > search looking for a PGD entry, then you have to look at 512 PUD > entries. A full search would have to look at half a million ptes. > That's just finding out how sparse the first level of the tables are > before you've looked at a byte of actual data, and if they were empty. > > We could keep another, parallel, data structure that handles this better > other than the hardware tables. Like, say, an rbtree that stores ranges > of virtual addresses. We could call them vm_area_somethings ... wait a > sec... we have a structure like that. ;) > > So here's my mental image of how I might do this if I were doing it entirely in userspace: I'd create a file or memfd for the bound tables and another for the bound directory. These files would be *huge*: the bound directory file would be 2GB and the bounds table file would be 2^48 bytes or whatever it is. (Maybe even bigger?) Then I'd just map pieces of those files wherever they'd need to be, and I'd make the mappings sparse. I suspect that you don't actually want a vma for each piece of bound table that gets mapped -- the space of vmas could end up incredibly sparse. So I'd at least map (in the vma sense, not the pte sense) and entire bound table at a time. And I'd probably just map the bound directory in one big piece. Then I'd populate it in the fault handler. This is almost what the code is doing, I think, modulo the files. This has one killer problem: these mappings need to be private (cowed on fork). So memfd is no good. There's got to be an easyish way to modify the mm code to allow anonymous maps with vm_ops. Maybe a new mmap_region parameter or something? Maybe even a special anon_vma, but I don't really understand how those work. Also, egads: what happens when a bound table entry is associated with a MAP_SHARED page? --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751559AbaF0AT5 (ORCPT ); Thu, 26 Jun 2014 20:19:57 -0400 Received: from mga09.intel.com ([134.134.136.24]:16232 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750699AbaF0AT4 (ORCPT ); Thu, 26 Jun 2014 20:19:56 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,556,1400050800"; d="scan'208";a="564197375" Message-ID: <53ACB8A7.9050002@intel.com> Date: Thu, 26 Jun 2014 17:19:51 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/26/2014 04:15 PM, Andy Lutomirski wrote: > So here's my mental image of how I might do this if I were doing it > entirely in userspace: I'd create a file or memfd for the bound tables > and another for the bound directory. These files would be *huge*: the > bound directory file would be 2GB and the bounds table file would be > 2^48 bytes or whatever it is. (Maybe even bigger?) > > Then I'd just map pieces of those files wherever they'd need to be, > and I'd make the mappings sparse. I suspect that you don't actually > want a vma for each piece of bound table that gets mapped -- the space > of vmas could end up incredibly sparse. So I'd at least map (in the > vma sense, not the pte sense) and entire bound table at a time. And > I'd probably just map the bound directory in one big piece. > > Then I'd populate it in the fault handler. > > This is almost what the code is doing, I think, modulo the files. > > This has one killer problem: these mappings need to be private (cowed > on fork). So memfd is no good. This essentially uses the page cache's radix tree as a parallel data structure in order to keep a vaddr->mpx_vma map. That's not a bad idea, but it is a parallel data structure that does not handle copy-on-write very well. I'm pretty sure we need the semantics that anonymous memory provides. > There's got to be an easyish way to > modify the mm code to allow anonymous maps with vm_ops. Maybe a new > mmap_region parameter or something? Maybe even a special anon_vma, > but I don't really understand how those work. Yeah, we very well might end up having to go down that path. > Also, egads: what happens when a bound table entry is associated with > a MAP_SHARED page? Bounds table entries are for pointers. Do we keep pointers inside of MAP_SHARED-mapped things? :) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751950AbaF0A1F (ORCPT ); Thu, 26 Jun 2014 20:27:05 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:50205 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751127AbaF0A1D (ORCPT ); Thu, 26 Jun 2014 20:27:03 -0400 MIME-Version: 1.0 In-Reply-To: <53ACB8A7.9050002@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 17:26:40 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 26, 2014 at 5:19 PM, Dave Hansen wrote: > On 06/26/2014 04:15 PM, Andy Lutomirski wrote: >> So here's my mental image of how I might do this if I were doing it >> entirely in userspace: I'd create a file or memfd for the bound tables >> and another for the bound directory. These files would be *huge*: the >> bound directory file would be 2GB and the bounds table file would be >> 2^48 bytes or whatever it is. (Maybe even bigger?) >> >> Then I'd just map pieces of those files wherever they'd need to be, >> and I'd make the mappings sparse. I suspect that you don't actually >> want a vma for each piece of bound table that gets mapped -- the space >> of vmas could end up incredibly sparse. So I'd at least map (in the >> vma sense, not the pte sense) and entire bound table at a time. And >> I'd probably just map the bound directory in one big piece. >> >> Then I'd populate it in the fault handler. >> >> This is almost what the code is doing, I think, modulo the files. >> >> This has one killer problem: these mappings need to be private (cowed >> on fork). So memfd is no good. > > This essentially uses the page cache's radix tree as a parallel data > structure in order to keep a vaddr->mpx_vma map. That's not a bad idea, > but it is a parallel data structure that does not handle copy-on-write > very well. > > I'm pretty sure we need the semantics that anonymous memory provides. > >> There's got to be an easyish way to >> modify the mm code to allow anonymous maps with vm_ops. Maybe a new >> mmap_region parameter or something? Maybe even a special anon_vma, >> but I don't really understand how those work. > > Yeah, we very well might end up having to go down that path. > >> Also, egads: what happens when a bound table entry is associated with >> a MAP_SHARED page? > > Bounds table entries are for pointers. Do we keep pointers inside of > MAP_SHARED-mapped things? :) Sure, if it's MAP_SHARED | MAP_ANONYMOUS. For example: struct thing { struct thing *next; }; struct thing *storage = mmap(..., MAP_SHARED | MAP_ANONYMOUS, ...); storage[0].next = &storage[1]; fork(); I'm not suggesting that this needs to *work* in the first incarnation of this :) --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751914AbaF0Rev (ORCPT ); Fri, 27 Jun 2014 13:34:51 -0400 Received: from mga09.intel.com ([134.134.136.24]:43369 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751675AbaF0Reu (ORCPT ); Fri, 27 Jun 2014 13:34:50 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,561,1400050800"; d="scan'208";a="564598054" Message-ID: <53ADAB39.6030403@intel.com> Date: Fri, 27 Jun 2014 10:34:49 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/26/2014 05:26 PM, Andy Lutomirski wrote: > On Thu, Jun 26, 2014 at 5:19 PM, Dave Hansen wrote: >> On 06/26/2014 04:15 PM, Andy Lutomirski wrote: >>> Also, egads: what happens when a bound table entry is associated with >>> a MAP_SHARED page? >> >> Bounds table entries are for pointers. Do we keep pointers inside of >> MAP_SHARED-mapped things? :) > > Sure, if it's MAP_SHARED | MAP_ANONYMOUS. For example: > > struct thing { > struct thing *next; > }; > > struct thing *storage = mmap(..., MAP_SHARED | MAP_ANONYMOUS, ...); > storage[0].next = &storage[1]; > fork(); > > I'm not suggesting that this needs to *work* in the first incarnation of this :) I'm not sure I'm seeing the issue. I'm claiming that we need COW behavior for the bounds tables, at least by default. If userspace knows enough about the ways that it is using the tables and knows how to share them, let it go to town. The kernel will permit this kind of usage model, but we simply won't be helping with the management of the tables when userspace creates them. You've demonstrated a case where userspace might theoretically might want to share bounds tables (although I think it's pretty dangerous). It's equally theoretically possible that userspace might *not* want to share the tables for instance if one process narrowed the bounds and the other did not. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752176AbaF0RnJ (ORCPT ); Fri, 27 Jun 2014 13:43:09 -0400 Received: from mga09.intel.com ([134.134.136.24]:3119 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751953AbaF0RnI (ORCPT ); Fri, 27 Jun 2014 13:43:08 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,561,1400050800"; d="scan'208";a="564601460" Message-ID: <53ADACF7.7080203@intel.com> Date: Fri, 27 Jun 2014 10:42:15 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> <53ADAB39.6030403@intel.com> In-Reply-To: <53ADAB39.6030403@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/27/2014 10:34 AM, Dave Hansen wrote: > I'm claiming that we need COW behavior for the bounds tables, at least > by default. If userspace knows enough about the ways that it is using > the tables and knows how to share them, let it go to town. The kernel > will permit this kind of usage model, but we simply won't be helping > with the management of the tables when userspace creates them. Actually, this is another reason we need to mark VMAs as being MPX-related explicitly instead of inferring it from the tables. If userspace does something really specialized like this, the kernel does not want to confuse these VMAs the ones it created. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752778AbaF0S5i (ORCPT ); Fri, 27 Jun 2014 14:57:38 -0400 Received: from mail-lb0-f177.google.com ([209.85.217.177]:63622 "EHLO mail-lb0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752527AbaF0S5g (ORCPT ); Fri, 27 Jun 2014 14:57:36 -0400 MIME-Version: 1.0 In-Reply-To: <53ADACF7.7080203@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> <53ACB8A7.9050002@intel.com> <53ADAB39.6030403@intel.com> <53ADACF7.7080203@intel.com> From: Andy Lutomirski Date: Fri, 27 Jun 2014 11:57:15 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 27, 2014 at 10:42 AM, Dave Hansen wrote: > On 06/27/2014 10:34 AM, Dave Hansen wrote: >> I'm claiming that we need COW behavior for the bounds tables, at least >> by default. If userspace knows enough about the ways that it is using >> the tables and knows how to share them, let it go to town. The kernel >> will permit this kind of usage model, but we simply won't be helping >> with the management of the tables when userspace creates them. > > Actually, this is another reason we need to mark VMAs as being > MPX-related explicitly instead of inferring it from the tables. If > userspace does something really specialized like this, the kernel does > not want to confuse these VMAs the ones it created. > Good point. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754728AbaGKQXL (ORCPT ); Fri, 11 Jul 2014 12:23:11 -0400 Received: from mga09.intel.com ([134.134.136.24]:31649 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750886AbaGKQXK (ORCPT ); Fri, 11 Jul 2014 12:23:10 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,644,1400050800"; d="scan'208";a="542098617" Message-ID: <53C00F69.7070705@intel.com> Date: Fri, 11 Jul 2014 09:23:05 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Qiaowei Ren , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar CC: x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 04/10] x86, mpx: hook #BR exception handler to allocate bound tables References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-5-git-send-email-qiaowei.ren@intel.com> In-Reply-To: <1403084656-27284-5-git-send-email-qiaowei.ren@intel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2014 02:44 AM, Qiaowei Ren wrote: > + bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES); > + if (IS_ERR((void *)bt_addr)) { > + pr_err("Bounds table allocation failed at entry addr %p\n", > + bd_entry); > + return bt_addr; > + } Don't forget to remove the unnecessary pr_*() calls for the next version.