linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/25] x86: Memory Protection Keys
@ 2015-09-28 19:18 Dave Hansen
  2015-09-28 19:18 ` [PATCH 20/25] mm, multi-arch: pass a protection key in to calc_vm_flag_bits() Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Dave Hansen @ 2015-09-28 19:18 UTC (permalink / raw)
  To: dave; +Cc: borntraeger, x86, linux-kernel, linux-mm, linux-api, linux-arch

I have addressed all known issues and review comments.  I believe
they are ready to be pulled in to the x86 tree.  Note that this
is also the first time anyone has seen the new 'selftests' code.
If there are issues limited to it, I'd prefer to fix those up
separately post-merge.

Changes from RFCv2 (Thanks Ingo and Thomas for most of these):

 * few minor compile warnings
 * changed 'nopku' interaction with cpuid bits.  Now, we do not
   clear the PKU cpuid bit, we just skip enabling it.
 * changed __pkru_allows_write() to also check access disable bit
 * removed the unused write_pkru()
 * made si_pkey a u64 and added some patch description details.
   Also made it share space in siginfo with MPX and clarified
   comments.
 * give some real text for the Processor Trace xsave state
 * made vma_pkey() less ugly (and much more optimized actually)
 * added SEGV_PKUERR to copy_siginfo_to_user()
 * remove page table walk when filling in si_pkey, added some
   big fat comments about it being inherently racy.
 * added self test code

MM reviewers, if you are going to look at one thing, please look
at patch 14 which adds a bunch of additional vma/pte permission
checks.

This code contains a new system call: mprotect_key(),  This needs
the usual amount of rigor around new interfaces.  Review there
would be much appreciated.

This code is not runnable to anyone outside of Intel unless they
have some special hardware or a fancy simulator.  If you are
interested in running this for real, please get in touch with me.
Hardware is available to a very small but nonzero number of
people.

This set is also available here (with the new syscall):

	git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-pkeys.git pkeys-v006

=== diffstat ===

(note that over half of this is kselftests)

 Documentation/kernel-parameters.txt           |    3 
 Documentation/x86/protection-keys.txt         |   54 +
 arch/powerpc/include/asm/mman.h               |    5 
 arch/powerpc/include/asm/mmu_context.h        |   11 
 arch/s390/include/asm/mmu_context.h           |   11 
 arch/unicore32/include/asm/mmu_context.h      |   11 
 arch/x86/Kconfig                              |   15 
 arch/x86/entry/syscalls/syscall_32.tbl        |    1 
 arch/x86/entry/syscalls/syscall_64.tbl        |    1 
 arch/x86/include/asm/cpufeature.h             |   54 +
 arch/x86/include/asm/disabled-features.h      |   12 
 arch/x86/include/asm/fpu/types.h              |   16 
 arch/x86/include/asm/fpu/xstate.h             |    4 
 arch/x86/include/asm/mmu_context.h            |   71 ++
 arch/x86/include/asm/pgtable.h                |   45 +
 arch/x86/include/asm/pgtable_types.h          |   34 -
 arch/x86/include/asm/required-features.h      |    4 
 arch/x86/include/asm/special_insns.h          |   32 +
 arch/x86/include/uapi/asm/mman.h              |   23 
 arch/x86/include/uapi/asm/processor-flags.h   |    2 
 arch/x86/kernel/cpu/common.c                  |   42 +
 arch/x86/kernel/fpu/xstate.c                  |    7 
 arch/x86/kernel/process_64.c                  |    2 
 arch/x86/kernel/setup.c                       |    9 
 arch/x86/mm/fault.c                           |  143 +++-
 arch/x86/mm/gup.c                             |   37 -
 drivers/char/agp/frontend.c                   |    2 
 drivers/staging/android/ashmem.c              |    9 
 fs/proc/task_mmu.c                            |    5 
 include/asm-generic/mm_hooks.h                |   11 
 include/linux/mm.h                            |   13 
 include/linux/mman.h                          |    6 
 include/uapi/asm-generic/siginfo.h            |   17 
 kernel/signal.c                               |    4 
 mm/Kconfig                                    |   11 
 mm/gup.c                                      |   28 
 mm/memory.c                                   |    4 
 mm/mmap.c                                     |    2 
 mm/mprotect.c                                 |   20 
 mm/nommu.c                                    |    2 
 tools/testing/selftests/x86/Makefile          |    3 
 tools/testing/selftests/x86/pkey-helpers.h    |  182 +++++
 tools/testing/selftests/x86/protection_keys.c |  828 ++++++++++++++++++++++++++
 43 files changed, 1705 insertions(+), 91 deletions(-)

Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 20/25] mm, multi-arch: pass a protection key in to calc_vm_flag_bits()
  2015-09-28 19:18 [PATCH 00/25] x86: Memory Protection Keys Dave Hansen
@ 2015-09-28 19:18 ` Dave Hansen
  2015-09-28 19:18 ` [PATCH 21/25] mm: implement new mprotect_key() system call Dave Hansen
       [not found] ` <20150928191817.035A64E2-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2 siblings, 0 replies; 6+ messages in thread
From: Dave Hansen @ 2015-09-28 19:18 UTC (permalink / raw)
  To: dave
  Cc: borntraeger, x86, linux-kernel, linux-mm, dave.hansen, linux-api,
	linux-arch


From: Dave Hansen <dave.hansen@linux.intel.com>

This plumbs a protection key through calc_vm_flag_bits().
We could of done this in calc_vm_prot_bits(), but I did not
feel super strongly which way to go.  It was pretty arbitrary
which one to use.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
---

 b/arch/powerpc/include/asm/mman.h  |    5 +++--
 b/drivers/char/agp/frontend.c      |    2 +-
 b/drivers/staging/android/ashmem.c |    9 +++++----
 b/include/linux/mman.h             |    6 +++---
 b/mm/mmap.c                        |    2 +-
 b/mm/mprotect.c                    |    2 +-
 b/mm/nommu.c                       |    2 +-
 7 files changed, 15 insertions(+), 13 deletions(-)

diff -puN arch/powerpc/include/asm/mman.h~pkeys-84-calc_vm_prot_bits arch/powerpc/include/asm/mman.h
--- a/arch/powerpc/include/asm/mman.h~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.962365460 -0700
+++ b/arch/powerpc/include/asm/mman.h	2015-09-28 11:39:49.976366097 -0700
@@ -18,11 +18,12 @@
  * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits()
  * here.  How important is the optimization?
  */
-static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot)
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
+		unsigned long pkey)
 {
 	return (prot & PROT_SAO) ? VM_SAO : 0;
 }
-#define arch_calc_vm_prot_bits(prot) arch_calc_vm_prot_bits(prot)
+#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
 
 static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 {
diff -puN drivers/char/agp/frontend.c~pkeys-84-calc_vm_prot_bits drivers/char/agp/frontend.c
--- a/drivers/char/agp/frontend.c~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.964365551 -0700
+++ b/drivers/char/agp/frontend.c	2015-09-28 11:39:49.977366142 -0700
@@ -156,7 +156,7 @@ static pgprot_t agp_convert_mmap_flags(i
 {
 	unsigned long prot_bits;
 
-	prot_bits = calc_vm_prot_bits(prot) | VM_SHARED;
+	prot_bits = calc_vm_prot_bits(prot, 0) | VM_SHARED;
 	return vm_get_page_prot(prot_bits);
 }
 
diff -puN drivers/staging/android/ashmem.c~pkeys-84-calc_vm_prot_bits drivers/staging/android/ashmem.c
--- a/drivers/staging/android/ashmem.c~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.966365642 -0700
+++ b/drivers/staging/android/ashmem.c	2015-09-28 11:39:49.977366142 -0700
@@ -351,7 +351,8 @@ out:
 	return ret;
 }
 
-static inline vm_flags_t calc_vm_may_flags(unsigned long prot)
+static inline vm_flags_t calc_vm_may_flags(unsigned long prot,
+		unsigned long pkey)
 {
 	return _calc_vm_trans(prot, PROT_READ,  VM_MAYREAD) |
 	       _calc_vm_trans(prot, PROT_WRITE, VM_MAYWRITE) |
@@ -372,12 +373,12 @@ static int ashmem_mmap(struct file *file
 	}
 
 	/* requested protection bits must match our allowed protection mask */
-	if (unlikely((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask)) &
-		     calc_vm_prot_bits(PROT_MASK))) {
+	if (unlikely((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
+		     calc_vm_prot_bits(PROT_MASK, 0))) {
 		ret = -EPERM;
 		goto out;
 	}
-	vma->vm_flags &= ~calc_vm_may_flags(~asma->prot_mask);
+	vma->vm_flags &= ~calc_vm_may_flags(~asma->prot_mask, 0);
 
 	if (!asma->file) {
 		char *name = ASHMEM_NAME_DEF;
diff -puN include/linux/mman.h~pkeys-84-calc_vm_prot_bits include/linux/mman.h
--- a/include/linux/mman.h~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.967365688 -0700
+++ b/include/linux/mman.h	2015-09-28 11:39:49.977366142 -0700
@@ -35,7 +35,7 @@ static inline void vm_unacct_memory(long
  */
 
 #ifndef arch_calc_vm_prot_bits
-#define arch_calc_vm_prot_bits(prot) 0
+#define arch_calc_vm_prot_bits(prot, pkey) 0
 #endif
 
 #ifndef arch_vm_get_page_prot
@@ -70,12 +70,12 @@ static inline int arch_validate_prot(uns
  * Combine the mmap "prot" argument into "vm_flags" used internally.
  */
 static inline unsigned long
-calc_vm_prot_bits(unsigned long prot)
+calc_vm_prot_bits(unsigned long prot, unsigned long pkey)
 {
 	return _calc_vm_trans(prot, PROT_READ,  VM_READ ) |
 	       _calc_vm_trans(prot, PROT_WRITE, VM_WRITE) |
 	       _calc_vm_trans(prot, PROT_EXEC,  VM_EXEC) |
-	       arch_calc_vm_prot_bits(prot);
+	       arch_calc_vm_prot_bits(prot, pkey);
 }
 
 /*
diff -puN mm/mmap.c~pkeys-84-calc_vm_prot_bits mm/mmap.c
--- a/mm/mmap.c~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.969365779 -0700
+++ b/mm/mmap.c	2015-09-28 11:39:49.978366188 -0700
@@ -1311,7 +1311,7 @@ unsigned long do_mmap(struct file *file,
 	 * to. we assume access permissions have been handled by the open
 	 * of the memory object, so we don't do any here.
 	 */
-	vm_flags |= calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
+	vm_flags |= calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags) |
 			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
 
 	if (flags & MAP_LOCKED)
diff -puN mm/mprotect.c~pkeys-84-calc_vm_prot_bits mm/mprotect.c
--- a/mm/mprotect.c~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.971365870 -0700
+++ b/mm/mprotect.c	2015-09-28 11:39:49.979366234 -0700
@@ -373,7 +373,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long,
 	if ((prot & PROT_READ) && (current->personality & READ_IMPLIES_EXEC))
 		prot |= PROT_EXEC;
 
-	vm_flags = calc_vm_prot_bits(prot);
+	vm_flags = calc_vm_prot_bits(prot, 0);
 
 	down_write(&current->mm->mmap_sem);
 
diff -puN mm/nommu.c~pkeys-84-calc_vm_prot_bits mm/nommu.c
--- a/mm/nommu.c~pkeys-84-calc_vm_prot_bits	2015-09-28 11:39:49.973365961 -0700
+++ b/mm/nommu.c	2015-09-28 11:39:49.980366279 -0700
@@ -1084,7 +1084,7 @@ static unsigned long determine_vm_flags(
 {
 	unsigned long vm_flags;
 
-	vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags);
+	vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags);
 	/* vm_flags |= mm->def_flags; */
 
 	if (!(capabilities & NOMMU_MAP_DIRECT)) {
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 22/25] x86: wire up mprotect_key() system call
       [not found] ` <20150928191817.035A64E2-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2015-09-28 19:18   ` Dave Hansen
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Hansen @ 2015-09-28 19:18 UTC (permalink / raw)
  To: dave-gkUM19QKKo4
  Cc: borntraeger-tA70FqPdS9bQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	dave.hansen-VuQAYsv1563Yd54FQh9/CA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


From: Dave Hansen <dave.hansen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

This is all that we need to get the new system call itself
working on x86.

Signed-off-by: Dave Hansen <dave.hansen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---

 b/arch/x86/entry/syscalls/syscall_32.tbl |    1 +
 b/arch/x86/entry/syscalls/syscall_64.tbl |    1 +
 b/arch/x86/include/uapi/asm/mman.h       |    7 +++++++
 b/mm/Kconfig                             |    1 +
 4 files changed, 10 insertions(+)

diff -puN arch/x86/entry/syscalls/syscall_32.tbl~pkeys-16-x86-mprotect_key arch/x86/entry/syscalls/syscall_32.tbl
--- a/arch/x86/entry/syscalls/syscall_32.tbl~pkeys-16-x86-mprotect_key	2015-09-28 11:39:50.964411042 -0700
+++ b/arch/x86/entry/syscalls/syscall_32.tbl	2015-09-28 11:39:50.972411406 -0700
@@ -382,3 +382,4 @@
 373	i386	shutdown		sys_shutdown
 374	i386	userfaultfd		sys_userfaultfd
 375	i386	membarrier		sys_membarrier
+376	i386	mprotect_key		sys_mprotect_key
diff -puN arch/x86/entry/syscalls/syscall_64.tbl~pkeys-16-x86-mprotect_key arch/x86/entry/syscalls/syscall_64.tbl
--- a/arch/x86/entry/syscalls/syscall_64.tbl~pkeys-16-x86-mprotect_key	2015-09-28 11:39:50.965411087 -0700
+++ b/arch/x86/entry/syscalls/syscall_64.tbl	2015-09-28 11:39:50.972411406 -0700
@@ -331,6 +331,7 @@
 322	64	execveat		stub_execveat
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
+325	common	mprotect_key		sys_mprotect_key
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff -puN arch/x86/include/uapi/asm/mman.h~pkeys-16-x86-mprotect_key arch/x86/include/uapi/asm/mman.h
--- a/arch/x86/include/uapi/asm/mman.h~pkeys-16-x86-mprotect_key	2015-09-28 11:39:50.967411179 -0700
+++ b/arch/x86/include/uapi/asm/mman.h	2015-09-28 11:39:50.973411451 -0700
@@ -20,6 +20,13 @@
 		((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) |	\
 		((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) |	\
 		((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
+
+#define arch_calc_vm_prot_bits(prot, key) ( 		\
+		((key) & 0x1 ? VM_PKEY_BIT0 : 0) |      \
+		((key) & 0x2 ? VM_PKEY_BIT1 : 0) |      \
+		((key) & 0x4 ? VM_PKEY_BIT2 : 0) |      \
+		((key) & 0x8 ? VM_PKEY_BIT3 : 0))
+
 #endif
 
 #include <asm-generic/mman.h>
diff -puN mm/Kconfig~pkeys-16-x86-mprotect_key mm/Kconfig
--- a/mm/Kconfig~pkeys-16-x86-mprotect_key	2015-09-28 11:39:50.969411269 -0700
+++ b/mm/Kconfig	2015-09-28 11:39:50.973411451 -0700
@@ -689,4 +689,5 @@ config NR_PROTECTION_KEYS
 	# Everything supports a _single_ key, so allow folks to
 	# at least call APIs that take keys, but require that the
 	# key be 0.
+	default 16 if X86_INTEL_MEMORY_PROTECTION_KEYS
 	default 1
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 21/25] mm: implement new mprotect_key() system call
  2015-09-28 19:18 [PATCH 00/25] x86: Memory Protection Keys Dave Hansen
  2015-09-28 19:18 ` [PATCH 20/25] mm, multi-arch: pass a protection key in to calc_vm_flag_bits() Dave Hansen
@ 2015-09-28 19:18 ` Dave Hansen
       [not found]   ` <20150928191826.F1CD5256-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
       [not found] ` <20150928191817.035A64E2-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2015-09-28 19:18 UTC (permalink / raw)
  To: dave; +Cc: borntraeger, x86, linux-kernel, linux-mm, dave.hansen, linux-api


From: Dave Hansen <dave.hansen@linux.intel.com>

mprotect_key() is just like mprotect, except it also takes a
protection key as an argument.  On systems that do not support
protection keys, it still works, but requires that key=0.
Otherwise it does exactly what mprotect does.

I expect it to get used like this, if you want to guarantee that
any mapping you create can *never* be accessed without the right
protection keys set up.

	pkey_deny_access(11); // random pkey
	int real_prot = PROT_READ|PROT_WRITE;
	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	ret = mprotect_key(ptr, PAGE_SIZE, real_prot, 11);

This way, there is *no* window where the mapping is accessible
since it was always either PROT_NONE or had a protection key set.

We settled on 'unsigned long' for the type of the key here.  We
only need 4 bits on x86 today, but I figured that other
architectures might need some more space.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-api@vger.kernel.org
---

 b/mm/Kconfig    |    7 +++++++
 b/mm/mprotect.c |   20 +++++++++++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff -puN mm/Kconfig~pkeys-85-mprotect_pkey mm/Kconfig
--- a/mm/Kconfig~pkeys-85-mprotect_pkey	2015-09-28 11:39:50.527391162 -0700
+++ b/mm/Kconfig	2015-09-28 11:39:50.532391390 -0700
@@ -683,3 +683,10 @@ config FRAME_VECTOR
 
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
+
+config NR_PROTECTION_KEYS
+	int
+	# Everything supports a _single_ key, so allow folks to
+	# at least call APIs that take keys, but require that the
+	# key be 0.
+	default 1
diff -puN mm/mprotect.c~pkeys-85-mprotect_pkey mm/mprotect.c
--- a/mm/mprotect.c~pkeys-85-mprotect_pkey	2015-09-28 11:39:50.529391253 -0700
+++ b/mm/mprotect.c	2015-09-28 11:39:50.532391390 -0700
@@ -344,8 +344,8 @@ fail:
 	return error;
 }
 
-SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
-		unsigned long, prot)
+static int do_mprotect_key(unsigned long start, size_t len,
+		unsigned long prot, unsigned long key)
 {
 	unsigned long vm_flags, nstart, end, tmp, reqprot;
 	struct vm_area_struct *vma, *prev;
@@ -365,6 +365,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long,
 		return -ENOMEM;
 	if (!arch_validate_prot(prot))
 		return -EINVAL;
+	if (key >= CONFIG_NR_PROTECTION_KEYS)
+		return -EINVAL;
 
 	reqprot = prot;
 	/*
@@ -373,7 +375,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long,
 	if ((prot & PROT_READ) && (current->personality & READ_IMPLIES_EXEC))
 		prot |= PROT_EXEC;
 
-	vm_flags = calc_vm_prot_bits(prot, 0);
+	vm_flags = calc_vm_prot_bits(prot, key);
 
 	down_write(&current->mm->mmap_sem);
 
@@ -443,3 +445,15 @@ out:
 	up_write(&current->mm->mmap_sem);
 	return error;
 }
+
+SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
+		unsigned long, prot)
+{
+	return do_mprotect_key(start, len, prot, 0);
+}
+
+SYSCALL_DEFINE4(mprotect_key, unsigned long, start, size_t, len,
+		unsigned long, prot, unsigned long, key)
+{
+	return do_mprotect_key(start, len, prot, key);
+}
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 21/25] mm: implement new mprotect_key() system call
       [not found]   ` <20150928191826.F1CD5256-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2015-09-29  6:39     ` Michael Ellerman
  2015-09-29 14:16       ` Dave Hansen
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2015-09-29  6:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: borntraeger-tA70FqPdS9bQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	dave.hansen-VuQAYsv1563Yd54FQh9/CA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon, 2015-09-28 at 12:18 -0700, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> 
> mprotect_key() is just like mprotect, except it also takes a
> protection key as an argument.  On systems that do not support
> protection keys, it still works, but requires that key=0.

I'm not sure how userspace is going to use the key=0 feature? ie. userspace
will still have to detect that keys are not supported and use key 0 everywhere.
At that point it could just as well skip the mprotect_key() syscalls entirely
couldn't it?

> I expect it to get used like this, if you want to guarantee that
> any mapping you create can *never* be accessed without the right
> protection keys set up.
> 
> 	pkey_deny_access(11); // random pkey
> 	int real_prot = PROT_READ|PROT_WRITE;
> 	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> 	ret = mprotect_key(ptr, PAGE_SIZE, real_prot, 11);
> 
> This way, there is *no* window where the mapping is accessible
> since it was always either PROT_NONE or had a protection key set.
> 
> We settled on 'unsigned long' for the type of the key here.  We
> only need 4 bits on x86 today, but I figured that other
> architectures might need some more space.

If the existing mprotect() syscall had a flags argument you could have just
used that. So is it worth just adding mprotect2() now and using it for this? ie:

int mprotect2(unsigned long start, size_t len, unsigned long prot, unsigned long flags) ..

And then you define bit zero of flags to say you're passing a pkey, and it's in
bits 1-63?

That way if other arches need to do something different you at least have the
flags available?

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 21/25] mm: implement new mprotect_key() system call
  2015-09-29  6:39     ` Michael Ellerman
@ 2015-09-29 14:16       ` Dave Hansen
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Hansen @ 2015-09-29 14:16 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: borntraeger, x86, linux-kernel, linux-mm, dave.hansen, linux-api

On 09/28/2015 11:39 PM, Michael Ellerman wrote:
> On Mon, 2015-09-28 at 12:18 -0700, Dave Hansen wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>
>> mprotect_key() is just like mprotect, except it also takes a
>> protection key as an argument.  On systems that do not support
>> protection keys, it still works, but requires that key=0.
> 
> I'm not sure how userspace is going to use the key=0 feature? ie. userspace
> will still have to detect that keys are not supported and use key 0 everywhere.
> At that point it could just as well skip the mprotect_key() syscalls entirely
> couldn't it?

Yep.

Or, a new architecture could just skip mprotect() itself entirely and
only wire up mprotect_pkey().  I don't see this pkey=0 thing as an
important feature or anything.  I just wanted to call out the behavior.

>> I expect it to get used like this, if you want to guarantee that
>> any mapping you create can *never* be accessed without the right
>> protection keys set up.
>>
>> 	pkey_deny_access(11); // random pkey
>> 	int real_prot = PROT_READ|PROT_WRITE;
>> 	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>> 	ret = mprotect_key(ptr, PAGE_SIZE, real_prot, 11);
>>
>> This way, there is *no* window where the mapping is accessible
>> since it was always either PROT_NONE or had a protection key set.
>>
>> We settled on 'unsigned long' for the type of the key here.  We
>> only need 4 bits on x86 today, but I figured that other
>> architectures might need some more space.
> 
> If the existing mprotect() syscall had a flags argument you could have just
> used that. So is it worth just adding mprotect2() now and using it for this? ie:
> 
> int mprotect2(unsigned long start, size_t len, unsigned long prot, unsigned long flags) ..
> 
> And then you define bit zero of flags to say you're passing a pkey, and it's in
> bits 1-63?
> 
> That way if other arches need to do something different you at least have the
> flags available?

But what problem does that solve?

mprotect() itself has plenty of space in prot.  Do any of the other
architectures need to pass in more than just an integer key to implement
storage/protection keys?

I'd much rather have a set of (relatively) arch-specific system calls
implementing protection keys rather than a single one with one
arch-specific argument.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-29 14:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-28 19:18 [PATCH 00/25] x86: Memory Protection Keys Dave Hansen
2015-09-28 19:18 ` [PATCH 20/25] mm, multi-arch: pass a protection key in to calc_vm_flag_bits() Dave Hansen
2015-09-28 19:18 ` [PATCH 21/25] mm: implement new mprotect_key() system call Dave Hansen
     [not found]   ` <20150928191826.F1CD5256-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2015-09-29  6:39     ` Michael Ellerman
2015-09-29 14:16       ` Dave Hansen
     [not found] ` <20150928191817.035A64E2-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2015-09-28 19:18   ` [PATCH 22/25] x86: wire up " Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).