[kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO)

kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed

* [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO)
       [not found] <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com>
@ 2016-09-02 11:39 ` Juerg Haefliger
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
                     ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

Changes from:
  v1 -> v2:
    - Moved the code from arch/x86/mm/ to mm/ since it's (mostly)
      arch-agnostic.
    - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO
      for x86.
    - Use page_ext for the additional per-page data.
    - Removed the clearing of pages. This can be accomplished by using
      PAGE_POISONING.
    - Split up the patch into multiple patches.
    - Fixed additional issues identified by reviewers.

This patch series adds support for XPFO which protects against 'ret2dir'
kernel attacks. The basic idea is to enforce exclusive ownership of page
frames by either the kernel or userspace, unless explicitly requested by
the kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping.
Specifically two flags to distinguish user vs. kernel pages and to tag
unmapped pages and a reference counter to balance kmap/kunmap operations
and a lock to serialize access to the XPFO fields.

Known issues/limitations:
  - Only supports x86-64 (for now)
  - Only supports 4k pages (for now)
  - There are most likely some legitimate uses cases where the kernel needs
    to access userspace which need to be made XPFO-aware
  - Performance penalty

Reference paper by the original patch authors:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Juerg Haefliger (3):
  Add support for eXclusive Page Frame Ownership (XPFO)
  xpfo: Only put previous userspace pages into the hot cache
  block: Always use a bounce buffer when XPFO is enabled

 arch/x86/Kconfig         |   3 +-
 arch/x86/mm/init.c       |   2 +-
 block/blk-map.c          |   2 +-
 include/linux/highmem.h  |  15 +++-
 include/linux/page_ext.h |   7 ++
 include/linux/xpfo.h     |  41 +++++++++
 lib/swiotlb.c            |   3 +-
 mm/Makefile              |   1 +
 mm/page_alloc.c          |  10 ++-
 mm/page_ext.c            |   4 +
 mm/xpfo.c                | 213 +++++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig         |  20 +++++
 12 files changed, 314 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
@ 2016-09-02 11:39   ` Juerg Haefliger
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

This patch adds support for XPFO which protects against 'ret2dir' kernel
attacks. The basic idea is to enforce exclusive ownership of page frames
by either the kernel or userspace, unless explicitly requested by the
kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping.
Specifically two flags to distinguish user vs. kernel pages and to tag
unmapped pages and a reference counter to balance kmap/kunmap operations
and a lock to serialize access to the XPFO fields.

Known issues/limitations:
  - Only supports x86-64 (for now)
  - Only supports 4k pages (for now)
  - There are most likely some legitimate uses cases where the kernel needs
    to access userspace which need to be made XPFO-aware
  - Performance penalty

Reference paper by the original patch authors:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 arch/x86/Kconfig         |   3 +-
 arch/x86/mm/init.c       |   2 +-
 include/linux/highmem.h  |  15 +++-
 include/linux/page_ext.h |   7 ++
 include/linux/xpfo.h     |  39 +++++++++
 lib/swiotlb.c            |   3 +-
 mm/Makefile              |   1 +
 mm/page_alloc.c          |   2 +
 mm/page_ext.c            |   4 +
 mm/xpfo.c                | 205 +++++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig         |  20 +++++
 11 files changed, 296 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c33562..dc5604a710c6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -165,6 +165,7 @@ config X86
 	select HAVE_STACK_VALIDATION		if X86_64
 	select ARCH_USES_HIGH_VMA_FLAGS		if X86_INTEL_MEMORY_PROTECTION_KEYS
 	select ARCH_HAS_PKEYS			if X86_INTEL_MEMORY_PROTECTION_KEYS
+	select ARCH_SUPPORTS_XPFO		if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
@@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT
 
 config X86_DIRECT_GBPAGES
 	def_bool y
-	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
+	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
 	---help---
 	  Certain kernel features effectively disable kernel
 	  linear 1 GB mappings (even if the CPU otherwise
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d28a2d741f9e..426427b54639 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,7 +161,7 @@ static int page_size_mask;
 
 static void __init probe_page_size_mask(void)
 {
-#if !defined(CONFIG_KMEMCHECK)
+#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
 	/*
 	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
 	 * use small pages.
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f3297062a..7a17c166532f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
+#include <linux/xpfo.h>
 
 #include <asm/cacheflush.h>
 
@@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
 #ifndef ARCH_HAS_KMAP
 static inline void *kmap(struct page *page)
 {
+	void *kaddr;
+
 	might_sleep();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 
 static inline void kunmap(struct page *page)
 {
+	xpfo_kunmap(page_address(page), page);
 }
 
 static inline void *kmap_atomic(struct page *page)
 {
+	void *kaddr;
+
 	preempt_disable();
 	pagefault_disable();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
 
 static inline void __kunmap_atomic(void *addr)
 {
+	xpfo_kunmap(addr, virt_to_page(addr));
 	pagefault_enable();
 	preempt_enable();
 }
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 03f2a3e7d76d..fdf63dcc399e 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -27,6 +27,8 @@ enum page_ext_flags {
 	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
 	PAGE_EXT_DEBUG_GUARD,
 	PAGE_EXT_OWNER,
+	PAGE_EXT_XPFO_KERNEL,		/* Page is a kernel page */
+	PAGE_EXT_XPFO_UNMAPPED,		/* Page is unmapped */
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
@@ -48,6 +50,11 @@ struct page_ext {
 	int last_migrate_reason;
 	depot_stack_handle_t handle;
 #endif
+#ifdef CONFIG_XPFO
+	int inited;		/* Map counter and lock initialized */
+	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
+	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
+#endif
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
new file mode 100644
index 000000000000..77187578ca33
--- /dev/null
+++ b/include/linux/xpfo.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _LINUX_XPFO_H
+#define _LINUX_XPFO_H
+
+#ifdef CONFIG_XPFO
+
+extern struct page_ext_operations page_xpfo_ops;
+
+extern void xpfo_kmap(void *kaddr, struct page *page);
+extern void xpfo_kunmap(void *kaddr, struct page *page);
+extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
+extern void xpfo_free_page(struct page *page, int order);
+
+extern bool xpfo_page_is_unmapped(struct page *page);
+
+#else /* !CONFIG_XPFO */
+
+static inline void xpfo_kmap(void *kaddr, struct page *page) { }
+static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
+static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
+static inline void xpfo_free_page(struct page *page, int order) { }
+
+static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
+
+#endif /* CONFIG_XPFO */
+
+#endif /* _LINUX_XPFO_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 22e13a0e19d7..455eff44604e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 {
 	unsigned long pfn = PFN_DOWN(orig_addr);
 	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	struct page *page = pfn_to_page(pfn);
 
-	if (PageHighMem(pfn_to_page(pfn))) {
+	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
 		/* The buffer does not have a mapping.  Map it in and copy */
 		unsigned int offset = orig_addr & ~PAGE_MASK;
 		char *buffer;
diff --git a/mm/Makefile b/mm/Makefile
index 2ca1faf3fa09..e6f8894423da 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_XPFO) += xpfo.o
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3fbe73a6fe4b..0241c8a7e72a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	kernel_poison_pages(page, 1 << order, 0);
 	kernel_map_pages(page, 1 << order, 0);
 	kasan_free_pages(page, order);
+	xpfo_free_page(page, order);
 
 	return true;
 }
@@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	kernel_map_pages(page, 1 << order, 1);
 	kernel_poison_pages(page, 1 << order, 1);
 	kasan_alloc_pages(page, order);
+	xpfo_alloc_page(page, order, gfp_flags);
 	set_page_owner(page, order, gfp_flags);
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 44a4c029c8e7..1cd7d7f460cc 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/xpfo.h>
 
 /*
  * struct page extension
@@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_XPFO
+	&page_xpfo_ops,
+#endif
 };
 
 static unsigned long total_usage;
diff --git a/mm/xpfo.c b/mm/xpfo.c
new file mode 100644
index 000000000000..ddb1be05485d
--- /dev/null
+++ b/mm/xpfo.c
@@ -0,0 +1,205 @@
+/*
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/page_ext.h>
+#include <linux/xpfo.h>
+
+#include <asm/tlbflush.h>
+
+DEFINE_STATIC_KEY_FALSE(xpfo_inited);
+
+static bool need_xpfo(void)
+{
+	return true;
+}
+
+static void init_xpfo(void)
+{
+	printk(KERN_INFO "XPFO enabled\n");
+	static_branch_enable(&xpfo_inited);
+}
+
+struct page_ext_operations page_xpfo_ops = {
+	.need = need_xpfo,
+	.init = init_xpfo,
+};
+
+/*
+ * Update a single kernel page table entry
+ */
+static inline void set_kpte(struct page *page, unsigned long kaddr,
+			    pgprot_t prot) {
+	unsigned int level;
+	pte_t *kpte = lookup_address(kaddr, &level);
+
+	/* We only support 4k pages for now */
+	BUG_ON(!kpte || level != PG_LEVEL_4K);
+
+	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+}
+
+void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
+{
+	int i, flush_tlb = 0;
+	struct page_ext *page_ext;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++)  {
+		page_ext = lookup_page_ext(page + i);
+
+		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
+
+		/* Initialize the map lock and map counter */
+		if (!page_ext->inited) {
+			spin_lock_init(&page_ext->maplock);
+			atomic_set(&page_ext->mapcount, 0);
+			page_ext->inited = 1;
+		}
+		BUG_ON(atomic_read(&page_ext->mapcount));
+
+		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
+			/*
+			 * Flush the TLB if the page was previously allocated
+			 * to the kernel.
+			 */
+			if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
+					       &page_ext->flags))
+				flush_tlb = 1;
+		} else {
+			/* Tag the page as a kernel page */
+			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
+		}
+	}
+
+	if (flush_tlb) {
+		kaddr = (unsigned long)page_address(page);
+		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
+				       PAGE_SIZE);
+	}
+}
+
+void xpfo_free_page(struct page *page, int order)
+{
+	int i;
+	struct page_ext *page_ext;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+
+		if (!page_ext->inited) {
+			/*
+			 * The page was allocated before page_ext was
+			 * initialized, so it is a kernel page and it needs to
+			 * be tagged accordingly.
+			 */
+			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
+			continue;
+		}
+
+		/*
+		 * Map the page back into the kernel if it was previously
+		 * allocated to user space.
+		 */
+		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
+				       &page_ext->flags)) {
+			kaddr = (unsigned long)page_address(page + i);
+			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
+		}
+	}
+}
+
+void xpfo_kmap(void *kaddr, struct page *page)
+{
+	struct page_ext *page_ext;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	page_ext = lookup_page_ext(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (!page_ext->inited ||
+	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
+		return;
+
+	spin_lock_irqsave(&page_ext->maplock, flags);
+
+	/*
+	 * The page was previously allocated to user space, so map it back
+	 * into the kernel. No TLB flush required.
+	 */
+	if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
+	    test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
+		set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
+
+	spin_unlock_irqrestore(&page_ext->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kmap);
+
+void xpfo_kunmap(void *kaddr, struct page *page)
+{
+	struct page_ext *page_ext;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	page_ext = lookup_page_ext(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (!page_ext->inited ||
+	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
+		return;
+
+	spin_lock_irqsave(&page_ext->maplock, flags);
+
+	/*
+	 * The page is to be allocated back to user space, so unmap it from the
+	 * kernel, flush the TLB and tag it as a user page.
+	 */
+	if (atomic_dec_return(&page_ext->mapcount) == 0) {
+		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
+		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
+		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
+		__flush_tlb_one((unsigned long)kaddr);
+	}
+
+	spin_unlock_irqrestore(&page_ext->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kunmap);
+
+inline bool xpfo_page_is_unmapped(struct page *page)
+{
+	if (!static_branch_unlikely(&xpfo_inited))
+		return false;
+
+	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
+}
diff --git a/security/Kconfig b/security/Kconfig
index da10d9b573a4..1eac37a9bec2 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -6,6 +6,26 @@ menu "Security options"
 
 source security/keys/Kconfig
 
+config ARCH_SUPPORTS_XPFO
+	bool
+
+config XPFO
+	bool "Enable eXclusive Page Frame Ownership (XPFO)"
+	default n
+	depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO
+	select DEBUG_TLBFLUSH
+	select PAGE_EXTENSION
+	help
+	  This option offers protection against 'ret2dir' kernel attacks.
+	  When enabled, every time a page frame is allocated to user space, it
+	  is unmapped from the direct mapped RAM region in kernel space
+	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
+	  mapped back to physmap.
+
+	  There is a slight performance impact when this option is enabled.
+
+	  If in doubt, say "N".
+
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"
 	default n
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
@ 2016-09-02 11:39   ` Juerg Haefliger
  2016-09-02 20:39     ` [kernel-hardening] " Dave Hansen
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
  3 siblings, 1 reply; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

Allocating a page to userspace that was previously allocated to the
kernel requires an expensive TLB shootdown. To minimize this, we only
put non-kernel pages into the hot cache to favor their allocation.

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 include/linux/xpfo.h | 2 ++
 mm/page_alloc.c      | 8 +++++++-
 mm/xpfo.c            | 8 ++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
index 77187578ca33..077d1cfadfa2 100644
--- a/include/linux/xpfo.h
+++ b/include/linux/xpfo.h
@@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
 extern void xpfo_free_page(struct page *page, int order);
 
 extern bool xpfo_page_is_unmapped(struct page *page);
+extern bool xpfo_page_is_kernel(struct page *page);
 
 #else /* !CONFIG_XPFO */
 
@@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
 static inline void xpfo_free_page(struct page *page, int order) { }
 
 static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
+static inline bool xpfo_page_is_kernel(struct page *page) { return false; }
 
 #endif /* CONFIG_XPFO */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0241c8a7e72a..83404b41e52d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold)
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	if (!cold)
+	/*
+	 * XPFO: Allocating a page to userspace that was previously allocated
+	 * to the kernel requires an expensive TLB shootdown. To minimize this,
+	 * we only put non-kernel pages into the hot cache to favor their
+	 * allocation.
+	 */
+	if (!cold && !xpfo_page_is_kernel(page))
 		list_add(&page->lru, &pcp->lists[migratetype]);
 	else
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
diff --git a/mm/xpfo.c b/mm/xpfo.c
index ddb1be05485d..f8dffda0c961 100644
--- a/mm/xpfo.c
+++ b/mm/xpfo.c
@@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page)
 
 	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
 }
+
+inline bool xpfo_page_is_kernel(struct page *page)
+{
+	if (!static_branch_unlikely(&xpfo_inited))
+		return false;
+
+	return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags);
+}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
@ 2016-09-02 20:39     ` Dave Hansen
  2016-09-05 11:54       ` Juerg Haefliger
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2016-09-02 20:39 UTC (permalink / raw)
  To: Juerg Haefliger, linux-kernel, linux-mm, kernel-hardening,
	linux-x86_64; +Cc: vpk

On 09/02/2016 04:39 AM, Juerg Haefliger wrote:
> Allocating a page to userspace that was previously allocated to the
> kernel requires an expensive TLB shootdown. To minimize this, we only
> put non-kernel pages into the hot cache to favor their allocation.

But kernel allocations do allocate from these pools, right?  Does this
just mean that kernel allocations usually have to pay the penalty to
convert a page?

So, what's the logic here?  You're assuming that order-0 kernel
allocations are more rare than allocations for userspace?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-02 20:39     ` [kernel-hardening] " Dave Hansen
@ 2016-09-05 11:54       ` Juerg Haefliger
  0 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-05 11:54 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, linux-mm, kernel-hardening,
	linux-x86_64; +Cc: vpk


[-- Attachment #1.1: Type: text/plain, Size: 989 bytes --]

On 09/02/2016 10:39 PM, Dave Hansen wrote:
> On 09/02/2016 04:39 AM, Juerg Haefliger wrote:
>> Allocating a page to userspace that was previously allocated to the
>> kernel requires an expensive TLB shootdown. To minimize this, we only
>> put non-kernel pages into the hot cache to favor their allocation.
> 
> But kernel allocations do allocate from these pools, right?

Yes.


> Does this
> just mean that kernel allocations usually have to pay the penalty to
> convert a page?

Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were
previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty.


> So, what's the logic here?  You're assuming that order-0 kernel
> allocations are more rare than allocations for userspace?

The logic is to put reclaimed kernel pages into the cold cache to postpone their allocation as long
as possible to minimize (potential) TLB flushes.

...Juerg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled
  2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
@ 2016-09-02 11:39   ` Juerg Haefliger
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
  3 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

This is a temporary hack to prevent the use of bio_map_user_iov()
which causes XPFO page faults.

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 block/blk-map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index b8657fa8dc9a..e889dbfee6fb 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq,
 	struct bio *bio, *orig_bio;
 	int ret;
 
-	if (copy)
+	if (copy || IS_ENABLED(CONFIG_XPFO))
 		bio = bio_copy_user_iov(q, map_data, iter, gfp_mask);
 	else
 		bio = bio_map_user_iov(q, iter, gfp_mask);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
                     ` (2 preceding siblings ...)
  2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
@ 2016-09-14  7:18   ` Juerg Haefliger
  2016-09-14  7:18     ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
                       ` (5 more replies)
  3 siblings, 6 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-14  7:18 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

Changes from:
  v1 -> v2:
    - Moved the code from arch/x86/mm/ to mm/ since it's (mostly)
      arch-agnostic.
    - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO
      for x86.
    - Use page_ext for the additional per-page data.
    - Removed the clearing of pages. This can be accomplished by using
      PAGE_POISONING.
    - Split up the patch into multiple patches.
    - Fixed additional issues identified by reviewers.

This patch series adds support for XPFO which protects against 'ret2dir'
kernel attacks. The basic idea is to enforce exclusive ownership of page
frames by either the kernel or userspace, unless explicitly requested by
the kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping.
Specifically two flags to distinguish user vs. kernel pages and to tag
unmapped pages and a reference counter to balance kmap/kunmap operations
and a lock to serialize access to the XPFO fields.

Known issues/limitations:
  - Only supports x86-64 (for now)
  - Only supports 4k pages (for now)
  - There are most likely some legitimate uses cases where the kernel needs
    to access userspace which need to be made XPFO-aware
  - Performance penalty

Reference paper by the original patch authors:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Juerg Haefliger (3):
  Add support for eXclusive Page Frame Ownership (XPFO)
  xpfo: Only put previous userspace pages into the hot cache
  block: Always use a bounce buffer when XPFO is enabled

 arch/x86/Kconfig         |   3 +-
 arch/x86/mm/init.c       |   2 +-
 block/blk-map.c          |   2 +-
 include/linux/highmem.h  |  15 +++-
 include/linux/page_ext.h |   7 ++
 include/linux/xpfo.h     |  41 +++++++++
 lib/swiotlb.c            |   3 +-
 mm/Makefile              |   1 +
 mm/page_alloc.c          |  10 ++-
 mm/page_ext.c            |   4 +
 mm/xpfo.c                | 213 +++++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig         |  20 +++++
 12 files changed, 314 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
@ 2016-09-14  7:18     ` Juerg Haefliger
  2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-14  7:18 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

This patch adds support for XPFO which protects against 'ret2dir' kernel
attacks. The basic idea is to enforce exclusive ownership of page frames
by either the kernel or userspace, unless explicitly requested by the
kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping.
Specifically two flags to distinguish user vs. kernel pages and to tag
unmapped pages and a reference counter to balance kmap/kunmap operations
and a lock to serialize access to the XPFO fields.

Known issues/limitations:
  - Only supports x86-64 (for now)
  - Only supports 4k pages (for now)
  - There are most likely some legitimate uses cases where the kernel needs
    to access userspace which need to be made XPFO-aware
  - Performance penalty

Reference paper by the original patch authors:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 arch/x86/Kconfig         |   3 +-
 arch/x86/mm/init.c       |   2 +-
 include/linux/highmem.h  |  15 +++-
 include/linux/page_ext.h |   7 ++
 include/linux/xpfo.h     |  39 +++++++++
 lib/swiotlb.c            |   3 +-
 mm/Makefile              |   1 +
 mm/page_alloc.c          |   2 +
 mm/page_ext.c            |   4 +
 mm/xpfo.c                | 205 +++++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig         |  20 +++++
 11 files changed, 296 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c33562..dc5604a710c6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -165,6 +165,7 @@ config X86
 	select HAVE_STACK_VALIDATION		if X86_64
 	select ARCH_USES_HIGH_VMA_FLAGS		if X86_INTEL_MEMORY_PROTECTION_KEYS
 	select ARCH_HAS_PKEYS			if X86_INTEL_MEMORY_PROTECTION_KEYS
+	select ARCH_SUPPORTS_XPFO		if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
@@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT
 
 config X86_DIRECT_GBPAGES
 	def_bool y
-	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
+	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
 	---help---
 	  Certain kernel features effectively disable kernel
 	  linear 1 GB mappings (even if the CPU otherwise
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d28a2d741f9e..426427b54639 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,7 +161,7 @@ static int page_size_mask;
 
 static void __init probe_page_size_mask(void)
 {
-#if !defined(CONFIG_KMEMCHECK)
+#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
 	/*
 	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
 	 * use small pages.
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f3297062a..7a17c166532f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
+#include <linux/xpfo.h>
 
 #include <asm/cacheflush.h>
 
@@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
 #ifndef ARCH_HAS_KMAP
 static inline void *kmap(struct page *page)
 {
+	void *kaddr;
+
 	might_sleep();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 
 static inline void kunmap(struct page *page)
 {
+	xpfo_kunmap(page_address(page), page);
 }
 
 static inline void *kmap_atomic(struct page *page)
 {
+	void *kaddr;
+
 	preempt_disable();
 	pagefault_disable();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
 
 static inline void __kunmap_atomic(void *addr)
 {
+	xpfo_kunmap(addr, virt_to_page(addr));
 	pagefault_enable();
 	preempt_enable();
 }
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 03f2a3e7d76d..fdf63dcc399e 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -27,6 +27,8 @@ enum page_ext_flags {
 	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
 	PAGE_EXT_DEBUG_GUARD,
 	PAGE_EXT_OWNER,
+	PAGE_EXT_XPFO_KERNEL,		/* Page is a kernel page */
+	PAGE_EXT_XPFO_UNMAPPED,		/* Page is unmapped */
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
@@ -48,6 +50,11 @@ struct page_ext {
 	int last_migrate_reason;
 	depot_stack_handle_t handle;
 #endif
+#ifdef CONFIG_XPFO
+	int inited;		/* Map counter and lock initialized */
+	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
+	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
+#endif
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
new file mode 100644
index 000000000000..77187578ca33
--- /dev/null
+++ b/include/linux/xpfo.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _LINUX_XPFO_H
+#define _LINUX_XPFO_H
+
+#ifdef CONFIG_XPFO
+
+extern struct page_ext_operations page_xpfo_ops;
+
+extern void xpfo_kmap(void *kaddr, struct page *page);
+extern void xpfo_kunmap(void *kaddr, struct page *page);
+extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
+extern void xpfo_free_page(struct page *page, int order);
+
+extern bool xpfo_page_is_unmapped(struct page *page);
+
+#else /* !CONFIG_XPFO */
+
+static inline void xpfo_kmap(void *kaddr, struct page *page) { }
+static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
+static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
+static inline void xpfo_free_page(struct page *page, int order) { }
+
+static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
+
+#endif /* CONFIG_XPFO */
+
+#endif /* _LINUX_XPFO_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 22e13a0e19d7..455eff44604e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 {
 	unsigned long pfn = PFN_DOWN(orig_addr);
 	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	struct page *page = pfn_to_page(pfn);
 
-	if (PageHighMem(pfn_to_page(pfn))) {
+	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
 		/* The buffer does not have a mapping.  Map it in and copy */
 		unsigned int offset = orig_addr & ~PAGE_MASK;
 		char *buffer;
diff --git a/mm/Makefile b/mm/Makefile
index 2ca1faf3fa09..e6f8894423da 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_XPFO) += xpfo.o
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3fbe73a6fe4b..0241c8a7e72a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	kernel_poison_pages(page, 1 << order, 0);
 	kernel_map_pages(page, 1 << order, 0);
 	kasan_free_pages(page, order);
+	xpfo_free_page(page, order);
 
 	return true;
 }
@@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	kernel_map_pages(page, 1 << order, 1);
 	kernel_poison_pages(page, 1 << order, 1);
 	kasan_alloc_pages(page, order);
+	xpfo_alloc_page(page, order, gfp_flags);
 	set_page_owner(page, order, gfp_flags);
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 44a4c029c8e7..1cd7d7f460cc 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/xpfo.h>
 
 /*
  * struct page extension
@@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_XPFO
+	&page_xpfo_ops,
+#endif
 };
 
 static unsigned long total_usage;
diff --git a/mm/xpfo.c b/mm/xpfo.c
new file mode 100644
index 000000000000..ddb1be05485d
--- /dev/null
+++ b/mm/xpfo.c
@@ -0,0 +1,205 @@
+/*
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/page_ext.h>
+#include <linux/xpfo.h>
+
+#include <asm/tlbflush.h>
+
+DEFINE_STATIC_KEY_FALSE(xpfo_inited);
+
+static bool need_xpfo(void)
+{
+	return true;
+}
+
+static void init_xpfo(void)
+{
+	printk(KERN_INFO "XPFO enabled\n");
+	static_branch_enable(&xpfo_inited);
+}
+
+struct page_ext_operations page_xpfo_ops = {
+	.need = need_xpfo,
+	.init = init_xpfo,
+};
+
+/*
+ * Update a single kernel page table entry
+ */
+static inline void set_kpte(struct page *page, unsigned long kaddr,
+			    pgprot_t prot) {
+	unsigned int level;
+	pte_t *kpte = lookup_address(kaddr, &level);
+
+	/* We only support 4k pages for now */
+	BUG_ON(!kpte || level != PG_LEVEL_4K);
+
+	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+}
+
+void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
+{
+	int i, flush_tlb = 0;
+	struct page_ext *page_ext;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++)  {
+		page_ext = lookup_page_ext(page + i);
+
+		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
+
+		/* Initialize the map lock and map counter */
+		if (!page_ext->inited) {
+			spin_lock_init(&page_ext->maplock);
+			atomic_set(&page_ext->mapcount, 0);
+			page_ext->inited = 1;
+		}
+		BUG_ON(atomic_read(&page_ext->mapcount));
+
+		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
+			/*
+			 * Flush the TLB if the page was previously allocated
+			 * to the kernel.
+			 */
+			if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
+					       &page_ext->flags))
+				flush_tlb = 1;
+		} else {
+			/* Tag the page as a kernel page */
+			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
+		}
+	}
+
+	if (flush_tlb) {
+		kaddr = (unsigned long)page_address(page);
+		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
+				       PAGE_SIZE);
+	}
+}
+
+void xpfo_free_page(struct page *page, int order)
+{
+	int i;
+	struct page_ext *page_ext;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+
+		if (!page_ext->inited) {
+			/*
+			 * The page was allocated before page_ext was
+			 * initialized, so it is a kernel page and it needs to
+			 * be tagged accordingly.
+			 */
+			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
+			continue;
+		}
+
+		/*
+		 * Map the page back into the kernel if it was previously
+		 * allocated to user space.
+		 */
+		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
+				       &page_ext->flags)) {
+			kaddr = (unsigned long)page_address(page + i);
+			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
+		}
+	}
+}
+
+void xpfo_kmap(void *kaddr, struct page *page)
+{
+	struct page_ext *page_ext;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	page_ext = lookup_page_ext(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (!page_ext->inited ||
+	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
+		return;
+
+	spin_lock_irqsave(&page_ext->maplock, flags);
+
+	/*
+	 * The page was previously allocated to user space, so map it back
+	 * into the kernel. No TLB flush required.
+	 */
+	if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
+	    test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
+		set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
+
+	spin_unlock_irqrestore(&page_ext->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kmap);
+
+void xpfo_kunmap(void *kaddr, struct page *page)
+{
+	struct page_ext *page_ext;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	page_ext = lookup_page_ext(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (!page_ext->inited ||
+	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
+		return;
+
+	spin_lock_irqsave(&page_ext->maplock, flags);
+
+	/*
+	 * The page is to be allocated back to user space, so unmap it from the
+	 * kernel, flush the TLB and tag it as a user page.
+	 */
+	if (atomic_dec_return(&page_ext->mapcount) == 0) {
+		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
+		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
+		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
+		__flush_tlb_one((unsigned long)kaddr);
+	}
+
+	spin_unlock_irqrestore(&page_ext->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kunmap);
+
+inline bool xpfo_page_is_unmapped(struct page *page)
+{
+	if (!static_branch_unlikely(&xpfo_inited))
+		return false;
+
+	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
+}
diff --git a/security/Kconfig b/security/Kconfig
index da10d9b573a4..1eac37a9bec2 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -6,6 +6,26 @@ menu "Security options"
 
 source security/keys/Kconfig
 
+config ARCH_SUPPORTS_XPFO
+	bool
+
+config XPFO
+	bool "Enable eXclusive Page Frame Ownership (XPFO)"
+	default n
+	depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO
+	select DEBUG_TLBFLUSH
+	select PAGE_EXTENSION
+	help
+	  This option offers protection against 'ret2dir' kernel attacks.
+	  When enabled, every time a page frame is allocated to user space, it
+	  is unmapped from the direct mapped RAM region in kernel space
+	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
+	  mapped back to physmap.
+
+	  There is a slight performance impact when this option is enabled.
+
+	  If in doubt, say "N".
+
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"
 	default n
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
  2016-09-14  7:18     ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
@ 2016-09-14  7:19     ` Juerg Haefliger
  2016-09-14 14:33       ` Dave Hansen
  2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-14  7:19 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

Allocating a page to userspace that was previously allocated to the
kernel requires an expensive TLB shootdown. To minimize this, we only
put non-kernel pages into the hot cache to favor their allocation.

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 include/linux/xpfo.h | 2 ++
 mm/page_alloc.c      | 8 +++++++-
 mm/xpfo.c            | 8 ++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
index 77187578ca33..077d1cfadfa2 100644
--- a/include/linux/xpfo.h
+++ b/include/linux/xpfo.h
@@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
 extern void xpfo_free_page(struct page *page, int order);
 
 extern bool xpfo_page_is_unmapped(struct page *page);
+extern bool xpfo_page_is_kernel(struct page *page);
 
 #else /* !CONFIG_XPFO */
 
@@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
 static inline void xpfo_free_page(struct page *page, int order) { }
 
 static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
+static inline bool xpfo_page_is_kernel(struct page *page) { return false; }
 
 #endif /* CONFIG_XPFO */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0241c8a7e72a..83404b41e52d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold)
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	if (!cold)
+	/*
+	 * XPFO: Allocating a page to userspace that was previously allocated
+	 * to the kernel requires an expensive TLB shootdown. To minimize this,
+	 * we only put non-kernel pages into the hot cache to favor their
+	 * allocation.
+	 */
+	if (!cold && !xpfo_page_is_kernel(page))
 		list_add(&page->lru, &pcp->lists[migratetype]);
 	else
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
diff --git a/mm/xpfo.c b/mm/xpfo.c
index ddb1be05485d..f8dffda0c961 100644
--- a/mm/xpfo.c
+++ b/mm/xpfo.c
@@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page)
 
 	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
 }
+
+inline bool xpfo_page_is_kernel(struct page *page)
+{
+	if (!static_branch_unlikely(&xpfo_inited))
+		return false;
+
+	return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags);
+}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
@ 2016-09-14 14:33       ` Dave Hansen
  2016-09-14 14:40         ` Juerg Haefliger
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2016-09-14 14:33 UTC (permalink / raw)
  To: kernel-hardening, linux-kernel, linux-mm, linux-x86_64
  Cc: juerg.haefliger, vpk

On 09/14/2016 12:19 AM, Juerg Haefliger wrote:
> Allocating a page to userspace that was previously allocated to the
> kernel requires an expensive TLB shootdown. To minimize this, we only
> put non-kernel pages into the hot cache to favor their allocation.

Hi, I had some questions about this the last time you posted it.  Maybe
you want to address them now.

--

But kernel allocations do allocate from these pools, right?  Does this
just mean that kernel allocations usually have to pay the penalty to
convert a page?

So, what's the logic here?  You're assuming that order-0 kernel
allocations are more rare than allocations for userspace?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-14 14:33       ` Dave Hansen
@ 2016-09-14 14:40         ` Juerg Haefliger
  2016-09-14 14:48           ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-14 14:40 UTC (permalink / raw)
  To: Dave Hansen, kernel-hardening, linux-kernel, linux-mm,
	linux-x86_64; +Cc: vpk


[-- Attachment #1.1: Type: text/plain, Size: 817 bytes --]

Hi Dave,

On 09/14/2016 04:33 PM, Dave Hansen wrote:
> On 09/14/2016 12:19 AM, Juerg Haefliger wrote:
>> Allocating a page to userspace that was previously allocated to the
>> kernel requires an expensive TLB shootdown. To minimize this, we only
>> put non-kernel pages into the hot cache to favor their allocation.
> 
> Hi, I had some questions about this the last time you posted it.  Maybe
> you want to address them now.

I did reply: https://lkml.org/lkml/2016/9/5/249

...Juerg


> --
> 
> But kernel allocations do allocate from these pools, right?  Does this
> just mean that kernel allocations usually have to pay the penalty to
> convert a page?
> 
> So, what's the logic here?  You're assuming that order-0 kernel
> allocations are more rare than allocations for userspace?
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-14 14:40         ` Juerg Haefliger
@ 2016-09-14 14:48           ` Dave Hansen
  2016-09-21  5:32             ` Juerg Haefliger
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2016-09-14 14:48 UTC (permalink / raw)
  To: Juerg Haefliger, kernel-hardening, linux-kernel, linux-mm,
	linux-x86_64; +Cc: vpk

> On 09/02/2016 10:39 PM, Dave Hansen wrote:
>> On 09/02/2016 04:39 AM, Juerg Haefliger wrote:
>> Does this
>> just mean that kernel allocations usually have to pay the penalty to
>> convert a page?
> 
> Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were
> previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty.
> 
>> So, what's the logic here?  You're assuming that order-0 kernel
>> allocations are more rare than allocations for userspace?
> 
> The logic is to put reclaimed kernel pages into the cold cache to
> postpone their allocation as long as possible to minimize (potential)
> TLB flushes.

OK, but if we put them in the cold area but kernel allocations pull them
from the hot cache, aren't we virtually guaranteeing that kernel
allocations will have to to TLB shootdown to convert a page?

It seems like you also need to convert all kernel allocations to pull
from the cold area.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache
  2016-09-14 14:48           ` Dave Hansen
@ 2016-09-21  5:32             ` Juerg Haefliger
  0 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-21  5:32 UTC (permalink / raw)
  To: Dave Hansen, kernel-hardening, linux-kernel, linux-mm,
	linux-x86_64; +Cc: vpk


[-- Attachment #1.1: Type: text/plain, Size: 1411 bytes --]

On 09/14/2016 04:48 PM, Dave Hansen wrote:
>> On 09/02/2016 10:39 PM, Dave Hansen wrote:
>>> On 09/02/2016 04:39 AM, Juerg Haefliger wrote:
>>> Does this
>>> just mean that kernel allocations usually have to pay the penalty to
>>> convert a page?
>>
>> Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were
>> previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty.
>>
>>> So, what's the logic here?  You're assuming that order-0 kernel
>>> allocations are more rare than allocations for userspace?
>>
>> The logic is to put reclaimed kernel pages into the cold cache to
>> postpone their allocation as long as possible to minimize (potential)
>> TLB flushes.
> 
> OK, but if we put them in the cold area but kernel allocations pull them
> from the hot cache, aren't we virtually guaranteeing that kernel
> allocations will have to to TLB shootdown to convert a page?

No. Allocations for the kernel never require a TLB shootdown. Only allocations for userspace (and
only if the page was previously a kernel page).


> It seems like you also need to convert all kernel allocations to pull
> from the cold area.

Kernel allocations can continue to pull from the hot cache. Maybe introduce another cache for the
userspace pages? But I'm not sure what other implications this might have.

...Juerg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
  2016-09-14  7:18     ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
  2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
@ 2016-09-14  7:19     ` Juerg Haefliger
  2016-09-14  7:33       ` [kernel-hardening] " Christoph Hellwig
  2016-09-14  7:23     ` [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-14  7:19 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: juerg.haefliger, vpk

This is a temporary hack to prevent the use of bio_map_user_iov()
which causes XPFO page faults.

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 block/blk-map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index b8657fa8dc9a..e889dbfee6fb 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq,
 	struct bio *bio, *orig_bio;
 	int ret;
 
-	if (copy)
+	if (copy || IS_ENABLED(CONFIG_XPFO))
 		bio = bio_copy_user_iov(q, map_data, iter, gfp_mask);
 	else
 		bio = bio_map_user_iov(q, iter, gfp_mask);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled
  2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
@ 2016-09-14  7:33       ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2016-09-14  7:33 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk

On Wed, Sep 14, 2016 at 09:19:01AM +0200, Juerg Haefliger wrote:
> This is a temporary hack to prevent the use of bio_map_user_iov()
> which causes XPFO page faults.
> 
> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>

Sorry, but if your scheme doesn't support get_user_pages access to
user memory is't a steaming pile of crap and entirely unacceptable.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
                       ` (2 preceding siblings ...)
  2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
@ 2016-09-14  7:23     ` Juerg Haefliger
  2016-09-14  9:36     ` [kernel-hardening] " Mark Rutland
  2016-11-04 14:45     ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger
  5 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-09-14  7:23 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64; +Cc: vpk


[-- Attachment #1.1: Type: text/plain, Size: 2720 bytes --]

Resending to include the kernel-hardening list. Sorry, I wasn't subscribed with the correct email
address when I sent this the first time.

...Juerg

On 09/14/2016 09:18 AM, Juerg Haefliger wrote:
> Changes from:
>   v1 -> v2:
>     - Moved the code from arch/x86/mm/ to mm/ since it's (mostly)
>       arch-agnostic.
>     - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO
>       for x86.
>     - Use page_ext for the additional per-page data.
>     - Removed the clearing of pages. This can be accomplished by using
>       PAGE_POISONING.
>     - Split up the patch into multiple patches.
>     - Fixed additional issues identified by reviewers.
> 
> This patch series adds support for XPFO which protects against 'ret2dir'
> kernel attacks. The basic idea is to enforce exclusive ownership of page
> frames by either the kernel or userspace, unless explicitly requested by
> the kernel. Whenever a page destined for userspace is allocated, it is
> unmapped from physmap (the kernel's page table). When such a page is
> reclaimed from userspace, it is mapped back to physmap.
> 
> Additional fields in the page_ext struct are used for XPFO housekeeping.
> Specifically two flags to distinguish user vs. kernel pages and to tag
> unmapped pages and a reference counter to balance kmap/kunmap operations
> and a lock to serialize access to the XPFO fields.
> 
> Known issues/limitations:
>   - Only supports x86-64 (for now)
>   - Only supports 4k pages (for now)
>   - There are most likely some legitimate uses cases where the kernel needs
>     to access userspace which need to be made XPFO-aware
>   - Performance penalty
> 
> Reference paper by the original patch authors:
>   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
> 
> Juerg Haefliger (3):
>   Add support for eXclusive Page Frame Ownership (XPFO)
>   xpfo: Only put previous userspace pages into the hot cache
>   block: Always use a bounce buffer when XPFO is enabled
> 
>  arch/x86/Kconfig         |   3 +-
>  arch/x86/mm/init.c       |   2 +-
>  block/blk-map.c          |   2 +-
>  include/linux/highmem.h  |  15 +++-
>  include/linux/page_ext.h |   7 ++
>  include/linux/xpfo.h     |  41 +++++++++
>  lib/swiotlb.c            |   3 +-
>  mm/Makefile              |   1 +
>  mm/page_alloc.c          |  10 ++-
>  mm/page_ext.c            |   4 +
>  mm/xpfo.c                | 213 +++++++++++++++++++++++++++++++++++++++++++++++
>  security/Kconfig         |  20 +++++
>  12 files changed, 314 insertions(+), 7 deletions(-)
>  create mode 100644 include/linux/xpfo.h
>  create mode 100644 mm/xpfo.c
> 


-- 
Juerg Haefliger
Hewlett Packard Enterprise


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
                       ` (3 preceding siblings ...)
  2016-09-14  7:23     ` [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
@ 2016-09-14  9:36     ` Mark Rutland
  2016-09-14  9:49       ` Mark Rutland
  2016-11-04 14:45     ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger
  5 siblings, 1 reply; 30+ messages in thread
From: Mark Rutland @ 2016-09-14  9:36 UTC (permalink / raw)
  To: kernel-hardening
  Cc: linux-kernel, linux-mm, linux-x86_64, juerg.haefliger, vpk

Hi,

On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote:

> This patch series adds support for XPFO which protects against 'ret2dir'
> kernel attacks. The basic idea is to enforce exclusive ownership of page
> frames by either the kernel or userspace, unless explicitly requested by
> the kernel. Whenever a page destined for userspace is allocated, it is
> unmapped from physmap (the kernel's page table). When such a page is
> reclaimed from userspace, it is mapped back to physmap.

> Known issues/limitations:
>   - Only supports x86-64 (for now)
>   - Only supports 4k pages (for now)
>   - There are most likely some legitimate uses cases where the kernel needs
>     to access userspace which need to be made XPFO-aware
>   - Performance penalty
> 
> Reference paper by the original patch authors:
>   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Just to check, doesn't DEBUG_RODATA ensure that the linear mapping is
non-executable on x86_64 (as it does for arm64)?

For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so).
Assuming that implies a lack of execute permission for x86_64, that
should provide a similar level of protection against erroneously
branching to addresses in the linear map, without the complexity and
overhead of mapping/unmapping pages.

So to me it looks like this approach may only be useful for
architectures without page-granular execute permission controls.

Is this also intended to protect against erroneous *data* accesses to
the linear map?

Am I missing something?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-14  9:36     ` [kernel-hardening] " Mark Rutland
@ 2016-09-14  9:49       ` Mark Rutland
  0 siblings, 0 replies; 30+ messages in thread
From: Mark Rutland @ 2016-09-14  9:49 UTC (permalink / raw)
  To: kernel-hardening
  Cc: linux-kernel, linux-mm, linux-x86_64, juerg.haefliger, vpk

On Wed, Sep 14, 2016 at 10:36:34AM +0100, Mark Rutland wrote:
> On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote:
> > This patch series adds support for XPFO which protects against 'ret2dir'
> > kernel attacks. The basic idea is to enforce exclusive ownership of page
> > frames by either the kernel or userspace, unless explicitly requested by
> > the kernel. Whenever a page destined for userspace is allocated, it is
> > unmapped from physmap (the kernel's page table). When such a page is
> > reclaimed from userspace, it is mapped back to physmap.

> > Reference paper by the original patch authors:
> >   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

> For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so).
> Assuming that implies a lack of execute permission for x86_64, that
> should provide a similar level of protection against erroneously
> branching to addresses in the linear map, without the complexity and
> overhead of mapping/unmapping pages.
> 
> So to me it looks like this approach may only be useful for
> architectures without page-granular execute permission controls.
> 
> Is this also intended to protect against erroneous *data* accesses to
> the linear map?

Now that I read the paper more carefully, I can see that this is the
case, and this does catch issues which DEBUG_RODATA cannot.

Apologies for the noise.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v3 0/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
                       ` (4 preceding siblings ...)
  2016-09-14  9:36     ` [kernel-hardening] " Mark Rutland
@ 2016-11-04 14:45     ` Juerg Haefliger
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
  5 siblings, 2 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-11-04 14:45 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: vpk, juerg.haefliger

Changes from:
  v2 -> v3:
    - Removed 'depends on DEBUG_KERNEL' and 'select DEBUG_TLBFLUSH'.
      These are left-overs from the original patch and are not required.
    - Make libata XPFO-aware, i.e., properly handle pages that were
      unmapped by XPFO. This takes care of the temporary hack in v2 that
      forced the use of a bounce buffer in block/blk-map.c.
  v1 -> v2:
    - Moved the code from arch/x86/mm/ to mm/ since it's (mostly)
      arch-agnostic.
    - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO
      for x86.
    - Use page_ext for the additional per-page data.
    - Removed the clearing of pages. This can be accomplished by using
      PAGE_POISONING.
    - Split up the patch into multiple patches.
    - Fixed additional issues identified by reviewers.

This patch series adds support for XPFO which protects against 'ret2dir'
kernel attacks. The basic idea is to enforce exclusive ownership of page
frames by either the kernel or userspace, unless explicitly requested by
the kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (removed from the kernel's page table). When such a
page is reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping.
Specifically two flags to distinguish user vs. kernel pages and to tag
unmapped pages and a reference counter to balance kmap/kunmap operations
and a lock to serialize access to the XPFO fields.

Known issues/limitations:
  - Only supports x86-64 (for now)
  - Only supports 4k pages (for now)
  - There are most likely some legitimate uses cases where the kernel needs
    to access userspace which need to be made XPFO-aware
  - Performance penalty

Reference paper by the original patch authors:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Juerg Haefliger (2):
  Add support for eXclusive Page Frame Ownership (XPFO)
  xpfo: Only put previous userspace pages into the hot cache

 arch/x86/Kconfig         |   3 +-
 arch/x86/mm/init.c       |   2 +-
 drivers/ata/libata-sff.c |   4 +-
 include/linux/highmem.h  |  15 +++-
 include/linux/page_ext.h |   7 ++
 include/linux/xpfo.h     |  41 +++++++++
 lib/swiotlb.c            |   3 +-
 mm/Makefile              |   1 +
 mm/page_alloc.c          |  10 ++-
 mm/page_ext.c            |   4 +
 mm/xpfo.c                | 214 +++++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig         |  19 +++++
 12 files changed, 315 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

-- 
2.10.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-04 14:45     ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger
@ 2016-11-04 14:45       ` Juerg Haefliger
  2016-11-04 14:50         ` [kernel-hardening] " Christoph Hellwig
                           ` (4 more replies)
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
  1 sibling, 5 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-11-04 14:45 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: vpk, juerg.haefliger

This patch adds support for XPFO which protects against 'ret2dir' kernel
attacks. The basic idea is to enforce exclusive ownership of page frames
by either the kernel or userspace, unless explicitly requested by the
kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping.
Specifically two flags to distinguish user vs. kernel pages and to tag
unmapped pages and a reference counter to balance kmap/kunmap operations
and a lock to serialize access to the XPFO fields.

Known issues/limitations:
  - Only supports x86-64 (for now)
  - Only supports 4k pages (for now)
  - There are most likely some legitimate uses cases where the kernel needs
    to access userspace which need to be made XPFO-aware
  - Performance penalty

Reference paper by the original patch authors:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 arch/x86/Kconfig         |   3 +-
 arch/x86/mm/init.c       |   2 +-
 drivers/ata/libata-sff.c |   4 +-
 include/linux/highmem.h  |  15 +++-
 include/linux/page_ext.h |   7 ++
 include/linux/xpfo.h     |  39 +++++++++
 lib/swiotlb.c            |   3 +-
 mm/Makefile              |   1 +
 mm/page_alloc.c          |   2 +
 mm/page_ext.c            |   4 +
 mm/xpfo.c                | 206 +++++++++++++++++++++++++++++++++++++++++++++++
 security/Kconfig         |  19 +++++
 12 files changed, 298 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bada636d1065..38b334f8fde5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -165,6 +165,7 @@ config X86
 	select HAVE_STACK_VALIDATION		if X86_64
 	select ARCH_USES_HIGH_VMA_FLAGS		if X86_INTEL_MEMORY_PROTECTION_KEYS
 	select ARCH_HAS_PKEYS			if X86_INTEL_MEMORY_PROTECTION_KEYS
+	select ARCH_SUPPORTS_XPFO		if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
@@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT
 
 config X86_DIRECT_GBPAGES
 	def_bool y
-	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
+	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
 	---help---
 	  Certain kernel features effectively disable kernel
 	  linear 1 GB mappings (even if the CPU otherwise
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 22af912d66d2..a6fafbae02bb 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,7 +161,7 @@ static int page_size_mask;
 
 static void __init probe_page_size_mask(void)
 {
-#if !defined(CONFIG_KMEMCHECK)
+#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
 	/*
 	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
 	 * use small pages.
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 051b6158d1b7..58af734be25d 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
 
 	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
 
-	if (PageHighMem(page)) {
+	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
 		unsigned long flags;
 
 		/* FIXME: use a bounce buffer */
@@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
 
 	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
 
-	if (PageHighMem(page)) {
+	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
 		unsigned long flags;
 
 		/* FIXME: use bounce buffer */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f3297062a..7a17c166532f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
+#include <linux/xpfo.h>
 
 #include <asm/cacheflush.h>
 
@@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
 #ifndef ARCH_HAS_KMAP
 static inline void *kmap(struct page *page)
 {
+	void *kaddr;
+
 	might_sleep();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 
 static inline void kunmap(struct page *page)
 {
+	xpfo_kunmap(page_address(page), page);
 }
 
 static inline void *kmap_atomic(struct page *page)
 {
+	void *kaddr;
+
 	preempt_disable();
 	pagefault_disable();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
 
 static inline void __kunmap_atomic(void *addr)
 {
+	xpfo_kunmap(addr, virt_to_page(addr));
 	pagefault_enable();
 	preempt_enable();
 }
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 9298c393ddaa..0e451a42e5a3 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -29,6 +29,8 @@ enum page_ext_flags {
 	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
 	PAGE_EXT_DEBUG_GUARD,
 	PAGE_EXT_OWNER,
+	PAGE_EXT_XPFO_KERNEL,		/* Page is a kernel page */
+	PAGE_EXT_XPFO_UNMAPPED,		/* Page is unmapped */
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
@@ -44,6 +46,11 @@ enum page_ext_flags {
  */
 struct page_ext {
 	unsigned long flags;
+#ifdef CONFIG_XPFO
+	int inited;		/* Map counter and lock initialized */
+	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
+	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
+#endif
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
new file mode 100644
index 000000000000..77187578ca33
--- /dev/null
+++ b/include/linux/xpfo.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _LINUX_XPFO_H
+#define _LINUX_XPFO_H
+
+#ifdef CONFIG_XPFO
+
+extern struct page_ext_operations page_xpfo_ops;
+
+extern void xpfo_kmap(void *kaddr, struct page *page);
+extern void xpfo_kunmap(void *kaddr, struct page *page);
+extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
+extern void xpfo_free_page(struct page *page, int order);
+
+extern bool xpfo_page_is_unmapped(struct page *page);
+
+#else /* !CONFIG_XPFO */
+
+static inline void xpfo_kmap(void *kaddr, struct page *page) { }
+static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
+static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
+static inline void xpfo_free_page(struct page *page, int order) { }
+
+static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
+
+#endif /* CONFIG_XPFO */
+
+#endif /* _LINUX_XPFO_H */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 22e13a0e19d7..455eff44604e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 {
 	unsigned long pfn = PFN_DOWN(orig_addr);
 	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	struct page *page = pfn_to_page(pfn);
 
-	if (PageHighMem(pfn_to_page(pfn))) {
+	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
 		/* The buffer does not have a mapping.  Map it in and copy */
 		unsigned int offset = orig_addr & ~PAGE_MASK;
 		char *buffer;
diff --git a/mm/Makefile b/mm/Makefile
index 295bd7a9f76b..175680f516aa 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_XPFO) += xpfo.o
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fd42aa7c4bd..100e80e008e2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	kernel_poison_pages(page, 1 << order, 0);
 	kernel_map_pages(page, 1 << order, 0);
 	kasan_free_pages(page, order);
+	xpfo_free_page(page, order);
 
 	return true;
 }
@@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	kernel_map_pages(page, 1 << order, 1);
 	kernel_poison_pages(page, 1 << order, 1);
 	kasan_alloc_pages(page, order);
+	xpfo_alloc_page(page, order, gfp_flags);
 	set_page_owner(page, order, gfp_flags);
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 121dcffc4ec1..ba6dbcacc2db 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/xpfo.h>
 
 /*
  * struct page extension
@@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_XPFO
+	&page_xpfo_ops,
+#endif
 };
 
 static unsigned long total_usage;
diff --git a/mm/xpfo.c b/mm/xpfo.c
new file mode 100644
index 000000000000..8e3a6a694b6a
--- /dev/null
+++ b/mm/xpfo.c
@@ -0,0 +1,206 @@
+/*
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/page_ext.h>
+#include <linux/xpfo.h>
+
+#include <asm/tlbflush.h>
+
+DEFINE_STATIC_KEY_FALSE(xpfo_inited);
+
+static bool need_xpfo(void)
+{
+	return true;
+}
+
+static void init_xpfo(void)
+{
+	printk(KERN_INFO "XPFO enabled\n");
+	static_branch_enable(&xpfo_inited);
+}
+
+struct page_ext_operations page_xpfo_ops = {
+	.need = need_xpfo,
+	.init = init_xpfo,
+};
+
+/*
+ * Update a single kernel page table entry
+ */
+static inline void set_kpte(struct page *page, unsigned long kaddr,
+			    pgprot_t prot) {
+	unsigned int level;
+	pte_t *kpte = lookup_address(kaddr, &level);
+
+	/* We only support 4k pages for now */
+	BUG_ON(!kpte || level != PG_LEVEL_4K);
+
+	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+}
+
+void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
+{
+	int i, flush_tlb = 0;
+	struct page_ext *page_ext;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++)  {
+		page_ext = lookup_page_ext(page + i);
+
+		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
+
+		/* Initialize the map lock and map counter */
+		if (!page_ext->inited) {
+			spin_lock_init(&page_ext->maplock);
+			atomic_set(&page_ext->mapcount, 0);
+			page_ext->inited = 1;
+		}
+		BUG_ON(atomic_read(&page_ext->mapcount));
+
+		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
+			/*
+			 * Flush the TLB if the page was previously allocated
+			 * to the kernel.
+			 */
+			if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
+					       &page_ext->flags))
+				flush_tlb = 1;
+		} else {
+			/* Tag the page as a kernel page */
+			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
+		}
+	}
+
+	if (flush_tlb) {
+		kaddr = (unsigned long)page_address(page);
+		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
+				       PAGE_SIZE);
+	}
+}
+
+void xpfo_free_page(struct page *page, int order)
+{
+	int i;
+	struct page_ext *page_ext;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+
+		if (!page_ext->inited) {
+			/*
+			 * The page was allocated before page_ext was
+			 * initialized, so it is a kernel page and it needs to
+			 * be tagged accordingly.
+			 */
+			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
+			continue;
+		}
+
+		/*
+		 * Map the page back into the kernel if it was previously
+		 * allocated to user space.
+		 */
+		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
+				       &page_ext->flags)) {
+			kaddr = (unsigned long)page_address(page + i);
+			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
+		}
+	}
+}
+
+void xpfo_kmap(void *kaddr, struct page *page)
+{
+	struct page_ext *page_ext;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	page_ext = lookup_page_ext(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (!page_ext->inited ||
+	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
+		return;
+
+	spin_lock_irqsave(&page_ext->maplock, flags);
+
+	/*
+	 * The page was previously allocated to user space, so map it back
+	 * into the kernel. No TLB flush required.
+	 */
+	if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
+	    test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
+		set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
+
+	spin_unlock_irqrestore(&page_ext->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kmap);
+
+void xpfo_kunmap(void *kaddr, struct page *page)
+{
+	struct page_ext *page_ext;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	page_ext = lookup_page_ext(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (!page_ext->inited ||
+	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
+		return;
+
+	spin_lock_irqsave(&page_ext->maplock, flags);
+
+	/*
+	 * The page is to be allocated back to user space, so unmap it from the
+	 * kernel, flush the TLB and tag it as a user page.
+	 */
+	if (atomic_dec_return(&page_ext->mapcount) == 0) {
+		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
+		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
+		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
+		__flush_tlb_one((unsigned long)kaddr);
+	}
+
+	spin_unlock_irqrestore(&page_ext->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kunmap);
+
+inline bool xpfo_page_is_unmapped(struct page *page)
+{
+	if (!static_branch_unlikely(&xpfo_inited))
+		return false;
+
+	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
+}
+EXPORT_SYMBOL(xpfo_page_is_unmapped);
diff --git a/security/Kconfig b/security/Kconfig
index 118f4549404e..4502e15c8419 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -6,6 +6,25 @@ menu "Security options"
 
 source security/keys/Kconfig
 
+config ARCH_SUPPORTS_XPFO
+	bool
+
+config XPFO
+	bool "Enable eXclusive Page Frame Ownership (XPFO)"
+	default n
+	depends on ARCH_SUPPORTS_XPFO
+	select PAGE_EXTENSION
+	help
+	  This option offers protection against 'ret2dir' kernel attacks.
+	  When enabled, every time a page frame is allocated to user space, it
+	  is unmapped from the direct mapped RAM region in kernel space
+	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
+	  mapped back to physmap.
+
+	  There is a slight performance impact when this option is enabled.
+
+	  If in doubt, say "N".
+
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"
 	default n
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
@ 2016-11-04 14:50         ` Christoph Hellwig
  2016-11-10  5:53         ` [kernel-hardening] " ZhaoJunmin Zhao(Junmin)
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2016-11-04 14:50 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk,
	Tejun Heo, linux-ide

The libata parts here really need to be split out and the proper list
and maintainer need to be Cc'ed.

> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 051b6158d1b7..58af734be25d 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
>  
>  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>  
> -	if (PageHighMem(page)) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>  		unsigned long flags;
>  
>  		/* FIXME: use a bounce buffer */
> @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
>  
>  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>  
> -	if (PageHighMem(page)) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>  		unsigned long flags;
>  
>  		/* FIXME: use bounce buffer */
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h

This is just piling one nasty hack on top of another.  libata should
just use the highmem case unconditionally, as it is the correct thing
to do for all cases.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
  2016-11-04 14:50         ` [kernel-hardening] " Christoph Hellwig
@ 2016-11-10  5:53         ` ZhaoJunmin Zhao(Junmin)
  2016-11-10 19:11         ` [kernel-hardening] " Kees Cook
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: ZhaoJunmin Zhao(Junmin) @ 2016-11-10  5:53 UTC (permalink / raw)
  To: kernel-hardening, linux-kernel, linux-mm, linux-x86_64
  Cc: vpk, juerg.haefliger

> This patch adds support for XPFO which protects against 'ret2dir' kernel
> attacks. The basic idea is to enforce exclusive ownership of page frames
> by either the kernel or userspace, unless explicitly requested by the
> kernel. Whenever a page destined for userspace is allocated, it is
> unmapped from physmap (the kernel's page table). When such a page is
> reclaimed from userspace, it is mapped back to physmap.
>
> Additional fields in the page_ext struct are used for XPFO housekeeping.
> Specifically two flags to distinguish user vs. kernel pages and to tag
> unmapped pages and a reference counter to balance kmap/kunmap operations
> and a lock to serialize access to the XPFO fields.
>
> Known issues/limitations:
>    - Only supports x86-64 (for now)
>    - Only supports 4k pages (for now)
>    - There are most likely some legitimate uses cases where the kernel needs
>      to access userspace which need to be made XPFO-aware
>    - Performance penalty
>
> Reference paper by the original patch authors:
>    http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
>
> Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
> ---
>   arch/x86/Kconfig         |   3 +-
>   arch/x86/mm/init.c       |   2 +-
>   drivers/ata/libata-sff.c |   4 +-
>   include/linux/highmem.h  |  15 +++-
>   include/linux/page_ext.h |   7 ++
>   include/linux/xpfo.h     |  39 +++++++++
>   lib/swiotlb.c            |   3 +-
>   mm/Makefile              |   1 +
>   mm/page_alloc.c          |   2 +
>   mm/page_ext.c            |   4 +
>   mm/xpfo.c                | 206 +++++++++++++++++++++++++++++++++++++++++++++++
>   security/Kconfig         |  19 +++++
>   12 files changed, 298 insertions(+), 7 deletions(-)
>   create mode 100644 include/linux/xpfo.h
>   create mode 100644 mm/xpfo.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index bada636d1065..38b334f8fde5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -165,6 +165,7 @@ config X86
>   	select HAVE_STACK_VALIDATION		if X86_64
>   	select ARCH_USES_HIGH_VMA_FLAGS		if X86_INTEL_MEMORY_PROTECTION_KEYS
>   	select ARCH_HAS_PKEYS			if X86_INTEL_MEMORY_PROTECTION_KEYS
> +	select ARCH_SUPPORTS_XPFO		if X86_64
>
>   config INSTRUCTION_DECODER
>   	def_bool y
> @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT
>
>   config X86_DIRECT_GBPAGES
>   	def_bool y
> -	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
> +	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
>   	---help---
>   	  Certain kernel features effectively disable kernel
>   	  linear 1 GB mappings (even if the CPU otherwise
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 22af912d66d2..a6fafbae02bb 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -161,7 +161,7 @@ static int page_size_mask;
>
>   static void __init probe_page_size_mask(void)
>   {
> -#if !defined(CONFIG_KMEMCHECK)
> +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
>   	/*
>   	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
>   	 * use small pages.
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 051b6158d1b7..58af734be25d 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
>
>   	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>
> -	if (PageHighMem(page)) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>   		unsigned long flags;
>
>   		/* FIXME: use a bounce buffer */
> @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
>
>   	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>
> -	if (PageHighMem(page)) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>   		unsigned long flags;
>
>   		/* FIXME: use bounce buffer */
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bb3f3297062a..7a17c166532f 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -7,6 +7,7 @@
>   #include <linux/mm.h>
>   #include <linux/uaccess.h>
>   #include <linux/hardirq.h>
> +#include <linux/xpfo.h>
>
>   #include <asm/cacheflush.h>
>
> @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
>   #ifndef ARCH_HAS_KMAP
>   static inline void *kmap(struct page *page)
>   {
> +	void *kaddr;
> +
>   	might_sleep();
> -	return page_address(page);
> +	kaddr = page_address(page);
> +	xpfo_kmap(kaddr, page);
> +	return kaddr;
>   }
>
>   static inline void kunmap(struct page *page)
>   {
> +	xpfo_kunmap(page_address(page), page);
>   }
>
>   static inline void *kmap_atomic(struct page *page)
>   {
> +	void *kaddr;
> +
>   	preempt_disable();
>   	pagefault_disable();
> -	return page_address(page);
> +	kaddr = page_address(page);
> +	xpfo_kmap(kaddr, page);
> +	return kaddr;
>   }
>   #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
>
>   static inline void __kunmap_atomic(void *addr)
>   {
> +	xpfo_kunmap(addr, virt_to_page(addr));
>   	pagefault_enable();
>   	preempt_enable();
>   }
> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> index 9298c393ddaa..0e451a42e5a3 100644
> --- a/include/linux/page_ext.h
> +++ b/include/linux/page_ext.h
> @@ -29,6 +29,8 @@ enum page_ext_flags {
>   	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
>   	PAGE_EXT_DEBUG_GUARD,
>   	PAGE_EXT_OWNER,
> +	PAGE_EXT_XPFO_KERNEL,		/* Page is a kernel page */
> +	PAGE_EXT_XPFO_UNMAPPED,		/* Page is unmapped */
>   #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
>   	PAGE_EXT_YOUNG,
>   	PAGE_EXT_IDLE,
> @@ -44,6 +46,11 @@ enum page_ext_flags {
>    */
>   struct page_ext {
>   	unsigned long flags;
> +#ifdef CONFIG_XPFO
> +	int inited;		/* Map counter and lock initialized */
> +	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
> +	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
> +#endif
>   };
>
>   extern void pgdat_page_ext_init(struct pglist_data *pgdat);
> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
> new file mode 100644
> index 000000000000..77187578ca33
> --- /dev/null
> +++ b/include/linux/xpfo.h
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> + * Copyright (C) 2016 Brown University. All rights reserved.
> + *
> + * Authors:
> + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#ifndef _LINUX_XPFO_H
> +#define _LINUX_XPFO_H
> +
> +#ifdef CONFIG_XPFO
> +
> +extern struct page_ext_operations page_xpfo_ops;
> +
> +extern void xpfo_kmap(void *kaddr, struct page *page);
> +extern void xpfo_kunmap(void *kaddr, struct page *page);
> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
> +extern void xpfo_free_page(struct page *page, int order);
> +
> +extern bool xpfo_page_is_unmapped(struct page *page);
> +
> +#else /* !CONFIG_XPFO */
> +
> +static inline void xpfo_kmap(void *kaddr, struct page *page) { }
> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
> +static inline void xpfo_free_page(struct page *page, int order) { }
> +
> +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
> +
> +#endif /* CONFIG_XPFO */
> +
> +#endif /* _LINUX_XPFO_H */
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 22e13a0e19d7..455eff44604e 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>   {
>   	unsigned long pfn = PFN_DOWN(orig_addr);
>   	unsigned char *vaddr = phys_to_virt(tlb_addr);
> +	struct page *page = pfn_to_page(pfn);
>
> -	if (PageHighMem(pfn_to_page(pfn))) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>   		/* The buffer does not have a mapping.  Map it in and copy */
>   		unsigned int offset = orig_addr & ~PAGE_MASK;
>   		char *buffer;
> diff --git a/mm/Makefile b/mm/Makefile
> index 295bd7a9f76b..175680f516aa 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
>   obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>   obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>   obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> +obj-$(CONFIG_XPFO) += xpfo.o
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fd42aa7c4bd..100e80e008e2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
>   	kernel_poison_pages(page, 1 << order, 0);
>   	kernel_map_pages(page, 1 << order, 0);
>   	kasan_free_pages(page, order);
> +	xpfo_free_page(page, order);
>
>   	return true;
>   }
> @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
>   	kernel_map_pages(page, 1 << order, 1);
>   	kernel_poison_pages(page, 1 << order, 1);
>   	kasan_alloc_pages(page, order);
> +	xpfo_alloc_page(page, order, gfp_flags);
>   	set_page_owner(page, order, gfp_flags);
>   }
>
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 121dcffc4ec1..ba6dbcacc2db 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -7,6 +7,7 @@
>   #include <linux/kmemleak.h>
>   #include <linux/page_owner.h>
>   #include <linux/page_idle.h>
> +#include <linux/xpfo.h>
>
>   /*
>    * struct page extension
> @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = {
>   #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
>   	&page_idle_ops,
>   #endif
> +#ifdef CONFIG_XPFO
> +	&page_xpfo_ops,
> +#endif
>   };
>
>   static unsigned long total_usage;
> diff --git a/mm/xpfo.c b/mm/xpfo.c
> new file mode 100644
> index 000000000000..8e3a6a694b6a
> --- /dev/null
> +++ b/mm/xpfo.c
> @@ -0,0 +1,206 @@
> +/*
> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> + * Copyright (C) 2016 Brown University. All rights reserved.
> + *
> + * Authors:
> + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/page_ext.h>
> +#include <linux/xpfo.h>
> +
> +#include <asm/tlbflush.h>
> +
> +DEFINE_STATIC_KEY_FALSE(xpfo_inited);
> +
> +static bool need_xpfo(void)
> +{
> +	return true;
> +}
> +
> +static void init_xpfo(void)
> +{
> +	printk(KERN_INFO "XPFO enabled\n");
> +	static_branch_enable(&xpfo_inited);
> +}
> +
> +struct page_ext_operations page_xpfo_ops = {
> +	.need = need_xpfo,
> +	.init = init_xpfo,
> +};
> +
> +/*
> + * Update a single kernel page table entry
> + */
> +static inline void set_kpte(struct page *page, unsigned long kaddr,
> +			    pgprot_t prot) {
> +	unsigned int level;
> +	pte_t *kpte = lookup_address(kaddr, &level);
> +
> +	/* We only support 4k pages for now */
> +	BUG_ON(!kpte || level != PG_LEVEL_4K);
> +
> +	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +}
> +
> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
> +{
> +	int i, flush_tlb = 0;
> +	struct page_ext *page_ext;
> +	unsigned long kaddr;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	for (i = 0; i < (1 << order); i++)  {
> +		page_ext = lookup_page_ext(page + i);
> +
> +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> +
> +		/* Initialize the map lock and map counter */
> +		if (!page_ext->inited) {
> +			spin_lock_init(&page_ext->maplock);
> +			atomic_set(&page_ext->mapcount, 0);
> +			page_ext->inited = 1;
> +		}
> +		BUG_ON(atomic_read(&page_ext->mapcount));
> +
> +		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
> +			/*
> +			 * Flush the TLB if the page was previously allocated
> +			 * to the kernel.
> +			 */
> +			if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
> +					       &page_ext->flags))
> +				flush_tlb = 1;
> +		} else {
> +			/* Tag the page as a kernel page */
> +			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> +		}
> +	}
> +
> +	if (flush_tlb) {
> +		kaddr = (unsigned long)page_address(page);
> +		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
> +				       PAGE_SIZE);
> +	}
> +}
> +
> +void xpfo_free_page(struct page *page, int order)
> +{
> +	int i;
> +	struct page_ext *page_ext;
> +	unsigned long kaddr;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	for (i = 0; i < (1 << order); i++) {
> +		page_ext = lookup_page_ext(page + i);
> +
> +		if (!page_ext->inited) {
> +			/*
> +			 * The page was allocated before page_ext was
> +			 * initialized, so it is a kernel page and it needs to
> +			 * be tagged accordingly.
> +			 */
> +			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> +			continue;
> +		}
> +
> +		/*
> +		 * Map the page back into the kernel if it was previously
> +		 * allocated to user space.
> +		 */
> +		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
> +				       &page_ext->flags)) {
> +			kaddr = (unsigned long)page_address(page + i);
> +			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
> +		}
> +	}
> +}
> +
> +void xpfo_kmap(void *kaddr, struct page *page)
> +{
> +	struct page_ext *page_ext;
> +	unsigned long flags;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	page_ext = lookup_page_ext(page);
> +
> +	/*
> +	 * The page was allocated before page_ext was initialized (which means
> +	 * it's a kernel page) or it's allocated to the kernel, so nothing to
> +	 * do.
> +	 */
> +	if (!page_ext->inited ||
> +	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> +		return;
> +
> +	spin_lock_irqsave(&page_ext->maplock, flags);
> +
> +	/*
> +	 * The page was previously allocated to user space, so map it back
> +	 * into the kernel. No TLB flush required.
> +	 */
> +	if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
> +	    test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
> +		set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
> +
> +	spin_unlock_irqrestore(&page_ext->maplock, flags);
> +}
> +EXPORT_SYMBOL(xpfo_kmap);
> +
> +void xpfo_kunmap(void *kaddr, struct page *page)
> +{
> +	struct page_ext *page_ext;
> +	unsigned long flags;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	page_ext = lookup_page_ext(page);
> +
> +	/*
> +	 * The page was allocated before page_ext was initialized (which means
> +	 * it's a kernel page) or it's allocated to the kernel, so nothing to
> +	 * do.
> +	 */
> +	if (!page_ext->inited ||
> +	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> +		return;
> +
> +	spin_lock_irqsave(&page_ext->maplock, flags);
> +
> +	/*
> +	 * The page is to be allocated back to user space, so unmap it from the
> +	 * kernel, flush the TLB and tag it as a user page.
> +	 */
> +	if (atomic_dec_return(&page_ext->mapcount) == 0) {
> +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> +		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
> +		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
> +		__flush_tlb_one((unsigned long)kaddr);
> +	}
> +
> +	spin_unlock_irqrestore(&page_ext->maplock, flags);
> +}
> +EXPORT_SYMBOL(xpfo_kunmap);
> +
> +inline bool xpfo_page_is_unmapped(struct page *page)
> +{
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return false;
> +
> +	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
> +}
> +EXPORT_SYMBOL(xpfo_page_is_unmapped);
> diff --git a/security/Kconfig b/security/Kconfig
> index 118f4549404e..4502e15c8419 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -6,6 +6,25 @@ menu "Security options"
>
>   source security/keys/Kconfig
>
> +config ARCH_SUPPORTS_XPFO
> +	bool
> +
> +config XPFO
> +	bool "Enable eXclusive Page Frame Ownership (XPFO)"
> +	default n
> +	depends on ARCH_SUPPORTS_XPFO
> +	select PAGE_EXTENSION
> +	help
> +	  This option offers protection against 'ret2dir' kernel attacks.
> +	  When enabled, every time a page frame is allocated to user space, it
> +	  is unmapped from the direct mapped RAM region in kernel space
> +	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
> +	  mapped back to physmap.
> +
> +	  There is a slight performance impact when this option is enabled.
> +
> +	  If in doubt, say "N".
> +
>   config SECURITY_DMESG_RESTRICT
>   	bool "Restrict unprivileged access to the kernel syslog"
>   	default n
>

When a physical page is assigned to a process in user space, it should 
be unmaped from kernel physmap.  From the code, I can see the patch only 
handle the page in high memory zone. if the kernel use the high memory 
zone, it will call the kmap. So I would like to know if the physical 
page is coming from normal zone,how to handle it.

Thanks
Zhaojunmin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
  2016-11-04 14:50         ` [kernel-hardening] " Christoph Hellwig
  2016-11-10  5:53         ` [kernel-hardening] " ZhaoJunmin Zhao(Junmin)
@ 2016-11-10 19:11         ` Kees Cook
  2016-11-15 11:15           ` Juerg Haefliger
  2016-11-10 19:24         ` Kees Cook
  2016-11-24 10:56         ` AKASHI Takahiro
  4 siblings, 1 reply; 30+ messages in thread
From: Kees Cook @ 2016-11-10 19:11 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64,
	vpk

On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote:
> This patch adds support for XPFO which protects against 'ret2dir' kernel
> attacks. The basic idea is to enforce exclusive ownership of page frames
> by either the kernel or userspace, unless explicitly requested by the
> kernel. Whenever a page destined for userspace is allocated, it is
> unmapped from physmap (the kernel's page table). When such a page is
> reclaimed from userspace, it is mapped back to physmap.
>
> Additional fields in the page_ext struct are used for XPFO housekeeping.
> Specifically two flags to distinguish user vs. kernel pages and to tag
> unmapped pages and a reference counter to balance kmap/kunmap operations
> and a lock to serialize access to the XPFO fields.

Thanks for keeping on this! I'd really like to see it land and then
get more architectures to support it.

> Known issues/limitations:
>   - Only supports x86-64 (for now)
>   - Only supports 4k pages (for now)
>   - There are most likely some legitimate uses cases where the kernel needs
>     to access userspace which need to be made XPFO-aware
>   - Performance penalty

In the Kconfig you say "slight", but I'm curious what kinds of
benchmarks you've done and if there's a more specific cost we can
declare, just to give people more of an idea what the hit looks like?
(What workloads would trigger a lot of XPFO unmapping, for example?)

Thanks!

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-10 19:11         ` [kernel-hardening] " Kees Cook
@ 2016-11-15 11:15           ` Juerg Haefliger
  0 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-11-15 11:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64,
	vpk


[-- Attachment #1.1: Type: text/plain, Size: 2123 bytes --]

Sorry for the late reply, I just found your email in my cluttered inbox.

On 11/10/2016 08:11 PM, Kees Cook wrote:
> On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote:
>> This patch adds support for XPFO which protects against 'ret2dir' kernel
>> attacks. The basic idea is to enforce exclusive ownership of page frames
>> by either the kernel or userspace, unless explicitly requested by the
>> kernel. Whenever a page destined for userspace is allocated, it is
>> unmapped from physmap (the kernel's page table). When such a page is
>> reclaimed from userspace, it is mapped back to physmap.
>>
>> Additional fields in the page_ext struct are used for XPFO housekeeping.
>> Specifically two flags to distinguish user vs. kernel pages and to tag
>> unmapped pages and a reference counter to balance kmap/kunmap operations
>> and a lock to serialize access to the XPFO fields.
> 
> Thanks for keeping on this! I'd really like to see it land and then
> get more architectures to support it.

Good to hear :-)


>> Known issues/limitations:
>>   - Only supports x86-64 (for now)
>>   - Only supports 4k pages (for now)
>>   - There are most likely some legitimate uses cases where the kernel needs
>>     to access userspace which need to be made XPFO-aware
>>   - Performance penalty
> 
> In the Kconfig you say "slight", but I'm curious what kinds of
> benchmarks you've done and if there's a more specific cost we can
> declare, just to give people more of an idea what the hit looks like?
> (What workloads would trigger a lot of XPFO unmapping, for example?)

That 'slight' wording is based on the performance numbers published in the referenced paper.

So far I've only run kernel compilation tests. For that workload, the big performance hit comes from
disabling >4k page sizes (around 10%). Adding XPFO on top causes 'only' another 0.5% performance
penalty. I'm currently looking into adding support for larger page sizes to see what the real impact
is and then generate some more relevant numbers.

...Juerg


> Thanks!
> 
> -Kees
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
                           ` (2 preceding siblings ...)
  2016-11-10 19:11         ` [kernel-hardening] " Kees Cook
@ 2016-11-10 19:24         ` Kees Cook
  2016-11-15 11:18           ` Juerg Haefliger
  2016-11-24 10:56         ` AKASHI Takahiro
  4 siblings, 1 reply; 30+ messages in thread
From: Kees Cook @ 2016-11-10 19:24 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64,
	vpk

On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote:
> This patch adds support for XPFO which protects against 'ret2dir' kernel
> attacks. The basic idea is to enforce exclusive ownership of page frames
> by either the kernel or userspace, unless explicitly requested by the
> kernel. Whenever a page destined for userspace is allocated, it is
> unmapped from physmap (the kernel's page table). When such a page is
> reclaimed from userspace, it is mapped back to physmap.
>
> Additional fields in the page_ext struct are used for XPFO housekeeping.
> Specifically two flags to distinguish user vs. kernel pages and to tag
> unmapped pages and a reference counter to balance kmap/kunmap operations
> and a lock to serialize access to the XPFO fields.
>
> Known issues/limitations:
>   - Only supports x86-64 (for now)
>   - Only supports 4k pages (for now)
>   - There are most likely some legitimate uses cases where the kernel needs
>     to access userspace which need to be made XPFO-aware
>   - Performance penalty
>
> Reference paper by the original patch authors:
>   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Would it be possible to create an lkdtm test that can exercise this protection?

> Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
> ---
>  arch/x86/Kconfig         |   3 +-
>  arch/x86/mm/init.c       |   2 +-
>  drivers/ata/libata-sff.c |   4 +-
>  include/linux/highmem.h  |  15 +++-
>  include/linux/page_ext.h |   7 ++
>  include/linux/xpfo.h     |  39 +++++++++
>  lib/swiotlb.c            |   3 +-
>  mm/Makefile              |   1 +
>  mm/page_alloc.c          |   2 +
>  mm/page_ext.c            |   4 +
>  mm/xpfo.c                | 206 +++++++++++++++++++++++++++++++++++++++++++++++
>  security/Kconfig         |  19 +++++
>  12 files changed, 298 insertions(+), 7 deletions(-)
>  create mode 100644 include/linux/xpfo.h
>  create mode 100644 mm/xpfo.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index bada636d1065..38b334f8fde5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -165,6 +165,7 @@ config X86
>         select HAVE_STACK_VALIDATION            if X86_64
>         select ARCH_USES_HIGH_VMA_FLAGS         if X86_INTEL_MEMORY_PROTECTION_KEYS
>         select ARCH_HAS_PKEYS                   if X86_INTEL_MEMORY_PROTECTION_KEYS
> +       select ARCH_SUPPORTS_XPFO               if X86_64
>
>  config INSTRUCTION_DECODER
>         def_bool y
> @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT
>
>  config X86_DIRECT_GBPAGES
>         def_bool y
> -       depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
> +       depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
>         ---help---
>           Certain kernel features effectively disable kernel
>           linear 1 GB mappings (even if the CPU otherwise
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 22af912d66d2..a6fafbae02bb 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -161,7 +161,7 @@ static int page_size_mask;
>
>  static void __init probe_page_size_mask(void)
>  {
> -#if !defined(CONFIG_KMEMCHECK)
> +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
>         /*
>          * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
>          * use small pages.
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 051b6158d1b7..58af734be25d 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
>
>         DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>
> -       if (PageHighMem(page)) {
> +       if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>                 unsigned long flags;
>
>                 /* FIXME: use a bounce buffer */
> @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
>
>         DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>
> -       if (PageHighMem(page)) {
> +       if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>                 unsigned long flags;
>
>                 /* FIXME: use bounce buffer */
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bb3f3297062a..7a17c166532f 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -7,6 +7,7 @@
>  #include <linux/mm.h>
>  #include <linux/uaccess.h>
>  #include <linux/hardirq.h>
> +#include <linux/xpfo.h>
>
>  #include <asm/cacheflush.h>
>
> @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
>  #ifndef ARCH_HAS_KMAP
>  static inline void *kmap(struct page *page)
>  {
> +       void *kaddr;
> +
>         might_sleep();
> -       return page_address(page);
> +       kaddr = page_address(page);
> +       xpfo_kmap(kaddr, page);
> +       return kaddr;
>  }
>
>  static inline void kunmap(struct page *page)
>  {
> +       xpfo_kunmap(page_address(page), page);
>  }
>
>  static inline void *kmap_atomic(struct page *page)
>  {
> +       void *kaddr;
> +
>         preempt_disable();
>         pagefault_disable();
> -       return page_address(page);
> +       kaddr = page_address(page);
> +       xpfo_kmap(kaddr, page);
> +       return kaddr;
>  }
>  #define kmap_atomic_prot(page, prot)   kmap_atomic(page)
>
>  static inline void __kunmap_atomic(void *addr)
>  {
> +       xpfo_kunmap(addr, virt_to_page(addr));
>         pagefault_enable();
>         preempt_enable();
>  }
> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> index 9298c393ddaa..0e451a42e5a3 100644
> --- a/include/linux/page_ext.h
> +++ b/include/linux/page_ext.h
> @@ -29,6 +29,8 @@ enum page_ext_flags {
>         PAGE_EXT_DEBUG_POISON,          /* Page is poisoned */
>         PAGE_EXT_DEBUG_GUARD,
>         PAGE_EXT_OWNER,
> +       PAGE_EXT_XPFO_KERNEL,           /* Page is a kernel page */
> +       PAGE_EXT_XPFO_UNMAPPED,         /* Page is unmapped */
>  #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
>         PAGE_EXT_YOUNG,
>         PAGE_EXT_IDLE,
> @@ -44,6 +46,11 @@ enum page_ext_flags {
>   */
>  struct page_ext {
>         unsigned long flags;
> +#ifdef CONFIG_XPFO
> +       int inited;             /* Map counter and lock initialized */
> +       atomic_t mapcount;      /* Counter for balancing map/unmap requests */
> +       spinlock_t maplock;     /* Lock to serialize map/unmap requests */
> +#endif
>  };
>
>  extern void pgdat_page_ext_init(struct pglist_data *pgdat);
> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
> new file mode 100644
> index 000000000000..77187578ca33
> --- /dev/null
> +++ b/include/linux/xpfo.h
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> + * Copyright (C) 2016 Brown University. All rights reserved.
> + *
> + * Authors:
> + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#ifndef _LINUX_XPFO_H
> +#define _LINUX_XPFO_H
> +
> +#ifdef CONFIG_XPFO
> +
> +extern struct page_ext_operations page_xpfo_ops;
> +
> +extern void xpfo_kmap(void *kaddr, struct page *page);
> +extern void xpfo_kunmap(void *kaddr, struct page *page);
> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
> +extern void xpfo_free_page(struct page *page, int order);
> +
> +extern bool xpfo_page_is_unmapped(struct page *page);
> +
> +#else /* !CONFIG_XPFO */
> +
> +static inline void xpfo_kmap(void *kaddr, struct page *page) { }
> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
> +static inline void xpfo_free_page(struct page *page, int order) { }
> +
> +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
> +
> +#endif /* CONFIG_XPFO */
> +
> +#endif /* _LINUX_XPFO_H */
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 22e13a0e19d7..455eff44604e 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>  {
>         unsigned long pfn = PFN_DOWN(orig_addr);
>         unsigned char *vaddr = phys_to_virt(tlb_addr);
> +       struct page *page = pfn_to_page(pfn);
>
> -       if (PageHighMem(pfn_to_page(pfn))) {
> +       if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>                 /* The buffer does not have a mapping.  Map it in and copy */
>                 unsigned int offset = orig_addr & ~PAGE_MASK;
>                 char *buffer;
> diff --git a/mm/Makefile b/mm/Makefile
> index 295bd7a9f76b..175680f516aa 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
>  obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> +obj-$(CONFIG_XPFO) += xpfo.o
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fd42aa7c4bd..100e80e008e2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
>         kernel_poison_pages(page, 1 << order, 0);
>         kernel_map_pages(page, 1 << order, 0);
>         kasan_free_pages(page, order);
> +       xpfo_free_page(page, order);
>
>         return true;
>  }
> @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
>         kernel_map_pages(page, 1 << order, 1);
>         kernel_poison_pages(page, 1 << order, 1);
>         kasan_alloc_pages(page, order);
> +       xpfo_alloc_page(page, order, gfp_flags);
>         set_page_owner(page, order, gfp_flags);
>  }
>
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 121dcffc4ec1..ba6dbcacc2db 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -7,6 +7,7 @@
>  #include <linux/kmemleak.h>
>  #include <linux/page_owner.h>
>  #include <linux/page_idle.h>
> +#include <linux/xpfo.h>
>
>  /*
>   * struct page extension
> @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = {
>  #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
>         &page_idle_ops,
>  #endif
> +#ifdef CONFIG_XPFO
> +       &page_xpfo_ops,
> +#endif
>  };
>
>  static unsigned long total_usage;
> diff --git a/mm/xpfo.c b/mm/xpfo.c
> new file mode 100644
> index 000000000000..8e3a6a694b6a
> --- /dev/null
> +++ b/mm/xpfo.c
> @@ -0,0 +1,206 @@
> +/*
> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> + * Copyright (C) 2016 Brown University. All rights reserved.
> + *
> + * Authors:
> + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/page_ext.h>
> +#include <linux/xpfo.h>
> +
> +#include <asm/tlbflush.h>
> +
> +DEFINE_STATIC_KEY_FALSE(xpfo_inited);
> +
> +static bool need_xpfo(void)
> +{
> +       return true;
> +}
> +
> +static void init_xpfo(void)
> +{
> +       printk(KERN_INFO "XPFO enabled\n");
> +       static_branch_enable(&xpfo_inited);
> +}
> +
> +struct page_ext_operations page_xpfo_ops = {
> +       .need = need_xpfo,
> +       .init = init_xpfo,
> +};
> +
> +/*
> + * Update a single kernel page table entry
> + */
> +static inline void set_kpte(struct page *page, unsigned long kaddr,
> +                           pgprot_t prot) {
> +       unsigned int level;
> +       pte_t *kpte = lookup_address(kaddr, &level);
> +
> +       /* We only support 4k pages for now */
> +       BUG_ON(!kpte || level != PG_LEVEL_4K);
> +
> +       set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +}
> +
> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
> +{
> +       int i, flush_tlb = 0;
> +       struct page_ext *page_ext;
> +       unsigned long kaddr;
> +
> +       if (!static_branch_unlikely(&xpfo_inited))
> +               return;
> +
> +       for (i = 0; i < (1 << order); i++)  {
> +               page_ext = lookup_page_ext(page + i);
> +
> +               BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> +
> +               /* Initialize the map lock and map counter */
> +               if (!page_ext->inited) {
> +                       spin_lock_init(&page_ext->maplock);
> +                       atomic_set(&page_ext->mapcount, 0);
> +                       page_ext->inited = 1;
> +               }
> +               BUG_ON(atomic_read(&page_ext->mapcount));
> +
> +               if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
> +                       /*
> +                        * Flush the TLB if the page was previously allocated
> +                        * to the kernel.
> +                        */
> +                       if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
> +                                              &page_ext->flags))
> +                               flush_tlb = 1;
> +               } else {
> +                       /* Tag the page as a kernel page */
> +                       set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> +               }
> +       }
> +
> +       if (flush_tlb) {
> +               kaddr = (unsigned long)page_address(page);
> +               flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
> +                                      PAGE_SIZE);
> +       }
> +}
> +
> +void xpfo_free_page(struct page *page, int order)
> +{
> +       int i;
> +       struct page_ext *page_ext;
> +       unsigned long kaddr;
> +
> +       if (!static_branch_unlikely(&xpfo_inited))
> +               return;
> +
> +       for (i = 0; i < (1 << order); i++) {
> +               page_ext = lookup_page_ext(page + i);
> +
> +               if (!page_ext->inited) {
> +                       /*
> +                        * The page was allocated before page_ext was
> +                        * initialized, so it is a kernel page and it needs to
> +                        * be tagged accordingly.
> +                        */
> +                       set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> +                       continue;
> +               }
> +
> +               /*
> +                * Map the page back into the kernel if it was previously
> +                * allocated to user space.
> +                */
> +               if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
> +                                      &page_ext->flags)) {
> +                       kaddr = (unsigned long)page_address(page + i);
> +                       set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
> +               }
> +       }
> +}
> +
> +void xpfo_kmap(void *kaddr, struct page *page)
> +{
> +       struct page_ext *page_ext;
> +       unsigned long flags;
> +
> +       if (!static_branch_unlikely(&xpfo_inited))
> +               return;
> +
> +       page_ext = lookup_page_ext(page);
> +
> +       /*
> +        * The page was allocated before page_ext was initialized (which means
> +        * it's a kernel page) or it's allocated to the kernel, so nothing to
> +        * do.
> +        */
> +       if (!page_ext->inited ||
> +           test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> +               return;
> +
> +       spin_lock_irqsave(&page_ext->maplock, flags);
> +
> +       /*
> +        * The page was previously allocated to user space, so map it back
> +        * into the kernel. No TLB flush required.
> +        */
> +       if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
> +           test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
> +               set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
> +
> +       spin_unlock_irqrestore(&page_ext->maplock, flags);
> +}
> +EXPORT_SYMBOL(xpfo_kmap);
> +
> +void xpfo_kunmap(void *kaddr, struct page *page)
> +{
> +       struct page_ext *page_ext;
> +       unsigned long flags;
> +
> +       if (!static_branch_unlikely(&xpfo_inited))
> +               return;
> +
> +       page_ext = lookup_page_ext(page);
> +
> +       /*
> +        * The page was allocated before page_ext was initialized (which means
> +        * it's a kernel page) or it's allocated to the kernel, so nothing to
> +        * do.
> +        */
> +       if (!page_ext->inited ||
> +           test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> +               return;
> +
> +       spin_lock_irqsave(&page_ext->maplock, flags);
> +
> +       /*
> +        * The page is to be allocated back to user space, so unmap it from the
> +        * kernel, flush the TLB and tag it as a user page.
> +        */
> +       if (atomic_dec_return(&page_ext->mapcount) == 0) {
> +               BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> +               set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
> +               set_kpte(page, (unsigned long)kaddr, __pgprot(0));
> +               __flush_tlb_one((unsigned long)kaddr);
> +       }
> +
> +       spin_unlock_irqrestore(&page_ext->maplock, flags);
> +}
> +EXPORT_SYMBOL(xpfo_kunmap);
> +
> +inline bool xpfo_page_is_unmapped(struct page *page)
> +{
> +       if (!static_branch_unlikely(&xpfo_inited))
> +               return false;
> +
> +       return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
> +}
> +EXPORT_SYMBOL(xpfo_page_is_unmapped);
> diff --git a/security/Kconfig b/security/Kconfig
> index 118f4549404e..4502e15c8419 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -6,6 +6,25 @@ menu "Security options"
>
>  source security/keys/Kconfig
>
> +config ARCH_SUPPORTS_XPFO
> +       bool

Can you include a "help" section here to describe what requirements an
architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and
HAVE_ARCH_VMAP_STACK or some examples.

> +config XPFO
> +       bool "Enable eXclusive Page Frame Ownership (XPFO)"
> +       default n
> +       depends on ARCH_SUPPORTS_XPFO
> +       select PAGE_EXTENSION
> +       help
> +         This option offers protection against 'ret2dir' kernel attacks.
> +         When enabled, every time a page frame is allocated to user space, it
> +         is unmapped from the direct mapped RAM region in kernel space
> +         (physmap). Similarly, when a page frame is freed/reclaimed, it is
> +         mapped back to physmap.
> +
> +         There is a slight performance impact when this option is enabled.
> +
> +         If in doubt, say "N".
> +
>  config SECURITY_DMESG_RESTRICT
>         bool "Restrict unprivileged access to the kernel syslog"
>         default n
> --
> 2.10.1
>

I've added these patches to my kspp tree on kernel.org, so it should
get some 0-day testing now...

Thanks!

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-10 19:24         ` Kees Cook
@ 2016-11-15 11:18           ` Juerg Haefliger
  0 siblings, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-11-15 11:18 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64,
	vpk


[-- Attachment #1.1: Type: text/plain, Size: 2873 bytes --]

On 11/10/2016 08:24 PM, Kees Cook wrote:
> On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote:
>> This patch adds support for XPFO which protects against 'ret2dir' kernel
>> attacks. The basic idea is to enforce exclusive ownership of page frames
>> by either the kernel or userspace, unless explicitly requested by the
>> kernel. Whenever a page destined for userspace is allocated, it is
>> unmapped from physmap (the kernel's page table). When such a page is
>> reclaimed from userspace, it is mapped back to physmap.
>>
>> Additional fields in the page_ext struct are used for XPFO housekeeping.
>> Specifically two flags to distinguish user vs. kernel pages and to tag
>> unmapped pages and a reference counter to balance kmap/kunmap operations
>> and a lock to serialize access to the XPFO fields.
>>
>> Known issues/limitations:
>>   - Only supports x86-64 (for now)
>>   - Only supports 4k pages (for now)
>>   - There are most likely some legitimate uses cases where the kernel needs
>>     to access userspace which need to be made XPFO-aware
>>   - Performance penalty
>>
>> Reference paper by the original patch authors:
>>   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
> 
> Would it be possible to create an lkdtm test that can exercise this protection?

I'll look into it.


>> diff --git a/security/Kconfig b/security/Kconfig
>> index 118f4549404e..4502e15c8419 100644
>> --- a/security/Kconfig
>> +++ b/security/Kconfig
>> @@ -6,6 +6,25 @@ menu "Security options"
>>
>>  source security/keys/Kconfig
>>
>> +config ARCH_SUPPORTS_XPFO
>> +       bool
> 
> Can you include a "help" section here to describe what requirements an
> architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and
> HAVE_ARCH_VMAP_STACK or some examples.

Will do.


>> +config XPFO
>> +       bool "Enable eXclusive Page Frame Ownership (XPFO)"
>> +       default n
>> +       depends on ARCH_SUPPORTS_XPFO
>> +       select PAGE_EXTENSION
>> +       help
>> +         This option offers protection against 'ret2dir' kernel attacks.
>> +         When enabled, every time a page frame is allocated to user space, it
>> +         is unmapped from the direct mapped RAM region in kernel space
>> +         (physmap). Similarly, when a page frame is freed/reclaimed, it is
>> +         mapped back to physmap.
>> +
>> +         There is a slight performance impact when this option is enabled.
>> +
>> +         If in doubt, say "N".
>> +
>>  config SECURITY_DMESG_RESTRICT
>>         bool "Restrict unprivileged access to the kernel syslog"
>>         default n
> 
> I've added these patches to my kspp tree on kernel.org, so it should
> get some 0-day testing now...

Very good. Thanks!


> Thanks!

Appreciate the feedback.

...Juerg


> -Kees
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
                           ` (3 preceding siblings ...)
  2016-11-10 19:24         ` Kees Cook
@ 2016-11-24 10:56         ` AKASHI Takahiro
  2016-11-28 11:15           ` Juerg Haefliger
  2016-12-09  9:02           ` AKASHI Takahiro
  4 siblings, 2 replies; 30+ messages in thread
From: AKASHI Takahiro @ 2016-11-24 10:56 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk

Hi,

I'm trying to give it a spin on arm64, but ...

On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote:
> This patch adds support for XPFO which protects against 'ret2dir' kernel
> attacks. The basic idea is to enforce exclusive ownership of page frames
> by either the kernel or userspace, unless explicitly requested by the
> kernel. Whenever a page destined for userspace is allocated, it is
> unmapped from physmap (the kernel's page table). When such a page is
> reclaimed from userspace, it is mapped back to physmap.
> 
> Additional fields in the page_ext struct are used for XPFO housekeeping.
> Specifically two flags to distinguish user vs. kernel pages and to tag
> unmapped pages and a reference counter to balance kmap/kunmap operations
> and a lock to serialize access to the XPFO fields.
> 
> Known issues/limitations:
>   - Only supports x86-64 (for now)
>   - Only supports 4k pages (for now)
>   - There are most likely some legitimate uses cases where the kernel needs
>     to access userspace which need to be made XPFO-aware
>   - Performance penalty
> 
> Reference paper by the original patch authors:
>   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
> 
> Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
> ---
>  arch/x86/Kconfig         |   3 +-
>  arch/x86/mm/init.c       |   2 +-
>  drivers/ata/libata-sff.c |   4 +-
>  include/linux/highmem.h  |  15 +++-
>  include/linux/page_ext.h |   7 ++
>  include/linux/xpfo.h     |  39 +++++++++
>  lib/swiotlb.c            |   3 +-
>  mm/Makefile              |   1 +
>  mm/page_alloc.c          |   2 +
>  mm/page_ext.c            |   4 +
>  mm/xpfo.c                | 206 +++++++++++++++++++++++++++++++++++++++++++++++
>  security/Kconfig         |  19 +++++
>  12 files changed, 298 insertions(+), 7 deletions(-)
>  create mode 100644 include/linux/xpfo.h
>  create mode 100644 mm/xpfo.c
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index bada636d1065..38b334f8fde5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -165,6 +165,7 @@ config X86
>  	select HAVE_STACK_VALIDATION		if X86_64
>  	select ARCH_USES_HIGH_VMA_FLAGS		if X86_INTEL_MEMORY_PROTECTION_KEYS
>  	select ARCH_HAS_PKEYS			if X86_INTEL_MEMORY_PROTECTION_KEYS
> +	select ARCH_SUPPORTS_XPFO		if X86_64
>  
>  config INSTRUCTION_DECODER
>  	def_bool y
> @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT
>  
>  config X86_DIRECT_GBPAGES
>  	def_bool y
> -	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
> +	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
>  	---help---
>  	  Certain kernel features effectively disable kernel
>  	  linear 1 GB mappings (even if the CPU otherwise
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 22af912d66d2..a6fafbae02bb 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -161,7 +161,7 @@ static int page_size_mask;
>  
>  static void __init probe_page_size_mask(void)
>  {
> -#if !defined(CONFIG_KMEMCHECK)
> +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
>  	/*
>  	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
>  	 * use small pages.
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 051b6158d1b7..58af734be25d 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
>  
>  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>  
> -	if (PageHighMem(page)) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>  		unsigned long flags;
>  
>  		/* FIXME: use a bounce buffer */
> @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
>  
>  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
>  
> -	if (PageHighMem(page)) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>  		unsigned long flags;
>  
>  		/* FIXME: use bounce buffer */
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bb3f3297062a..7a17c166532f 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -7,6 +7,7 @@
>  #include <linux/mm.h>
>  #include <linux/uaccess.h>
>  #include <linux/hardirq.h>
> +#include <linux/xpfo.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
>  #ifndef ARCH_HAS_KMAP
>  static inline void *kmap(struct page *page)
>  {
> +	void *kaddr;
> +
>  	might_sleep();
> -	return page_address(page);
> +	kaddr = page_address(page);
> +	xpfo_kmap(kaddr, page);
> +	return kaddr;
>  }
>  
>  static inline void kunmap(struct page *page)
>  {
> +	xpfo_kunmap(page_address(page), page);
>  }
>  
>  static inline void *kmap_atomic(struct page *page)
>  {
> +	void *kaddr;
> +
>  	preempt_disable();
>  	pagefault_disable();
> -	return page_address(page);
> +	kaddr = page_address(page);
> +	xpfo_kmap(kaddr, page);
> +	return kaddr;
>  }
>  #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
>  
>  static inline void __kunmap_atomic(void *addr)
>  {
> +	xpfo_kunmap(addr, virt_to_page(addr));
>  	pagefault_enable();
>  	preempt_enable();
>  }
> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> index 9298c393ddaa..0e451a42e5a3 100644
> --- a/include/linux/page_ext.h
> +++ b/include/linux/page_ext.h
> @@ -29,6 +29,8 @@ enum page_ext_flags {
>  	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
>  	PAGE_EXT_DEBUG_GUARD,
>  	PAGE_EXT_OWNER,
> +	PAGE_EXT_XPFO_KERNEL,		/* Page is a kernel page */
> +	PAGE_EXT_XPFO_UNMAPPED,		/* Page is unmapped */
>  #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
>  	PAGE_EXT_YOUNG,
>  	PAGE_EXT_IDLE,
> @@ -44,6 +46,11 @@ enum page_ext_flags {
>   */
>  struct page_ext {
>  	unsigned long flags;
> +#ifdef CONFIG_XPFO
> +	int inited;		/* Map counter and lock initialized */
> +	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
> +	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
> +#endif
>  };
>  
>  extern void pgdat_page_ext_init(struct pglist_data *pgdat);
> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
> new file mode 100644
> index 000000000000..77187578ca33
> --- /dev/null
> +++ b/include/linux/xpfo.h
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> + * Copyright (C) 2016 Brown University. All rights reserved.
> + *
> + * Authors:
> + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#ifndef _LINUX_XPFO_H
> +#define _LINUX_XPFO_H
> +
> +#ifdef CONFIG_XPFO
> +
> +extern struct page_ext_operations page_xpfo_ops;
> +
> +extern void xpfo_kmap(void *kaddr, struct page *page);
> +extern void xpfo_kunmap(void *kaddr, struct page *page);
> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
> +extern void xpfo_free_page(struct page *page, int order);
> +
> +extern bool xpfo_page_is_unmapped(struct page *page);
> +
> +#else /* !CONFIG_XPFO */
> +
> +static inline void xpfo_kmap(void *kaddr, struct page *page) { }
> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
> +static inline void xpfo_free_page(struct page *page, int order) { }
> +
> +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
> +
> +#endif /* CONFIG_XPFO */
> +
> +#endif /* _LINUX_XPFO_H */
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 22e13a0e19d7..455eff44604e 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>  {
>  	unsigned long pfn = PFN_DOWN(orig_addr);
>  	unsigned char *vaddr = phys_to_virt(tlb_addr);
> +	struct page *page = pfn_to_page(pfn);
>  
> -	if (PageHighMem(pfn_to_page(pfn))) {
> +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
>  		/* The buffer does not have a mapping.  Map it in and copy */
>  		unsigned int offset = orig_addr & ~PAGE_MASK;
>  		char *buffer;
> diff --git a/mm/Makefile b/mm/Makefile
> index 295bd7a9f76b..175680f516aa 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
>  obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> +obj-$(CONFIG_XPFO) += xpfo.o
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fd42aa7c4bd..100e80e008e2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
>  	kernel_poison_pages(page, 1 << order, 0);
>  	kernel_map_pages(page, 1 << order, 0);
>  	kasan_free_pages(page, order);
> +	xpfo_free_page(page, order);
>  
>  	return true;
>  }
> @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
>  	kernel_map_pages(page, 1 << order, 1);
>  	kernel_poison_pages(page, 1 << order, 1);
>  	kasan_alloc_pages(page, order);
> +	xpfo_alloc_page(page, order, gfp_flags);
>  	set_page_owner(page, order, gfp_flags);
>  }
>  
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 121dcffc4ec1..ba6dbcacc2db 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -7,6 +7,7 @@
>  #include <linux/kmemleak.h>
>  #include <linux/page_owner.h>
>  #include <linux/page_idle.h>
> +#include <linux/xpfo.h>
>  
>  /*
>   * struct page extension
> @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = {
>  #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
>  	&page_idle_ops,
>  #endif
> +#ifdef CONFIG_XPFO
> +	&page_xpfo_ops,
> +#endif
>  };
>  
>  static unsigned long total_usage;
> diff --git a/mm/xpfo.c b/mm/xpfo.c
> new file mode 100644
> index 000000000000..8e3a6a694b6a
> --- /dev/null
> +++ b/mm/xpfo.c
> @@ -0,0 +1,206 @@
> +/*
> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> + * Copyright (C) 2016 Brown University. All rights reserved.
> + *
> + * Authors:
> + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/page_ext.h>
> +#include <linux/xpfo.h>
> +
> +#include <asm/tlbflush.h>
> +
> +DEFINE_STATIC_KEY_FALSE(xpfo_inited);
> +
> +static bool need_xpfo(void)
> +{
> +	return true;
> +}
> +
> +static void init_xpfo(void)
> +{
> +	printk(KERN_INFO "XPFO enabled\n");
> +	static_branch_enable(&xpfo_inited);
> +}
> +
> +struct page_ext_operations page_xpfo_ops = {
> +	.need = need_xpfo,
> +	.init = init_xpfo,
> +};
> +
> +/*
> + * Update a single kernel page table entry
> + */
> +static inline void set_kpte(struct page *page, unsigned long kaddr,
> +			    pgprot_t prot) {
> +	unsigned int level;
> +	pte_t *kpte = lookup_address(kaddr, &level);
> +
> +	/* We only support 4k pages for now */
> +	BUG_ON(!kpte || level != PG_LEVEL_4K);
> +
> +	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +}

As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific,
would it be better to put the whole definition into arch-specific part?

> +
> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
> +{
> +	int i, flush_tlb = 0;
> +	struct page_ext *page_ext;
> +	unsigned long kaddr;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	for (i = 0; i < (1 << order); i++)  {
> +		page_ext = lookup_page_ext(page + i);
> +
> +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> +
> +		/* Initialize the map lock and map counter */
> +		if (!page_ext->inited) {
> +			spin_lock_init(&page_ext->maplock);
> +			atomic_set(&page_ext->mapcount, 0);
> +			page_ext->inited = 1;
> +		}
> +		BUG_ON(atomic_read(&page_ext->mapcount));
> +
> +		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
> +			/*
> +			 * Flush the TLB if the page was previously allocated
> +			 * to the kernel.
> +			 */
> +			if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
> +					       &page_ext->flags))
> +				flush_tlb = 1;
> +		} else {
> +			/* Tag the page as a kernel page */
> +			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> +		}
> +	}
> +
> +	if (flush_tlb) {
> +		kaddr = (unsigned long)page_address(page);
> +		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
> +				       PAGE_SIZE);
> +	}
> +}
> +
> +void xpfo_free_page(struct page *page, int order)
> +{
> +	int i;
> +	struct page_ext *page_ext;
> +	unsigned long kaddr;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	for (i = 0; i < (1 << order); i++) {
> +		page_ext = lookup_page_ext(page + i);
> +
> +		if (!page_ext->inited) {
> +			/*
> +			 * The page was allocated before page_ext was
> +			 * initialized, so it is a kernel page and it needs to
> +			 * be tagged accordingly.
> +			 */
> +			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> +			continue;
> +		}
> +
> +		/*
> +		 * Map the page back into the kernel if it was previously
> +		 * allocated to user space.
> +		 */
> +		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
> +				       &page_ext->flags)) {
> +			kaddr = (unsigned long)page_address(page + i);
> +			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));

Why not PAGE_KERNEL?

> +		}
> +	}
> +}
> +
> +void xpfo_kmap(void *kaddr, struct page *page)
> +{
> +	struct page_ext *page_ext;
> +	unsigned long flags;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	page_ext = lookup_page_ext(page);
> +
> +	/*
> +	 * The page was allocated before page_ext was initialized (which means
> +	 * it's a kernel page) or it's allocated to the kernel, so nothing to
> +	 * do.
> +	 */
> +	if (!page_ext->inited ||
> +	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> +		return;
> +
> +	spin_lock_irqsave(&page_ext->maplock, flags);
> +
> +	/*
> +	 * The page was previously allocated to user space, so map it back
> +	 * into the kernel. No TLB flush required.
> +	 */
> +	if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
> +	    test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
> +		set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
> +
> +	spin_unlock_irqrestore(&page_ext->maplock, flags);
> +}
> +EXPORT_SYMBOL(xpfo_kmap);
> +
> +void xpfo_kunmap(void *kaddr, struct page *page)
> +{
> +	struct page_ext *page_ext;
> +	unsigned long flags;
> +
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return;
> +
> +	page_ext = lookup_page_ext(page);
> +
> +	/*
> +	 * The page was allocated before page_ext was initialized (which means
> +	 * it's a kernel page) or it's allocated to the kernel, so nothing to
> +	 * do.
> +	 */
> +	if (!page_ext->inited ||
> +	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> +		return;
> +
> +	spin_lock_irqsave(&page_ext->maplock, flags);
> +
> +	/*
> +	 * The page is to be allocated back to user space, so unmap it from the
> +	 * kernel, flush the TLB and tag it as a user page.
> +	 */
> +	if (atomic_dec_return(&page_ext->mapcount) == 0) {
> +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> +		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
> +		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
> +		__flush_tlb_one((unsigned long)kaddr);

Again __flush_tlb_one() is x86-specific.
flush_tlb_kernel_range() instead?

Thanks,
-Takahiro AKASHI

> +	}
> +
> +	spin_unlock_irqrestore(&page_ext->maplock, flags);
> +}
> +EXPORT_SYMBOL(xpfo_kunmap);
> +
> +inline bool xpfo_page_is_unmapped(struct page *page)
> +{
> +	if (!static_branch_unlikely(&xpfo_inited))
> +		return false;
> +
> +	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
> +}
> +EXPORT_SYMBOL(xpfo_page_is_unmapped);
> diff --git a/security/Kconfig b/security/Kconfig
> index 118f4549404e..4502e15c8419 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -6,6 +6,25 @@ menu "Security options"
>  
>  source security/keys/Kconfig
>  
> +config ARCH_SUPPORTS_XPFO
> +	bool
> +
> +config XPFO
> +	bool "Enable eXclusive Page Frame Ownership (XPFO)"
> +	default n
> +	depends on ARCH_SUPPORTS_XPFO
> +	select PAGE_EXTENSION
> +	help
> +	  This option offers protection against 'ret2dir' kernel attacks.
> +	  When enabled, every time a page frame is allocated to user space, it
> +	  is unmapped from the direct mapped RAM region in kernel space
> +	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
> +	  mapped back to physmap.
> +
> +	  There is a slight performance impact when this option is enabled.
> +
> +	  If in doubt, say "N".
> +
>  config SECURITY_DMESG_RESTRICT
>  	bool "Restrict unprivileged access to the kernel syslog"
>  	default n
> -- 
> 2.10.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-24 10:56         ` AKASHI Takahiro
@ 2016-11-28 11:15           ` Juerg Haefliger
  2016-12-09  9:02           ` AKASHI Takahiro
  1 sibling, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-11-28 11:15 UTC (permalink / raw)
  To: AKASHI Takahiro, linux-kernel, linux-mm, kernel-hardening,
	linux-x86_64, vpk


[-- Attachment #1.1: Type: text/plain, Size: 2003 bytes --]

On 11/24/2016 11:56 AM, AKASHI Takahiro wrote:
> Hi,
> 
> I'm trying to give it a spin on arm64, but ...

Thanks for trying this.


>> +/*
>> + * Update a single kernel page table entry
>> + */
>> +static inline void set_kpte(struct page *page, unsigned long kaddr,
>> +			    pgprot_t prot) {
>> +	unsigned int level;
>> +	pte_t *kpte = lookup_address(kaddr, &level);
>> +
>> +	/* We only support 4k pages for now */
>> +	BUG_ON(!kpte || level != PG_LEVEL_4K);
>> +
>> +	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
>> +}
> 
> As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific,
> would it be better to put the whole definition into arch-specific part?

Well yes but I haven't really looked into splitting up the arch specific stuff.


>> +		/*
>> +		 * Map the page back into the kernel if it was previously
>> +		 * allocated to user space.
>> +		 */
>> +		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
>> +				       &page_ext->flags)) {
>> +			kaddr = (unsigned long)page_address(page + i);
>> +			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
> 
> Why not PAGE_KERNEL?

Good catch, thanks!


>> +	/*
>> +	 * The page is to be allocated back to user space, so unmap it from the
>> +	 * kernel, flush the TLB and tag it as a user page.
>> +	 */
>> +	if (atomic_dec_return(&page_ext->mapcount) == 0) {
>> +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
>> +		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
>> +		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
>> +		__flush_tlb_one((unsigned long)kaddr);
> 
> Again __flush_tlb_one() is x86-specific.
> flush_tlb_kernel_range() instead?

I'll take a look. If you can tell me what the relevant arm64 equivalents are for the arch-specific
functions, that would help tremendously.

Thanks for the comments!

...Juerg



> Thanks,
> -Takahiro AKASHI


-- 
Juerg Haefliger
Hewlett Packard Enterprise


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO)
  2016-11-24 10:56         ` AKASHI Takahiro
  2016-11-28 11:15           ` Juerg Haefliger
@ 2016-12-09  9:02           ` AKASHI Takahiro
  1 sibling, 0 replies; 30+ messages in thread
From: AKASHI Takahiro @ 2016-12-09  9:02 UTC (permalink / raw)
  To: Juerg Haefliger, linux-kernel, linux-mm, kernel-hardening,
	linux-x86_64, vpk

On Thu, Nov 24, 2016 at 07:56:30PM +0900, AKASHI Takahiro wrote:
> Hi,
> 
> I'm trying to give it a spin on arm64, but ...

In my experiment on hikey,
the kernel boot failed, catching a page fault around cache operations,
(a) __clean_dcache_area_pou() on 4KB-page kernel, 
(b) __inval_cache_range() on 64KB-page kernel,
(See more details for backtrace below.)

This is because, on arm64, cache operations are by VA (in particular,
of direct/linear mapping of physical memory). So I think that 
naively unmapping a page from physmap in xpfo_kunmap() won't work well
on arm64.

-Takahiro AKASHI

case (a)
--------
Unable to handle kernel paging request at virtual address ffff800000cba000
pgd = ffff80003ba8c000
*pgd=0000000000000000
task: ffff80003be38000 task.stack: ffff80003be40000
PC is at __clean_dcache_area_pou+0x20/0x38
LR is at sync_icache_aliases+0x2c/0x40
 ...
Call trace:
 ...
__clean_dcache_area_pou+0x20/0x38
__sync_icache_dcache+0x6c/0xa8
alloc_set_pte+0x33c/0x588
filemap_map_pages+0x3a8/0x3b8
handle_mm_fault+0x910/0x1080
do_page_fault+0x2b0/0x358
do_mem_abort+0x44/0xa0
el0_ia+0x18/0x1c

case (b)
--------
Unable to handle kernel paging request at virtual address ffff80002aed0000
pgd = ffff000008f40000
, *pud=000000003dfc0003
, *pmd=000000003dfa0003
, *pte=000000002aed0000
task: ffff800028711900 task.stack: ffff800029020000
PC is at __inval_cache_range+0x3c/0x60
LR is at __swiotlb_map_sg_attrs+0x6c/0x98
 ...

Call trace:
 ...
__inval_cache_range+0x3c/0x60
dw_mci_pre_dma_transfer.isra.7+0xfc/0x190
dw_mci_pre_req+0x50/0x60
mmc_start_req+0x4c/0x420
mmc_blk_issue_rw_rq+0xb0/0x9b8
mmc_blk_issue_rq+0x154/0x518
mmc_queue_thread+0xac/0x158
kthread+0xd0/0xe8
ret_from_fork+0x10/0x20


> 
> On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote:
> > This patch adds support for XPFO which protects against 'ret2dir' kernel
> > attacks. The basic idea is to enforce exclusive ownership of page frames
> > by either the kernel or userspace, unless explicitly requested by the
> > kernel. Whenever a page destined for userspace is allocated, it is
> > unmapped from physmap (the kernel's page table). When such a page is
> > reclaimed from userspace, it is mapped back to physmap.
> > 
> > Additional fields in the page_ext struct are used for XPFO housekeeping.
> > Specifically two flags to distinguish user vs. kernel pages and to tag
> > unmapped pages and a reference counter to balance kmap/kunmap operations
> > and a lock to serialize access to the XPFO fields.
> > 
> > Known issues/limitations:
> >   - Only supports x86-64 (for now)
> >   - Only supports 4k pages (for now)
> >   - There are most likely some legitimate uses cases where the kernel needs
> >     to access userspace which need to be made XPFO-aware
> >   - Performance penalty
> > 
> > Reference paper by the original patch authors:
> >   http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
> > 
> > Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
> > Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
> > ---
> >  arch/x86/Kconfig         |   3 +-
> >  arch/x86/mm/init.c       |   2 +-
> >  drivers/ata/libata-sff.c |   4 +-
> >  include/linux/highmem.h  |  15 +++-
> >  include/linux/page_ext.h |   7 ++
> >  include/linux/xpfo.h     |  39 +++++++++
> >  lib/swiotlb.c            |   3 +-
> >  mm/Makefile              |   1 +
> >  mm/page_alloc.c          |   2 +
> >  mm/page_ext.c            |   4 +
> >  mm/xpfo.c                | 206 +++++++++++++++++++++++++++++++++++++++++++++++
> >  security/Kconfig         |  19 +++++
> >  12 files changed, 298 insertions(+), 7 deletions(-)
> >  create mode 100644 include/linux/xpfo.h
> >  create mode 100644 mm/xpfo.c
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index bada636d1065..38b334f8fde5 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -165,6 +165,7 @@ config X86
> >  	select HAVE_STACK_VALIDATION		if X86_64
> >  	select ARCH_USES_HIGH_VMA_FLAGS		if X86_INTEL_MEMORY_PROTECTION_KEYS
> >  	select ARCH_HAS_PKEYS			if X86_INTEL_MEMORY_PROTECTION_KEYS
> > +	select ARCH_SUPPORTS_XPFO		if X86_64
> >  
> >  config INSTRUCTION_DECODER
> >  	def_bool y
> > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT
> >  
> >  config X86_DIRECT_GBPAGES
> >  	def_bool y
> > -	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK
> > +	depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO
> >  	---help---
> >  	  Certain kernel features effectively disable kernel
> >  	  linear 1 GB mappings (even if the CPU otherwise
> > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > index 22af912d66d2..a6fafbae02bb 100644
> > --- a/arch/x86/mm/init.c
> > +++ b/arch/x86/mm/init.c
> > @@ -161,7 +161,7 @@ static int page_size_mask;
> >  
> >  static void __init probe_page_size_mask(void)
> >  {
> > -#if !defined(CONFIG_KMEMCHECK)
> > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO)
> >  	/*
> >  	 * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
> >  	 * use small pages.
> > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> > index 051b6158d1b7..58af734be25d 100644
> > --- a/drivers/ata/libata-sff.c
> > +++ b/drivers/ata/libata-sff.c
> > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
> >  
> >  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
> >  
> > -	if (PageHighMem(page)) {
> > +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
> >  		unsigned long flags;
> >  
> >  		/* FIXME: use a bounce buffer */
> > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
> >  
> >  	DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
> >  
> > -	if (PageHighMem(page)) {
> > +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
> >  		unsigned long flags;
> >  
> >  		/* FIXME: use bounce buffer */
> > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > index bb3f3297062a..7a17c166532f 100644
> > --- a/include/linux/highmem.h
> > +++ b/include/linux/highmem.h
> > @@ -7,6 +7,7 @@
> >  #include <linux/mm.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/hardirq.h>
> > +#include <linux/xpfo.h>
> >  
> >  #include <asm/cacheflush.h>
> >  
> > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
> >  #ifndef ARCH_HAS_KMAP
> >  static inline void *kmap(struct page *page)
> >  {
> > +	void *kaddr;
> > +
> >  	might_sleep();
> > -	return page_address(page);
> > +	kaddr = page_address(page);
> > +	xpfo_kmap(kaddr, page);
> > +	return kaddr;
> >  }
> >  
> >  static inline void kunmap(struct page *page)
> >  {
> > +	xpfo_kunmap(page_address(page), page);
> >  }
> >  
> >  static inline void *kmap_atomic(struct page *page)
> >  {
> > +	void *kaddr;
> > +
> >  	preempt_disable();
> >  	pagefault_disable();
> > -	return page_address(page);
> > +	kaddr = page_address(page);
> > +	xpfo_kmap(kaddr, page);
> > +	return kaddr;
> >  }
> >  #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
> >  
> >  static inline void __kunmap_atomic(void *addr)
> >  {
> > +	xpfo_kunmap(addr, virt_to_page(addr));
> >  	pagefault_enable();
> >  	preempt_enable();
> >  }
> > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> > index 9298c393ddaa..0e451a42e5a3 100644
> > --- a/include/linux/page_ext.h
> > +++ b/include/linux/page_ext.h
> > @@ -29,6 +29,8 @@ enum page_ext_flags {
> >  	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
> >  	PAGE_EXT_DEBUG_GUARD,
> >  	PAGE_EXT_OWNER,
> > +	PAGE_EXT_XPFO_KERNEL,		/* Page is a kernel page */
> > +	PAGE_EXT_XPFO_UNMAPPED,		/* Page is unmapped */
> >  #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
> >  	PAGE_EXT_YOUNG,
> >  	PAGE_EXT_IDLE,
> > @@ -44,6 +46,11 @@ enum page_ext_flags {
> >   */
> >  struct page_ext {
> >  	unsigned long flags;
> > +#ifdef CONFIG_XPFO
> > +	int inited;		/* Map counter and lock initialized */
> > +	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
> > +	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
> > +#endif
> >  };
> >  
> >  extern void pgdat_page_ext_init(struct pglist_data *pgdat);
> > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
> > new file mode 100644
> > index 000000000000..77187578ca33
> > --- /dev/null
> > +++ b/include/linux/xpfo.h
> > @@ -0,0 +1,39 @@
> > +/*
> > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> > + * Copyright (C) 2016 Brown University. All rights reserved.
> > + *
> > + * Authors:
> > + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> > + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published by
> > + * the Free Software Foundation.
> > + */
> > +
> > +#ifndef _LINUX_XPFO_H
> > +#define _LINUX_XPFO_H
> > +
> > +#ifdef CONFIG_XPFO
> > +
> > +extern struct page_ext_operations page_xpfo_ops;
> > +
> > +extern void xpfo_kmap(void *kaddr, struct page *page);
> > +extern void xpfo_kunmap(void *kaddr, struct page *page);
> > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
> > +extern void xpfo_free_page(struct page *page, int order);
> > +
> > +extern bool xpfo_page_is_unmapped(struct page *page);
> > +
> > +#else /* !CONFIG_XPFO */
> > +
> > +static inline void xpfo_kmap(void *kaddr, struct page *page) { }
> > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
> > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
> > +static inline void xpfo_free_page(struct page *page, int order) { }
> > +
> > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
> > +
> > +#endif /* CONFIG_XPFO */
> > +
> > +#endif /* _LINUX_XPFO_H */
> > diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> > index 22e13a0e19d7..455eff44604e 100644
> > --- a/lib/swiotlb.c
> > +++ b/lib/swiotlb.c
> > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
> >  {
> >  	unsigned long pfn = PFN_DOWN(orig_addr);
> >  	unsigned char *vaddr = phys_to_virt(tlb_addr);
> > +	struct page *page = pfn_to_page(pfn);
> >  
> > -	if (PageHighMem(pfn_to_page(pfn))) {
> > +	if (PageHighMem(page) || xpfo_page_is_unmapped(page)) {
> >  		/* The buffer does not have a mapping.  Map it in and copy */
> >  		unsigned int offset = orig_addr & ~PAGE_MASK;
> >  		char *buffer;
> > diff --git a/mm/Makefile b/mm/Makefile
> > index 295bd7a9f76b..175680f516aa 100644
> > --- a/mm/Makefile
> > +++ b/mm/Makefile
> > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> >  obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> >  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> >  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> > +obj-$(CONFIG_XPFO) += xpfo.o
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 8fd42aa7c4bd..100e80e008e2 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
> >  	kernel_poison_pages(page, 1 << order, 0);
> >  	kernel_map_pages(page, 1 << order, 0);
> >  	kasan_free_pages(page, order);
> > +	xpfo_free_page(page, order);
> >  
> >  	return true;
> >  }
> > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
> >  	kernel_map_pages(page, 1 << order, 1);
> >  	kernel_poison_pages(page, 1 << order, 1);
> >  	kasan_alloc_pages(page, order);
> > +	xpfo_alloc_page(page, order, gfp_flags);
> >  	set_page_owner(page, order, gfp_flags);
> >  }
> >  
> > diff --git a/mm/page_ext.c b/mm/page_ext.c
> > index 121dcffc4ec1..ba6dbcacc2db 100644
> > --- a/mm/page_ext.c
> > +++ b/mm/page_ext.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/kmemleak.h>
> >  #include <linux/page_owner.h>
> >  #include <linux/page_idle.h>
> > +#include <linux/xpfo.h>
> >  
> >  /*
> >   * struct page extension
> > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = {
> >  #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
> >  	&page_idle_ops,
> >  #endif
> > +#ifdef CONFIG_XPFO
> > +	&page_xpfo_ops,
> > +#endif
> >  };
> >  
> >  static unsigned long total_usage;
> > diff --git a/mm/xpfo.c b/mm/xpfo.c
> > new file mode 100644
> > index 000000000000..8e3a6a694b6a
> > --- /dev/null
> > +++ b/mm/xpfo.c
> > @@ -0,0 +1,206 @@
> > +/*
> > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P.
> > + * Copyright (C) 2016 Brown University. All rights reserved.
> > + *
> > + * Authors:
> > + *   Juerg Haefliger <juerg.haefliger@hpe.com>
> > + *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published by
> > + * the Free Software Foundation.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/module.h>
> > +#include <linux/page_ext.h>
> > +#include <linux/xpfo.h>
> > +
> > +#include <asm/tlbflush.h>
> > +
> > +DEFINE_STATIC_KEY_FALSE(xpfo_inited);
> > +
> > +static bool need_xpfo(void)
> > +{
> > +	return true;
> > +}
> > +
> > +static void init_xpfo(void)
> > +{
> > +	printk(KERN_INFO "XPFO enabled\n");
> > +	static_branch_enable(&xpfo_inited);
> > +}
> > +
> > +struct page_ext_operations page_xpfo_ops = {
> > +	.need = need_xpfo,
> > +	.init = init_xpfo,
> > +};
> > +
> > +/*
> > + * Update a single kernel page table entry
> > + */
> > +static inline void set_kpte(struct page *page, unsigned long kaddr,
> > +			    pgprot_t prot) {
> > +	unsigned int level;
> > +	pte_t *kpte = lookup_address(kaddr, &level);
> > +
> > +	/* We only support 4k pages for now */
> > +	BUG_ON(!kpte || level != PG_LEVEL_4K);
> > +
> > +	set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> > +}
> 
> As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific,
> would it be better to put the whole definition into arch-specific part?
> 
> > +
> > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp)
> > +{
> > +	int i, flush_tlb = 0;
> > +	struct page_ext *page_ext;
> > +	unsigned long kaddr;
> > +
> > +	if (!static_branch_unlikely(&xpfo_inited))
> > +		return;
> > +
> > +	for (i = 0; i < (1 << order); i++)  {
> > +		page_ext = lookup_page_ext(page + i);
> > +
> > +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> > +
> > +		/* Initialize the map lock and map counter */
> > +		if (!page_ext->inited) {
> > +			spin_lock_init(&page_ext->maplock);
> > +			atomic_set(&page_ext->mapcount, 0);
> > +			page_ext->inited = 1;
> > +		}
> > +		BUG_ON(atomic_read(&page_ext->mapcount));
> > +
> > +		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
> > +			/*
> > +			 * Flush the TLB if the page was previously allocated
> > +			 * to the kernel.
> > +			 */
> > +			if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL,
> > +					       &page_ext->flags))
> > +				flush_tlb = 1;
> > +		} else {
> > +			/* Tag the page as a kernel page */
> > +			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> > +		}
> > +	}
> > +
> > +	if (flush_tlb) {
> > +		kaddr = (unsigned long)page_address(page);
> > +		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
> > +				       PAGE_SIZE);
> > +	}
> > +}
> > +
> > +void xpfo_free_page(struct page *page, int order)
> > +{
> > +	int i;
> > +	struct page_ext *page_ext;
> > +	unsigned long kaddr;
> > +
> > +	if (!static_branch_unlikely(&xpfo_inited))
> > +		return;
> > +
> > +	for (i = 0; i < (1 << order); i++) {
> > +		page_ext = lookup_page_ext(page + i);
> > +
> > +		if (!page_ext->inited) {
> > +			/*
> > +			 * The page was allocated before page_ext was
> > +			 * initialized, so it is a kernel page and it needs to
> > +			 * be tagged accordingly.
> > +			 */
> > +			set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags);
> > +			continue;
> > +		}
> > +
> > +		/*
> > +		 * Map the page back into the kernel if it was previously
> > +		 * allocated to user space.
> > +		 */
> > +		if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED,
> > +				       &page_ext->flags)) {
> > +			kaddr = (unsigned long)page_address(page + i);
> > +			set_kpte(page + i,  kaddr, __pgprot(__PAGE_KERNEL));
> 
> Why not PAGE_KERNEL?
> 
> > +		}
> > +	}
> > +}
> > +
> > +void xpfo_kmap(void *kaddr, struct page *page)
> > +{
> > +	struct page_ext *page_ext;
> > +	unsigned long flags;
> > +
> > +	if (!static_branch_unlikely(&xpfo_inited))
> > +		return;
> > +
> > +	page_ext = lookup_page_ext(page);
> > +
> > +	/*
> > +	 * The page was allocated before page_ext was initialized (which means
> > +	 * it's a kernel page) or it's allocated to the kernel, so nothing to
> > +	 * do.
> > +	 */
> > +	if (!page_ext->inited ||
> > +	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> > +		return;
> > +
> > +	spin_lock_irqsave(&page_ext->maplock, flags);
> > +
> > +	/*
> > +	 * The page was previously allocated to user space, so map it back
> > +	 * into the kernel. No TLB flush required.
> > +	 */
> > +	if ((atomic_inc_return(&page_ext->mapcount) == 1) &&
> > +	    test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags))
> > +		set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL));
> > +
> > +	spin_unlock_irqrestore(&page_ext->maplock, flags);
> > +}
> > +EXPORT_SYMBOL(xpfo_kmap);
> > +
> > +void xpfo_kunmap(void *kaddr, struct page *page)
> > +{
> > +	struct page_ext *page_ext;
> > +	unsigned long flags;
> > +
> > +	if (!static_branch_unlikely(&xpfo_inited))
> > +		return;
> > +
> > +	page_ext = lookup_page_ext(page);
> > +
> > +	/*
> > +	 * The page was allocated before page_ext was initialized (which means
> > +	 * it's a kernel page) or it's allocated to the kernel, so nothing to
> > +	 * do.
> > +	 */
> > +	if (!page_ext->inited ||
> > +	    test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags))
> > +		return;
> > +
> > +	spin_lock_irqsave(&page_ext->maplock, flags);
> > +
> > +	/*
> > +	 * The page is to be allocated back to user space, so unmap it from the
> > +	 * kernel, flush the TLB and tag it as a user page.
> > +	 */
> > +	if (atomic_dec_return(&page_ext->mapcount) == 0) {
> > +		BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags));
> > +		set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags);
> > +		set_kpte(page, (unsigned long)kaddr, __pgprot(0));
> > +		__flush_tlb_one((unsigned long)kaddr);
> 
> Again __flush_tlb_one() is x86-specific.
> flush_tlb_kernel_range() instead?
> 
> Thanks,
> -Takahiro AKASHI
> 
> > +	}
> > +
> > +	spin_unlock_irqrestore(&page_ext->maplock, flags);
> > +}
> > +EXPORT_SYMBOL(xpfo_kunmap);
> > +
> > +inline bool xpfo_page_is_unmapped(struct page *page)
> > +{
> > +	if (!static_branch_unlikely(&xpfo_inited))
> > +		return false;
> > +
> > +	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
> > +}
> > +EXPORT_SYMBOL(xpfo_page_is_unmapped);
> > diff --git a/security/Kconfig b/security/Kconfig
> > index 118f4549404e..4502e15c8419 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -6,6 +6,25 @@ menu "Security options"
> >  
> >  source security/keys/Kconfig
> >  
> > +config ARCH_SUPPORTS_XPFO
> > +	bool
> > +
> > +config XPFO
> > +	bool "Enable eXclusive Page Frame Ownership (XPFO)"
> > +	default n
> > +	depends on ARCH_SUPPORTS_XPFO
> > +	select PAGE_EXTENSION
> > +	help
> > +	  This option offers protection against 'ret2dir' kernel attacks.
> > +	  When enabled, every time a page frame is allocated to user space, it
> > +	  is unmapped from the direct mapped RAM region in kernel space
> > +	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
> > +	  mapped back to physmap.
> > +
> > +	  There is a slight performance impact when this option is enabled.
> > +
> > +	  If in doubt, say "N".
> > +
> >  config SECURITY_DMESG_RESTRICT
> >  	bool "Restrict unprivileged access to the kernel syslog"
> >  	default n
> > -- 
> > 2.10.1
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache
  2016-11-04 14:45     ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger
  2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
@ 2016-11-04 14:45       ` Juerg Haefliger
  1 sibling, 0 replies; 30+ messages in thread
From: Juerg Haefliger @ 2016-11-04 14:45 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64
  Cc: vpk, juerg.haefliger

Allocating a page to userspace that was previously allocated to the
kernel requires an expensive TLB shootdown. To minimize this, we only
put non-kernel pages into the hot cache to favor their allocation.

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
---
 include/linux/xpfo.h | 2 ++
 mm/page_alloc.c      | 8 +++++++-
 mm/xpfo.c            | 8 ++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
index 77187578ca33..077d1cfadfa2 100644
--- a/include/linux/xpfo.h
+++ b/include/linux/xpfo.h
@@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp);
 extern void xpfo_free_page(struct page *page, int order);
 
 extern bool xpfo_page_is_unmapped(struct page *page);
+extern bool xpfo_page_is_kernel(struct page *page);
 
 #else /* !CONFIG_XPFO */
 
@@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { }
 static inline void xpfo_free_page(struct page *page, int order) { }
 
 static inline bool xpfo_page_is_unmapped(struct page *page) { return false; }
+static inline bool xpfo_page_is_kernel(struct page *page) { return false; }
 
 #endif /* CONFIG_XPFO */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 100e80e008e2..09ef4f7cfd14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2440,7 +2440,13 @@ void free_hot_cold_page(struct page *page, bool cold)
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	if (!cold)
+	/*
+	 * XPFO: Allocating a page to userspace that was previously allocated
+	 * to the kernel requires an expensive TLB shootdown. To minimize this,
+	 * we only put non-kernel pages into the hot cache to favor their
+	 * allocation.
+	 */
+	if (!cold && !xpfo_page_is_kernel(page))
 		list_add(&page->lru, &pcp->lists[migratetype]);
 	else
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
diff --git a/mm/xpfo.c b/mm/xpfo.c
index 8e3a6a694b6a..0e447e38008a 100644
--- a/mm/xpfo.c
+++ b/mm/xpfo.c
@@ -204,3 +204,11 @@ inline bool xpfo_page_is_unmapped(struct page *page)
 	return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags);
 }
 EXPORT_SYMBOL(xpfo_page_is_unmapped);
+
+inline bool xpfo_page_is_kernel(struct page *page)
+{
+	if (!static_branch_unlikely(&xpfo_inited))
+		return false;
+
+	return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags);
+}
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-12-09  9:02 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com>
2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
2016-09-02 20:39     ` [kernel-hardening] " Dave Hansen
2016-09-05 11:54       ` Juerg Haefliger
2016-09-02 11:39   ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
2016-09-14  7:18   ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
2016-09-14  7:18     ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
2016-09-14 14:33       ` Dave Hansen
2016-09-14 14:40         ` Juerg Haefliger
2016-09-14 14:48           ` Dave Hansen
2016-09-21  5:32             ` Juerg Haefliger
2016-09-14  7:19     ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
2016-09-14  7:33       ` [kernel-hardening] " Christoph Hellwig
2016-09-14  7:23     ` [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
2016-09-14  9:36     ` [kernel-hardening] " Mark Rutland
2016-09-14  9:49       ` Mark Rutland
2016-11-04 14:45     ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger
2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
2016-11-04 14:50         ` [kernel-hardening] " Christoph Hellwig
2016-11-10  5:53         ` [kernel-hardening] " ZhaoJunmin Zhao(Junmin)
2016-11-10 19:11         ` [kernel-hardening] " Kees Cook
2016-11-15 11:15           ` Juerg Haefliger
2016-11-10 19:24         ` Kees Cook
2016-11-15 11:18           ` Juerg Haefliger
2016-11-24 10:56         ` AKASHI Takahiro
2016-11-28 11:15           ` Juerg Haefliger
2016-12-09  9:02           ` AKASHI Takahiro
2016-11-04 14:45       ` [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).