* [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) [not found] <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> @ 2016-09-02 11:39 ` Juerg Haefliger 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger @ 2016-09-02 11:39 ` Juerg Haefliger 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/hardirq.h> +#include <linux/xpfo.h> #include <asm/cacheflush.h> @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger <juerg.haefliger@hpe.com> + * Vasileios P. Kemerlis <vpk@cs.brown.edu> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include <linux/kmemleak.h> #include <linux/page_owner.h> #include <linux/page_idle.h> +#include <linux/xpfo.h> /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger <juerg.haefliger@hpe.com> + * Vasileios P. Kemerlis <vpk@cs.brown.edu> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/page_ext.h> +#include <linux/xpfo.h> + +#include <asm/tlbflush.h> + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger @ 2016-09-02 11:39 ` Juerg Haefliger 2016-09-02 20:39 ` [kernel-hardening] " Dave Hansen 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger 3 siblings, 1 reply; 30+ messages in thread From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger @ 2016-09-02 20:39 ` Dave Hansen 2016-09-05 11:54 ` Juerg Haefliger 0 siblings, 1 reply; 30+ messages in thread From: Dave Hansen @ 2016-09-02 20:39 UTC (permalink / raw) To: Juerg Haefliger, linux-kernel, linux-mm, kernel-hardening, linux-x86_64; +Cc: vpk On 09/02/2016 04:39 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-02 20:39 ` [kernel-hardening] " Dave Hansen @ 2016-09-05 11:54 ` Juerg Haefliger 0 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-05 11:54 UTC (permalink / raw) To: Dave Hansen, linux-kernel, linux-mm, kernel-hardening, linux-x86_64; +Cc: vpk [-- Attachment #1.1: Type: text/plain, Size: 989 bytes --] On 09/02/2016 10:39 PM, Dave Hansen wrote: > On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Allocating a page to userspace that was previously allocated to the >> kernel requires an expensive TLB shootdown. To minimize this, we only >> put non-kernel pages into the hot cache to favor their allocation. > > But kernel allocations do allocate from these pools, right? Yes. > Does this > just mean that kernel allocations usually have to pay the penalty to > convert a page? Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. > So, what's the logic here? You're assuming that order-0 kernel > allocations are more rare than allocations for userspace? The logic is to put reclaimed kernel pages into the cold cache to postpone their allocation as long as possible to minimize (potential) TLB flushes. ...Juerg [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger @ 2016-09-02 11:39 ` Juerg Haefliger 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger 3 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-02 11:39 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger ` (2 preceding siblings ...) 2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger @ 2016-09-14 7:18 ` Juerg Haefliger 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger ` (5 more replies) 3 siblings, 6 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-14 7:18 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger @ 2016-09-14 7:18 ` Juerg Haefliger 2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger ` (4 subsequent siblings) 5 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-14 7:18 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/hardirq.h> +#include <linux/xpfo.h> #include <asm/cacheflush.h> @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger <juerg.haefliger@hpe.com> + * Vasileios P. Kemerlis <vpk@cs.brown.edu> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include <linux/kmemleak.h> #include <linux/page_owner.h> #include <linux/page_idle.h> +#include <linux/xpfo.h> /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger <juerg.haefliger@hpe.com> + * Vasileios P. Kemerlis <vpk@cs.brown.edu> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/page_ext.h> +#include <linux/xpfo.h> + +#include <asm/tlbflush.h> + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger @ 2016-09-14 7:19 ` Juerg Haefliger 2016-09-14 14:33 ` Dave Hansen 2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger ` (3 subsequent siblings) 5 siblings, 1 reply; 30+ messages in thread From: Juerg Haefliger @ 2016-09-14 7:19 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger @ 2016-09-14 14:33 ` Dave Hansen 2016-09-14 14:40 ` Juerg Haefliger 0 siblings, 1 reply; 30+ messages in thread From: Dave Hansen @ 2016-09-14 14:33 UTC (permalink / raw) To: kernel-hardening, linux-kernel, linux-mm, linux-x86_64 Cc: juerg.haefliger, vpk On 09/14/2016 12:19 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. Hi, I had some questions about this the last time you posted it. Maybe you want to address them now. -- But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-14 14:33 ` Dave Hansen @ 2016-09-14 14:40 ` Juerg Haefliger 2016-09-14 14:48 ` Dave Hansen 0 siblings, 1 reply; 30+ messages in thread From: Juerg Haefliger @ 2016-09-14 14:40 UTC (permalink / raw) To: Dave Hansen, kernel-hardening, linux-kernel, linux-mm, linux-x86_64; +Cc: vpk [-- Attachment #1.1: Type: text/plain, Size: 817 bytes --] Hi Dave, On 09/14/2016 04:33 PM, Dave Hansen wrote: > On 09/14/2016 12:19 AM, Juerg Haefliger wrote: >> Allocating a page to userspace that was previously allocated to the >> kernel requires an expensive TLB shootdown. To minimize this, we only >> put non-kernel pages into the hot cache to favor their allocation. > > Hi, I had some questions about this the last time you posted it. Maybe > you want to address them now. I did reply: https://lkml.org/lkml/2016/9/5/249 ...Juerg > -- > > But kernel allocations do allocate from these pools, right? Does this > just mean that kernel allocations usually have to pay the penalty to > convert a page? > > So, what's the logic here? You're assuming that order-0 kernel > allocations are more rare than allocations for userspace? > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-14 14:40 ` Juerg Haefliger @ 2016-09-14 14:48 ` Dave Hansen 2016-09-21 5:32 ` Juerg Haefliger 0 siblings, 1 reply; 30+ messages in thread From: Dave Hansen @ 2016-09-14 14:48 UTC (permalink / raw) To: Juerg Haefliger, kernel-hardening, linux-kernel, linux-mm, linux-x86_64; +Cc: vpk > On 09/02/2016 10:39 PM, Dave Hansen wrote: >> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Does this >> just mean that kernel allocations usually have to pay the penalty to >> convert a page? > > Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were > previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. > >> So, what's the logic here? You're assuming that order-0 kernel >> allocations are more rare than allocations for userspace? > > The logic is to put reclaimed kernel pages into the cold cache to > postpone their allocation as long as possible to minimize (potential) > TLB flushes. OK, but if we put them in the cold area but kernel allocations pull them from the hot cache, aren't we virtually guaranteeing that kernel allocations will have to to TLB shootdown to convert a page? It seems like you also need to convert all kernel allocations to pull from the cold area. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache 2016-09-14 14:48 ` Dave Hansen @ 2016-09-21 5:32 ` Juerg Haefliger 0 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-21 5:32 UTC (permalink / raw) To: Dave Hansen, kernel-hardening, linux-kernel, linux-mm, linux-x86_64; +Cc: vpk [-- Attachment #1.1: Type: text/plain, Size: 1411 bytes --] On 09/14/2016 04:48 PM, Dave Hansen wrote: >> On 09/02/2016 10:39 PM, Dave Hansen wrote: >>> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >>> Does this >>> just mean that kernel allocations usually have to pay the penalty to >>> convert a page? >> >> Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were >> previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. >> >>> So, what's the logic here? You're assuming that order-0 kernel >>> allocations are more rare than allocations for userspace? >> >> The logic is to put reclaimed kernel pages into the cold cache to >> postpone their allocation as long as possible to minimize (potential) >> TLB flushes. > > OK, but if we put them in the cold area but kernel allocations pull them > from the hot cache, aren't we virtually guaranteeing that kernel > allocations will have to to TLB shootdown to convert a page? No. Allocations for the kernel never require a TLB shootdown. Only allocations for userspace (and only if the page was previously a kernel page). > It seems like you also need to convert all kernel allocations to pull > from the cold area. Kernel allocations can continue to pull from the hot cache. Maybe introduce another cache for the userspace pages? But I'm not sure what other implications this might have. ...Juerg [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger 2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger @ 2016-09-14 7:19 ` Juerg Haefliger 2016-09-14 7:33 ` [kernel-hardening] " Christoph Hellwig 2016-09-14 7:23 ` [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger ` (2 subsequent siblings) 5 siblings, 1 reply; 30+ messages in thread From: Juerg Haefliger @ 2016-09-14 7:19 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: juerg.haefliger, vpk This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled 2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger @ 2016-09-14 7:33 ` Christoph Hellwig 0 siblings, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2016-09-14 7:33 UTC (permalink / raw) To: Juerg Haefliger Cc: linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk On Wed, Sep 14, 2016 at 09:19:01AM +0200, Juerg Haefliger wrote: > This is a temporary hack to prevent the use of bio_map_user_iov() > which causes XPFO page faults. > > Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> Sorry, but if your scheme doesn't support get_user_pages access to user memory is't a steaming pile of crap and entirely unacceptable. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger ` (2 preceding siblings ...) 2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger @ 2016-09-14 7:23 ` Juerg Haefliger 2016-09-14 9:36 ` [kernel-hardening] " Mark Rutland 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger 5 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-09-14 7:23 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64; +Cc: vpk [-- Attachment #1.1: Type: text/plain, Size: 2720 bytes --] Resending to include the kernel-hardening list. Sorry, I wasn't subscribed with the correct email address when I sent this the first time. ...Juerg On 09/14/2016 09:18 AM, Juerg Haefliger wrote: > Changes from: > v1 -> v2: > - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) > arch-agnostic. > - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO > for x86. > - Use page_ext for the additional per-page data. > - Removed the clearing of pages. This can be accomplished by using > PAGE_POISONING. > - Split up the patch into multiple patches. > - Fixed additional issues identified by reviewers. > > This patch series adds support for XPFO which protects against 'ret2dir' > kernel attacks. The basic idea is to enforce exclusive ownership of page > frames by either the kernel or userspace, unless explicitly requested by > the kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Juerg Haefliger (3): > Add support for eXclusive Page Frame Ownership (XPFO) > xpfo: Only put previous userspace pages into the hot cache > block: Always use a bounce buffer when XPFO is enabled > > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > block/blk-map.c | 2 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 41 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 10 ++- > mm/page_ext.c | 4 + > mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 20 +++++ > 12 files changed, 314 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > -- Juerg Haefliger Hewlett Packard Enterprise [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger ` (3 preceding siblings ...) 2016-09-14 7:23 ` [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger @ 2016-09-14 9:36 ` Mark Rutland 2016-09-14 9:49 ` Mark Rutland 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger 5 siblings, 1 reply; 30+ messages in thread From: Mark Rutland @ 2016-09-14 9:36 UTC (permalink / raw) To: kernel-hardening Cc: linux-kernel, linux-mm, linux-x86_64, juerg.haefliger, vpk Hi, On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote: > This patch series adds support for XPFO which protects against 'ret2dir' > kernel attacks. The basic idea is to enforce exclusive ownership of page > frames by either the kernel or userspace, unless explicitly requested by > the kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Just to check, doesn't DEBUG_RODATA ensure that the linear mapping is non-executable on x86_64 (as it does for arm64)? For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so). Assuming that implies a lack of execute permission for x86_64, that should provide a similar level of protection against erroneously branching to addresses in the linear map, without the complexity and overhead of mapping/unmapping pages. So to me it looks like this approach may only be useful for architectures without page-granular execute permission controls. Is this also intended to protect against erroneous *data* accesses to the linear map? Am I missing something? Thanks, Mark. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-14 9:36 ` [kernel-hardening] " Mark Rutland @ 2016-09-14 9:49 ` Mark Rutland 0 siblings, 0 replies; 30+ messages in thread From: Mark Rutland @ 2016-09-14 9:49 UTC (permalink / raw) To: kernel-hardening Cc: linux-kernel, linux-mm, linux-x86_64, juerg.haefliger, vpk On Wed, Sep 14, 2016 at 10:36:34AM +0100, Mark Rutland wrote: > On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote: > > This patch series adds support for XPFO which protects against 'ret2dir' > > kernel attacks. The basic idea is to enforce exclusive ownership of page > > frames by either the kernel or userspace, unless explicitly requested by > > the kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so). > Assuming that implies a lack of execute permission for x86_64, that > should provide a similar level of protection against erroneously > branching to addresses in the linear map, without the complexity and > overhead of mapping/unmapping pages. > > So to me it looks like this approach may only be useful for > architectures without page-granular execute permission controls. > > Is this also intended to protect against erroneous *data* accesses to > the linear map? Now that I read the paper more carefully, I can see that this is the case, and this does catch issues which DEBUG_RODATA cannot. Apologies for the noise. Thanks, Mark. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v3 0/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger ` (4 preceding siblings ...) 2016-09-14 9:36 ` [kernel-hardening] " Mark Rutland @ 2016-11-04 14:45 ` Juerg Haefliger 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger 5 siblings, 2 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-11-04 14:45 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: vpk, juerg.haefliger Changes from: v2 -> v3: - Removed 'depends on DEBUG_KERNEL' and 'select DEBUG_TLBFLUSH'. These are left-overs from the original patch and are not required. - Make libata XPFO-aware, i.e., properly handle pages that were unmapped by XPFO. This takes care of the temporary hack in v2 that forced the use of a bounce buffer in block/blk-map.c. v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (removed from the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (2): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 315 insertions(+), 8 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.10.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger @ 2016-11-04 14:45 ` Juerg Haefliger 2016-11-04 14:50 ` [kernel-hardening] " Christoph Hellwig ` (4 more replies) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger 1 sibling, 5 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-11-04 14:45 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: vpk, juerg.haefliger This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 298 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index bada636d1065..38b334f8fde5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 22af912d66d2..a6fafbae02bb 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 051b6158d1b7..58af734be25d 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use a bounce buffer */ @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use bounce buffer */ diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/hardirq.h> +#include <linux/xpfo.h> #include <asm/cacheflush.h> @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 9298c393ddaa..0e451a42e5a3 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -29,6 +29,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -44,6 +46,11 @@ enum page_ext_flags { */ struct page_ext { unsigned long flags; +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger <juerg.haefliger@hpe.com> + * Vasileios P. Kemerlis <vpk@cs.brown.edu> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 295bd7a9f76b..175680f516aa 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fd42aa7c4bd..100e80e008e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 121dcffc4ec1..ba6dbcacc2db 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include <linux/kmemleak.h> #include <linux/page_owner.h> #include <linux/page_idle.h> +#include <linux/xpfo.h> /* * struct page extension @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..8e3a6a694b6a --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,206 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger <juerg.haefliger@hpe.com> + * Vasileios P. Kemerlis <vpk@cs.brown.edu> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/page_ext.h> +#include <linux/xpfo.h> + +#include <asm/tlbflush.h> + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} +EXPORT_SYMBOL(xpfo_page_is_unmapped); diff --git a/security/Kconfig b/security/Kconfig index 118f4549404e..4502e15c8419 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,25 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on ARCH_SUPPORTS_XPFO + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.10.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger @ 2016-11-04 14:50 ` Christoph Hellwig 2016-11-10 5:53 ` [kernel-hardening] " ZhaoJunmin Zhao(Junmin) ` (3 subsequent siblings) 4 siblings, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2016-11-04 14:50 UTC (permalink / raw) To: Juerg Haefliger Cc: linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk, Tejun Heo, linux-ide The libata parts here really need to be split out and the proper list and maintainer need to be Cc'ed. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h This is just piling one nasty hack on top of another. libata should just use the highmem case unconditionally, as it is the correct thing to do for all cases. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger 2016-11-04 14:50 ` [kernel-hardening] " Christoph Hellwig @ 2016-11-10 5:53 ` ZhaoJunmin Zhao(Junmin) 2016-11-10 19:11 ` [kernel-hardening] " Kees Cook ` (2 subsequent siblings) 4 siblings, 0 replies; 30+ messages in thread From: ZhaoJunmin Zhao(Junmin) @ 2016-11-10 5:53 UTC (permalink / raw) To: kernel-hardening, linux-kernel, linux-mm, linux-x86_64 Cc: vpk, juerg.haefliger > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> > Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include <linux/mm.h> > #include <linux/uaccess.h> > #include <linux/hardirq.h> > +#include <linux/xpfo.h> > > #include <asm/cacheflush.h> > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger <juerg.haefliger@hpe.com> > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include <linux/kmemleak.h> > #include <linux/page_owner.h> > #include <linux/page_idle.h> > +#include <linux/xpfo.h> > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger <juerg.haefliger@hpe.com> > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include <linux/mm.h> > +#include <linux/module.h> > +#include <linux/page_ext.h> > +#include <linux/xpfo.h> > + > +#include <asm/tlbflush.h> > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > When a physical page is assigned to a process in user space, it should be unmaped from kernel physmap. From the code, I can see the patch only handle the page in high memory zone. if the kernel use the high memory zone, it will call the kmap. So I would like to know if the physical page is coming from normal zone,how to handle it. Thanks Zhaojunmin ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger 2016-11-04 14:50 ` [kernel-hardening] " Christoph Hellwig 2016-11-10 5:53 ` [kernel-hardening] " ZhaoJunmin Zhao(Junmin) @ 2016-11-10 19:11 ` Kees Cook 2016-11-15 11:15 ` Juerg Haefliger 2016-11-10 19:24 ` Kees Cook 2016-11-24 10:56 ` AKASHI Takahiro 4 siblings, 1 reply; 30+ messages in thread From: Kees Cook @ 2016-11-10 19:11 UTC (permalink / raw) To: Juerg Haefliger Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64, vpk On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. Thanks for keeping on this! I'd really like to see it land and then get more architectures to support it. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty In the Kconfig you say "slight", but I'm curious what kinds of benchmarks you've done and if there's a more specific cost we can declare, just to give people more of an idea what the hit looks like? (What workloads would trigger a lot of XPFO unmapping, for example?) Thanks! -Kees -- Kees Cook Nexus Security ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-10 19:11 ` [kernel-hardening] " Kees Cook @ 2016-11-15 11:15 ` Juerg Haefliger 0 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-11-15 11:15 UTC (permalink / raw) To: Kees Cook Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64, vpk [-- Attachment #1.1: Type: text/plain, Size: 2123 bytes --] Sorry for the late reply, I just found your email in my cluttered inbox. On 11/10/2016 08:11 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kernel >> attacks. The basic idea is to enforce exclusive ownership of page frames >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeeping. >> Specifically two flags to distinguish user vs. kernel pages and to tag >> unmapped pages and a reference counter to balance kmap/kunmap operations >> and a lock to serialize access to the XPFO fields. > > Thanks for keeping on this! I'd really like to see it land and then > get more architectures to support it. Good to hear :-) >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty > > In the Kconfig you say "slight", but I'm curious what kinds of > benchmarks you've done and if there's a more specific cost we can > declare, just to give people more of an idea what the hit looks like? > (What workloads would trigger a lot of XPFO unmapping, for example?) That 'slight' wording is based on the performance numbers published in the referenced paper. So far I've only run kernel compilation tests. For that workload, the big performance hit comes from disabling >4k page sizes (around 10%). Adding XPFO on top causes 'only' another 0.5% performance penalty. I'm currently looking into adding support for larger page sizes to see what the real impact is and then generate some more relevant numbers. ...Juerg > Thanks! > > -Kees > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger ` (2 preceding siblings ...) 2016-11-10 19:11 ` [kernel-hardening] " Kees Cook @ 2016-11-10 19:24 ` Kees Cook 2016-11-15 11:18 ` Juerg Haefliger 2016-11-24 10:56 ` AKASHI Takahiro 4 siblings, 1 reply; 30+ messages in thread From: Kees Cook @ 2016-11-10 19:24 UTC (permalink / raw) To: Juerg Haefliger Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64, vpk On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Would it be possible to create an lkdtm test that can exercise this protection? > Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> > Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include <linux/mm.h> > #include <linux/uaccess.h> > #include <linux/hardirq.h> > +#include <linux/xpfo.h> > > #include <asm/cacheflush.h> > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger <juerg.haefliger@hpe.com> > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include <linux/kmemleak.h> > #include <linux/page_owner.h> > #include <linux/page_idle.h> > +#include <linux/xpfo.h> > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger <juerg.haefliger@hpe.com> > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include <linux/mm.h> > +#include <linux/module.h> > +#include <linux/page_ext.h> > +#include <linux/xpfo.h> > + > +#include <asm/tlbflush.h> > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool Can you include a "help" section here to describe what requirements an architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and HAVE_ARCH_VMAP_STACK or some examples. > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > I've added these patches to my kspp tree on kernel.org, so it should get some 0-day testing now... Thanks! -Kees -- Kees Cook Nexus Security ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-10 19:24 ` Kees Cook @ 2016-11-15 11:18 ` Juerg Haefliger 0 siblings, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-11-15 11:18 UTC (permalink / raw) To: Kees Cook Cc: LKML, Linux-MM, kernel-hardening@lists.openwall.com, linux-x86_64, vpk [-- Attachment #1.1: Type: text/plain, Size: 2873 bytes --] On 11/10/2016 08:24 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger <juerg.haefliger@hpe.com> wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kernel >> attacks. The basic idea is to enforce exclusive ownership of page frames >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeeping. >> Specifically two flags to distinguish user vs. kernel pages and to tag >> unmapped pages and a reference counter to balance kmap/kunmap operations >> and a lock to serialize access to the XPFO fields. >> >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Would it be possible to create an lkdtm test that can exercise this protection? I'll look into it. >> diff --git a/security/Kconfig b/security/Kconfig >> index 118f4549404e..4502e15c8419 100644 >> --- a/security/Kconfig >> +++ b/security/Kconfig >> @@ -6,6 +6,25 @@ menu "Security options" >> >> source security/keys/Kconfig >> >> +config ARCH_SUPPORTS_XPFO >> + bool > > Can you include a "help" section here to describe what requirements an > architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and > HAVE_ARCH_VMAP_STACK or some examples. Will do. >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on ARCH_SUPPORTS_XPFO >> + select PAGE_EXTENSION >> + help >> + This option offers protection against 'ret2dir' kernel attacks. >> + When enabled, every time a page frame is allocated to user space, it >> + is unmapped from the direct mapped RAM region in kernel space >> + (physmap). Similarly, when a page frame is freed/reclaimed, it is >> + mapped back to physmap. >> + >> + There is a slight performance impact when this option is enabled. >> + >> + If in doubt, say "N". >> + >> config SECURITY_DMESG_RESTRICT >> bool "Restrict unprivileged access to the kernel syslog" >> default n > > I've added these patches to my kspp tree on kernel.org, so it should > get some 0-day testing now... Very good. Thanks! > Thanks! Appreciate the feedback. ...Juerg > -Kees > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger ` (3 preceding siblings ...) 2016-11-10 19:24 ` Kees Cook @ 2016-11-24 10:56 ` AKASHI Takahiro 2016-11-28 11:15 ` Juerg Haefliger 2016-12-09 9:02 ` AKASHI Takahiro 4 siblings, 2 replies; 30+ messages in thread From: AKASHI Takahiro @ 2016-11-24 10:56 UTC (permalink / raw) To: Juerg Haefliger Cc: linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk Hi, I'm trying to give it a spin on arm64, but ... On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> > Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include <linux/mm.h> > #include <linux/uaccess.h> > #include <linux/hardirq.h> > +#include <linux/xpfo.h> > > #include <asm/cacheflush.h> > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger <juerg.haefliger@hpe.com> > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include <linux/kmemleak.h> > #include <linux/page_owner.h> > #include <linux/page_idle.h> > +#include <linux/xpfo.h> > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger <juerg.haefliger@hpe.com> > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include <linux/mm.h> > +#include <linux/module.h> > +#include <linux/page_ext.h> > +#include <linux/xpfo.h> > + > +#include <asm/tlbflush.h> > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, would it be better to put the whole definition into arch-specific part? > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); Why not PAGE_KERNEL? > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); Again __flush_tlb_one() is x86-specific. flush_tlb_kernel_range() instead? Thanks, -Takahiro AKASHI > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-24 10:56 ` AKASHI Takahiro @ 2016-11-28 11:15 ` Juerg Haefliger 2016-12-09 9:02 ` AKASHI Takahiro 1 sibling, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-11-28 11:15 UTC (permalink / raw) To: AKASHI Takahiro, linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk [-- Attachment #1.1: Type: text/plain, Size: 2003 bytes --] On 11/24/2016 11:56 AM, AKASHI Takahiro wrote: > Hi, > > I'm trying to give it a spin on arm64, but ... Thanks for trying this. >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte = lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level != PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); >> +} > > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, > would it be better to put the whole definition into arch-specific part? Well yes but I haven't really looked into splitting up the arch specific stuff. >> + /* >> + * Map the page back into the kernel if it was previously >> + * allocated to user space. >> + */ >> + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, >> + &page_ext->flags)) { >> + kaddr = (unsigned long)page_address(page + i); >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > > Why not PAGE_KERNEL? Good catch, thanks! >> + /* >> + * The page is to be allocated back to user space, so unmap it from the >> + * kernel, flush the TLB and tag it as a user page. >> + */ >> + if (atomic_dec_return(&page_ext->mapcount) == 0) { >> + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); >> + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); > > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? I'll take a look. If you can tell me what the relevant arm64 equivalents are for the arch-specific functions, that would help tremendously. Thanks for the comments! ...Juerg > Thanks, > -Takahiro AKASHI -- Juerg Haefliger Hewlett Packard Enterprise [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) 2016-11-24 10:56 ` AKASHI Takahiro 2016-11-28 11:15 ` Juerg Haefliger @ 2016-12-09 9:02 ` AKASHI Takahiro 1 sibling, 0 replies; 30+ messages in thread From: AKASHI Takahiro @ 2016-12-09 9:02 UTC (permalink / raw) To: Juerg Haefliger, linux-kernel, linux-mm, kernel-hardening, linux-x86_64, vpk On Thu, Nov 24, 2016 at 07:56:30PM +0900, AKASHI Takahiro wrote: > Hi, > > I'm trying to give it a spin on arm64, but ... In my experiment on hikey, the kernel boot failed, catching a page fault around cache operations, (a) __clean_dcache_area_pou() on 4KB-page kernel, (b) __inval_cache_range() on 64KB-page kernel, (See more details for backtrace below.) This is because, on arm64, cache operations are by VA (in particular, of direct/linear mapping of physical memory). So I think that naively unmapping a page from physmap in xpfo_kunmap() won't work well on arm64. -Takahiro AKASHI case (a) -------- Unable to handle kernel paging request at virtual address ffff800000cba000 pgd = ffff80003ba8c000 *pgd=0000000000000000 task: ffff80003be38000 task.stack: ffff80003be40000 PC is at __clean_dcache_area_pou+0x20/0x38 LR is at sync_icache_aliases+0x2c/0x40 ... Call trace: ... __clean_dcache_area_pou+0x20/0x38 __sync_icache_dcache+0x6c/0xa8 alloc_set_pte+0x33c/0x588 filemap_map_pages+0x3a8/0x3b8 handle_mm_fault+0x910/0x1080 do_page_fault+0x2b0/0x358 do_mem_abort+0x44/0xa0 el0_ia+0x18/0x1c case (b) -------- Unable to handle kernel paging request at virtual address ffff80002aed0000 pgd = ffff000008f40000 , *pud=000000003dfc0003 , *pmd=000000003dfa0003 , *pte=000000002aed0000 task: ffff800028711900 task.stack: ffff800029020000 PC is at __inval_cache_range+0x3c/0x60 LR is at __swiotlb_map_sg_attrs+0x6c/0x98 ... Call trace: ... __inval_cache_range+0x3c/0x60 dw_mci_pre_dma_transfer.isra.7+0xfc/0x190 dw_mci_pre_req+0x50/0x60 mmc_start_req+0x4c/0x420 mmc_blk_issue_rw_rq+0xb0/0x9b8 mmc_blk_issue_rq+0x154/0x518 mmc_queue_thread+0xac/0x158 kthread+0xd0/0xe8 ret_from_fork+0x10/0x20 > > On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > > This patch adds support for XPFO which protects against 'ret2dir' kernel > > attacks. The basic idea is to enforce exclusive ownership of page frames > > by either the kernel or userspace, unless explicitly requested by the > > kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > > > Additional fields in the page_ext struct are used for XPFO housekeeping. > > Specifically two flags to distinguish user vs. kernel pages and to tag > > unmapped pages and a reference counter to balance kmap/kunmap operations > > and a lock to serialize access to the XPFO fields. > > > > Known issues/limitations: > > - Only supports x86-64 (for now) > > - Only supports 4k pages (for now) > > - There are most likely some legitimate uses cases where the kernel needs > > to access userspace which need to be made XPFO-aware > > - Performance penalty > > > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > > > Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu> > > Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> > > --- > > arch/x86/Kconfig | 3 +- > > arch/x86/mm/init.c | 2 +- > > drivers/ata/libata-sff.c | 4 +- > > include/linux/highmem.h | 15 +++- > > include/linux/page_ext.h | 7 ++ > > include/linux/xpfo.h | 39 +++++++++ > > lib/swiotlb.c | 3 +- > > mm/Makefile | 1 + > > mm/page_alloc.c | 2 + > > mm/page_ext.c | 4 + > > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > > security/Kconfig | 19 +++++ > > 12 files changed, 298 insertions(+), 7 deletions(-) > > create mode 100644 include/linux/xpfo.h > > create mode 100644 mm/xpfo.c > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index bada636d1065..38b334f8fde5 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -165,6 +165,7 @@ config X86 > > select HAVE_STACK_VALIDATION if X86_64 > > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > > + select ARCH_SUPPORTS_XPFO if X86_64 > > > > config INSTRUCTION_DECODER > > def_bool y > > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > > > config X86_DIRECT_GBPAGES > > def_bool y > > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > > ---help--- > > Certain kernel features effectively disable kernel > > linear 1 GB mappings (even if the CPU otherwise > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > index 22af912d66d2..a6fafbae02bb 100644 > > --- a/arch/x86/mm/init.c > > +++ b/arch/x86/mm/init.c > > @@ -161,7 +161,7 @@ static int page_size_mask; > > > > static void __init probe_page_size_mask(void) > > { > > -#if !defined(CONFIG_KMEMCHECK) > > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > > /* > > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > > * use small pages. > > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > > index 051b6158d1b7..58af734be25d 100644 > > --- a/drivers/ata/libata-sff.c > > +++ b/drivers/ata/libata-sff.c > > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use a bounce buffer */ > > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use bounce buffer */ > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index bb3f3297062a..7a17c166532f 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -7,6 +7,7 @@ > > #include <linux/mm.h> > > #include <linux/uaccess.h> > > #include <linux/hardirq.h> > > +#include <linux/xpfo.h> > > > > #include <asm/cacheflush.h> > > > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > > #ifndef ARCH_HAS_KMAP > > static inline void *kmap(struct page *page) > > { > > + void *kaddr; > > + > > might_sleep(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > > > static inline void kunmap(struct page *page) > > { > > + xpfo_kunmap(page_address(page), page); > > } > > > > static inline void *kmap_atomic(struct page *page) > > { > > + void *kaddr; > > + > > preempt_disable(); > > pagefault_disable(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > > > static inline void __kunmap_atomic(void *addr) > > { > > + xpfo_kunmap(addr, virt_to_page(addr)); > > pagefault_enable(); > > preempt_enable(); > > } > > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > > index 9298c393ddaa..0e451a42e5a3 100644 > > --- a/include/linux/page_ext.h > > +++ b/include/linux/page_ext.h > > @@ -29,6 +29,8 @@ enum page_ext_flags { > > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > > PAGE_EXT_DEBUG_GUARD, > > PAGE_EXT_OWNER, > > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > PAGE_EXT_YOUNG, > > PAGE_EXT_IDLE, > > @@ -44,6 +46,11 @@ enum page_ext_flags { > > */ > > struct page_ext { > > unsigned long flags; > > +#ifdef CONFIG_XPFO > > + int inited; /* Map counter and lock initialized */ > > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > > +#endif > > }; > > > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > > new file mode 100644 > > index 000000000000..77187578ca33 > > --- /dev/null > > +++ b/include/linux/xpfo.h > > @@ -0,0 +1,39 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger <juerg.haefliger@hpe.com> > > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#ifndef _LINUX_XPFO_H > > +#define _LINUX_XPFO_H > > + > > +#ifdef CONFIG_XPFO > > + > > +extern struct page_ext_operations page_xpfo_ops; > > + > > +extern void xpfo_kmap(void *kaddr, struct page *page); > > +extern void xpfo_kunmap(void *kaddr, struct page *page); > > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > > +extern void xpfo_free_page(struct page *page, int order); > > + > > +extern bool xpfo_page_is_unmapped(struct page *page); > > + > > +#else /* !CONFIG_XPFO */ > > + > > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > > +static inline void xpfo_free_page(struct page *page, int order) { } > > + > > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > > + > > +#endif /* CONFIG_XPFO */ > > + > > +#endif /* _LINUX_XPFO_H */ > > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > > index 22e13a0e19d7..455eff44604e 100644 > > --- a/lib/swiotlb.c > > +++ b/lib/swiotlb.c > > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > > { > > unsigned long pfn = PFN_DOWN(orig_addr); > > unsigned char *vaddr = phys_to_virt(tlb_addr); > > + struct page *page = pfn_to_page(pfn); > > > > - if (PageHighMem(pfn_to_page(pfn))) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > /* The buffer does not have a mapping. Map it in and copy */ > > unsigned int offset = orig_addr & ~PAGE_MASK; > > char *buffer; > > diff --git a/mm/Makefile b/mm/Makefile > > index 295bd7a9f76b..175680f516aa 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > > +obj-$(CONFIG_XPFO) += xpfo.o > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8fd42aa7c4bd..100e80e008e2 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > > kernel_poison_pages(page, 1 << order, 0); > > kernel_map_pages(page, 1 << order, 0); > > kasan_free_pages(page, order); > > + xpfo_free_page(page, order); > > > > return true; > > } > > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > > kernel_map_pages(page, 1 << order, 1); > > kernel_poison_pages(page, 1 << order, 1); > > kasan_alloc_pages(page, order); > > + xpfo_alloc_page(page, order, gfp_flags); > > set_page_owner(page, order, gfp_flags); > > } > > > > diff --git a/mm/page_ext.c b/mm/page_ext.c > > index 121dcffc4ec1..ba6dbcacc2db 100644 > > --- a/mm/page_ext.c > > +++ b/mm/page_ext.c > > @@ -7,6 +7,7 @@ > > #include <linux/kmemleak.h> > > #include <linux/page_owner.h> > > #include <linux/page_idle.h> > > +#include <linux/xpfo.h> > > > > /* > > * struct page extension > > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > &page_idle_ops, > > #endif > > +#ifdef CONFIG_XPFO > > + &page_xpfo_ops, > > +#endif > > }; > > > > static unsigned long total_usage; > > diff --git a/mm/xpfo.c b/mm/xpfo.c > > new file mode 100644 > > index 000000000000..8e3a6a694b6a > > --- /dev/null > > +++ b/mm/xpfo.c > > @@ -0,0 +1,206 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger <juerg.haefliger@hpe.com> > > + * Vasileios P. Kemerlis <vpk@cs.brown.edu> > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#include <linux/mm.h> > > +#include <linux/module.h> > > +#include <linux/page_ext.h> > > +#include <linux/xpfo.h> > > + > > +#include <asm/tlbflush.h> > > + > > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > > + > > +static bool need_xpfo(void) > > +{ > > + return true; > > +} > > + > > +static void init_xpfo(void) > > +{ > > + printk(KERN_INFO "XPFO enabled\n"); > > + static_branch_enable(&xpfo_inited); > > +} > > + > > +struct page_ext_operations page_xpfo_ops = { > > + .need = need_xpfo, > > + .init = init_xpfo, > > +}; > > + > > +/* > > + * Update a single kernel page table entry > > + */ > > +static inline void set_kpte(struct page *page, unsigned long kaddr, > > + pgprot_t prot) { > > + unsigned int level; > > + pte_t *kpte = lookup_address(kaddr, &level); > > + > > + /* We only support 4k pages for now */ > > + BUG_ON(!kpte || level != PG_LEVEL_4K); > > + > > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > > +} > > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, > would it be better to put the whole definition into arch-specific part? > > > + > > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > > +{ > > + int i, flush_tlb = 0; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + > > + /* Initialize the map lock and map counter */ > > + if (!page_ext->inited) { > > + spin_lock_init(&page_ext->maplock); > > + atomic_set(&page_ext->mapcount, 0); > > + page_ext->inited = 1; > > + } > > + BUG_ON(atomic_read(&page_ext->mapcount)); > > + > > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > > + /* > > + * Flush the TLB if the page was previously allocated > > + * to the kernel. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > > + &page_ext->flags)) > > + flush_tlb = 1; > > + } else { > > + /* Tag the page as a kernel page */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + } > > + } > > + > > + if (flush_tlb) { > > + kaddr = (unsigned long)page_address(page); > > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > > + PAGE_SIZE); > > + } > > +} > > + > > +void xpfo_free_page(struct page *page, int order) > > +{ > > + int i; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + if (!page_ext->inited) { > > + /* > > + * The page was allocated before page_ext was > > + * initialized, so it is a kernel page and it needs to > > + * be tagged accordingly. > > + */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + continue; > > + } > > + > > + /* > > + * Map the page back into the kernel if it was previously > > + * allocated to user space. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > > + &page_ext->flags)) { > > + kaddr = (unsigned long)page_address(page + i); > > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > > Why not PAGE_KERNEL? > > > + } > > + } > > +} > > + > > +void xpfo_kmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page was previously allocated to user space, so map it back > > + * into the kernel. No TLB flush required. > > + */ > > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kmap); > > + > > +void xpfo_kunmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page is to be allocated back to user space, so unmap it from the > > + * kernel, flush the TLB and tag it as a user page. > > + */ > > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > > + __flush_tlb_one((unsigned long)kaddr); > > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? > > Thanks, > -Takahiro AKASHI > > > + } > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kunmap); > > + > > +inline bool xpfo_page_is_unmapped(struct page *page) > > +{ > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return false; > > + > > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > > +} > > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > > diff --git a/security/Kconfig b/security/Kconfig > > index 118f4549404e..4502e15c8419 100644 > > --- a/security/Kconfig > > +++ b/security/Kconfig > > @@ -6,6 +6,25 @@ menu "Security options" > > > > source security/keys/Kconfig > > > > +config ARCH_SUPPORTS_XPFO > > + bool > > + > > +config XPFO > > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > > + default n > > + depends on ARCH_SUPPORTS_XPFO > > + select PAGE_EXTENSION > > + help > > + This option offers protection against 'ret2dir' kernel attacks. > > + When enabled, every time a page frame is allocated to user space, it > > + is unmapped from the direct mapped RAM region in kernel space > > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > > + mapped back to physmap. > > + > > + There is a slight performance impact when this option is enabled. > > + > > + If in doubt, say "N". > > + > > config SECURITY_DMESG_RESTRICT > > bool "Restrict unprivileged access to the kernel syslog" > > default n > > -- > > 2.10.1 > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger 2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger @ 2016-11-04 14:45 ` Juerg Haefliger 1 sibling, 0 replies; 30+ messages in thread From: Juerg Haefliger @ 2016-11-04 14:45 UTC (permalink / raw) To: linux-kernel, linux-mm, kernel-hardening, linux-x86_64 Cc: vpk, juerg.haefliger Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 100e80e008e2..09ef4f7cfd14 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2440,7 +2440,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index 8e3a6a694b6a..0e447e38008a 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -204,3 +204,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } EXPORT_SYMBOL(xpfo_page_is_unmapped); + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.10.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
end of thread, other threads:[~2016-12-09 9:02 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com>
2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
2016-09-02 20:39 ` [kernel-hardening] " Dave Hansen
2016-09-05 11:54 ` Juerg Haefliger
2016-09-02 11:39 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
2016-09-14 7:18 ` [kernel-hardening] [RFC PATCH v2 1/3] " Juerg Haefliger
2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
2016-09-14 14:33 ` Dave Hansen
2016-09-14 14:40 ` Juerg Haefliger
2016-09-14 14:48 ` Dave Hansen
2016-09-21 5:32 ` Juerg Haefliger
2016-09-14 7:19 ` [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Juerg Haefliger
2016-09-14 7:33 ` [kernel-hardening] " Christoph Hellwig
2016-09-14 7:23 ` [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Juerg Haefliger
2016-09-14 9:36 ` [kernel-hardening] " Mark Rutland
2016-09-14 9:49 ` Mark Rutland
2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 0/2] " Juerg Haefliger
2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 1/2] " Juerg Haefliger
2016-11-04 14:50 ` [kernel-hardening] " Christoph Hellwig
2016-11-10 5:53 ` [kernel-hardening] " ZhaoJunmin Zhao(Junmin)
2016-11-10 19:11 ` [kernel-hardening] " Kees Cook
2016-11-15 11:15 ` Juerg Haefliger
2016-11-10 19:24 ` Kees Cook
2016-11-15 11:18 ` Juerg Haefliger
2016-11-24 10:56 ` AKASHI Takahiro
2016-11-28 11:15 ` Juerg Haefliger
2016-12-09 9:02 ` AKASHI Takahiro
2016-11-04 14:45 ` [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Juerg Haefliger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).