From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by kanga.kvack.org (Postfix) with ESMTP id 9B0186B0038 for ; Mon, 14 Dec 2015 13:31:24 -0500 (EST) Received: by pacdm15 with SMTP id dm15so107574801pac.3 for ; Mon, 14 Dec 2015 10:31:24 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id xr9si10411352pab.232.2015.12.14.10.31.23 for ; Mon, 14 Dec 2015 10:31:23 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements Date: Mon, 14 Dec 2015 10:31:12 -0800 Message-Id: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski This applies on top of the earlier vdso pvclock series I sent out. Once that lands in -tip, this will apply to -tip. This series cleans up the hack that is our vvar mapping. We currently initialize the vvar mapping as a special mapping vma backed by nothing whatsoever and then we abuse remap_pfn_range to populate it. This cheats the mm core, probably breaks under various evil madvise workloads, and prevents handling faults in more interesting ways. To clean it up, this series: - Adds a special mapping .fault operation - Adds a vm_insert_pfn_prot helper - Uses the new .fault infrastructure in x86's vdso and vvar mappings - Hardens the HPET mapping, mitigating an HW attack surface that bothers me akpm, can you ack patck 1? Changes from v1: - Lots of changelog clarification requested by akpm - Minor tweaks to style and comments in the first two patches Andy Lutomirski (6): mm: Add a vm_special_mapping .fault method mm: Add vm_insert_pfn_prot x86/vdso: Track each mm's loaded vdso image as well as its base x86,vdso: Use .fault for the vdso text mapping x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping x86/vdso: Disallow vvar access to vclock IO for never-used vclocks arch/x86/entry/vdso/vdso2c.h | 7 -- arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------ arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++- arch/x86/include/asm/clocksource.h | 9 +-- arch/x86/include/asm/mmu.h | 3 +- arch/x86/include/asm/vdso.h | 3 - arch/x86/include/asm/vgtod.h | 6 ++ include/linux/mm.h | 2 + include/linux/mm_types.h | 22 +++++- mm/memory.c | 25 ++++++- mm/mmap.c | 13 ++-- 11 files changed, 151 insertions(+), 72 deletions(-) -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by kanga.kvack.org (Postfix) with ESMTP id 1AC096B0254 for ; Mon, 14 Dec 2015 13:31:26 -0500 (EST) Received: by padhk6 with SMTP id hk6so67660387pad.2 for ; Mon, 14 Dec 2015 10:31:25 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id mj8si10027604pab.50.2015.12.14.10.31.25 for ; Mon, 14 Dec 2015 10:31:25 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 1/6] mm: Add a vm_special_mapping .fault method Date: Mon, 14 Dec 2015 10:31:13 -0800 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Andy Lutomirski From: Andy Lutomirski Requiring special mappings to give a list of struct pages is inflexible: it prevents sane use of IO memory in a special mapping, it's inefficient (it requires arch code to initialize a list of struct pages, and it requires the mm core to walk the entire list just to figure out how long it is), and it prevents arch code from doing anything fancy when a special mapping fault occurs. Add a .fault method as an alternative to filling in a .pages array. Looks-OK-to: Andrew Morton Signed-off-by: Andy Lutomirski --- Notes: Chages from v1: - Fixed "struct vm_special_mapping" code layout (akpm) - s/is// (akpm) include/linux/mm_types.h | 22 +++++++++++++++++++--- mm/mmap.c | 13 +++++++++---- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index f8d1492a114f..c88e48a3c155 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -568,10 +568,26 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm) } #endif -struct vm_special_mapping -{ - const char *name; +struct vm_fault; + +struct vm_special_mapping { + const char *name; /* The name, e.g. "[vdso]". */ + + /* + * If .fault is not provided, this points to a + * NULL-terminated array of pages that back the special mapping. + * + * This must not be NULL unless .fault is provided. + */ struct page **pages; + + /* + * If non-NULL, then this is called to resolve page faults + * on the special mapping. If used, .pages is not checked. + */ + int (*fault)(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, + struct vm_fault *vmf); }; enum tlb_flush_reason { diff --git a/mm/mmap.c b/mm/mmap.c index 2ce04a649f6b..f717453b1a57 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3030,11 +3030,16 @@ static int special_mapping_fault(struct vm_area_struct *vma, pgoff_t pgoff; struct page **pages; - if (vma->vm_ops == &legacy_special_mapping_vmops) + if (vma->vm_ops == &legacy_special_mapping_vmops) { pages = vma->vm_private_data; - else - pages = ((struct vm_special_mapping *)vma->vm_private_data)-> - pages; + } else { + struct vm_special_mapping *sm = vma->vm_private_data; + + if (sm->fault) + return sm->fault(sm, vma, vmf); + + pages = sm->pages; + } for (pgoff = vmf->pgoff; pgoff && *pages; ++pages) pgoff--; -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f182.google.com (mail-pf0-f182.google.com [209.85.192.182]) by kanga.kvack.org (Postfix) with ESMTP id 931A76B0255 for ; Mon, 14 Dec 2015 13:31:27 -0500 (EST) Received: by pfnn128 with SMTP id n128so109535056pfn.0 for ; Mon, 14 Dec 2015 10:31:27 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id 85si18561889pfn.11.2015.12.14.10.31.26 for ; Mon, 14 Dec 2015 10:31:26 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 2/6] mm: Add vm_insert_pfn_prot Date: Mon, 14 Dec 2015 10:31:14 -0800 Message-Id: <946cc8ead6807b4026b88785156797e42e0d652a.1450117783.git.luto@kernel.org> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski The x86 vvar vma conntains pages with differing cacheability flags. x86 currently implements this by manually inserting all the ptes using (io_)remap_pfn_range when the vma is set up. x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the mappings as needed. The correct API to use to insert a pfn in .fault is vm_insert_pfn, but vm_insert_pfn can't override the vma's cache mode, and the HPET page in particular needs to be uncached despite the fact that the rest of the VMA is cached. Add vm_insert_pfn_prot to support varying cacheability within the same non-COW VMA in a more sane manner. x86 could alternatively use multiple VMAs, but that's messy, would break CRIU, and would create unnecessary VMAs that would waste memory. Acked-by: Andrew Morton Signed-off-by: Andy Lutomirski --- Notes: Changes from v1: - Improve the changelog (akpm) include/linux/mm.h | 2 ++ mm/memory.c | 25 +++++++++++++++++++++++-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 00bad7793788..87ef1d7730ba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2080,6 +2080,8 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr, int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); +int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn, pgprot_t pgprot); int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index c387430f06c3..a29f0b90fc56 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1564,8 +1564,29 @@ out: int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn) { + return vm_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot); +} +EXPORT_SYMBOL(vm_insert_pfn); + +/** + * vm_insert_pfn_prot - insert single pfn into user vma with specified pgprot + * @vma: user vma to map to + * @addr: target user address of this page + * @pfn: source kernel pfn + * @pgprot: pgprot flags for the inserted page + * + * This is exactly like vm_insert_pfn, except that it allows drivers to + * to override pgprot on a per-page basis. + * + * This only makes sense for IO mappings, and it makes no sense for + * cow mappings. In general, using multiple vmas is preferable; + * vm_insert_pfn_prot should only be used if using multiple VMAs is + * impractical. + */ +int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn, pgprot_t pgprot) +{ int ret; - pgprot_t pgprot = vma->vm_page_prot; /* * Technically, architectures with pte_special can avoid all these * restrictions (same for remap_pfn_range). However we would like @@ -1587,7 +1608,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, return ret; } -EXPORT_SYMBOL(vm_insert_pfn); +EXPORT_SYMBOL(vm_insert_pfn_prot); int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn) -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f182.google.com (mail-pf0-f182.google.com [209.85.192.182]) by kanga.kvack.org (Postfix) with ESMTP id 76D4E6B0256 for ; Mon, 14 Dec 2015 13:31:28 -0500 (EST) Received: by pff63 with SMTP id 63so15312038pff.2 for ; Mon, 14 Dec 2015 10:31:28 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id v1si18634560pfa.242.2015.12.14.10.31.27 for ; Mon, 14 Dec 2015 10:31:27 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 3/6] x86/vdso: Track each mm's loaded vdso image as well as its base Date: Mon, 14 Dec 2015 10:31:15 -0800 Message-Id: <69bab428e0db14fc1bc1add051d2a294760137dc.1450117783.git.luto@kernel.org> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski As we start to do more intelligent things with the vdso at runtime (as opposed to just at mm initialization time), we'll need to know which vdso is in use. In principle, we could guess based on the mm type, but that's over-complicated and error-prone. Instead, just track it in the mmu context. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vma.c | 1 + arch/x86/include/asm/mmu.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index b8f69e264ac4..80b021067bd6 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -121,6 +121,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) text_start = addr - image->sym_vvar_start; current->mm->context.vdso = (void __user *)text_start; + current->mm->context.vdso_image = image; /* * MAYWRITE to allow gdb to COW and set breakpoints diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 55234d5e7160..1ea0baef1175 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -19,7 +19,8 @@ typedef struct { #endif struct mutex lock; - void __user *vdso; + void __user *vdso; /* vdso base address */ + const struct vdso_image *vdso_image; /* vdso image in use */ atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */ } mm_context_t; -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f180.google.com (mail-pf0-f180.google.com [209.85.192.180]) by kanga.kvack.org (Postfix) with ESMTP id 2A8686B0257 for ; Mon, 14 Dec 2015 13:31:30 -0500 (EST) Received: by pfbo64 with SMTP id o64so30844542pfb.1 for ; Mon, 14 Dec 2015 10:31:29 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id kh7si9847316pab.85.2015.12.14.10.31.29 for ; Mon, 14 Dec 2015 10:31:29 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 4/6] x86,vdso: Use .fault for the vdso text mapping Date: Mon, 14 Dec 2015 10:31:16 -0800 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski The old scheme for mapping the vdso text is rather complicated. vdso2c generates a struct vm_special_mapping and a blank .pages array of the correct size for each vdso image. Init code in vdso/vma.c populates the .pages array for each vdso image, and the mapping code selects the appropriate struct vm_special_mapping. With .fault, we can use a less roundabout approach: vdso_fault just returns the appropriate page for the selected vdso image. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vdso2c.h | 7 ------- arch/x86/entry/vdso/vma.c | 26 +++++++++++++++++++------- arch/x86/include/asm/vdso.h | 3 --- 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h index 0224987556ce..abe961c7c71c 100644 --- a/arch/x86/entry/vdso/vdso2c.h +++ b/arch/x86/entry/vdso/vdso2c.h @@ -150,16 +150,9 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, } fprintf(outfile, "\n};\n\n"); - fprintf(outfile, "static struct page *pages[%lu];\n\n", - mapping_size / 4096); - fprintf(outfile, "const struct vdso_image %s = {\n", name); fprintf(outfile, "\t.data = raw_data,\n"); fprintf(outfile, "\t.size = %lu,\n", mapping_size); - fprintf(outfile, "\t.text_mapping = {\n"); - fprintf(outfile, "\t\t.name = \"[vdso]\",\n"); - fprintf(outfile, "\t\t.pages = pages,\n"); - fprintf(outfile, "\t},\n"); if (alt_sec) { fprintf(outfile, "\t.alt = %lu,\n", (unsigned long)GET_LE(&alt_sec->sh_offset)); diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 80b021067bd6..eb50d7c1f161 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -27,13 +27,7 @@ unsigned int __read_mostly vdso64_enabled = 1; void __init init_vdso_image(const struct vdso_image *image) { - int i; - int npages = (image->size) / PAGE_SIZE; - BUG_ON(image->size % PAGE_SIZE != 0); - for (i = 0; i < npages; i++) - image->text_mapping.pages[i] = - virt_to_page(image->data + i*PAGE_SIZE); apply_alternatives((struct alt_instr *)(image->data + image->alt), (struct alt_instr *)(image->data + image->alt + @@ -90,6 +84,24 @@ static unsigned long vdso_addr(unsigned long start, unsigned len) #endif } +static int vdso_fault(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + const struct vdso_image *image = vma->vm_mm->context.vdso_image; + + if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size) + return VM_FAULT_SIGBUS; + + vmf->page = virt_to_page(image->data + (vmf->pgoff << PAGE_SHIFT)); + get_page(vmf->page); + return 0; +} + +static const struct vm_special_mapping text_mapping = { + .name = "[vdso]", + .fault = vdso_fault, +}; + static int map_vdso(const struct vdso_image *image, bool calculate_addr) { struct mm_struct *mm = current->mm; @@ -131,7 +143,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) image->size, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, - &image->text_mapping); + &text_mapping); if (IS_ERR(vma)) { ret = PTR_ERR(vma); diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index deabaf9759b6..43dc55be524e 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -13,9 +13,6 @@ struct vdso_image { void *data; unsigned long size; /* Always a multiple of PAGE_SIZE */ - /* text_mapping.pages is big enough for data/size page pointers */ - struct vm_special_mapping text_mapping; - unsigned long alt, alt_len; long sym_vvar_start; /* Negative offset to the vvar area */ -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by kanga.kvack.org (Postfix) with ESMTP id 960296B0258 for ; Mon, 14 Dec 2015 13:31:31 -0500 (EST) Received: by pacwq6 with SMTP id wq6so107579331pac.1 for ; Mon, 14 Dec 2015 10:31:31 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id xq4si10424540pab.229.2015.12.14.10.31.30 for ; Mon, 14 Dec 2015 10:31:30 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 5/6] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping Date: Mon, 14 Dec 2015 10:31:17 -0800 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski This is IMO much less ugly, and it also opens the door to disallowing unprivileged userspace HPET access on systems with usable TSCs. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vma.c | 97 ++++++++++++++++++++++++++++------------------- 1 file changed, 57 insertions(+), 40 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index eb50d7c1f161..02221e98b83f 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -102,18 +102,69 @@ static const struct vm_special_mapping text_mapping = { .fault = vdso_fault, }; +static int vvar_fault(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + const struct vdso_image *image = vma->vm_mm->context.vdso_image; + long sym_offset; + int ret = -EFAULT; + + if (!image) + return VM_FAULT_SIGBUS; + + sym_offset = (long)(vmf->pgoff << PAGE_SHIFT) + + image->sym_vvar_start; + + /* + * Sanity check: a symbol offset of zero means that the page + * does not exist for this vdso image, not that the page is at + * offset zero relative to the text mapping. This should be + * impossible here, because sym_offset should only be zero for + * the page past the end of the vvar mapping. + */ + if (sym_offset == 0) + return VM_FAULT_SIGBUS; + + if (sym_offset == image->sym_vvar_page) { + ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, + __pa_symbol(&__vvar_page) >> PAGE_SHIFT); + } else if (sym_offset == image->sym_hpet_page) { +#ifdef CONFIG_HPET_TIMER + if (hpet_address) { + ret = vm_insert_pfn_prot( + vma, + (unsigned long)vmf->virtual_address, + hpet_address >> PAGE_SHIFT, + pgprot_noncached(PAGE_READONLY)); + } +#endif + } else if (sym_offset == image->sym_pvclock_page) { + struct pvclock_vsyscall_time_info *pvti = + pvclock_pvti_cpu0_va(); + if (pvti) { + ret = vm_insert_pfn( + vma, + (unsigned long)vmf->virtual_address, + __pa(pvti) >> PAGE_SHIFT); + } + } + + if (ret == 0) + return VM_FAULT_NOPAGE; + + return VM_FAULT_SIGBUS; +} + static int map_vdso(const struct vdso_image *image, bool calculate_addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long addr, text_start; int ret = 0; - static struct page *no_pages[] = {NULL}; - static struct vm_special_mapping vvar_mapping = { + static const struct vm_special_mapping vvar_mapping = { .name = "[vvar]", - .pages = no_pages, + .fault = vvar_fault, }; - struct pvclock_vsyscall_time_info *pvti; if (calculate_addr) { addr = vdso_addr(current->mm->start_stack, @@ -153,7 +204,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) vma = _install_special_mapping(mm, addr, -image->sym_vvar_start, - VM_READ|VM_MAYREAD, + VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP| + VM_PFNMAP, &vvar_mapping); if (IS_ERR(vma)) { @@ -161,41 +213,6 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) goto up_fail; } - if (image->sym_vvar_page) - ret = remap_pfn_range(vma, - text_start + image->sym_vvar_page, - __pa_symbol(&__vvar_page) >> PAGE_SHIFT, - PAGE_SIZE, - PAGE_READONLY); - - if (ret) - goto up_fail; - -#ifdef CONFIG_HPET_TIMER - if (hpet_address && image->sym_hpet_page) { - ret = io_remap_pfn_range(vma, - text_start + image->sym_hpet_page, - hpet_address >> PAGE_SHIFT, - PAGE_SIZE, - pgprot_noncached(PAGE_READONLY)); - - if (ret) - goto up_fail; - } -#endif - - pvti = pvclock_pvti_cpu0_va(); - if (pvti && image->sym_pvclock_page) { - ret = remap_pfn_range(vma, - text_start + image->sym_pvclock_page, - __pa(pvti) >> PAGE_SHIFT, - PAGE_SIZE, - PAGE_READONLY); - - if (ret) - goto up_fail; - } - up_fail: if (ret) current->mm->context.vdso = NULL; -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by kanga.kvack.org (Postfix) with ESMTP id 428E16B0259 for ; Mon, 14 Dec 2015 13:31:33 -0500 (EST) Received: by pacdm15 with SMTP id dm15so107576602pac.3 for ; Mon, 14 Dec 2015 10:31:33 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTP id rk9si10371378pab.31.2015.12.14.10.31.32 for ; Mon, 14 Dec 2015 10:31:32 -0800 (PST) From: Andy Lutomirski Subject: [PATCH v2 6/6] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks Date: Mon, 14 Dec 2015 10:31:18 -0800 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski It makes me uncomfortable that even modern systems grant every process direct read access to the HPET. While fixing this for real without regressing anything is a mess (unmapping the HPET is tricky because we don't adequately track all the mappings), we can do almost as well by tracking which vclocks have ever been used and only allowing pages associated with used vclocks to be faulted in. This will cause rogue programs that try to peek at the HPET to get SIGBUS instead on most systems. We can't restrict faults to vclock pages that are associated with the currently selected vclock due to a race: a process could start to access the HPET for the first time and race against a switch away from the HPET as the current clocksource. We can't segfault the process trying to peek at the HPET in this case, even though the process isn't going to do anything useful with the data. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vma.c | 4 ++-- arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++++++++- arch/x86/include/asm/clocksource.h | 9 +++++---- arch/x86/include/asm/vgtod.h | 6 ++++++ 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 02221e98b83f..aa4f5b99f1f3 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -130,7 +130,7 @@ static int vvar_fault(const struct vm_special_mapping *sm, __pa_symbol(&__vvar_page) >> PAGE_SHIFT); } else if (sym_offset == image->sym_hpet_page) { #ifdef CONFIG_HPET_TIMER - if (hpet_address) { + if (hpet_address && vclock_was_used(VCLOCK_HPET)) { ret = vm_insert_pfn_prot( vma, (unsigned long)vmf->virtual_address, @@ -141,7 +141,7 @@ static int vvar_fault(const struct vm_special_mapping *sm, } else if (sym_offset == image->sym_pvclock_page) { struct pvclock_vsyscall_time_info *pvti = pvclock_pvti_cpu0_va(); - if (pvti) { + if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) { ret = vm_insert_pfn( vma, (unsigned long)vmf->virtual_address, diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c index 51e330416995..0fb3a104ac62 100644 --- a/arch/x86/entry/vsyscall/vsyscall_gtod.c +++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c @@ -16,6 +16,8 @@ #include #include +int vclocks_used __read_mostly; + DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data); void update_vsyscall_tz(void) @@ -26,12 +28,17 @@ void update_vsyscall_tz(void) void update_vsyscall(struct timekeeper *tk) { + int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode; struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data; + /* Mark the new vclock used. */ + BUILD_BUG_ON(VCLOCK_MAX >= 32); + WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode)); + gtod_write_begin(vdata); /* copy vsyscall data */ - vdata->vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode; + vdata->vclock_mode = vclock_mode; vdata->cycle_last = tk->tkr_mono.cycle_last; vdata->mask = tk->tkr_mono.mask; vdata->mult = tk->tkr_mono.mult; diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h index eda81dc0f4ae..d194266acb28 100644 --- a/arch/x86/include/asm/clocksource.h +++ b/arch/x86/include/asm/clocksource.h @@ -3,10 +3,11 @@ #ifndef _ASM_X86_CLOCKSOURCE_H #define _ASM_X86_CLOCKSOURCE_H -#define VCLOCK_NONE 0 /* No vDSO clock available. */ -#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */ -#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */ -#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */ +#define VCLOCK_NONE 0 /* No vDSO clock available. */ +#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */ +#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */ +#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */ +#define VCLOCK_MAX 3 struct arch_clocksource_data { int vclock_mode; diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h index f556c4843aa1..e728699db774 100644 --- a/arch/x86/include/asm/vgtod.h +++ b/arch/x86/include/asm/vgtod.h @@ -37,6 +37,12 @@ struct vsyscall_gtod_data { }; extern struct vsyscall_gtod_data vsyscall_gtod_data; +extern int vclocks_used; +static inline bool vclock_was_used(int vclock) +{ + return READ_ONCE(vclocks_used) & (1 << vclock); +} + static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s) { unsigned ret; -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f169.google.com (mail-ob0-f169.google.com [209.85.214.169]) by kanga.kvack.org (Postfix) with ESMTP id 787F882F99 for ; Wed, 23 Dec 2015 18:57:12 -0500 (EST) Received: by mail-ob0-f169.google.com with SMTP id bx1so58030211obb.0 for ; Wed, 23 Dec 2015 15:57:12 -0800 (PST) Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com. [2607:f8b0:4003:c06::234]) by mx.google.com with ESMTPS id b202si15212281oig.100.2015.12.23.15.57.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Dec 2015 15:57:11 -0800 (PST) Received: by mail-oi0-x234.google.com with SMTP id o124so131284285oia.1 for ; Wed, 23 Dec 2015 15:57:11 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Wed, 23 Dec 2015 15:56:52 -0800 Message-ID: Subject: Re: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , Oleg Nesterov , Kees Cook Cc: X86 ML , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Borislav Petkov Hi Oleg and Kees- I meant to cc you on this in the first place, but I failed. If you have a few minutes, want to take a peek at these and see if you can poke any holes in them? I'm reasonably confident that they're a considerable improvement over the old state of affairs, but they might still not be perfect. Let me know if you want me to email out a fresh copy. This series applies to tip:x86/asm. --Andy On Mon, Dec 14, 2015 at 10:31 AM, Andy Lutomirski wrote: > This applies on top of the earlier vdso pvclock series I sent out. > Once that lands in -tip, this will apply to -tip. > > This series cleans up the hack that is our vvar mapping. We currently > initialize the vvar mapping as a special mapping vma backed by nothing > whatsoever and then we abuse remap_pfn_range to populate it. > > This cheats the mm core, probably breaks under various evil madvise > workloads, and prevents handling faults in more interesting ways. > > To clean it up, this series: > > - Adds a special mapping .fault operation > - Adds a vm_insert_pfn_prot helper > - Uses the new .fault infrastructure in x86's vdso and vvar mappings > - Hardens the HPET mapping, mitigating an HW attack surface that bothers me > > akpm, can you ack patck 1? > > Changes from v1: > - Lots of changelog clarification requested by akpm > - Minor tweaks to style and comments in the first two patches > > Andy Lutomirski (6): > mm: Add a vm_special_mapping .fault method > mm: Add vm_insert_pfn_prot > x86/vdso: Track each mm's loaded vdso image as well as its base > x86,vdso: Use .fault for the vdso text mapping > x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping > x86/vdso: Disallow vvar access to vclock IO for never-used vclocks > > arch/x86/entry/vdso/vdso2c.h | 7 -- > arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------ > arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++- > arch/x86/include/asm/clocksource.h | 9 +-- > arch/x86/include/asm/mmu.h | 3 +- > arch/x86/include/asm/vdso.h | 3 - > arch/x86/include/asm/vgtod.h | 6 ++ > include/linux/mm.h | 2 + > include/linux/mm_types.h | 22 +++++- > mm/memory.c | 25 ++++++- > mm/mmap.c | 13 ++-- > 11 files changed, 151 insertions(+), 72 deletions(-) > > -- > 2.5.0 > -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) by kanga.kvack.org (Postfix) with ESMTP id EFDAC6B0275 for ; Tue, 29 Dec 2015 08:13:09 -0500 (EST) Received: by mail-oi0-f43.google.com with SMTP id o62so182411705oif.3 for ; Tue, 29 Dec 2015 05:13:09 -0800 (PST) Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com. [2607:f8b0:4003:c01::232]) by mx.google.com with ESMTPS id fo4si25948254obb.104.2015.12.29.05.13.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Dec 2015 05:13:09 -0800 (PST) Received: by mail-ob0-x232.google.com with SMTP id 18so252014490obc.2 for ; Tue, 29 Dec 2015 05:13:09 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Tue, 29 Dec 2015 05:12:49 -0800 Message-ID: Subject: Re: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , Oleg Nesterov , Kees Cook Cc: X86 ML , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Borislav Petkov On Wed, Dec 23, 2015 at 3:56 PM, Andy Lutomirski wrote: > Hi Oleg and Kees- > > I meant to cc you on this in the first place, but I failed. If you > have a few minutes, want to take a peek at these and see if you can > poke any holes in them? I'm reasonably confident that they're a > considerable improvement over the old state of affairs, but they might > still not be perfect. > > Let me know if you want me to email out a fresh copy. This series > applies to tip:x86/asm. Hi -tip people: please don't apply this series. It has a race. I'll send v3. --Andy > > --Andy > > On Mon, Dec 14, 2015 at 10:31 AM, Andy Lutomirski wrote: >> This applies on top of the earlier vdso pvclock series I sent out. >> Once that lands in -tip, this will apply to -tip. >> >> This series cleans up the hack that is our vvar mapping. We currently >> initialize the vvar mapping as a special mapping vma backed by nothing >> whatsoever and then we abuse remap_pfn_range to populate it. >> >> This cheats the mm core, probably breaks under various evil madvise >> workloads, and prevents handling faults in more interesting ways. >> >> To clean it up, this series: >> >> - Adds a special mapping .fault operation >> - Adds a vm_insert_pfn_prot helper >> - Uses the new .fault infrastructure in x86's vdso and vvar mappings >> - Hardens the HPET mapping, mitigating an HW attack surface that bothers me >> >> akpm, can you ack patck 1? >> >> Changes from v1: >> - Lots of changelog clarification requested by akpm >> - Minor tweaks to style and comments in the first two patches >> >> Andy Lutomirski (6): >> mm: Add a vm_special_mapping .fault method >> mm: Add vm_insert_pfn_prot >> x86/vdso: Track each mm's loaded vdso image as well as its base >> x86,vdso: Use .fault for the vdso text mapping >> x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping >> x86/vdso: Disallow vvar access to vclock IO for never-used vclocks >> >> arch/x86/entry/vdso/vdso2c.h | 7 -- >> arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------ >> arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++- >> arch/x86/include/asm/clocksource.h | 9 +-- >> arch/x86/include/asm/mmu.h | 3 +- >> arch/x86/include/asm/vdso.h | 3 - >> arch/x86/include/asm/vgtod.h | 6 ++ >> include/linux/mm.h | 2 + >> include/linux/mm_types.h | 22 +++++- >> mm/memory.c | 25 ++++++- >> mm/mmap.c | 13 ++-- >> 11 files changed, 151 insertions(+), 72 deletions(-) >> >> -- >> 2.5.0 >> > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753487AbbLNSb0 (ORCPT ); Mon, 14 Dec 2015 13:31:26 -0500 Received: from mail.kernel.org ([198.145.29.136]:51925 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752082AbbLNSbX (ORCPT ); Mon, 14 Dec 2015 13:31:23 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski Subject: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements Date: Mon, 14 Dec 2015 10:31:12 -0800 Message-Id: X-Mailer: git-send-email 2.5.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This applies on top of the earlier vdso pvclock series I sent out. Once that lands in -tip, this will apply to -tip. This series cleans up the hack that is our vvar mapping. We currently initialize the vvar mapping as a special mapping vma backed by nothing whatsoever and then we abuse remap_pfn_range to populate it. This cheats the mm core, probably breaks under various evil madvise workloads, and prevents handling faults in more interesting ways. To clean it up, this series: - Adds a special mapping .fault operation - Adds a vm_insert_pfn_prot helper - Uses the new .fault infrastructure in x86's vdso and vvar mappings - Hardens the HPET mapping, mitigating an HW attack surface that bothers me akpm, can you ack patck 1? Changes from v1: - Lots of changelog clarification requested by akpm - Minor tweaks to style and comments in the first two patches Andy Lutomirski (6): mm: Add a vm_special_mapping .fault method mm: Add vm_insert_pfn_prot x86/vdso: Track each mm's loaded vdso image as well as its base x86,vdso: Use .fault for the vdso text mapping x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping x86/vdso: Disallow vvar access to vclock IO for never-used vclocks arch/x86/entry/vdso/vdso2c.h | 7 -- arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------ arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++- arch/x86/include/asm/clocksource.h | 9 +-- arch/x86/include/asm/mmu.h | 3 +- arch/x86/include/asm/vdso.h | 3 - arch/x86/include/asm/vgtod.h | 6 ++ include/linux/mm.h | 2 + include/linux/mm_types.h | 22 +++++- mm/memory.c | 25 ++++++- mm/mmap.c | 13 ++-- 11 files changed, 151 insertions(+), 72 deletions(-) -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932237AbbLNSdl (ORCPT ); Mon, 14 Dec 2015 13:33:41 -0500 Received: from mail.kernel.org ([198.145.29.136]:51939 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753227AbbLNSbZ (ORCPT ); Mon, 14 Dec 2015 13:31:25 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Andy Lutomirski Subject: [PATCH v2 1/6] mm: Add a vm_special_mapping .fault method Date: Mon, 14 Dec 2015 10:31:13 -0800 Message-Id: X-Mailer: git-send-email 2.5.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andy Lutomirski Requiring special mappings to give a list of struct pages is inflexible: it prevents sane use of IO memory in a special mapping, it's inefficient (it requires arch code to initialize a list of struct pages, and it requires the mm core to walk the entire list just to figure out how long it is), and it prevents arch code from doing anything fancy when a special mapping fault occurs. Add a .fault method as an alternative to filling in a .pages array. Looks-OK-to: Andrew Morton Signed-off-by: Andy Lutomirski --- Notes: Chages from v1: - Fixed "struct vm_special_mapping" code layout (akpm) - s/is// (akpm) include/linux/mm_types.h | 22 +++++++++++++++++++--- mm/mmap.c | 13 +++++++++---- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index f8d1492a114f..c88e48a3c155 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -568,10 +568,26 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm) } #endif -struct vm_special_mapping -{ - const char *name; +struct vm_fault; + +struct vm_special_mapping { + const char *name; /* The name, e.g. "[vdso]". */ + + /* + * If .fault is not provided, this points to a + * NULL-terminated array of pages that back the special mapping. + * + * This must not be NULL unless .fault is provided. + */ struct page **pages; + + /* + * If non-NULL, then this is called to resolve page faults + * on the special mapping. If used, .pages is not checked. + */ + int (*fault)(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, + struct vm_fault *vmf); }; enum tlb_flush_reason { diff --git a/mm/mmap.c b/mm/mmap.c index 2ce04a649f6b..f717453b1a57 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3030,11 +3030,16 @@ static int special_mapping_fault(struct vm_area_struct *vma, pgoff_t pgoff; struct page **pages; - if (vma->vm_ops == &legacy_special_mapping_vmops) + if (vma->vm_ops == &legacy_special_mapping_vmops) { pages = vma->vm_private_data; - else - pages = ((struct vm_special_mapping *)vma->vm_private_data)-> - pages; + } else { + struct vm_special_mapping *sm = vma->vm_private_data; + + if (sm->fault) + return sm->fault(sm, vma, vmf); + + pages = sm->pages; + } for (pgoff = vmf->pgoff; pgoff && *pages; ++pages) pgoff--; -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753508AbbLNSb3 (ORCPT ); Mon, 14 Dec 2015 13:31:29 -0500 Received: from mail.kernel.org ([198.145.29.136]:51959 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753490AbbLNSb0 (ORCPT ); Mon, 14 Dec 2015 13:31:26 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski Subject: [PATCH v2 2/6] mm: Add vm_insert_pfn_prot Date: Mon, 14 Dec 2015 10:31:14 -0800 Message-Id: <946cc8ead6807b4026b88785156797e42e0d652a.1450117783.git.luto@kernel.org> X-Mailer: git-send-email 2.5.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The x86 vvar vma conntains pages with differing cacheability flags. x86 currently implements this by manually inserting all the ptes using (io_)remap_pfn_range when the vma is set up. x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the mappings as needed. The correct API to use to insert a pfn in .fault is vm_insert_pfn, but vm_insert_pfn can't override the vma's cache mode, and the HPET page in particular needs to be uncached despite the fact that the rest of the VMA is cached. Add vm_insert_pfn_prot to support varying cacheability within the same non-COW VMA in a more sane manner. x86 could alternatively use multiple VMAs, but that's messy, would break CRIU, and would create unnecessary VMAs that would waste memory. Acked-by: Andrew Morton Signed-off-by: Andy Lutomirski --- Notes: Changes from v1: - Improve the changelog (akpm) include/linux/mm.h | 2 ++ mm/memory.c | 25 +++++++++++++++++++++++-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 00bad7793788..87ef1d7730ba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2080,6 +2080,8 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr, int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); +int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn, pgprot_t pgprot); int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index c387430f06c3..a29f0b90fc56 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1564,8 +1564,29 @@ out: int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn) { + return vm_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot); +} +EXPORT_SYMBOL(vm_insert_pfn); + +/** + * vm_insert_pfn_prot - insert single pfn into user vma with specified pgprot + * @vma: user vma to map to + * @addr: target user address of this page + * @pfn: source kernel pfn + * @pgprot: pgprot flags for the inserted page + * + * This is exactly like vm_insert_pfn, except that it allows drivers to + * to override pgprot on a per-page basis. + * + * This only makes sense for IO mappings, and it makes no sense for + * cow mappings. In general, using multiple vmas is preferable; + * vm_insert_pfn_prot should only be used if using multiple VMAs is + * impractical. + */ +int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn, pgprot_t pgprot) +{ int ret; - pgprot_t pgprot = vma->vm_page_prot; /* * Technically, architectures with pte_special can avoid all these * restrictions (same for remap_pfn_range). However we would like @@ -1587,7 +1608,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, return ret; } -EXPORT_SYMBOL(vm_insert_pfn); +EXPORT_SYMBOL(vm_insert_pfn_prot); int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn) -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932212AbbLNSdS (ORCPT ); Mon, 14 Dec 2015 13:33:18 -0500 Received: from mail.kernel.org ([198.145.29.136]:51977 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752082AbbLNSb1 (ORCPT ); Mon, 14 Dec 2015 13:31:27 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski Subject: [PATCH v2 3/6] x86/vdso: Track each mm's loaded vdso image as well as its base Date: Mon, 14 Dec 2015 10:31:15 -0800 Message-Id: <69bab428e0db14fc1bc1add051d2a294760137dc.1450117783.git.luto@kernel.org> X-Mailer: git-send-email 2.5.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As we start to do more intelligent things with the vdso at runtime (as opposed to just at mm initialization time), we'll need to know which vdso is in use. In principle, we could guess based on the mm type, but that's over-complicated and error-prone. Instead, just track it in the mmu context. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vma.c | 1 + arch/x86/include/asm/mmu.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index b8f69e264ac4..80b021067bd6 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -121,6 +121,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) text_start = addr - image->sym_vvar_start; current->mm->context.vdso = (void __user *)text_start; + current->mm->context.vdso_image = image; /* * MAYWRITE to allow gdb to COW and set breakpoints diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 55234d5e7160..1ea0baef1175 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -19,7 +19,8 @@ typedef struct { #endif struct mutex lock; - void __user *vdso; + void __user *vdso; /* vdso base address */ + const struct vdso_image *vdso_image; /* vdso image in use */ atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */ } mm_context_t; -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753531AbbLNSbc (ORCPT ); Mon, 14 Dec 2015 13:31:32 -0500 Received: from mail.kernel.org ([198.145.29.136]:52002 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753490AbbLNSb3 (ORCPT ); Mon, 14 Dec 2015 13:31:29 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski Subject: [PATCH v2 4/6] x86,vdso: Use .fault for the vdso text mapping Date: Mon, 14 Dec 2015 10:31:16 -0800 Message-Id: X-Mailer: git-send-email 2.5.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The old scheme for mapping the vdso text is rather complicated. vdso2c generates a struct vm_special_mapping and a blank .pages array of the correct size for each vdso image. Init code in vdso/vma.c populates the .pages array for each vdso image, and the mapping code selects the appropriate struct vm_special_mapping. With .fault, we can use a less roundabout approach: vdso_fault just returns the appropriate page for the selected vdso image. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vdso2c.h | 7 ------- arch/x86/entry/vdso/vma.c | 26 +++++++++++++++++++------- arch/x86/include/asm/vdso.h | 3 --- 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h index 0224987556ce..abe961c7c71c 100644 --- a/arch/x86/entry/vdso/vdso2c.h +++ b/arch/x86/entry/vdso/vdso2c.h @@ -150,16 +150,9 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, } fprintf(outfile, "\n};\n\n"); - fprintf(outfile, "static struct page *pages[%lu];\n\n", - mapping_size / 4096); - fprintf(outfile, "const struct vdso_image %s = {\n", name); fprintf(outfile, "\t.data = raw_data,\n"); fprintf(outfile, "\t.size = %lu,\n", mapping_size); - fprintf(outfile, "\t.text_mapping = {\n"); - fprintf(outfile, "\t\t.name = \"[vdso]\",\n"); - fprintf(outfile, "\t\t.pages = pages,\n"); - fprintf(outfile, "\t},\n"); if (alt_sec) { fprintf(outfile, "\t.alt = %lu,\n", (unsigned long)GET_LE(&alt_sec->sh_offset)); diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 80b021067bd6..eb50d7c1f161 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -27,13 +27,7 @@ unsigned int __read_mostly vdso64_enabled = 1; void __init init_vdso_image(const struct vdso_image *image) { - int i; - int npages = (image->size) / PAGE_SIZE; - BUG_ON(image->size % PAGE_SIZE != 0); - for (i = 0; i < npages; i++) - image->text_mapping.pages[i] = - virt_to_page(image->data + i*PAGE_SIZE); apply_alternatives((struct alt_instr *)(image->data + image->alt), (struct alt_instr *)(image->data + image->alt + @@ -90,6 +84,24 @@ static unsigned long vdso_addr(unsigned long start, unsigned len) #endif } +static int vdso_fault(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + const struct vdso_image *image = vma->vm_mm->context.vdso_image; + + if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size) + return VM_FAULT_SIGBUS; + + vmf->page = virt_to_page(image->data + (vmf->pgoff << PAGE_SHIFT)); + get_page(vmf->page); + return 0; +} + +static const struct vm_special_mapping text_mapping = { + .name = "[vdso]", + .fault = vdso_fault, +}; + static int map_vdso(const struct vdso_image *image, bool calculate_addr) { struct mm_struct *mm = current->mm; @@ -131,7 +143,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) image->size, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, - &image->text_mapping); + &text_mapping); if (IS_ERR(vma)) { ret = PTR_ERR(vma); diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index deabaf9759b6..43dc55be524e 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -13,9 +13,6 @@ struct vdso_image { void *data; unsigned long size; /* Always a multiple of PAGE_SIZE */ - /* text_mapping.pages is big enough for data/size page pointers */ - struct vm_special_mapping text_mapping; - unsigned long alt, alt_len; long sym_vvar_start; /* Negative offset to the vvar area */ -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932137AbbLNScD (ORCPT ); Mon, 14 Dec 2015 13:32:03 -0500 Received: from mail.kernel.org ([198.145.29.136]:52032 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753219AbbLNSbb (ORCPT ); Mon, 14 Dec 2015 13:31:31 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski Subject: [PATCH v2 5/6] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping Date: Mon, 14 Dec 2015 10:31:17 -0800 Message-Id: X-Mailer: git-send-email 2.5.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is IMO much less ugly, and it also opens the door to disallowing unprivileged userspace HPET access on systems with usable TSCs. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vma.c | 97 ++++++++++++++++++++++++++++------------------- 1 file changed, 57 insertions(+), 40 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index eb50d7c1f161..02221e98b83f 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -102,18 +102,69 @@ static const struct vm_special_mapping text_mapping = { .fault = vdso_fault, }; +static int vvar_fault(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + const struct vdso_image *image = vma->vm_mm->context.vdso_image; + long sym_offset; + int ret = -EFAULT; + + if (!image) + return VM_FAULT_SIGBUS; + + sym_offset = (long)(vmf->pgoff << PAGE_SHIFT) + + image->sym_vvar_start; + + /* + * Sanity check: a symbol offset of zero means that the page + * does not exist for this vdso image, not that the page is at + * offset zero relative to the text mapping. This should be + * impossible here, because sym_offset should only be zero for + * the page past the end of the vvar mapping. + */ + if (sym_offset == 0) + return VM_FAULT_SIGBUS; + + if (sym_offset == image->sym_vvar_page) { + ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, + __pa_symbol(&__vvar_page) >> PAGE_SHIFT); + } else if (sym_offset == image->sym_hpet_page) { +#ifdef CONFIG_HPET_TIMER + if (hpet_address) { + ret = vm_insert_pfn_prot( + vma, + (unsigned long)vmf->virtual_address, + hpet_address >> PAGE_SHIFT, + pgprot_noncached(PAGE_READONLY)); + } +#endif + } else if (sym_offset == image->sym_pvclock_page) { + struct pvclock_vsyscall_time_info *pvti = + pvclock_pvti_cpu0_va(); + if (pvti) { + ret = vm_insert_pfn( + vma, + (unsigned long)vmf->virtual_address, + __pa(pvti) >> PAGE_SHIFT); + } + } + + if (ret == 0) + return VM_FAULT_NOPAGE; + + return VM_FAULT_SIGBUS; +} + static int map_vdso(const struct vdso_image *image, bool calculate_addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long addr, text_start; int ret = 0; - static struct page *no_pages[] = {NULL}; - static struct vm_special_mapping vvar_mapping = { + static const struct vm_special_mapping vvar_mapping = { .name = "[vvar]", - .pages = no_pages, + .fault = vvar_fault, }; - struct pvclock_vsyscall_time_info *pvti; if (calculate_addr) { addr = vdso_addr(current->mm->start_stack, @@ -153,7 +204,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) vma = _install_special_mapping(mm, addr, -image->sym_vvar_start, - VM_READ|VM_MAYREAD, + VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP| + VM_PFNMAP, &vvar_mapping); if (IS_ERR(vma)) { @@ -161,41 +213,6 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) goto up_fail; } - if (image->sym_vvar_page) - ret = remap_pfn_range(vma, - text_start + image->sym_vvar_page, - __pa_symbol(&__vvar_page) >> PAGE_SHIFT, - PAGE_SIZE, - PAGE_READONLY); - - if (ret) - goto up_fail; - -#ifdef CONFIG_HPET_TIMER - if (hpet_address && image->sym_hpet_page) { - ret = io_remap_pfn_range(vma, - text_start + image->sym_hpet_page, - hpet_address >> PAGE_SHIFT, - PAGE_SIZE, - pgprot_noncached(PAGE_READONLY)); - - if (ret) - goto up_fail; - } -#endif - - pvti = pvclock_pvti_cpu0_va(); - if (pvti && image->sym_pvclock_page) { - ret = remap_pfn_range(vma, - text_start + image->sym_pvclock_page, - __pa(pvti) >> PAGE_SHIFT, - PAGE_SIZE, - PAGE_READONLY); - - if (ret) - goto up_fail; - } - up_fail: if (ret) current->mm->context.vdso = NULL; -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753554AbbLNSbg (ORCPT ); Mon, 14 Dec 2015 13:31:36 -0500 Received: from mail.kernel.org ([198.145.29.136]:52055 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753538AbbLNSbc (ORCPT ); Mon, 14 Dec 2015 13:31:32 -0500 From: Andy Lutomirski To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Andy Lutomirski Subject: [PATCH v2 6/6] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks Date: Mon, 14 Dec 2015 10:31:18 -0800 Message-Id: X-Mailer: git-send-email 2.5.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It makes me uncomfortable that even modern systems grant every process direct read access to the HPET. While fixing this for real without regressing anything is a mess (unmapping the HPET is tricky because we don't adequately track all the mappings), we can do almost as well by tracking which vclocks have ever been used and only allowing pages associated with used vclocks to be faulted in. This will cause rogue programs that try to peek at the HPET to get SIGBUS instead on most systems. We can't restrict faults to vclock pages that are associated with the currently selected vclock due to a race: a process could start to access the HPET for the first time and race against a switch away from the HPET as the current clocksource. We can't segfault the process trying to peek at the HPET in this case, even though the process isn't going to do anything useful with the data. Signed-off-by: Andy Lutomirski --- arch/x86/entry/vdso/vma.c | 4 ++-- arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++++++++- arch/x86/include/asm/clocksource.h | 9 +++++---- arch/x86/include/asm/vgtod.h | 6 ++++++ 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 02221e98b83f..aa4f5b99f1f3 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -130,7 +130,7 @@ static int vvar_fault(const struct vm_special_mapping *sm, __pa_symbol(&__vvar_page) >> PAGE_SHIFT); } else if (sym_offset == image->sym_hpet_page) { #ifdef CONFIG_HPET_TIMER - if (hpet_address) { + if (hpet_address && vclock_was_used(VCLOCK_HPET)) { ret = vm_insert_pfn_prot( vma, (unsigned long)vmf->virtual_address, @@ -141,7 +141,7 @@ static int vvar_fault(const struct vm_special_mapping *sm, } else if (sym_offset == image->sym_pvclock_page) { struct pvclock_vsyscall_time_info *pvti = pvclock_pvti_cpu0_va(); - if (pvti) { + if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) { ret = vm_insert_pfn( vma, (unsigned long)vmf->virtual_address, diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c index 51e330416995..0fb3a104ac62 100644 --- a/arch/x86/entry/vsyscall/vsyscall_gtod.c +++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c @@ -16,6 +16,8 @@ #include #include +int vclocks_used __read_mostly; + DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data); void update_vsyscall_tz(void) @@ -26,12 +28,17 @@ void update_vsyscall_tz(void) void update_vsyscall(struct timekeeper *tk) { + int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode; struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data; + /* Mark the new vclock used. */ + BUILD_BUG_ON(VCLOCK_MAX >= 32); + WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode)); + gtod_write_begin(vdata); /* copy vsyscall data */ - vdata->vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode; + vdata->vclock_mode = vclock_mode; vdata->cycle_last = tk->tkr_mono.cycle_last; vdata->mask = tk->tkr_mono.mask; vdata->mult = tk->tkr_mono.mult; diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h index eda81dc0f4ae..d194266acb28 100644 --- a/arch/x86/include/asm/clocksource.h +++ b/arch/x86/include/asm/clocksource.h @@ -3,10 +3,11 @@ #ifndef _ASM_X86_CLOCKSOURCE_H #define _ASM_X86_CLOCKSOURCE_H -#define VCLOCK_NONE 0 /* No vDSO clock available. */ -#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */ -#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */ -#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */ +#define VCLOCK_NONE 0 /* No vDSO clock available. */ +#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */ +#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */ +#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */ +#define VCLOCK_MAX 3 struct arch_clocksource_data { int vclock_mode; diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h index f556c4843aa1..e728699db774 100644 --- a/arch/x86/include/asm/vgtod.h +++ b/arch/x86/include/asm/vgtod.h @@ -37,6 +37,12 @@ struct vsyscall_gtod_data { }; extern struct vsyscall_gtod_data vsyscall_gtod_data; +extern int vclocks_used; +static inline bool vclock_was_used(int vclock) +{ + return READ_ONCE(vclocks_used) & (1 << vclock); +} + static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s) { unsigned ret; -- 2.5.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965933AbbLWX5N (ORCPT ); Wed, 23 Dec 2015 18:57:13 -0500 Received: from mail-oi0-f48.google.com ([209.85.218.48]:36282 "EHLO mail-oi0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965898AbbLWX5M (ORCPT ); Wed, 23 Dec 2015 18:57:12 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Wed, 23 Dec 2015 15:56:52 -0800 Message-ID: Subject: Re: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements To: Andy Lutomirski , Oleg Nesterov , Kees Cook Cc: X86 ML , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Oleg and Kees- I meant to cc you on this in the first place, but I failed. If you have a few minutes, want to take a peek at these and see if you can poke any holes in them? I'm reasonably confident that they're a considerable improvement over the old state of affairs, but they might still not be perfect. Let me know if you want me to email out a fresh copy. This series applies to tip:x86/asm. --Andy On Mon, Dec 14, 2015 at 10:31 AM, Andy Lutomirski wrote: > This applies on top of the earlier vdso pvclock series I sent out. > Once that lands in -tip, this will apply to -tip. > > This series cleans up the hack that is our vvar mapping. We currently > initialize the vvar mapping as a special mapping vma backed by nothing > whatsoever and then we abuse remap_pfn_range to populate it. > > This cheats the mm core, probably breaks under various evil madvise > workloads, and prevents handling faults in more interesting ways. > > To clean it up, this series: > > - Adds a special mapping .fault operation > - Adds a vm_insert_pfn_prot helper > - Uses the new .fault infrastructure in x86's vdso and vvar mappings > - Hardens the HPET mapping, mitigating an HW attack surface that bothers me > > akpm, can you ack patck 1? > > Changes from v1: > - Lots of changelog clarification requested by akpm > - Minor tweaks to style and comments in the first two patches > > Andy Lutomirski (6): > mm: Add a vm_special_mapping .fault method > mm: Add vm_insert_pfn_prot > x86/vdso: Track each mm's loaded vdso image as well as its base > x86,vdso: Use .fault for the vdso text mapping > x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping > x86/vdso: Disallow vvar access to vclock IO for never-used vclocks > > arch/x86/entry/vdso/vdso2c.h | 7 -- > arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------ > arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++- > arch/x86/include/asm/clocksource.h | 9 +-- > arch/x86/include/asm/mmu.h | 3 +- > arch/x86/include/asm/vdso.h | 3 - > arch/x86/include/asm/vgtod.h | 6 ++ > include/linux/mm.h | 2 + > include/linux/mm_types.h | 22 +++++- > mm/memory.c | 25 ++++++- > mm/mmap.c | 13 ++-- > 11 files changed, 151 insertions(+), 72 deletions(-) > > -- > 2.5.0 > -- Andy Lutomirski AMA Capital Management, LLC From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753160AbbL2NNM (ORCPT ); Tue, 29 Dec 2015 08:13:12 -0500 Received: from mail-ob0-f175.google.com ([209.85.214.175]:34419 "EHLO mail-ob0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752074AbbL2NNJ (ORCPT ); Tue, 29 Dec 2015 08:13:09 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Tue, 29 Dec 2015 05:12:49 -0800 Message-ID: Subject: Re: [PATCH v2 0/6] mm, x86/vdso: Special IO mapping improvements To: Andy Lutomirski , Oleg Nesterov , Kees Cook Cc: X86 ML , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 23, 2015 at 3:56 PM, Andy Lutomirski wrote: > Hi Oleg and Kees- > > I meant to cc you on this in the first place, but I failed. If you > have a few minutes, want to take a peek at these and see if you can > poke any holes in them? I'm reasonably confident that they're a > considerable improvement over the old state of affairs, but they might > still not be perfect. > > Let me know if you want me to email out a fresh copy. This series > applies to tip:x86/asm. Hi -tip people: please don't apply this series. It has a race. I'll send v3. --Andy > > --Andy > > On Mon, Dec 14, 2015 at 10:31 AM, Andy Lutomirski wrote: >> This applies on top of the earlier vdso pvclock series I sent out. >> Once that lands in -tip, this will apply to -tip. >> >> This series cleans up the hack that is our vvar mapping. We currently >> initialize the vvar mapping as a special mapping vma backed by nothing >> whatsoever and then we abuse remap_pfn_range to populate it. >> >> This cheats the mm core, probably breaks under various evil madvise >> workloads, and prevents handling faults in more interesting ways. >> >> To clean it up, this series: >> >> - Adds a special mapping .fault operation >> - Adds a vm_insert_pfn_prot helper >> - Uses the new .fault infrastructure in x86's vdso and vvar mappings >> - Hardens the HPET mapping, mitigating an HW attack surface that bothers me >> >> akpm, can you ack patck 1? >> >> Changes from v1: >> - Lots of changelog clarification requested by akpm >> - Minor tweaks to style and comments in the first two patches >> >> Andy Lutomirski (6): >> mm: Add a vm_special_mapping .fault method >> mm: Add vm_insert_pfn_prot >> x86/vdso: Track each mm's loaded vdso image as well as its base >> x86,vdso: Use .fault for the vdso text mapping >> x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping >> x86/vdso: Disallow vvar access to vclock IO for never-used vclocks >> >> arch/x86/entry/vdso/vdso2c.h | 7 -- >> arch/x86/entry/vdso/vma.c | 124 ++++++++++++++++++++------------ >> arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++- >> arch/x86/include/asm/clocksource.h | 9 +-- >> arch/x86/include/asm/mmu.h | 3 +- >> arch/x86/include/asm/vdso.h | 3 - >> arch/x86/include/asm/vgtod.h | 6 ++ >> include/linux/mm.h | 2 + >> include/linux/mm_types.h | 22 +++++- >> mm/memory.c | 25 ++++++- >> mm/mmap.c | 13 ++-- >> 11 files changed, 151 insertions(+), 72 deletions(-) >> >> -- >> 2.5.0 >> > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC