* Do x86 NX and AMD prefetch check cause page fault infinite loop? @ 2004-06-30 1:38 Jamie Lokier 2004-06-30 5:50 ` Ingo Molnar 2004-06-30 6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko 0 siblings, 2 replies; 11+ messages in thread From: Jamie Lokier @ 2004-06-30 1:38 UTC (permalink / raw) To: linux-kernel, Andi Kleen, Ingo Molnar I was looking at the code which checks for prefetch instructions in the page fault path on x86 and x86-64, due to the AMD bug where prefetch sometimes causes unwanted faults. I wondered if simply returning when *EIP points to a prefetch instruction could cause an infinite loop of page faults, instead of the wanted SIGSEGV or SIGBUS. I know we went over it before, but I had another look. AMD already confirmed that the erroneous fault won't reoccur when a prefetch instruction is returned to from the fault handler. So a loop can only occur if it's _not_ an erroneous fault, but instead the __is_prefetch() code is preventing a normal signal from being ever raised. Can this happen? For it to happen, returning must immediately raise another fault, and that only happens if the page permission of *EIP doesn't permit execution. That can happen if another thread's changing permissions, but it's only transient -- it's not a loop. Can it happen otherwise? For __is_prefetch() to say it's a prefetch instruction, it must successfully read and decode the instruction. That can only happen if the page containing the instruction is mapped readable (i.e. on x86 that means anything other than PROT_NONE), and the code segment limit is ok. That means the answer to "can it get stuck in a loop" is _no_ on a plain 32-bit x86. That's because all such pages are executable and within the bounds of the code segment, even if it's a user-setup code segment. But... what if the page is not executable? When NX is enabled on 32-bit x86, and all x86-64 kernels, or even the exec-shield patch's changes to the USER_CS limit (that limit isn't checked in __is_prefetch) - those conditions all allow __is_prefetch() to read a prefetch instruction, cause the fault handler to return, and repeat. This can only happen when something branches to a page with PROT_EXEC _not_ set, on a kernel which honours that, and the target address is a prefetch instruction. That can happen due to malicious code, a programming error, or corruption. The behaviour in such cases _should_ be SIGSEGV due to lack of execute permission. However, I think the behaviour will be an infinite loop. I haven't tested this as I don't have the hardware for NX, and don't want to apply the non-NX exec-shield or PaX patches on a working Athlon box. Can anyone confirm this is a real bug, or that it isn't and I missed the reason why not? Thanks, -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-06-30 1:38 Do x86 NX and AMD prefetch check cause page fault infinite loop? Jamie Lokier @ 2004-06-30 5:50 ` Ingo Molnar 2004-06-30 14:21 ` Jamie Lokier 2004-06-30 14:38 ` Jamie Lokier 2004-06-30 6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko 1 sibling, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2004-06-30 5:50 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen * Jamie Lokier <jamie@shareable.org> wrote: > But... what if the page is not executable? When NX is enabled on > 32-bit x86, and all x86-64 kernels, or even the exec-shield patch's > changes to the USER_CS limit (that limit isn't checked in > __is_prefetch) - those conditions all allow __is_prefetch() to read a > prefetch instruction, cause the fault handler to return, and repeat. > > This can only happen when something branches to a page with PROT_EXEC > _not_ set, on a kernel which honours that, and the target address is a > prefetch instruction. > > That can happen due to malicious code, a programming error, or > corruption. > > The behaviour in such cases _should_ be SIGSEGV due to lack of execute > permission. However, I think the behaviour will be an infinite loop. > > I haven't tested this as I don't have the hardware for NX, and don't > want to apply the non-NX exec-shield or PaX patches on a working > Athlon box. > > Can anyone confirm this is a real bug, or that it isn't and I missed > the reason why not? i understand what you mean, but for this to trigger one would have to trigger the prefetch erratum _and_ then turn off executability in parallel, right? So the question is, is there a reliable way to trigger the pagefault situation, and if yes, how do you turn on NX - because right before the fault the instruction had to be executable. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-06-30 5:50 ` Ingo Molnar @ 2004-06-30 14:21 ` Jamie Lokier 2004-06-30 14:38 ` Jamie Lokier 1 sibling, 0 replies; 11+ messages in thread From: Jamie Lokier @ 2004-06-30 14:21 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo Molnar wrote: > i understand what you mean, but for this to trigger one would have to > trigger the prefetch erratum _and_ then turn off executability in > parallel, right? So the question is, is there a reliable way to trigger > the pagefault situation, and if yes, how do you turn on NX - because > right before the fault the instruction had to be executable. No need for anything in parallel. I think you can trigger it by jumping to a non-PROT_EXEC page where the target address is a prefetch -- or by falling through from the end of a PROT_EXEC page to a non-PROT_EXEC one. To be sure both cases are obscure, but the resulting loop is still wrong. Who knows, perhaps internal conditions of the chip prevent these particular prefetches from triggering the fault. After all, we're told that on returning from the fault handler, the prefetch won't fault again, and it's not obvious why that should be. It'd be very subtle though, and deserve a comment. -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-06-30 5:50 ` Ingo Molnar 2004-06-30 14:21 ` Jamie Lokier @ 2004-06-30 14:38 ` Jamie Lokier 2004-07-01 1:48 ` Jamie Lokier 1 sibling, 1 reply; 11+ messages in thread From: Jamie Lokier @ 2004-06-30 14:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen On x86 and x86-64 with NX, is a fault due to non-exec permission distinguishable from a fault due to lack of read/write permissions? I.e. does the flags word have a different bit set? If so, the solution is simple: don't just return if it's a non-exec fault. (It's possible that won't work if the CPU is very speculative and generates data faults from prefetches despite them being in non-exec area -- i.e. if the buggy data fault gets precedence over the non-exec fault or segment. But I'd hope that's not the case). -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-06-30 14:38 ` Jamie Lokier @ 2004-07-01 1:48 ` Jamie Lokier 2004-07-01 6:32 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Jamie Lokier @ 2004-07-01 1:48 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen Ingo, I think I now know what must be added to your 32-bit NX patch to prevent the "infinite loop without a signal" problem. It appears the correct way to prevent that one possibility I thought of, with no side effects, is to add this test in i386/mm/fault.c:is_prefetch(): /* Catch an obscure case of prefetch inside an NX page. */ if (error_code & 16) return 0; That means that it doesn't count as a prefetch fault if it's an _instruction_ fault. I.e. an instruction fault will always raise a signal. Bit 4 of error_code was kindly added alongside the NX feature by AMD. (Tweak: Because early Intel 64-bit chips don't have NX, perhaps it should say "if ((error_code & 16) && boot_cpu_has(X86_FEATURE_NX))" instead -- if we find the bit isn't architecturally set to 0 for those chips). This test isn't needed in the plain, non-NX i386 kernel, because the condition can never occur. (Actually it can once, a really obscure condition due to separate ITLB and DTLB loading and page table races with other CPUs, but it's transient so won't loop infinitely). Enjoy, -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-07-01 1:48 ` Jamie Lokier @ 2004-07-01 6:32 ` Ingo Molnar 2004-07-01 15:04 ` Jamie Lokier 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2004-07-01 6:32 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 1265 bytes --] * Jamie Lokier <jamie@shareable.org> wrote: > Ingo, I think I now know what must be added to your 32-bit NX patch to > prevent the "infinite loop without a signal" problem. > > It appears the correct way to prevent that one possibility I thought > of, with no side effects, is to add this test in > i386/mm/fault.c:is_prefetch(): > > /* Catch an obscure case of prefetch inside an NX page. */ > if (error_code & 16) > return 0; > > That means that it doesn't count as a prefetch fault if it's an > _instruction_ fault. I.e. an instruction fault will always raise a > signal. Bit 4 of error_code was kindly added alongside the NX feature > by AMD. > > (Tweak: Because early Intel 64-bit chips don't have NX, perhaps it > should say "if ((error_code & 16) && boot_cpu_has(X86_FEATURE_NX))" > instead -- if we find the bit isn't architecturally set to 0 for those > chips). Thanks for the analysis Jamie, this should certainly solve the problem. I've attached a patch against BK that implements this. I've tested the patched x86 kernel on an athlon64 box and on a non-NX box - it works fine. Bit 4 also simplifies the detection of illegal code execution within the kernel - i retested that too and it still works fine. Ingo [-- Attachment #2: nx-prefetch-fix-2.6.7-A2 --] [-- Type: text/plain, Size: 3595 bytes --] - fix possible prefetch-fault loop on NX page, based on suggestions from Jamie Lokier. - clean up nx feature dependencies - simplify detection of NX-violations when the kernel executes code Signed-off-by: Ingo Molnar <mingo@elte.hu> --- linux/arch/i386/mm/fault.c.orig +++ linux/arch/i386/mm/fault.c @@ -188,11 +188,16 @@ static int __is_prefetch(struct pt_regs return prefetch; } -static inline int is_prefetch(struct pt_regs *regs, unsigned long addr) +static inline int is_prefetch(struct pt_regs *regs, unsigned long addr, + unsigned long error_code) { if (unlikely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 >= 6)) + boot_cpu_data.x86 >= 6)) { + /* Catch an obscure case of prefetch inside an NX page. */ + if (nx_enabled && (error_code & 16)) + return 0; return __is_prefetch(regs, addr); + } return 0; } @@ -374,7 +379,7 @@ bad_area_nosemaphore: * Valid to do another page fault here because this one came * from user space. */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; tsk->thread.cr2 = address; @@ -415,7 +420,7 @@ no_context: * had been triggered by is_prefetch fixup_exception would have * handled it. */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; /* @@ -425,21 +430,8 @@ no_context: bust_spinlocks(1); -#ifdef CONFIG_X86_PAE - { - pgd_t *pgd; - pmd_t *pmd; - - - - pgd = init_mm.pgd + pgd_index(address); - if (pgd_present(*pgd)) { - pmd = pmd_offset(pgd, address); - if (pmd_val(*pmd) & _PAGE_NX) - printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid); - } - } -#endif + if (nx_enabled && (error_code & 16)) + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); if (address < PAGE_SIZE) printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference"); else @@ -492,7 +484,7 @@ do_sigbus: goto no_context; /* User space => ok to do another page fault */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; tsk->thread.cr2 = address; --- linux/arch/i386/mm/init.c.orig +++ linux/arch/i386/mm/init.c @@ -437,7 +437,7 @@ static int __init noexec_setup(char *str __setup("noexec=", noexec_setup); #ifdef CONFIG_X86_PAE -static int use_nx = 0; +int nx_enabled = 0; static void __init set_nx(void) { @@ -449,7 +449,7 @@ static void __init set_nx(void) rdmsr(MSR_EFER, l, h); l |= EFER_NX; wrmsr(MSR_EFER, l, h); - use_nx = 1; + nx_enabled = 1; __supported_pte_mask |= _PAGE_NX; } } @@ -468,7 +468,7 @@ void __init paging_init(void) { #ifdef CONFIG_X86_PAE set_nx(); - if (use_nx) + if (nx_enabled) printk("NX (Execute Disable) protection: active\n"); #endif --- linux/include/asm-i386/page.h.orig +++ linux/include/asm-i386/page.h @@ -41,6 +41,7 @@ */ #ifdef CONFIG_X86_PAE extern unsigned long long __supported_pte_mask; +extern int nx_enabled; typedef struct { unsigned long pte_low, pte_high; } pte_t; typedef struct { unsigned long long pmd; } pmd_t; typedef struct { unsigned long long pgd; } pgd_t; @@ -48,6 +49,7 @@ typedef struct { unsigned long long pgpr #define pte_val(x) ((x).pte_low | ((unsigned long long)(x).pte_high << 32)) #define HPAGE_SHIFT 21 #else +#define nx_enabled 0 typedef struct { unsigned long pte_low; } pte_t; typedef struct { unsigned long pmd; } pmd_t; typedef struct { unsigned long pgd; } pgd_t; ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-07-01 6:32 ` Ingo Molnar @ 2004-07-01 15:04 ` Jamie Lokier 2004-07-02 7:15 ` Ingo Molnar 2004-07-02 8:50 ` [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 Ingo Molnar 0 siblings, 2 replies; 11+ messages in thread From: Jamie Lokier @ 2004-07-01 15:04 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds Ingo Molnar wrote: > -#ifdef CONFIG_X86_PAE > - { > - pgd_t *pgd; > - pmd_t *pmd; > - pgd = init_mm.pgd + pgd_index(address); > - if (pgd_present(*pgd)) { > - pmd = pmd_offset(pgd, address); > - if (pmd_val(*pmd) & _PAGE_NX) > - printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid); > - } > - } > -#endif > + if (nx_enabled && (error_code & 16)) > + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); According to AMD's manual, bit 4 of error_code means the fault was due to an instruction fetch. It doesn't imply that it's an NX-protected page: it might be a page not present fault instead. (The manual doesn't spell that out, it just says the bit is set when it's an instruction fetch). Just so you realise that the above code fragments aren't logically equivalent. -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-07-01 15:04 ` Jamie Lokier @ 2004-07-02 7:15 ` Ingo Molnar 2004-07-02 8:50 ` [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 Ingo Molnar 1 sibling, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2004-07-02 7:15 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds * Jamie Lokier <jamie@shareable.org> wrote: > > - if (pmd_val(*pmd) & _PAGE_NX) > > - printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid); > > - } > > - } > > -#endif > > + if (nx_enabled && (error_code & 16)) > > + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); > > According to AMD's manual, bit 4 of error_code means the fault was due > to an instruction fetch. It doesn't imply that it's an NX-protected > page: it might be a page not present fault instead. (The manual > doesn't spell that out, it just says the bit is set when it's an > instruction fetch). you are right, it doesnt say it's an NX related fault. I'll test this out and send a delta patch. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 2004-07-01 15:04 ` Jamie Lokier 2004-07-02 7:15 ` Ingo Molnar @ 2004-07-02 8:50 ` Ingo Molnar 1 sibling, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2004-07-02 8:50 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 1668 bytes --] * Jamie Lokier <jamie@shareable.org> wrote: > > + if (nx_enabled && (error_code & 16)) > > + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); > > According to AMD's manual, bit 4 of error_code means the fault was due > to an instruction fetch. It doesn't imply that it's an NX-protected > page: it might be a page not present fault instead. (The manual > doesn't spell that out, it just says the bit is set when it's an > instruction fetch). > > Just so you realise that the above code fragments aren't logically > equivalent. i've attached an updated nx-prefetch-fix.patch that properly fixes this. I've also attached a delta relative to the previous patch. This patch will only print the 'possible exploit' warning if the kernel tries to execute a present page. (hence not printing the message in the quite common jump-to-address-zero crash case.) I've test-compiled and test-booted the full patch on 2.6.7-mm5 using the following x86 kernel configs: SMP+PAE, UP+PAE, SMP+!PAE, UP+!PAE, on an SMP P3 and an Athlon64 box. I've tested various types of instruction-fetch related kernel faults on the Athlon64 box, it all works fine. The full changelog: - fix possible prefetch-fault loop on NX page, based on suggestions from Jamie Lokier. - clean up nx feature dependencies - simplify detection of NX-violations when the kernel executes code - introduce pte_exec_kern() to simplify the NX logic - split the definitions out of pgtable-[23]level.h into pgtable-[23]level-defs.h, to enable the former to use generic pte functions from pgtable.h. Signed-off-by: Ingo Molnar <mingo@elte.hu> [-- Attachment #2: nx-prefetch-fix.patch --] [-- Type: text/plain, Size: 9660 bytes --] - fix possible prefetch-fault loop on NX page, based on suggestions from Jamie Lokier. - clean up nx feature dependencies - simplify detection of NX-violations when the kernel executes code - introduce pte_exec_kern() to simplify the NX logic - split the definitions out of pgtable-[23]level.h into pgtable-[23]level-defs.h, to enable the former to use generic pte functions from pgtable.h. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- linux/arch/i386/mm/fault.c.orig +++ linux/arch/i386/mm/fault.c @@ -188,11 +188,16 @@ static int __is_prefetch(struct pt_regs return prefetch; } -static inline int is_prefetch(struct pt_regs *regs, unsigned long addr) +static inline int is_prefetch(struct pt_regs *regs, unsigned long addr, + unsigned long error_code) { if (unlikely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 >= 6)) + boot_cpu_data.x86 >= 6)) { + /* Catch an obscure case of prefetch inside an NX page. */ + if (nx_enabled && (error_code & 16)) + return 0; return __is_prefetch(regs, addr); + } return 0; } @@ -374,7 +379,7 @@ bad_area_nosemaphore: * Valid to do another page fault here because this one came * from user space. */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; tsk->thread.cr2 = address; @@ -415,7 +420,7 @@ no_context: * had been triggered by is_prefetch fixup_exception would have * handled it. */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; /* @@ -432,18 +437,11 @@ no_context: bust_spinlocks(1); #ifdef CONFIG_X86_PAE - { - pgd_t *pgd; - pmd_t *pmd; - + if (error_code & 16) { + pte_t *pte = lookup_address(address); - - pgd = init_mm.pgd + pgd_index(address); - if (pgd_present(*pgd)) { - pmd = pmd_offset(pgd, address); - if (pmd_val(*pmd) & _PAGE_NX) - printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid); - } + if (pte && pte_present(*pte) && !pte_exec_kernel(*pte)) + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); } #endif if (address < PAGE_SIZE) @@ -498,7 +496,7 @@ do_sigbus: goto no_context; /* User space => ok to do another page fault */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; tsk->thread.cr2 = address; --- linux/arch/i386/mm/init.c.orig +++ linux/arch/i386/mm/init.c @@ -437,7 +437,7 @@ static int __init noexec_setup(char *str __setup("noexec=", noexec_setup); #ifdef CONFIG_X86_PAE -static int use_nx = 0; +int nx_enabled = 0; static void __init set_nx(void) { @@ -449,7 +449,7 @@ static void __init set_nx(void) rdmsr(MSR_EFER, l, h); l |= EFER_NX; wrmsr(MSR_EFER, l, h); - use_nx = 1; + nx_enabled = 1; __supported_pte_mask |= _PAGE_NX; } } @@ -470,7 +470,7 @@ int __init set_kernel_exec(unsigned long pte = lookup_address(vaddr); BUG_ON(!pte); - if (pte_val(*pte) & _PAGE_NX) + if (!pte_exec_kernel(*pte)) ret = 0; if (enable) @@ -495,7 +495,7 @@ void __init paging_init(void) { #ifdef CONFIG_X86_PAE set_nx(); - if (use_nx) + if (nx_enabled) printk("NX (Execute Disable) protection: active\n"); #endif --- linux/include/asm-i386/page.h.orig +++ linux/include/asm-i386/page.h @@ -41,6 +41,7 @@ */ #ifdef CONFIG_X86_PAE extern unsigned long long __supported_pte_mask; +extern int nx_enabled; typedef struct { unsigned long pte_low, pte_high; } pte_t; typedef struct { unsigned long long pmd; } pmd_t; typedef struct { unsigned long long pgd; } pgd_t; @@ -48,6 +49,7 @@ typedef struct { unsigned long long pgpr #define pte_val(x) ((x).pte_low | ((unsigned long long)(x).pte_high << 32)) #define HPAGE_SHIFT 21 #else +#define nx_enabled 0 typedef struct { unsigned long pte_low; } pte_t; typedef struct { unsigned long pmd; } pmd_t; typedef struct { unsigned long pgd; } pgd_t; --- linux/include/asm-i386/pgtable.h.orig +++ linux/include/asm-i386/pgtable.h @@ -43,19 +43,15 @@ void pgd_dtor(void *, kmem_cache_t *, un void pgtable_cache_init(void); void paging_init(void); -#endif /* !__ASSEMBLY__ */ - /* * The Linux x86 paging architecture is 'compile-time dual-mode', it * implements both the traditional 2-level x86 page tables and the * newer 3-level PAE-mode page tables. */ -#ifndef __ASSEMBLY__ #ifdef CONFIG_X86_PAE -# include <asm/pgtable-3level.h> +# include <asm/pgtable-3level-defs.h> #else -# include <asm/pgtable-2level.h> -#endif +# include <asm/pgtable-2level-defs.h> #endif #define PMD_SIZE (1UL << PMD_SHIFT) @@ -73,8 +69,6 @@ void paging_init(void); #define BOOT_USER_PGD_PTRS (__PAGE_OFFSET >> TWOLEVEL_PGDIR_SHIFT) #define BOOT_KERNEL_PGD_PTRS (1024-BOOT_USER_PGD_PTRS) - -#ifndef __ASSEMBLY__ /* Just any arbitrary offset to the start of the vmalloc VM area: the * current 8MB value just means that there will be a 8MB "hole" after the * physical memory until the kernel virtual memory starts. That means that @@ -223,7 +217,6 @@ extern unsigned long pg0[]; */ static inline int pte_user(pte_t pte) { return (pte).pte_low & _PAGE_USER; } static inline int pte_read(pte_t pte) { return (pte).pte_low & _PAGE_USER; } -static inline int pte_exec(pte_t pte) { return (pte).pte_low & _PAGE_USER; } static inline int pte_dirty(pte_t pte) { return (pte).pte_low & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return (pte).pte_low & _PAGE_ACCESSED; } static inline int pte_write(pte_t pte) { return (pte).pte_low & _PAGE_RW; } @@ -244,6 +237,12 @@ static inline pte_t pte_mkdirty(pte_t pt static inline pte_t pte_mkyoung(pte_t pte) { (pte).pte_low |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkwrite(pte_t pte) { (pte).pte_low |= _PAGE_RW; return pte; } +#ifdef CONFIG_X86_PAE +# include <asm/pgtable-3level.h> +#else +# include <asm/pgtable-2level.h> +#endif + static inline int ptep_test_and_clear_dirty(pte_t *ptep) { if (!pte_dirty(*ptep)) --- linux/include/asm-i386/pgtable-2level.h.orig +++ linux/include/asm-i386/pgtable-2level.h @@ -1,22 +1,6 @@ #ifndef _I386_PGTABLE_2LEVEL_H #define _I386_PGTABLE_2LEVEL_H -/* - * traditional i386 two-level paging structure: - */ - -#define PGDIR_SHIFT 22 -#define PTRS_PER_PGD 1024 - -/* - * the i386 is two-level, so we don't really have any - * PMD directory physically. - */ -#define PMD_SHIFT 22 -#define PTRS_PER_PMD 1 - -#define PTRS_PER_PTE 1024 - #define pte_ERROR(e) \ printk("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, (e).pte_low) #define pmd_ERROR(e) \ @@ -64,6 +48,22 @@ static inline pmd_t * pmd_offset(pgd_t * #define pfn_pmd(pfn, prot) __pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot)) /* + * All present user pages are user-executable: + */ +static inline int pte_exec(pte_t pte) +{ + return pte_user(pte); +} + +/* + * All present pages are kernel-executable: + */ +static inline int pte_exec_kernel(pte_t pte) +{ + return 1; +} + +/* * Bits 0, 6 and 7 are taken, split up the 29 bits of offset * into this range: */ --- linux/include/asm-i386/pgtable-3level.h.orig +++ linux/include/asm-i386/pgtable-3level.h @@ -8,24 +8,6 @@ * Copyright (C) 1999 Ingo Molnar <mingo@redhat.com> */ -/* - * PGDIR_SHIFT determines what a top-level page table entry can map - */ -#define PGDIR_SHIFT 30 -#define PTRS_PER_PGD 4 - -/* - * PMD_SHIFT determines the size of the area a middle-level - * page table can map - */ -#define PMD_SHIFT 21 -#define PTRS_PER_PMD 512 - -/* - * entries per page directory level - */ -#define PTRS_PER_PTE 512 - #define pte_ERROR(e) \ printk("%s:%d: bad pte %p(%08lx%08lx).\n", __FILE__, __LINE__, &(e), (e).pte_high, (e).pte_low) #define pmd_ERROR(e) \ @@ -37,6 +19,29 @@ static inline int pgd_none(pgd_t pgd) { static inline int pgd_bad(pgd_t pgd) { return 0; } static inline int pgd_present(pgd_t pgd) { return 1; } +/* + * Is the pte executable? + */ +static inline int pte_x(pte_t pte) +{ + return !(pte_val(pte) & _PAGE_NX); +} + +/* + * All present user-pages with !NX bit are user-executable: + */ +static inline int pte_exec(pte_t pte) +{ + return pte_user(pte) && pte_x(pte); +} +/* + * All present pages with !NX bit are kernel-executable: + */ +static inline int pte_exec_kernel(pte_t pte) +{ + return pte_x(pte); +} + /* Rules for using set_pte: the pte being assigned *must* be * either not present or in a state where the hardware will * not attempt to update the pte. In places where this is --- linux/include/asm-i386/pgtable-3level-defs.h.orig +++ linux/include/asm-i386/pgtable-3level-defs.h @@ -0,0 +1,22 @@ +#ifndef _I386_PGTABLE_3LEVEL_DEFS_H +#define _I386_PGTABLE_3LEVEL_DEFS_H + +/* + * PGDIR_SHIFT determines what a top-level page table entry can map + */ +#define PGDIR_SHIFT 30 +#define PTRS_PER_PGD 4 + +/* + * PMD_SHIFT determines the size of the area a middle-level + * page table can map + */ +#define PMD_SHIFT 21 +#define PTRS_PER_PMD 512 + +/* + * entries per page directory level + */ +#define PTRS_PER_PTE 512 + +#endif /* _I386_PGTABLE_3LEVEL_DEFS_H */ --- linux/include/asm-i386/pgtable-2level-defs.h.orig +++ linux/include/asm-i386/pgtable-2level-defs.h @@ -0,0 +1,20 @@ +#ifndef _I386_PGTABLE_2LEVEL_DEFS_H +#define _I386_PGTABLE_2LEVEL_DEFS_H + +/* + * traditional i386 two-level paging structure: + */ + +#define PGDIR_SHIFT 22 +#define PTRS_PER_PGD 1024 + +/* + * the i386 is two-level, so we don't really have any + * PMD directory physically. + */ +#define PMD_SHIFT 22 +#define PTRS_PER_PMD 1 + +#define PTRS_PER_PTE 1024 + +#endif /* _I386_PGTABLE_2LEVEL_DEFS_H */ [-- Attachment #3: nx-prefetch-fix-update.patch --] [-- Type: text/plain, Size: 6796 bytes --] - introduce pte_exec_kern() to simplify the NX logic - split the definitions out of pgtable-[23]level.h into pgtable-[23]level-defs.h, to enable the former to use generic pte functions from pgtable.h. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- linux/arch/i386/mm/fault.c +++ linux/arch/i386/mm/fault.c @@ -436,8 +436,14 @@ no_context: bust_spinlocks(1); - if (nx_enabled && (error_code & 16)) - printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); +#ifdef CONFIG_X86_PAE + if (error_code & 16) { + pte_t *pte = lookup_address(address); + + if (pte && pte_present(*pte) && !pte_exec_kernel(*pte)) + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); + } +#endif if (address < PAGE_SIZE) printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference"); else --- linux/arch/i386/mm/init.c +++ linux/arch/i386/mm/init.c @@ -470,7 +470,7 @@ int __init set_kernel_exec(unsigned long pte = lookup_address(vaddr); BUG_ON(!pte); - if (pte_val(*pte) & _PAGE_NX) + if (!pte_exec_kernel(*pte)) ret = 0; if (enable) --- linux/include/asm-i386/pgtable-2level-defs.h +++ linux/include/asm-i386/pgtable-2level-defs.h @@ -0,0 +1,20 @@ +#ifndef _I386_PGTABLE_2LEVEL_DEFS_H +#define _I386_PGTABLE_2LEVEL_DEFS_H + +/* + * traditional i386 two-level paging structure: + */ + +#define PGDIR_SHIFT 22 +#define PTRS_PER_PGD 1024 + +/* + * the i386 is two-level, so we don't really have any + * PMD directory physically. + */ +#define PMD_SHIFT 22 +#define PTRS_PER_PMD 1 + +#define PTRS_PER_PTE 1024 + +#endif /* _I386_PGTABLE_2LEVEL_DEFS_H */ --- linux/include/asm-i386/pgtable-2level.h +++ linux/include/asm-i386/pgtable-2level.h @@ -1,22 +1,6 @@ #ifndef _I386_PGTABLE_2LEVEL_H #define _I386_PGTABLE_2LEVEL_H -/* - * traditional i386 two-level paging structure: - */ - -#define PGDIR_SHIFT 22 -#define PTRS_PER_PGD 1024 - -/* - * the i386 is two-level, so we don't really have any - * PMD directory physically. - */ -#define PMD_SHIFT 22 -#define PTRS_PER_PMD 1 - -#define PTRS_PER_PTE 1024 - #define pte_ERROR(e) \ printk("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, (e).pte_low) #define pmd_ERROR(e) \ @@ -64,6 +48,22 @@ static inline pmd_t * pmd_offset(pgd_t * #define pfn_pmd(pfn, prot) __pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot)) /* + * All present user pages are user-executable: + */ +static inline int pte_exec(pte_t pte) +{ + return pte_user(pte); +} + +/* + * All present pages are kernel-executable: + */ +static inline int pte_exec_kernel(pte_t pte) +{ + return 1; +} + +/* * Bits 0, 6 and 7 are taken, split up the 29 bits of offset * into this range: */ --- linux/include/asm-i386/pgtable-3level-defs.h +++ linux/include/asm-i386/pgtable-3level-defs.h @@ -0,0 +1,22 @@ +#ifndef _I386_PGTABLE_3LEVEL_DEFS_H +#define _I386_PGTABLE_3LEVEL_DEFS_H + +/* + * PGDIR_SHIFT determines what a top-level page table entry can map + */ +#define PGDIR_SHIFT 30 +#define PTRS_PER_PGD 4 + +/* + * PMD_SHIFT determines the size of the area a middle-level + * page table can map + */ +#define PMD_SHIFT 21 +#define PTRS_PER_PMD 512 + +/* + * entries per page directory level + */ +#define PTRS_PER_PTE 512 + +#endif /* _I386_PGTABLE_3LEVEL_DEFS_H */ --- linux/include/asm-i386/pgtable-3level.h +++ linux/include/asm-i386/pgtable-3level.h @@ -8,24 +8,6 @@ * Copyright (C) 1999 Ingo Molnar <mingo@redhat.com> */ -/* - * PGDIR_SHIFT determines what a top-level page table entry can map - */ -#define PGDIR_SHIFT 30 -#define PTRS_PER_PGD 4 - -/* - * PMD_SHIFT determines the size of the area a middle-level - * page table can map - */ -#define PMD_SHIFT 21 -#define PTRS_PER_PMD 512 - -/* - * entries per page directory level - */ -#define PTRS_PER_PTE 512 - #define pte_ERROR(e) \ printk("%s:%d: bad pte %p(%08lx%08lx).\n", __FILE__, __LINE__, &(e), (e).pte_high, (e).pte_low) #define pmd_ERROR(e) \ @@ -37,6 +19,29 @@ static inline int pgd_none(pgd_t pgd) { static inline int pgd_bad(pgd_t pgd) { return 0; } static inline int pgd_present(pgd_t pgd) { return 1; } +/* + * Is the pte executable? + */ +static inline int pte_x(pte_t pte) +{ + return !(pte_val(pte) & _PAGE_NX); +} + +/* + * All present user-pages with !NX bit are user-executable: + */ +static inline int pte_exec(pte_t pte) +{ + return pte_user(pte) && pte_x(pte); +} +/* + * All present pages with !NX bit are kernel-executable: + */ +static inline int pte_exec_kernel(pte_t pte) +{ + return pte_x(pte); +} + /* Rules for using set_pte: the pte being assigned *must* be * either not present or in a state where the hardware will * not attempt to update the pte. In places where this is --- linux/include/asm-i386/pgtable.h +++ linux/include/asm-i386/pgtable.h @@ -43,19 +43,15 @@ void pgd_dtor(void *, kmem_cache_t *, un void pgtable_cache_init(void); void paging_init(void); -#endif /* !__ASSEMBLY__ */ - /* * The Linux x86 paging architecture is 'compile-time dual-mode', it * implements both the traditional 2-level x86 page tables and the * newer 3-level PAE-mode page tables. */ -#ifndef __ASSEMBLY__ #ifdef CONFIG_X86_PAE -# include <asm/pgtable-3level.h> +# include <asm/pgtable-3level-defs.h> #else -# include <asm/pgtable-2level.h> -#endif +# include <asm/pgtable-2level-defs.h> #endif #define PMD_SIZE (1UL << PMD_SHIFT) @@ -73,8 +69,6 @@ void paging_init(void); #define BOOT_USER_PGD_PTRS (__PAGE_OFFSET >> TWOLEVEL_PGDIR_SHIFT) #define BOOT_KERNEL_PGD_PTRS (1024-BOOT_USER_PGD_PTRS) - -#ifndef __ASSEMBLY__ /* Just any arbitrary offset to the start of the vmalloc VM area: the * current 8MB value just means that there will be a 8MB "hole" after the * physical memory until the kernel virtual memory starts. That means that @@ -223,7 +217,6 @@ extern unsigned long pg0[]; */ static inline int pte_user(pte_t pte) { return (pte).pte_low & _PAGE_USER; } static inline int pte_read(pte_t pte) { return (pte).pte_low & _PAGE_USER; } -static inline int pte_exec(pte_t pte) { return (pte).pte_low & _PAGE_USER; } static inline int pte_dirty(pte_t pte) { return (pte).pte_low & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return (pte).pte_low & _PAGE_ACCESSED; } static inline int pte_write(pte_t pte) { return (pte).pte_low & _PAGE_RW; } @@ -244,6 +237,12 @@ static inline pte_t pte_mkdirty(pte_t pt static inline pte_t pte_mkyoung(pte_t pte) { (pte).pte_low |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkwrite(pte_t pte) { (pte).pte_low |= _PAGE_RW; return pte; } +#ifdef CONFIG_X86_PAE +# include <asm/pgtable-3level.h> +#else +# include <asm/pgtable-2level.h> +#endif + static inline int ptep_test_and_clear_dirty(pte_t *ptep) { if (!pte_dirty(*ptep)) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-06-30 1:38 Do x86 NX and AMD prefetch check cause page fault infinite loop? Jamie Lokier 2004-06-30 5:50 ` Ingo Molnar @ 2004-06-30 6:10 ` Denis Vlasenko 2004-06-30 14:23 ` Jamie Lokier 1 sibling, 1 reply; 11+ messages in thread From: Denis Vlasenko @ 2004-06-30 6:10 UTC (permalink / raw) To: Jamie Lokier, linux-kernel, Andi Kleen, Ingo Molnar On Wednesday 30 June 2004 04:38, Jamie Lokier wrote: > I was looking at the code which checks for prefetch instructions in > the page fault path on x86 and x86-64, due to the AMD bug where > prefetch sometimes causes unwanted faults. > > I wondered if simply returning when *EIP points to a prefetch > instruction could cause an infinite loop of page faults, instead of > the wanted SIGSEGV or SIGBUS. I know we went over it before, but I > had another look. > > AMD already confirmed that the erroneous fault won't reoccur when a > prefetch instruction is returned to from the fault handler. So a loop > can only occur if it's _not_ an erroneous fault, but instead the > __is_prefetch() code is preventing a normal signal from being ever > raised. [snip] > But... what if the page is not executable? When NX is enabled on > 32-bit x86, and all x86-64 kernels, or even the exec-shield patch's > changes to the USER_CS limit (that limit isn't checked in > __is_prefetch) - those conditions all allow __is_prefetch() to read a > prefetch instruction, cause the fault handler to return, and repeat. > > This can only happen when something branches to a page with PROT_EXEC > _not_ set, on a kernel which honours that, and the target address is a > prefetch instruction. Well. To be safe, just skip prefetch instruction, always. Hm. An attacker can supply us with whole gigabyte of prefetches back-to-back... Better skip all prefetches, with resheduling between every 1000 of them. -- vda ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop? 2004-06-30 6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko @ 2004-06-30 14:23 ` Jamie Lokier 0 siblings, 0 replies; 11+ messages in thread From: Jamie Lokier @ 2004-06-30 14:23 UTC (permalink / raw) To: Denis Vlasenko; +Cc: linux-kernel, Andi Kleen, Ingo Molnar Denis Vlasenko wrote: > > This can only happen when something branches to a page with PROT_EXEC > > _not_ set, on a kernel which honours that, and the target address is a > > prefetch instruction. > > Well. To be safe, just skip prefetch instruction, always. > Hm. An attacker can supply us with whole gigabyte of > prefetches back-to-back... Better skip all prefetches, > with resheduling between every 1000 of them. You could just skip one and return from the handler. That's not a bad idea. -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-07-02 13:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-30 1:38 Do x86 NX and AMD prefetch check cause page fault infinite loop? Jamie Lokier 2004-06-30 5:50 ` Ingo Molnar 2004-06-30 14:21 ` Jamie Lokier 2004-06-30 14:38 ` Jamie Lokier 2004-07-01 1:48 ` Jamie Lokier 2004-07-01 6:32 ` Ingo Molnar 2004-07-01 15:04 ` Jamie Lokier 2004-07-02 7:15 ` Ingo Molnar 2004-07-02 8:50 ` [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 Ingo Molnar 2004-06-30 6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko 2004-06-30 14:23 ` Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox