Do x86 NX and AMD prefetch check cause page fault infinite loop?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Do x86 NX and AMD prefetch check cause page fault infinite loop?
@ 2004-06-30  1:38 Jamie Lokier
  2004-06-30  5:50 ` Ingo Molnar
  2004-06-30  6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko
  0 siblings, 2 replies; 11+ messages in thread
From: Jamie Lokier @ 2004-06-30  1:38 UTC (permalink / raw)
  To: linux-kernel, Andi Kleen, Ingo Molnar

I was looking at the code which checks for prefetch instructions in
the page fault path on x86 and x86-64, due to the AMD bug where
prefetch sometimes causes unwanted faults.

I wondered if simply returning when *EIP points to a prefetch
instruction could cause an infinite loop of page faults, instead of
the wanted SIGSEGV or SIGBUS.  I know we went over it before, but I
had another look.

AMD already confirmed that the erroneous fault won't reoccur when a
prefetch instruction is returned to from the fault handler.  So a loop
can only occur if it's _not_ an erroneous fault, but instead the
__is_prefetch() code is preventing a normal signal from being ever
raised.

Can this happen?  For it to happen, returning must immediately raise
another fault, and that only happens if the page permission of *EIP
doesn't permit execution.

That can happen if another thread's changing permissions, but it's
only transient -- it's not a loop.  Can it happen otherwise?

For __is_prefetch() to say it's a prefetch instruction, it must
successfully read and decode the instruction.

That can only happen if the page containing the instruction is mapped
readable (i.e. on x86 that means anything other than PROT_NONE), and
the code segment limit is ok.

That means the answer to "can it get stuck in a loop" is _no_ on a
plain 32-bit x86.  That's because all such pages are executable and
within the bounds of the code segment, even if it's a user-setup code
segment.

But... what if the page is not executable?  When NX is enabled on
32-bit x86, and all x86-64 kernels, or even the exec-shield patch's
changes to the USER_CS limit (that limit isn't checked in
__is_prefetch) - those conditions all allow __is_prefetch() to read a
prefetch instruction, cause the fault handler to return, and repeat.

This can only happen when something branches to a page with PROT_EXEC
_not_ set, on a kernel which honours that, and the target address is a
prefetch instruction.

That can happen due to malicious code, a programming error, or
corruption.

The behaviour in such cases _should_ be SIGSEGV due to lack of execute
permission.  However, I think the behaviour will be an infinite loop.

I haven't tested this as I don't have the hardware for NX, and don't
want to apply the non-NX exec-shield or PaX patches on a working Athlon box.

Can anyone confirm this is a real bug, or that it isn't and I missed
the reason why not?

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-06-30  1:38 Do x86 NX and AMD prefetch check cause page fault infinite loop? Jamie Lokier
@ 2004-06-30  5:50 ` Ingo Molnar
  2004-06-30 14:21   ` Jamie Lokier
  2004-06-30 14:38   ` Jamie Lokier
  2004-06-30  6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko
  1 sibling, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2004-06-30  5:50 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen


* Jamie Lokier <jamie@shareable.org> wrote:

> But... what if the page is not executable?  When NX is enabled on
> 32-bit x86, and all x86-64 kernels, or even the exec-shield patch's
> changes to the USER_CS limit (that limit isn't checked in
> __is_prefetch) - those conditions all allow __is_prefetch() to read a
> prefetch instruction, cause the fault handler to return, and repeat.
> 
> This can only happen when something branches to a page with PROT_EXEC
> _not_ set, on a kernel which honours that, and the target address is a
> prefetch instruction.
> 
> That can happen due to malicious code, a programming error, or
> corruption.
> 
> The behaviour in such cases _should_ be SIGSEGV due to lack of execute
> permission.  However, I think the behaviour will be an infinite loop.
> 
> I haven't tested this as I don't have the hardware for NX, and don't
> want to apply the non-NX exec-shield or PaX patches on a working
> Athlon box.
> 
> Can anyone confirm this is a real bug, or that it isn't and I missed
> the reason why not?

i understand what you mean, but for this to trigger one would have to
trigger the prefetch erratum _and_ then turn off executability in
parallel, right? So the question is, is there a reliable way to trigger
the pagefault situation, and if yes, how do you turn on NX - because
right before the fault the instruction had to be executable.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-06-30  1:38 Do x86 NX and AMD prefetch check cause page fault infinite loop? Jamie Lokier
  2004-06-30  5:50 ` Ingo Molnar
@ 2004-06-30  6:10 ` Denis Vlasenko
  2004-06-30 14:23   ` Jamie Lokier
  1 sibling, 1 reply; 11+ messages in thread
From: Denis Vlasenko @ 2004-06-30  6:10 UTC (permalink / raw)
  To: Jamie Lokier, linux-kernel, Andi Kleen, Ingo Molnar

On Wednesday 30 June 2004 04:38, Jamie Lokier wrote:
> I was looking at the code which checks for prefetch instructions in
> the page fault path on x86 and x86-64, due to the AMD bug where
> prefetch sometimes causes unwanted faults.
>
> I wondered if simply returning when *EIP points to a prefetch
> instruction could cause an infinite loop of page faults, instead of
> the wanted SIGSEGV or SIGBUS.  I know we went over it before, but I
> had another look.
>
> AMD already confirmed that the erroneous fault won't reoccur when a
> prefetch instruction is returned to from the fault handler.  So a loop
> can only occur if it's _not_ an erroneous fault, but instead the
> __is_prefetch() code is preventing a normal signal from being ever
> raised.
[snip]
> But... what if the page is not executable?  When NX is enabled on
> 32-bit x86, and all x86-64 kernels, or even the exec-shield patch's
> changes to the USER_CS limit (that limit isn't checked in
> __is_prefetch) - those conditions all allow __is_prefetch() to read a
> prefetch instruction, cause the fault handler to return, and repeat.
>
> This can only happen when something branches to a page with PROT_EXEC
> _not_ set, on a kernel which honours that, and the target address is a
> prefetch instruction.

Well. To be safe, just skip prefetch instruction, always.
Hm. An attacker can supply us with whole gigabyte of
prefetches back-to-back... Better skip all prefetches,
with resheduling between every 1000 of them.
-- 
vda

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-06-30  5:50 ` Ingo Molnar
@ 2004-06-30 14:21   ` Jamie Lokier
  2004-06-30 14:38   ` Jamie Lokier
  1 sibling, 0 replies; 11+ messages in thread
From: Jamie Lokier @ 2004-06-30 14:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:
> i understand what you mean, but for this to trigger one would have to
> trigger the prefetch erratum _and_ then turn off executability in
> parallel, right? So the question is, is there a reliable way to trigger
> the pagefault situation, and if yes, how do you turn on NX - because
> right before the fault the instruction had to be executable.

No need for anything in parallel.

I think you can trigger it by jumping to a non-PROT_EXEC page where
the target address is a prefetch -- or by falling through from the end
of a PROT_EXEC page to a non-PROT_EXEC one.

To be sure both cases are obscure, but the resulting loop is still wrong.

Who knows, perhaps internal conditions of the chip prevent these
particular prefetches from triggering the fault.  After all, we're
told that on returning from the fault handler, the prefetch won't
fault again, and it's not obvious why that should be.  It'd be very
subtle though, and deserve a comment.

-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-06-30  6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko
@ 2004-06-30 14:23   ` Jamie Lokier
  0 siblings, 0 replies; 11+ messages in thread
From: Jamie Lokier @ 2004-06-30 14:23 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: linux-kernel, Andi Kleen, Ingo Molnar

Denis Vlasenko wrote:
> > This can only happen when something branches to a page with PROT_EXEC
> > _not_ set, on a kernel which honours that, and the target address is a
> > prefetch instruction.
> 
> Well. To be safe, just skip prefetch instruction, always.
> Hm. An attacker can supply us with whole gigabyte of
> prefetches back-to-back... Better skip all prefetches,
> with resheduling between every 1000 of them.

You could just skip one and return from the handler.
That's not a bad idea.

-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-06-30  5:50 ` Ingo Molnar
  2004-06-30 14:21   ` Jamie Lokier
@ 2004-06-30 14:38   ` Jamie Lokier
  2004-07-01  1:48     ` Jamie Lokier
  1 sibling, 1 reply; 11+ messages in thread
From: Jamie Lokier @ 2004-06-30 14:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen

On x86 and x86-64 with NX, is a fault due to non-exec permission
distinguishable from a fault due to lack of read/write permissions?

I.e. does the flags word have a different bit set?

If so, the solution is simple: don't just return if it's a non-exec fault.

(It's possible that won't work if the CPU is very speculative and
generates data faults from prefetches despite them being in non-exec
area -- i.e. if the buggy data fault gets precedence over the non-exec
fault or segment.  But I'd hope that's not the case).

-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-06-30 14:38   ` Jamie Lokier
@ 2004-07-01  1:48     ` Jamie Lokier
  2004-07-01  6:32       ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Jamie Lokier @ 2004-07-01  1:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen

Ingo, I think I now know what must be added to your 32-bit NX patch to
prevent the "infinite loop without a signal" problem.

It appears the correct way to prevent that one possibility I thought
of, with no side effects, is to add this test in
i386/mm/fault.c:is_prefetch():

        /* Catch an obscure case of prefetch inside an NX page. */
        if (error_code & 16)
                return 0;

That means that it doesn't count as a prefetch fault if it's an
_instruction_ fault.  I.e. an instruction fault will always raise a
signal.  Bit 4 of error_code was kindly added alongside the NX feature
by AMD.

(Tweak: Because early Intel 64-bit chips don't have NX, perhaps it
should say "if ((error_code & 16) && boot_cpu_has(X86_FEATURE_NX))"
instead -- if we find the bit isn't architecturally set to 0 for those
chips).

This test isn't needed in the plain, non-NX i386 kernel, because the
condition can never occur.  (Actually it can once, a really obscure
condition due to separate ITLB and DTLB loading and page table races
with other CPUs, but it's transient so won't loop infinitely).

Enjoy,
-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-07-01  1:48     ` Jamie Lokier
@ 2004-07-01  6:32       ` Ingo Molnar
  2004-07-01 15:04         ` Jamie Lokier
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2004-07-01  6:32 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]


* Jamie Lokier <jamie@shareable.org> wrote:

> Ingo, I think I now know what must be added to your 32-bit NX patch to
> prevent the "infinite loop without a signal" problem.
> 
> It appears the correct way to prevent that one possibility I thought
> of, with no side effects, is to add this test in
> i386/mm/fault.c:is_prefetch():
> 
>         /* Catch an obscure case of prefetch inside an NX page. */
>         if (error_code & 16)
>                 return 0;
> 
> That means that it doesn't count as a prefetch fault if it's an
> _instruction_ fault.  I.e. an instruction fault will always raise a
> signal.  Bit 4 of error_code was kindly added alongside the NX feature
> by AMD.
> 
> (Tweak: Because early Intel 64-bit chips don't have NX, perhaps it
> should say "if ((error_code & 16) && boot_cpu_has(X86_FEATURE_NX))"
> instead -- if we find the bit isn't architecturally set to 0 for those
> chips).

Thanks for the analysis Jamie, this should certainly solve the problem.

I've attached a patch against BK that implements this. I've tested the
patched x86 kernel on an athlon64 box and on a non-NX box - it works
fine. Bit 4 also simplifies the detection of illegal code execution
within the kernel - i retested that too and it still works fine.

	Ingo

[-- Attachment #2: nx-prefetch-fix-2.6.7-A2 --]
[-- Type: text/plain, Size: 3595 bytes --]


- fix possible prefetch-fault loop on NX page, based on suggestions
  from Jamie Lokier.

- clean up nx feature dependencies

- simplify detection of NX-violations when the kernel executes code

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/arch/i386/mm/fault.c.orig	
+++ linux/arch/i386/mm/fault.c	
@@ -188,11 +188,16 @@ static int __is_prefetch(struct pt_regs 
 	return prefetch;
 }
 
-static inline int is_prefetch(struct pt_regs *regs, unsigned long addr)
+static inline int is_prefetch(struct pt_regs *regs, unsigned long addr,
+			      unsigned long error_code)
 {
 	if (unlikely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-		     boot_cpu_data.x86 >= 6))
+		     boot_cpu_data.x86 >= 6)) {
+		/* Catch an obscure case of prefetch inside an NX page. */
+		if (nx_enabled && (error_code & 16))
+			return 0;
 		return __is_prefetch(regs, addr);
+	}
 	return 0;
 } 
 
@@ -374,7 +379,7 @@ bad_area_nosemaphore:
 		 * Valid to do another page fault here because this one came 
 		 * from user space.
 		 */
-		if (is_prefetch(regs, address))
+		if (is_prefetch(regs, address, error_code))
 			return;
 
 		tsk->thread.cr2 = address;
@@ -415,7 +420,7 @@ no_context:
 	 * had been triggered by is_prefetch fixup_exception would have 
 	 * handled it.
 	 */
- 	if (is_prefetch(regs, address))
+ 	if (is_prefetch(regs, address, error_code))
  		return;
 
 /*
@@ -425,21 +430,8 @@ no_context:
 
 	bust_spinlocks(1);
 
-#ifdef CONFIG_X86_PAE
-	{
-		pgd_t *pgd;
-		pmd_t *pmd;
-
-
-
-		pgd = init_mm.pgd + pgd_index(address);
-		if (pgd_present(*pgd)) {
-			pmd = pmd_offset(pgd, address);
-			if (pmd_val(*pmd) & _PAGE_NX)
-				printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
-		}
-	}
-#endif
+	if (nx_enabled && (error_code & 16))
+		printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
 	if (address < PAGE_SIZE)
 		printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference");
 	else
@@ -492,7 +484,7 @@ do_sigbus:
 		goto no_context;
 
 	/* User space => ok to do another page fault */
-	if (is_prefetch(regs, address))
+	if (is_prefetch(regs, address, error_code))
 		return;
 
 	tsk->thread.cr2 = address;
--- linux/arch/i386/mm/init.c.orig	
+++ linux/arch/i386/mm/init.c	
@@ -437,7 +437,7 @@ static int __init noexec_setup(char *str
 __setup("noexec=", noexec_setup);
 
 #ifdef CONFIG_X86_PAE
-static int use_nx = 0;
+int nx_enabled = 0;
 
 static void __init set_nx(void)
 {
@@ -449,7 +449,7 @@ static void __init set_nx(void)
 			rdmsr(MSR_EFER, l, h);
 			l |= EFER_NX;
 			wrmsr(MSR_EFER, l, h);
-			use_nx = 1;
+			nx_enabled = 1;
 			__supported_pte_mask |= _PAGE_NX;
 		}
 	}
@@ -468,7 +468,7 @@ void __init paging_init(void)
 {
 #ifdef CONFIG_X86_PAE
 	set_nx();
-	if (use_nx)
+	if (nx_enabled)
 		printk("NX (Execute Disable) protection: active\n");
 #endif
 
--- linux/include/asm-i386/page.h.orig	
+++ linux/include/asm-i386/page.h	
@@ -41,6 +41,7 @@
  */
 #ifdef CONFIG_X86_PAE
 extern unsigned long long __supported_pte_mask;
+extern int nx_enabled;
 typedef struct { unsigned long pte_low, pte_high; } pte_t;
 typedef struct { unsigned long long pmd; } pmd_t;
 typedef struct { unsigned long long pgd; } pgd_t;
@@ -48,6 +49,7 @@ typedef struct { unsigned long long pgpr
 #define pte_val(x)	((x).pte_low | ((unsigned long long)(x).pte_high << 32))
 #define HPAGE_SHIFT	21
 #else
+#define nx_enabled 0
 typedef struct { unsigned long pte_low; } pte_t;
 typedef struct { unsigned long pmd; } pmd_t;
 typedef struct { unsigned long pgd; } pgd_t;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-07-01  6:32       ` Ingo Molnar
@ 2004-07-01 15:04         ` Jamie Lokier
  2004-07-02  7:15           ` Ingo Molnar
  2004-07-02  8:50           ` [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 Ingo Molnar
  0 siblings, 2 replies; 11+ messages in thread
From: Jamie Lokier @ 2004-07-01 15:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds

Ingo Molnar wrote:
> -#ifdef CONFIG_X86_PAE
> -	{
> -		pgd_t *pgd;
> -		pmd_t *pmd;
> -		pgd = init_mm.pgd + pgd_index(address);
> -		if (pgd_present(*pgd)) {
> -			pmd = pmd_offset(pgd, address);
> -			if (pmd_val(*pmd) & _PAGE_NX)
> -				printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
> -		}
> -	}
> -#endif
> +	if (nx_enabled && (error_code & 16))
> +		printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);

According to AMD's manual, bit 4 of error_code means the fault was due
to an instruction fetch.  It doesn't imply that it's an NX-protected
page: it might be a page not present fault instead.  (The manual
doesn't spell that out, it just says the bit is set when it's an
instruction fetch).

Just so you realise that the above code fragments aren't logically
equivalent.

-- Jamie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Do x86 NX and AMD prefetch check cause page fault infinite loop?
  2004-07-01 15:04         ` Jamie Lokier
@ 2004-07-02  7:15           ` Ingo Molnar
  2004-07-02  8:50           ` [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2004-07-02  7:15 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds


* Jamie Lokier <jamie@shareable.org> wrote:

> > -			if (pmd_val(*pmd) & _PAGE_NX)
> > -				printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
> > -		}
> > -	}
> > -#endif
> > +	if (nx_enabled && (error_code & 16))
> > +		printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
> 
> According to AMD's manual, bit 4 of error_code means the fault was due
> to an instruction fetch.  It doesn't imply that it's an NX-protected
> page: it might be a page not present fault instead.  (The manual
> doesn't spell that out, it just says the bit is set when it's an
> instruction fetch).

you are right, it doesnt say it's an NX related fault.

I'll test this out and send a delta patch.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5
  2004-07-01 15:04         ` Jamie Lokier
  2004-07-02  7:15           ` Ingo Molnar
@ 2004-07-02  8:50           ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2004-07-02  8:50 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, Andi Kleen, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]


* Jamie Lokier <jamie@shareable.org> wrote:

> > +	if (nx_enabled && (error_code & 16))
> > +		printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
> 
> According to AMD's manual, bit 4 of error_code means the fault was due
> to an instruction fetch.  It doesn't imply that it's an NX-protected
> page: it might be a page not present fault instead.  (The manual
> doesn't spell that out, it just says the bit is set when it's an
> instruction fetch).
> 
> Just so you realise that the above code fragments aren't logically
> equivalent.

i've attached an updated nx-prefetch-fix.patch that properly fixes this. 
I've also attached a delta relative to the previous patch. This patch
will only print the 'possible exploit' warning if the kernel tries to
execute a present page. (hence not printing the message in the quite
common jump-to-address-zero crash case.)

I've test-compiled and test-booted the full patch on 2.6.7-mm5 using the
following x86 kernel configs: SMP+PAE, UP+PAE, SMP+!PAE, UP+!PAE, on an
SMP P3 and an Athlon64 box. I've tested various types of
instruction-fetch related kernel faults on the Athlon64 box, it all
works fine.

The full changelog:

- fix possible prefetch-fault loop on NX page, based on suggestions
  from Jamie Lokier.

- clean up nx feature dependencies

- simplify detection of NX-violations when the kernel executes code

- introduce pte_exec_kern() to simplify the NX logic

- split the definitions out of pgtable-[23]level.h into
  pgtable-[23]level-defs.h, to enable the former to use generic
  pte functions from pgtable.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

[-- Attachment #2: nx-prefetch-fix.patch --]
[-- Type: text/plain, Size: 9660 bytes --]


- fix possible prefetch-fault loop on NX page, based on suggestions
  from Jamie Lokier.

- clean up nx feature dependencies

- simplify detection of NX-violations when the kernel executes code

- introduce pte_exec_kern() to simplify the NX logic

- split the definitions out of pgtable-[23]level.h into
  pgtable-[23]level-defs.h, to enable the former to use generic
  pte functions from pgtable.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/arch/i386/mm/fault.c.orig	
+++ linux/arch/i386/mm/fault.c	
@@ -188,11 +188,16 @@ static int __is_prefetch(struct pt_regs 
 	return prefetch;
 }
 
-static inline int is_prefetch(struct pt_regs *regs, unsigned long addr)
+static inline int is_prefetch(struct pt_regs *regs, unsigned long addr,
+			      unsigned long error_code)
 {
 	if (unlikely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-		     boot_cpu_data.x86 >= 6))
+		     boot_cpu_data.x86 >= 6)) {
+		/* Catch an obscure case of prefetch inside an NX page. */
+		if (nx_enabled && (error_code & 16))
+			return 0;
 		return __is_prefetch(regs, addr);
+	}
 	return 0;
 } 
 
@@ -374,7 +379,7 @@ bad_area_nosemaphore:
 		 * Valid to do another page fault here because this one came 
 		 * from user space.
 		 */
-		if (is_prefetch(regs, address))
+		if (is_prefetch(regs, address, error_code))
 			return;
 
 		tsk->thread.cr2 = address;
@@ -415,7 +420,7 @@ no_context:
 	 * had been triggered by is_prefetch fixup_exception would have 
 	 * handled it.
 	 */
- 	if (is_prefetch(regs, address))
+ 	if (is_prefetch(regs, address, error_code))
  		return;
 
 /*
@@ -432,18 +437,11 @@ no_context:
 	bust_spinlocks(1);
 
 #ifdef CONFIG_X86_PAE
-	{
-		pgd_t *pgd;
-		pmd_t *pmd;
-
+	if (error_code & 16) {
+		pte_t *pte = lookup_address(address);
 
-
-		pgd = init_mm.pgd + pgd_index(address);
-		if (pgd_present(*pgd)) {
-			pmd = pmd_offset(pgd, address);
-			if (pmd_val(*pmd) & _PAGE_NX)
-				printk(KERN_CRIT "kernel tried to access NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
-		}
+		if (pte && pte_present(*pte) && !pte_exec_kernel(*pte))
+			printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
 	}
 #endif
 	if (address < PAGE_SIZE)
@@ -498,7 +496,7 @@ do_sigbus:
 		goto no_context;
 
 	/* User space => ok to do another page fault */
-	if (is_prefetch(regs, address))
+	if (is_prefetch(regs, address, error_code))
 		return;
 
 	tsk->thread.cr2 = address;
--- linux/arch/i386/mm/init.c.orig	
+++ linux/arch/i386/mm/init.c	
@@ -437,7 +437,7 @@ static int __init noexec_setup(char *str
 __setup("noexec=", noexec_setup);
 
 #ifdef CONFIG_X86_PAE
-static int use_nx = 0;
+int nx_enabled = 0;
 
 static void __init set_nx(void)
 {
@@ -449,7 +449,7 @@ static void __init set_nx(void)
 			rdmsr(MSR_EFER, l, h);
 			l |= EFER_NX;
 			wrmsr(MSR_EFER, l, h);
-			use_nx = 1;
+			nx_enabled = 1;
 			__supported_pte_mask |= _PAGE_NX;
 		}
 	}
@@ -470,7 +470,7 @@ int __init set_kernel_exec(unsigned long
 	pte = lookup_address(vaddr);
 	BUG_ON(!pte);
 
-	if (pte_val(*pte) & _PAGE_NX)
+	if (!pte_exec_kernel(*pte))
 		ret = 0;
 
 	if (enable)
@@ -495,7 +495,7 @@ void __init paging_init(void)
 {
 #ifdef CONFIG_X86_PAE
 	set_nx();
-	if (use_nx)
+	if (nx_enabled)
 		printk("NX (Execute Disable) protection: active\n");
 #endif
 
--- linux/include/asm-i386/page.h.orig	
+++ linux/include/asm-i386/page.h	
@@ -41,6 +41,7 @@
  */
 #ifdef CONFIG_X86_PAE
 extern unsigned long long __supported_pte_mask;
+extern int nx_enabled;
 typedef struct { unsigned long pte_low, pte_high; } pte_t;
 typedef struct { unsigned long long pmd; } pmd_t;
 typedef struct { unsigned long long pgd; } pgd_t;
@@ -48,6 +49,7 @@ typedef struct { unsigned long long pgpr
 #define pte_val(x)	((x).pte_low | ((unsigned long long)(x).pte_high << 32))
 #define HPAGE_SHIFT	21
 #else
+#define nx_enabled 0
 typedef struct { unsigned long pte_low; } pte_t;
 typedef struct { unsigned long pmd; } pmd_t;
 typedef struct { unsigned long pgd; } pgd_t;
--- linux/include/asm-i386/pgtable.h.orig	
+++ linux/include/asm-i386/pgtable.h	
@@ -43,19 +43,15 @@ void pgd_dtor(void *, kmem_cache_t *, un
 void pgtable_cache_init(void);
 void paging_init(void);
 
-#endif /* !__ASSEMBLY__ */
-
 /*
  * The Linux x86 paging architecture is 'compile-time dual-mode', it
  * implements both the traditional 2-level x86 page tables and the
  * newer 3-level PAE-mode page tables.
  */
-#ifndef __ASSEMBLY__
 #ifdef CONFIG_X86_PAE
-# include <asm/pgtable-3level.h>
+# include <asm/pgtable-3level-defs.h>
 #else
-# include <asm/pgtable-2level.h>
-#endif
+# include <asm/pgtable-2level-defs.h>
 #endif
 
 #define PMD_SIZE	(1UL << PMD_SHIFT)
@@ -73,8 +69,6 @@ void paging_init(void);
 #define BOOT_USER_PGD_PTRS (__PAGE_OFFSET >> TWOLEVEL_PGDIR_SHIFT)
 #define BOOT_KERNEL_PGD_PTRS (1024-BOOT_USER_PGD_PTRS)
 
-
-#ifndef __ASSEMBLY__
 /* Just any arbitrary offset to the start of the vmalloc VM area: the
  * current 8MB value just means that there will be a 8MB "hole" after the
  * physical memory until the kernel virtual memory starts.  That means that
@@ -223,7 +217,6 @@ extern unsigned long pg0[];
  */
 static inline int pte_user(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
 static inline int pte_read(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
-static inline int pte_exec(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
 static inline int pte_dirty(pte_t pte)		{ return (pte).pte_low & _PAGE_DIRTY; }
 static inline int pte_young(pte_t pte)		{ return (pte).pte_low & _PAGE_ACCESSED; }
 static inline int pte_write(pte_t pte)		{ return (pte).pte_low & _PAGE_RW; }
@@ -244,6 +237,12 @@ static inline pte_t pte_mkdirty(pte_t pt
 static inline pte_t pte_mkyoung(pte_t pte)	{ (pte).pte_low |= _PAGE_ACCESSED; return pte; }
 static inline pte_t pte_mkwrite(pte_t pte)	{ (pte).pte_low |= _PAGE_RW; return pte; }
 
+#ifdef CONFIG_X86_PAE
+# include <asm/pgtable-3level.h>
+#else
+# include <asm/pgtable-2level.h>
+#endif
+
 static inline int ptep_test_and_clear_dirty(pte_t *ptep)
 {
 	if (!pte_dirty(*ptep))
--- linux/include/asm-i386/pgtable-2level.h.orig	
+++ linux/include/asm-i386/pgtable-2level.h	
@@ -1,22 +1,6 @@
 #ifndef _I386_PGTABLE_2LEVEL_H
 #define _I386_PGTABLE_2LEVEL_H
 
-/*
- * traditional i386 two-level paging structure:
- */
-
-#define PGDIR_SHIFT	22
-#define PTRS_PER_PGD	1024
-
-/*
- * the i386 is two-level, so we don't really have any
- * PMD directory physically.
- */
-#define PMD_SHIFT	22
-#define PTRS_PER_PMD	1
-
-#define PTRS_PER_PTE	1024
-
 #define pte_ERROR(e) \
 	printk("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, (e).pte_low)
 #define pmd_ERROR(e) \
@@ -64,6 +48,22 @@ static inline pmd_t * pmd_offset(pgd_t *
 #define pfn_pmd(pfn, prot)	__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 /*
+ * All present user pages are user-executable:
+ */
+static inline int pte_exec(pte_t pte)
+{
+	return pte_user(pte);
+}
+
+/*
+ * All present pages are kernel-executable:
+ */
+static inline int pte_exec_kernel(pte_t pte)
+{
+	return 1;
+}
+
+/*
  * Bits 0, 6 and 7 are taken, split up the 29 bits of offset
  * into this range:
  */
--- linux/include/asm-i386/pgtable-3level.h.orig	
+++ linux/include/asm-i386/pgtable-3level.h	
@@ -8,24 +8,6 @@
  * Copyright (C) 1999 Ingo Molnar <mingo@redhat.com>
  */
 
-/*
- * PGDIR_SHIFT determines what a top-level page table entry can map
- */
-#define PGDIR_SHIFT	30
-#define PTRS_PER_PGD	4
-
-/*
- * PMD_SHIFT determines the size of the area a middle-level
- * page table can map
- */
-#define PMD_SHIFT	21
-#define PTRS_PER_PMD	512
-
-/*
- * entries per page directory level
- */
-#define PTRS_PER_PTE	512
-
 #define pte_ERROR(e) \
 	printk("%s:%d: bad pte %p(%08lx%08lx).\n", __FILE__, __LINE__, &(e), (e).pte_high, (e).pte_low)
 #define pmd_ERROR(e) \
@@ -37,6 +19,29 @@ static inline int pgd_none(pgd_t pgd)		{
 static inline int pgd_bad(pgd_t pgd)		{ return 0; }
 static inline int pgd_present(pgd_t pgd)	{ return 1; }
 
+/*
+ * Is the pte executable?
+ */
+static inline int pte_x(pte_t pte)
+{
+	return !(pte_val(pte) & _PAGE_NX);
+}
+
+/*
+ * All present user-pages with !NX bit are user-executable:
+ */
+static inline int pte_exec(pte_t pte)
+{
+	return pte_user(pte) && pte_x(pte);
+}
+/*
+ * All present pages with !NX bit are kernel-executable:
+ */
+static inline int pte_exec_kernel(pte_t pte)
+{
+	return pte_x(pte);
+}
+
 /* Rules for using set_pte: the pte being assigned *must* be
  * either not present or in a state where the hardware will
  * not attempt to update the pte.  In places where this is
--- linux/include/asm-i386/pgtable-3level-defs.h.orig	
+++ linux/include/asm-i386/pgtable-3level-defs.h	
@@ -0,0 +1,22 @@
+#ifndef _I386_PGTABLE_3LEVEL_DEFS_H
+#define _I386_PGTABLE_3LEVEL_DEFS_H
+
+/*
+ * PGDIR_SHIFT determines what a top-level page table entry can map
+ */
+#define PGDIR_SHIFT	30
+#define PTRS_PER_PGD	4
+
+/*
+ * PMD_SHIFT determines the size of the area a middle-level
+ * page table can map
+ */
+#define PMD_SHIFT	21
+#define PTRS_PER_PMD	512
+
+/*
+ * entries per page directory level
+ */
+#define PTRS_PER_PTE	512
+
+#endif /* _I386_PGTABLE_3LEVEL_DEFS_H */
--- linux/include/asm-i386/pgtable-2level-defs.h.orig	
+++ linux/include/asm-i386/pgtable-2level-defs.h	
@@ -0,0 +1,20 @@
+#ifndef _I386_PGTABLE_2LEVEL_DEFS_H
+#define _I386_PGTABLE_2LEVEL_DEFS_H
+
+/*
+ * traditional i386 two-level paging structure:
+ */
+
+#define PGDIR_SHIFT	22
+#define PTRS_PER_PGD	1024
+
+/*
+ * the i386 is two-level, so we don't really have any
+ * PMD directory physically.
+ */
+#define PMD_SHIFT	22
+#define PTRS_PER_PMD	1
+
+#define PTRS_PER_PTE	1024
+
+#endif /* _I386_PGTABLE_2LEVEL_DEFS_H */

[-- Attachment #3: nx-prefetch-fix-update.patch --]
[-- Type: text/plain, Size: 6796 bytes --]


- introduce pte_exec_kern() to simplify the NX logic

- split the definitions out of pgtable-[23]level.h into
  pgtable-[23]level-defs.h, to enable the former to use generic
  pte functions from pgtable.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/arch/i386/mm/fault.c	
+++ linux/arch/i386/mm/fault.c	
@@ -436,8 +436,14 @@ no_context:
 
 	bust_spinlocks(1);
 
-	if (nx_enabled && (error_code & 16))
-		printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
+#ifdef CONFIG_X86_PAE
+	if (error_code & 16) {
+		pte_t *pte = lookup_address(address);
+
+		if (pte && pte_present(*pte) && !pte_exec_kernel(*pte))
+			printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid);
+	}
+#endif
 	if (address < PAGE_SIZE)
 		printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference");
 	else
--- linux/arch/i386/mm/init.c	
+++ linux/arch/i386/mm/init.c	
@@ -470,7 +470,7 @@ int __init set_kernel_exec(unsigned long
 	pte = lookup_address(vaddr);
 	BUG_ON(!pte);
 
-	if (pte_val(*pte) & _PAGE_NX)
+	if (!pte_exec_kernel(*pte))
 		ret = 0;
 
 	if (enable)
--- linux/include/asm-i386/pgtable-2level-defs.h	
+++ linux/include/asm-i386/pgtable-2level-defs.h	
@@ -0,0 +1,20 @@
+#ifndef _I386_PGTABLE_2LEVEL_DEFS_H
+#define _I386_PGTABLE_2LEVEL_DEFS_H
+
+/*
+ * traditional i386 two-level paging structure:
+ */
+
+#define PGDIR_SHIFT	22
+#define PTRS_PER_PGD	1024
+
+/*
+ * the i386 is two-level, so we don't really have any
+ * PMD directory physically.
+ */
+#define PMD_SHIFT	22
+#define PTRS_PER_PMD	1
+
+#define PTRS_PER_PTE	1024
+
+#endif /* _I386_PGTABLE_2LEVEL_DEFS_H */
--- linux/include/asm-i386/pgtable-2level.h	
+++ linux/include/asm-i386/pgtable-2level.h	
@@ -1,22 +1,6 @@
 #ifndef _I386_PGTABLE_2LEVEL_H
 #define _I386_PGTABLE_2LEVEL_H
 
-/*
- * traditional i386 two-level paging structure:
- */
-
-#define PGDIR_SHIFT	22
-#define PTRS_PER_PGD	1024
-
-/*
- * the i386 is two-level, so we don't really have any
- * PMD directory physically.
- */
-#define PMD_SHIFT	22
-#define PTRS_PER_PMD	1
-
-#define PTRS_PER_PTE	1024
-
 #define pte_ERROR(e) \
 	printk("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, (e).pte_low)
 #define pmd_ERROR(e) \
@@ -64,6 +48,22 @@ static inline pmd_t * pmd_offset(pgd_t *
 #define pfn_pmd(pfn, prot)	__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 /*
+ * All present user pages are user-executable:
+ */
+static inline int pte_exec(pte_t pte)
+{
+	return pte_user(pte);
+}
+
+/*
+ * All present pages are kernel-executable:
+ */
+static inline int pte_exec_kernel(pte_t pte)
+{
+	return 1;
+}
+
+/*
  * Bits 0, 6 and 7 are taken, split up the 29 bits of offset
  * into this range:
  */
--- linux/include/asm-i386/pgtable-3level-defs.h	
+++ linux/include/asm-i386/pgtable-3level-defs.h	
@@ -0,0 +1,22 @@
+#ifndef _I386_PGTABLE_3LEVEL_DEFS_H
+#define _I386_PGTABLE_3LEVEL_DEFS_H
+
+/*
+ * PGDIR_SHIFT determines what a top-level page table entry can map
+ */
+#define PGDIR_SHIFT	30
+#define PTRS_PER_PGD	4
+
+/*
+ * PMD_SHIFT determines the size of the area a middle-level
+ * page table can map
+ */
+#define PMD_SHIFT	21
+#define PTRS_PER_PMD	512
+
+/*
+ * entries per page directory level
+ */
+#define PTRS_PER_PTE	512
+
+#endif /* _I386_PGTABLE_3LEVEL_DEFS_H */
--- linux/include/asm-i386/pgtable-3level.h	
+++ linux/include/asm-i386/pgtable-3level.h	
@@ -8,24 +8,6 @@
  * Copyright (C) 1999 Ingo Molnar <mingo@redhat.com>
  */
 
-/*
- * PGDIR_SHIFT determines what a top-level page table entry can map
- */
-#define PGDIR_SHIFT	30
-#define PTRS_PER_PGD	4
-
-/*
- * PMD_SHIFT determines the size of the area a middle-level
- * page table can map
- */
-#define PMD_SHIFT	21
-#define PTRS_PER_PMD	512
-
-/*
- * entries per page directory level
- */
-#define PTRS_PER_PTE	512
-
 #define pte_ERROR(e) \
 	printk("%s:%d: bad pte %p(%08lx%08lx).\n", __FILE__, __LINE__, &(e), (e).pte_high, (e).pte_low)
 #define pmd_ERROR(e) \
@@ -37,6 +19,29 @@ static inline int pgd_none(pgd_t pgd)		{
 static inline int pgd_bad(pgd_t pgd)		{ return 0; }
 static inline int pgd_present(pgd_t pgd)	{ return 1; }
 
+/*
+ * Is the pte executable?
+ */
+static inline int pte_x(pte_t pte)
+{
+	return !(pte_val(pte) & _PAGE_NX);
+}
+
+/*
+ * All present user-pages with !NX bit are user-executable:
+ */
+static inline int pte_exec(pte_t pte)
+{
+	return pte_user(pte) && pte_x(pte);
+}
+/*
+ * All present pages with !NX bit are kernel-executable:
+ */
+static inline int pte_exec_kernel(pte_t pte)
+{
+	return pte_x(pte);
+}
+
 /* Rules for using set_pte: the pte being assigned *must* be
  * either not present or in a state where the hardware will
  * not attempt to update the pte.  In places where this is
--- linux/include/asm-i386/pgtable.h	
+++ linux/include/asm-i386/pgtable.h	
@@ -43,19 +43,15 @@ void pgd_dtor(void *, kmem_cache_t *, un
 void pgtable_cache_init(void);
 void paging_init(void);
 
-#endif /* !__ASSEMBLY__ */
-
 /*
  * The Linux x86 paging architecture is 'compile-time dual-mode', it
  * implements both the traditional 2-level x86 page tables and the
  * newer 3-level PAE-mode page tables.
  */
-#ifndef __ASSEMBLY__
 #ifdef CONFIG_X86_PAE
-# include <asm/pgtable-3level.h>
+# include <asm/pgtable-3level-defs.h>
 #else
-# include <asm/pgtable-2level.h>
-#endif
+# include <asm/pgtable-2level-defs.h>
 #endif
 
 #define PMD_SIZE	(1UL << PMD_SHIFT)
@@ -73,8 +69,6 @@ void paging_init(void);
 #define BOOT_USER_PGD_PTRS (__PAGE_OFFSET >> TWOLEVEL_PGDIR_SHIFT)
 #define BOOT_KERNEL_PGD_PTRS (1024-BOOT_USER_PGD_PTRS)
 
-
-#ifndef __ASSEMBLY__
 /* Just any arbitrary offset to the start of the vmalloc VM area: the
  * current 8MB value just means that there will be a 8MB "hole" after the
  * physical memory until the kernel virtual memory starts.  That means that
@@ -223,7 +217,6 @@ extern unsigned long pg0[];
  */
 static inline int pte_user(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
 static inline int pte_read(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
-static inline int pte_exec(pte_t pte)		{ return (pte).pte_low & _PAGE_USER; }
 static inline int pte_dirty(pte_t pte)		{ return (pte).pte_low & _PAGE_DIRTY; }
 static inline int pte_young(pte_t pte)		{ return (pte).pte_low & _PAGE_ACCESSED; }
 static inline int pte_write(pte_t pte)		{ return (pte).pte_low & _PAGE_RW; }
@@ -244,6 +237,12 @@ static inline pte_t pte_mkdirty(pte_t pt
 static inline pte_t pte_mkyoung(pte_t pte)	{ (pte).pte_low |= _PAGE_ACCESSED; return pte; }
 static inline pte_t pte_mkwrite(pte_t pte)	{ (pte).pte_low |= _PAGE_RW; return pte; }
 
+#ifdef CONFIG_X86_PAE
+# include <asm/pgtable-3level.h>
+#else
+# include <asm/pgtable-2level.h>
+#endif
+
 static inline int ptep_test_and_clear_dirty(pte_t *ptep)
 {
 	if (!pte_dirty(*ptep))

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-07-02 13:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-30  1:38 Do x86 NX and AMD prefetch check cause page fault infinite loop? Jamie Lokier
2004-06-30  5:50 ` Ingo Molnar
2004-06-30 14:21   ` Jamie Lokier
2004-06-30 14:38   ` Jamie Lokier
2004-07-01  1:48     ` Jamie Lokier
2004-07-01  6:32       ` Ingo Molnar
2004-07-01 15:04         ` Jamie Lokier
2004-07-02  7:15           ` Ingo Molnar
2004-07-02  8:50           ` [patch] i386 nx prefetch fix & cleanups, 2.6.7-mm5 Ingo Molnar
2004-06-30  6:10 ` Do x86 NX and AMD prefetch check cause page fault infinite loop? Denis Vlasenko
2004-06-30 14:23   ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox