All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Athlon/Opteron Prefetch Fix for 2.6.0test5 + numbers
@ 2003-09-17  2:22 Andi Kleen
  2003-09-17  2:44 ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2003-09-17  2:22 UTC (permalink / raw)
  To: torvalds, akpm, linux-kernel, richard.brunner


Here is the workaround patch for the Athlon/Opteron prefetch issue again.

Athlon/Opteron CPUs can report false page faults on prefetches in rare
cases. This patch works around it for the kernel and user space.
It checks in the slow path of the fault handler if the faulting instruction
was a prefetch and ignores them if needed.

In some cases the patch can cause an recursive page fault. It has been
carefully written to stop the recursion early. To do this cleanly I had
to remove the special case of deliver SIGBUS for *_user accesses done 
inside the kernel. Instead they are now reported as standard EFAULT
without a signal.

This is much more efficient than the previous workaround used in the kernel,
which checked for AMD CPUs in every prefetch(). This can be seen 
in the size of the vmlinux:

Without patch:
   text    data     bss     dec     hex filename
4020232  665956  169092 4855280  4a15f0 vmlinux
With patch:
4011578  665973  169092 4846643  49f433

The patch with the fault handler fix is ~8K smaller.

The overhead is small. First the delivery path of signals for page faults
is not very hot (rarely used because most programs don't generate 
signals for page faults). I tested it using LMbench2's lat_sig:

Then the prefetch check typically can decide 

[note - take the results with a grain of salt. They vary even between 
runs widely. All results average over 10 runs after another 10 runs to warm
everything up. The tests were run on a 2.4Ghz P4 Xeon.]

With prefetch check:    3.7268 microseconds
Without prefetch check: 3.65945 microseconds

This means the over is ~0.06 microseconds. Actually I suspect the 
difference is beyond lmbench's accuracy, although the no prefetch results
were systematically slightly better than prefetch (as expected). Overall
it has a small overhead, but not really worth caring about.

Just alone for the .text savings I would consider applying it. 
Without it the code for list_for_each looks pretty bad.
Also it helps user space programs too, which the previous workaround didn't.

x86-64 has the same issue, but I will submit that separately.

Please consider applying,

-Andi


--- linux-2.6.0test5-work/arch/i386/mm/fault.c-o	2003-05-27 03:00:20.000000000 +0200
+++ linux-2.6.0test5-work/arch/i386/mm/fault.c	2003-09-10 23:58:52.000000000 +0200
@@ -55,6 +55,77 @@
 	console_loglevel = loglevel_save;
 }
 
+/* Sometimes the CPU reports invalid exceptions on prefetch.
+   Check that here and ignore.
+   Opcode checker based on code by Richard Brunner */
+static int is_prefetch(struct pt_regs *regs, unsigned long addr)
+{ 
+	unsigned char *instr = (unsigned char *)(regs->eip);
+	int scan_more = 1;
+	int prefetch = 0; 
+	unsigned char *max_instr = instr + 15;
+
+	/* Avoid recursive faults. This is just an optimization,
+	   they must be handled correctly too */
+	if (regs->eip == addr)
+		return 0; 
+
+	while (scan_more && instr < max_instr) { 
+		unsigned char opcode;
+		unsigned char instr_hi;
+		unsigned char instr_lo;
+
+		if (__get_user(opcode, instr))
+			break; 
+
+		instr_hi = opcode & 0xf0; 
+		instr_lo = opcode & 0x0f; 
+		instr++;
+
+		switch (instr_hi) { 
+		case 0x20:
+		case 0x30:
+			/* Values 0x26,0x2E,0x36,0x3E are valid x86
+			   prefixes.  In long mode, the CPU will signal
+			   invalid opcode if some of these prefixes are
+			   present so we will never get here anyway */
+			scan_more = ((instr_lo & 7) == 0x6);
+			break;
+			
+		case 0x40:
+			/* May be valid in long mode (REX prefixes) */
+			break; 
+			
+		case 0x60:
+			/* 0x64 thru 0x67 are valid prefixes in all modes. */
+			scan_more = (instr_lo & 0xC) == 0x4;
+			break;		
+		case 0xF0:
+			/* 0xF0, 0xF2, and 0xF3 are valid prefixes in all modes. */
+			scan_more = !instr_lo || (instr_lo>>1) == 1;
+			break;			
+		case 0x00:
+			/* Prefetch instruction is 0x0F0D or 0x0F18 */
+			scan_more = 0;
+			if (__get_user(opcode, instr)) 
+				break;
+			prefetch = (instr_lo == 0xF) &&
+				(opcode == 0x0D || opcode == 0x18);
+			break;			
+		default:
+			scan_more = 0;
+			break;
+		} 
+	}
+
+#if 0
+	if (prefetch)
+		printk("%s: prefetch caused page fault at %lx/%lx\n", current->comm,
+		       regs->eip, addr);
+#endif
+	return prefetch;
+}
+
 asmlinkage void do_invalid_op(struct pt_regs *, unsigned long);
 
 /*
@@ -110,7 +181,7 @@
 	 * atomic region then we must not take the fault..
 	 */
 	if (in_atomic() || !mm)
-		goto no_context;
+		goto bad_area_nosemaphore;
 
 	down_read(&mm->mmap_sem);
 
@@ -198,8 +269,12 @@
 bad_area:
 	up_read(&mm->mmap_sem);
 
+bad_area_nosemaphore:
 	/* User mode accesses just cause a SIGSEGV */
 	if (error_code & 4) {
+		if (is_prefetch(regs, address))
+			return;
+
 		tsk->thread.cr2 = address;
 		tsk->thread.error_code = error_code;
 		tsk->thread.trap_no = 14;
@@ -232,6 +307,9 @@
 	if (fixup_exception(regs))
 		return;
 
+ 	if (is_prefetch(regs, address))
+ 		return;
+
 /*
  * Oops. The kernel tried to access some bad page. We'll have to
  * terminate things with extreme prejudice.
@@ -286,10 +364,13 @@
 do_sigbus:
 	up_read(&mm->mmap_sem);
 
-	/*
-	 * Send a sigbus, regardless of whether we were in kernel
-	 * or user mode.
-	 */
+	/* Kernel mode? Handle exceptions or die */
+	if (!(error_code & 4))
+		goto no_context;
+
+	if (is_prefetch(regs, address))
+		return;
+
 	tsk->thread.cr2 = address;
 	tsk->thread.error_code = error_code;
 	tsk->thread.trap_no = 14;
@@ -298,10 +379,6 @@
 	info.si_code = BUS_ADRERR;
 	info.si_addr = (void *)address;
 	force_sig_info(SIGBUS, &info, tsk);
-
-	/* Kernel mode? Handle exceptions or die */
-	if (!(error_code & 4))
-		goto no_context;
 	return;
 
 vmalloc_fault:
--- linux-2.6.0test5-work/include/asm-i386/processor.h-o	2003-09-09 20:55:46.000000000 +0200
+++ linux-2.6.0test5-work/include/asm-i386/processor.h	2003-09-11 03:22:16.000000000 +0200
@@ -578,8 +578,6 @@
 #define ARCH_HAS_PREFETCH
 extern inline void prefetch(const void *x)
 {
-	if (cpu_data[0].x86_vendor == X86_VENDOR_AMD)
-		return;		/* Some athlons fault if the address is bad */
 	alternative_input(ASM_NOP4,
 			  "prefetchnta (%1)",
 			  X86_FEATURE_XMM,




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-09-19 10:03 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-17  2:22 [PATCH] Athlon/Opteron Prefetch Fix for 2.6.0test5 + numbers Andi Kleen
2003-09-17  2:44 ` Andrew Morton
2003-09-17  3:25   ` Andi Kleen
2003-09-17  4:53   ` Nick Piggin
2003-09-17  5:08     ` Andrew Morton
2003-09-17  5:19       ` Nick Piggin
2003-09-17  7:15       ` Dave Hansen
2003-09-17  5:26     ` Andi Kleen
2003-09-17  5:44       ` Nick Piggin
2003-09-17 19:53     ` Linus Torvalds
2003-09-17 20:21       ` Andi Kleen
2003-09-17 20:50         ` Linus Torvalds
2003-09-17 21:12           ` Andi Kleen
2003-09-18 15:38             ` Jamie Lokier
2003-09-18 16:04               ` Jamie Lokier
2003-09-18 17:06                 ` Andi Kleen
2003-09-18 19:48                   ` Jamie Lokier
2003-09-19  6:55             ` Kai Henningsen
2003-09-19 10:02               ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.