Re: [PATCH] Athlon/Opteron Prefetch Fix for 2.6.0test5 + numbers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <haveblue@us.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Nick Piggin <piggin@cyberone.com.au>,
	ak@suse.de, torvalds@osdl.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	richard.brunner@amd.com
Subject: Re: [PATCH] Athlon/Opteron Prefetch Fix for 2.6.0test5 + numbers
Date: 17 Sep 2003 00:15:20 -0700	[thread overview]
Message-ID: <1063782920.3590.164.camel@nighthawk> (raw)
In-Reply-To: <20030916220843.31533480.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]

On Tue, 2003-09-16 at 22:08, Andrew Morton wrote:
> But I would like to see some evidence that prefetch ever provides any
> performance gain in-kernel.  I spent some time fiddling a while back and
> was unable to demonstrate any difference.

For SDET on the NUMA-Q, it's pretty small.  Looks like well less than
1%.  Mostly lost in the noise, though.

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered 
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run
results are non-compliant and not-comparable with any published results.

Prefetch on: 
 Scripts  |  Average Throughput |  Standard Deviation
----------+---------------------+---------------------
       16 |      15795.9100     |     711.3221
       32 |      16450.5800     |     292.6028
       64 |      15800.2400     |     126.2358

Prefetch off:
 Scripts  |  Average Throughput |  Standard Deviation
----------+---------------------+---------------------
       16 |      15672.8600     |     438.5506
       32 |      16376.4100     |     364.5610
       64 |      15744.8100     |     208.9217


For those of you who care, I generated this with a neat little tool that
some interns cooked up for us this summer.  You hand it a little file
like this and it throws back a wealth of information back at you. 
There's a central machine to which has a database of other machines and
figures out the best fit for the job (can be more than 1 machine
requested).  In this case it's easy because there are very few 16x
machines in the database.  Oh, and like good interns, they stole a lot
of code from Steve Pratt's "autobench".  

class host num_cpus=16
+$mytest sdet "1 4 16 32 64" -l /tmp/sdet

disuseprofiler sar

# build the kernel with prefaulting off
build stock 2.6.0-test5 \
    -p http://elm3b114.beaverton.ibm.com/patches/prefetch-off.patch \
    -m -j8
boot
# sdet runs fast on ramfs
fs -mountramfs /tmp/sdet
run "sdet" $mytest
setup umount /tmp/sdet

build stock 2.6.0-test5 -m -j8
boot
fs -mountramfs /tmp/sdet
run "sdet" $mytest
setup umount /tmp/sdet

-- 
Dave Hansen
haveblue@us.ibm.com

[-- Attachment #2: preload-off.patch --]
[-- Type: text/plain, Size: 1092 bytes --]

--- include/asm/processor.h.orig	Tue Sep 16 22:41:53 2003
+++ include/asm/processor.h	Tue Sep 16 22:40:54 2003
@@ -571,34 +571,4 @@
 #endif
 
 #define ASM_NOP_MAX 8
-
-/* Prefetch instructions for Pentium III and AMD Athlon */
-/* It's not worth to care about 3dnow! prefetches for the K6
-   because they are microcoded there and very slow. */
-#define ARCH_HAS_PREFETCH
-extern inline void prefetch(const void *x)
-{
-	if (cpu_data[0].x86_vendor == X86_VENDOR_AMD)
-		return;		/* Some athlons fault if the address is bad */
-	alternative_input(ASM_NOP4,
-			  "prefetchnta (%1)",
-			  X86_FEATURE_XMM,
-			  "r" (x));
-}
-
-#define ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCHW
-#define ARCH_HAS_SPINLOCK_PREFETCH
-
-/* 3dnow! prefetch to get an exclusive cache line. Useful for 
-   spinlocks to avoid one state transition in the cache coherency protocol. */
-extern inline void prefetchw(const void *x)
-{
-	alternative_input(ASM_NOP4,
-			  "prefetchw (%1)",
-			  X86_FEATURE_3DNOW,
-			  "r" (x));
-}
-#define spin_lock_prefetch(x)	prefetchw(x)
-
 #endif /* __ASM_I386_PROCESSOR_H */

next prev parent reply	other threads:[~2003-09-17  7:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-17  2:22 [PATCH] Athlon/Opteron Prefetch Fix for 2.6.0test5 + numbers Andi Kleen
2003-09-17  2:44 ` Andrew Morton
2003-09-17  3:25   ` Andi Kleen
2003-09-17  4:53   ` Nick Piggin
2003-09-17  5:08     ` Andrew Morton
2003-09-17  5:19       ` Nick Piggin
2003-09-17  7:15       ` Dave Hansen [this message]
2003-09-17  5:26     ` Andi Kleen
2003-09-17  5:44       ` Nick Piggin
2003-09-17 19:53     ` Linus Torvalds
2003-09-17 20:21       ` Andi Kleen
2003-09-17 20:50         ` Linus Torvalds
2003-09-17 21:12           ` Andi Kleen
2003-09-18 15:38             ` Jamie Lokier
2003-09-18 16:04               ` Jamie Lokier
2003-09-18 17:06                 ` Andi Kleen
2003-09-18 19:48                   ` Jamie Lokier
2003-09-19  6:55             ` Kai Henningsen
2003-09-19 10:02               ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1063782920.3590.164.camel@nighthawk \
    --to=haveblue@us.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=piggin@cyberone.com.au \
    --cc=richard.brunner@amd.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.