public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
To: linux-ia64@vger.kernel.org
Subject: Re: flush_icache_range
Date: Mon, 23 May 2005 13:43:16 +0000	[thread overview]
Message-ID: <4291DDF4.9060107@bull.net> (raw)
In-Reply-To: <4236D7B5.8050408@bull.net>

[-- Attachment #1: Type: text/plain, Size: 2667 bytes --]

The Itanium 2 processor Reference Manual for SW development & optimization
(May 2004) says in the chapter 5.8:

"In Itanium 2 processor, each fc will invalidate 128 bytes corresponding
to the L3 cache line size. Since both the L1I and L1D have line sizes of
64 bytes, a single fc instruction can invalidate two lines."

Can someone please confirm that an equivalent statement is true for the
"fc.i", too ?
Say:

"In Itanium 2 processor, each fc.i instruction will ensure that 128 bytes
(corresponding to the L3 cache line size) of the I-cache(s) be coherent with
the data caches. Since the L1I cache has line sizes of 64 bytes, a single
fc.i instruction can make coherent two lines."


This gave me the idea to try with 128-byte strides
(the measures are repeated for 10 times):

Modified in d-cache:
cycles = 19,164 time = 14.782 usec
cycles = 18,060 time = 13.930 usec
cycles = 16,929 time = 13.058 usec
cycles = 17,597 time = 13.573 usec
cycles = 17,163 time = 13.239 usec
cycles = 16,990 time = 13.105 usec
cycles = 17,427 time = 13.442 usec
cycles = 17,028 time = 13.134 usec
cycles = 16,993 time = 13.107 usec
cycles = 16,930 time = 13.059 usec

Valid:
cycles = 13,514 time = 10.424 usec
cycles = 13,518 time = 10.427 usec
cycles = 13,518 time = 10.427 usec
cycles = 13,518 time = 10.427 usec
cycles = 13,518 time = 10.427 usec
cycles = 13,746 time = 10.603 usec
cycles = 13,866 time = 10.695 usec
cycles = 13,830 time = 10.668 usec
cycles = 13,790 time = 10.637 usec
cycles = 13,830 time = 10.668 usec

Invalid:
cycles = 13,794 time = 10.640 usec
cycles = 13,790 time = 10.637 usec
cycles = 13,830 time = 10.668 usec
cycles = 13,830 time = 10.668 usec
cycles = 13,966 time = 10.773 usec
cycles = 13,994 time = 10.794 usec
cycles = 14,074 time = 10.856 usec
cycles = 13,574 time = 10.470 usec
cycles = 13,902 time = 10.723 usec
cycles = 14,114 time = 10.887 usec

I got these incredibly low number of cycles,
compared to my previous results:

With a 32-byte stride:

Modified in d-cache: cycles = 215 K, time = 169 usec
Valid:               cycles = 222 K, time = 171 usec
Invalid:             cycles = 222 K, time = 171 usec

With a 64-byte stride:

Modified in d-cache: cycles = 63 K, time = 49 usec
Valid:               cycles = 116 K, time = 89 usec
Invalid:             cycles = 116 K, time = 89 usec 


This is a Tiger box with the following CPUs:
processor  : 0
vendor     : GenuineIntel
arch       : IA-64
family     : Itanium 2
model      : 1
revision   : 5
archrev    : 0
features   : branchlong
cpu number : 0
cpu regs   : 4
cpu MHz    : 1296.435998
itc MHz    : 1296.435998
BogoMIPS   : 1941.96
etc...

Can these results be real?

Thanks,

Zoltan



[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1217 bytes --]

--- linux-2.6.11-orig/arch/ia64/lib/flush.S	2005-04-26 15:59:49.000000000 +0200
+++ linux-2.6.11/arch/ia64/lib/flush.S	2005-05-23 15:30:24.891935385 +0200
@@ -7,6 +7,22 @@
 #include <asm/asmmacro.h>
 #include <asm/page.h>
 
+
+#if	defined(CONFIG_ITANIUM)
+#define CACHE_SHIFT	5
+#else
+/*
+ * In Itanium 2 processor, each fc.i instruction will ensure that 128 bytes
+ * (corresponding to the L3 cache line size) of the I-cache(s) be coherent with
+ * the data caches. Since the L1I cache has line sizes of 64 bytes, a single
+ * fc.i instruction can make coherent two lines.
+ */
+#define CACHE_SHIFT	7
+#endif
+
+#define	CACHE_BYTES	(1 << CACHE_SHIFT)
+
+
 	/*
 	 * flush_icache_range(start,end)
 	 *	Must flush range from start to end-1 but nothing else (need to
@@ -17,7 +33,7 @@
 	alloc r2=ar.pfs,2,0,0,0
 	sub r8=in1,in0,1
 	;;
-	shr.u r8=r8,5			// we flush 32 bytes per iteration
+	shr.u r8=r8,CACHE_SHIFT		// we flush CACHE_BYTES bytes per iteration
 	.save ar.lc, r3
 	mov r3=ar.lc			// save ar.lc
 	;;
@@ -26,8 +42,8 @@
 
 	mov ar.lc=r8
 	;;
-.Loop:	fc in0				// issuable on M0 only
-	add in0=32,in0
+.Loop:	fc.i in0			// issuable on M0 only
+	add in0=CACHE_BYTES,in0
 	br.cloop.sptk.few .Loop
 	;;
 	sync.i

  parent reply	other threads:[~2005-05-23 13:43 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-15 12:40 flush_icache_range Zoltan Menyhart
2005-03-15 18:21 ` flush_icache_range David Mosberger
2005-03-16 10:58 ` flush_icache_range Zoltan Menyhart
2005-03-16 11:19 ` flush_icache_range Duraid Madina
2005-03-16 18:31 ` flush_icache_range David Mosberger
2005-05-20 14:17 ` flush_icache_range Zoltan Menyhart
2005-05-20 15:03 ` flush_icache_range David Mosberger
2005-05-23 13:43 ` Zoltan Menyhart [this message]
2005-05-26 17:21 ` flush_icache_range David Mosberger
2005-05-26 17:39 ` flush_icache_range Seth, Rohit
2005-05-27 15:45 ` flush_icache_range Zoltan Menyhart
2005-05-27 15:56 ` flush_icache_range David Mosberger
2005-05-27 16:45 ` flush_icache_range Zoltan Menyhart
2005-05-27 16:55 ` flush_icache_range David Mosberger
2005-05-27 18:27 ` flush_icache_range Grant Grundler
2005-05-27 19:00 ` flush_icache_range Russ Anderson
2005-05-29 20:23 ` flush_icache_range Menyhart, Zoltan
2005-06-01 23:50 ` flush_icache_range David Mosberger
2005-06-02  3:00 ` flush_icache_range Jim Hull
2005-06-02 12:12 ` flush_icache_range Zoltan Menyhart
2005-06-02 14:25 ` flush_icache_range Zoltan Menyhart
2005-06-02 17:36 ` flush_icache_range David Mosberger
2005-06-02 18:28 ` flush_icache_range David Mosberger
2005-06-02 18:31 ` flush_icache_range David Mosberger
2005-06-02 19:00 ` flush_icache_range Jim Hull
2005-06-02 21:37 ` flush_icache_range Menyhart, Zoltan
2005-06-02 22:23 ` flush_icache_range David Mosberger
2005-06-02 22:55 ` flush_icache_range Menyhart, Zoltan
2005-06-02 23:07 ` flush_icache_range David Mosberger
2005-06-03 12:35 ` flush_icache_range Zoltan Menyhart
2005-06-03 21:09 ` flush_icache_range David Mosberger
2005-06-13 11:20 ` flush_icache_range Zoltan Menyhart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4291DDF4.9060107@bull.net \
    --to=zoltan.menyhart@bull.net \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox