Jack Steiner wrote:
> On Thu, Jun 02, 2005 at 02:12:02PM +0200, Zoltan Menyhart wrote:
> 
>>+.Loop:	fc.i	in0			// issuable on M0 only
>>+	add	in0=r21,in0
>> 	br.cloop.sptk.few .Loop
>> 	;;
> 
> 
> I noticed that the flush loop has a single bundle loop. I know
> that this loop was not introduced by your code, but according to 
> Intel, single bundle loops should not be used in performance critical code.
> 
> We ran in to severe performance problems several years ago with single bundle
> loops. IIRC, the details were posted to the ia64 mail list & the 
> resolution was "don't use single bundle loops". I don't know if the performance
> problem exists if the loop contains an fc instruction but you may want
> to unroll the loop one additional cycle. 
> 
> (The problem is that single bundle loops that are not aligned on a
> 0 mod 32 address will run significantly slower (we observed 3X slower) after 
> an interrupt).

Thank you for your remark.

I added a "nop.b. 0" to occupy the original slot of "br".
I hope it is fine that my "br" is shifted to the very last slot:

0xa000000100302d00 <flush_icache_range+64>:     [MIB]       fc.i r32
0xa000000100302d01 <flush_icache_range+65>:                 add r32=r21,r32
0xa000000100302d02 <flush_icache_range+66>:                 nop.b 0x0
0xa000000100302d10 <flush_icache_range+80>:     [MFB]       nop.m 0x0
0xa000000100302d11 <flush_icache_range+81>:                 nop.f 0x0
0xa000000100302d12 <flush_icache_range+82>:                 br.cloop.sptk.few 0xa000000100302d00
									<flush_icache_range+64>;;

Zoltan