Jack Steiner wrote: > On Thu, Jun 02, 2005 at 02:12:02PM +0200, Zoltan Menyhart wrote: > >>+.Loop: fc.i in0 // issuable on M0 only >>+ add in0=r21,in0 >> br.cloop.sptk.few .Loop >> ;; > > > I noticed that the flush loop has a single bundle loop. I know > that this loop was not introduced by your code, but according to > Intel, single bundle loops should not be used in performance critical code. > > We ran in to severe performance problems several years ago with single bundle > loops. IIRC, the details were posted to the ia64 mail list & the > resolution was "don't use single bundle loops". I don't know if the performance > problem exists if the loop contains an fc instruction but you may want > to unroll the loop one additional cycle. > > (The problem is that single bundle loops that are not aligned on a > 0 mod 32 address will run significantly slower (we observed 3X slower) after > an interrupt). Thank you for your remark. I added a "nop.b. 0" to occupy the original slot of "br". I hope it is fine that my "br" is shifted to the very last slot: 0xa000000100302d00 : [MIB] fc.i r32 0xa000000100302d01 : add r32=r21,r32 0xa000000100302d02 : nop.b 0x0 0xa000000100302d10 : [MFB] nop.m 0x0 0xa000000100302d11 : nop.f 0x0 0xa000000100302d12 : br.cloop.sptk.few 0xa000000100302d00 ;; Zoltan