From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 13 Aug 1999 22:18:18 +1000 Message-Id: <199908131218.WAA32706@tango.anu.edu.au> From: Paul Mackerras To: Geert.Uytterhoeven@cs.kuleuven.ac.be CC: rth@cygnus.com, Jes.Sorensen@cern.ch, linuxppc-dev@lists.linuxppc.org, linux-fbdev@vuser.vu.union.edu In-reply-to: (message from Geert Uytterhoeven on Thu, 12 Aug 1999 14:31:25 +0200 (CEST)) Subject: Re: [linux-fbdev] Re: readl() and friends and eieio on PPC Reply-to: Paul.Mackerras@cs.anu.edu.au References: Sender: owner-linuxppc-dev@lists.linuxppc.org List-Id: Geert Uytterhoeven wrote: > I'm seeing different things (results don't tend to vary a lot): > > | [14:27:01]/tmp# ./a.out 0xc2800000 > | 35 29 30 31 28 > | 261 251 247 248 248 > | 429 332 358 374 348 > | 541 532 529 531 529 > | [14:27:05]/tmp# > > Hence eieio() is quite expensive on memory. > > This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache, > 66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+. I tried it on my longtrail, with a 300MHz 604 machV. I changed the loop count to 18 since that is the ratio of cpu clock to timebase clock on this machine. (You should probably use 12 on your machine.) I got results much like yours: 23 23 20 20 21 av=21.4 180 175 175 175 175 av=176.0 288 358 275 359 309 av=317.8 375 400 351 423 351 av=380.0 So yes, in this case adding the eieios costs about 22 cycles each when going to main memory, or 9 cycles each when going to the framebuffer. I guess that when going to the framebuffer, much of the latency of the eieio gets hidden. It would be interesting to try a mix of loads and stores to the framebuffer, perhaps 4 loads followed by 4 stores to get the effect of a bitblt routine. I tried my framebuffer-copy test on my 7600, which has 200MHz 604e cpus, and I didn't see any difference in overall time for the test, whether there were eieio's in or not. This morning I read something in the PPC750 manual which implied that the G3 doesn't reorder stores, and doesn't reorder non-cacheable accesses. That would mean eieio could be a no-op, which could help explain why it only takes 1 cycle on a G3. :-) (Not reordering non-cacheable accesses actually makes a lot of sense to me.) I think that probably the best thing is to have safe and fast variants of readl/writel etc. For the sake of not having to change a whole heap of drivers (whose maintainers use x86 cpus :-() I would urge that readl/writel include the eieio, and that we have readl_fast, writel_fast etc. which don't include the eieio. I would still be interested to see overall timings for frame-buffer operations with and without the eieios. Paul. [[ This message was sent via the linuxppc-dev mailing list. Replies are ]] [[ not forced back to the list, so be sure to Cc linuxppc-dev if your ]] [[ reply is of general interest. Please check http://lists.linuxppc.org/ ]] [[ and http://www.linuxppc.org/ for useful information before posting. ]]