* Re: No cache control on ppc?? [not found] <Pine.LNX.4.21.0201121753510.11140-100000@darkwing.informatik.uni-stuttgart.de> @ 2002-01-12 21:39 ` Albert D. Cahalan 2002-01-13 5:36 ` Timothy A. Seufert 0 siblings, 1 reply; 6+ messages in thread From: Albert D. Cahalan @ 2002-01-12 21:39 UTC (permalink / raw) To: Siggi Langauf; +Cc: debian-powerpc, linuxppc-dev > On i586 (or newer) machines with AGP, the X server can set some MTRR > ranges. AFAIUI, these tell the (CPU-internal) cache controller not to > cache video memory (which wouln't make any sense, as that is used > write-only). It would make sense. You could fill up cache lines in the CPU, then force a write-out all at once. You could then free the cache line for future use. > I haven't found anything similar in powerpc kernels, so I assume > there is nothing like this. Is that correct? If so, is that a > hardware restriction? Does the hardware do this automagically? Oh come on... You get: 1. 4 cache-control bits per page table entry 2. instructions to manipulate cache lines 3. prefetch instructions (on "G4" chips: MPC7400, MPC7410...) 4. some TLB control that might be useful 5. 8 data BAT registers, allowing 4 super-size (256 MB) pages 6. 64-bit FPU (and 128-bit AltiVec) registers for memory copy BTW, some of the above is good for RAID, IP checksums... The serious problem is Apple's crappy 100 MHz bus. You'll have a hard time moving much beyond 700 MiB/s I think. Supercomputer? Not. I'm getting 351 in plus 351 out with 16 doubles on a Mac Cube. Another problem is lack of OS support. You can't set mmap() flags to indicate: cached, coherency not enforced, unguarded, and no writeback. This is what you need. It would be nice to get the BAT registers too, since user space does a lot more memory access than the kernel does. I don't know very much about MPEG, but something like this would be a reasonable plan I guess: Get some nice memory to use. Maybe 32 MiB, BAT mapped, with all the attributes mentioned above. Flush all the cache lines out -- you MUST if you have non-coherent memory, and it's a nice idea anyway. Repeat before every use of the memory. Get your video data, using raw IO. You'd really be asking for several frames ahead of course. Bite off a small chunk of the image. Pulling a number out of my ass, I'll say 128x128 pixels and 4 frames deep. This fits nicely into my 1 MB L2 cache. Go with 64x64 for the MPC7410. Prefetch your data. If you have AltiVec, use AltiVec prefetch. Do the decryption on that little chunk. Do the various motion compensation things and inter-frame stuff on that little chunk. You can process this tile in multiple frames to get better cache usage. That is, you are doing work for future frames. Now you may either a. write back your cache, then start video DMA + color transform b. do color transform interleaved with writing to video memory Scaling goes there too, if you must. You might limit scaling to small integer ratios, and pad/crop as needed to reach the exact size desired. Assuming you don't use DMA: make sure the video memory has the same attributes as everything else, and use explicit cache write-back instructions to push out the data. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: No cache control on ppc?? 2002-01-12 21:39 ` No cache control on ppc?? Albert D. Cahalan @ 2002-01-13 5:36 ` Timothy A. Seufert 2002-01-13 6:51 ` Albert D. Cahalan 0 siblings, 1 reply; 6+ messages in thread From: Timothy A. Seufert @ 2002-01-13 5:36 UTC (permalink / raw) To: Albert D. Cahalan, Siggi Langauf; +Cc: debian-powerpc, linuxppc-dev At 4:39 PM -0500 1/12/02, Albert D. Cahalan wrote: >Bite off a small chunk of the image. Pulling a number out of >my ass, I'll say 128x128 pixels and 4 frames deep. This fits >nicely into my 1 MB L2 cache. Go with 64x64 for the MPC7410. You don't need to cut cache use by 1/4 on the 7410. It's got almost the same L2 cache scheme as the 7400: they added one address bit so it can use up to 2 MB of SRAM, and it can now use half or all of the SRAM as memory instead of cache. I think all of Apple's 7410 systems have 1 MB L2, and naturally Apple configures it all as cache. Were you thinking of the 7450? It's the one that has 256 KB of on-die L2. Keep in mind that it still has an interface for external cache, which is now L3. Apple ships low end 7450 systems with no L3 and medium to high range systems with 2 MB L3. -- Tim Seufert ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: No cache control on ppc?? 2002-01-13 5:36 ` Timothy A. Seufert @ 2002-01-13 6:51 ` Albert D. Cahalan 2002-01-13 8:06 ` "Cache Profiler" ? (was: No cache control on ppc??) Elizabeth Barham 0 siblings, 1 reply; 6+ messages in thread From: Albert D. Cahalan @ 2002-01-13 6:51 UTC (permalink / raw) To: Timothy A. Seufert Cc: Albert D. Cahalan, Siggi Langauf, debian-powerpc, linuxppc-dev Timothy A. Seufert writes: > At 4:39 PM -0500 1/12/02, Albert D. Cahalan wrote: >> Bite off a small chunk of the image. Pulling a number out of >> my ass, I'll say 128x128 pixels and 4 frames deep. This fits >> nicely into my 1 MB L2 cache. Go with 64x64 for the MPC7410. > > You don't need to cut cache use by 1/4 on the 7410. It's got almost > the same L2 cache scheme as the 7400: they added one address bit so > it can use up to 2 MB of SRAM, and it can now use half or all of the > SRAM as memory instead of cache. I think all of Apple's 7410 systems > have 1 MB L2, and naturally Apple configures it all as cache. > > Were you thinking of the 7450? It's the one that has 256 KB of > on-die L2. Keep in mind that it still has an interface for external > cache, which is now L3. Apple ships low end 7450 systems with no L3 > and medium to high range systems with 2 MB L3. Yes, I meant the 7450. Configuring the 7410 L2 or the 7450 L3 as SRAM would be going way, way, too far I think. Not that it wouldn't be fun to try, but then the box pretty much becomes a dedicated video player. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* "Cache Profiler" ? (was: No cache control on ppc??) 2002-01-13 6:51 ` Albert D. Cahalan @ 2002-01-13 8:06 ` Elizabeth Barham 2002-01-13 19:36 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 6+ messages in thread From: Elizabeth Barham @ 2002-01-13 8:06 UTC (permalink / raw) To: Albert D. Cahalan Cc: Timothy A. Seufert, Siggi Langauf, debian-powerpc, linuxppc-dev Hi, I recently installed a NewerTech Maxpowr G3 L2-Cache - which is a G3 on a board that fits into one of the L2's ram banks on my Starmax 3000/160. I was ecstatic that the bogomips increased by 187% (199.47). Recently, though, I heard of someone installing a JoeBoard into his StarMax 5000 and his bogomips being around 800. He mentioned something about a "Cache Profiler". It seems that BootX is somehow able to tell the kernel that there is a G3 in the cache and speed is increased greatly. The CPU on the StarMax 3000/160 motherboard itself (what originally came with it) is a PPC 603e. /proc/cpuinfo shows a 750 - which is good but the bogomips are nowhere near what this person reported. I do not use BootX for I prefer booting straight into Linux with Quik. Does anyone know anymore about this and if it's possible to increase performance more by somehow making the G3 quicker? Thank you, Elizabeth ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "Cache Profiler" ? (was: No cache control on ppc??) 2002-01-13 8:06 ` "Cache Profiler" ? (was: No cache control on ppc??) Elizabeth Barham @ 2002-01-13 19:36 ` Benjamin Herrenschmidt 2002-01-15 9:17 ` Elizabeth Barham 0 siblings, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2002-01-13 19:36 UTC (permalink / raw) To: Elizabeth Barham; +Cc: debian-powerpc, linuxppc-dev >I recently installed a NewerTech Maxpowr G3 L2-Cache - which is a G3 >on a board that fits into one of the L2's ram banks on my Starmax >3000/160. I was ecstatic that the bogomips increased by 187% >(199.47). Recently, though, I heard of someone installing a JoeBoard >into his StarMax 5000 and his bogomips being around 800. He mentioned >something about a "Cache Profiler". It seems that BootX is somehow >able to tell the kernel that there is a G3 in the cache and speed is >increased greatly. > >The CPU on the StarMax 3000/160 motherboard itself (what originally >came with it) is a PPC 603e. /proc/cpuinfo shows a 750 - which is good >but the bogomips are nowhere near what this person reported. I do not >use BootX for I prefer booting straight into Linux with Quik. Does >anyone know anymore about this and if it's possible to increase >performance more by somehow making the G3 quicker? First boot once with BootX. Once in linux, grab the value of /proc/sys/kernel/l2cr. Then, go back to quik, and in your boot scripts, write back this value. This is the configuration of the backside L2 cache of the 750. Ben. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "Cache Profiler" ? (was: No cache control on ppc??) 2002-01-13 19:36 ` Benjamin Herrenschmidt @ 2002-01-15 9:17 ` Elizabeth Barham 0 siblings, 0 replies; 6+ messages in thread From: Elizabeth Barham @ 2002-01-15 9:17 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: debian-powerpc, linuxppc-dev > First boot once with BootX. Once in linux, grab the value of > /proc/sys/kernel/l2cr. Then, go back to quik, and in your boot > scripts, write back this value. This is the configuration of the > backside L2 cache of the 750. Just a follow-up: It turns out that Linux was using the 750 processor with it's configuration (1,0,0,1 [NewerTech G3L2]) but it was not using the cache at all. In order to grab the parameters of the above-mentioned file in the /proc/sys/kernel directory I had to install Mac OS. Fortunatly we had an extra drive available to install it upon. The configuration that I had been using, though, disabled the cache so I had to find a better setting that was quicker and stable (0,0,1,0 [240 MHz, 478.41 bogomips]). However, the gotcha! with this is that quik (v2.0) throws a fatal error prior to the start-screen ("Choose your kernel"). So, I ended up just keeping MacOS on half of the newly-installed drive and will use BootX to boot into Linux now and in the future; it's not *that* inconvenient and the increase in speed is easily worth it. Thank you all for your help. Kind regards, Elizabeth ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-01-15 9:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.21.0201121753510.11140-100000@darkwing.informatik.uni-stuttgart.de>
2002-01-12 21:39 ` No cache control on ppc?? Albert D. Cahalan
2002-01-13 5:36 ` Timothy A. Seufert
2002-01-13 6:51 ` Albert D. Cahalan
2002-01-13 8:06 ` "Cache Profiler" ? (was: No cache control on ppc??) Elizabeth Barham
2002-01-13 19:36 ` Benjamin Herrenschmidt
2002-01-15 9:17 ` Elizabeth Barham
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).