* Re: No cache control on ppc??
[not found] <Pine.LNX.4.21.0201121753510.11140-100000@darkwing.informatik.uni-stuttgart.de>
@ 2002-01-12 21:39 ` Albert D. Cahalan
2002-01-13 5:36 ` Timothy A. Seufert
0 siblings, 1 reply; 6+ messages in thread
From: Albert D. Cahalan @ 2002-01-12 21:39 UTC (permalink / raw)
To: Siggi Langauf; +Cc: debian-powerpc, linuxppc-dev
> On i586 (or newer) machines with AGP, the X server can set some MTRR
> ranges. AFAIUI, these tell the (CPU-internal) cache controller not to
> cache video memory (which wouln't make any sense, as that is used
> write-only).
It would make sense. You could fill up cache lines in the CPU,
then force a write-out all at once. You could then free the
cache line for future use.
> I haven't found anything similar in powerpc kernels, so I assume
> there is nothing like this. Is that correct? If so, is that a
> hardware restriction? Does the hardware do this automagically?
Oh come on... You get:
1. 4 cache-control bits per page table entry
2. instructions to manipulate cache lines
3. prefetch instructions (on "G4" chips: MPC7400, MPC7410...)
4. some TLB control that might be useful
5. 8 data BAT registers, allowing 4 super-size (256 MB) pages
6. 64-bit FPU (and 128-bit AltiVec) registers for memory copy
BTW, some of the above is good for RAID, IP checksums...
The serious problem is Apple's crappy 100 MHz bus. You'll have a
hard time moving much beyond 700 MiB/s I think. Supercomputer? Not.
I'm getting 351 in plus 351 out with 16 doubles on a Mac Cube.
Another problem is lack of OS support. You can't set mmap()
flags to indicate: cached, coherency not enforced, unguarded,
and no writeback. This is what you need. It would be nice to
get the BAT registers too, since user space does a lot more
memory access than the kernel does.
I don't know very much about MPEG, but something like this
would be a reasonable plan I guess:
Get some nice memory to use. Maybe 32 MiB, BAT mapped, with
all the attributes mentioned above. Flush all the cache lines
out -- you MUST if you have non-coherent memory, and it's a
nice idea anyway. Repeat before every use of the memory.
Get your video data, using raw IO. You'd really be asking
for several frames ahead of course.
Bite off a small chunk of the image. Pulling a number out of
my ass, I'll say 128x128 pixels and 4 frames deep. This fits
nicely into my 1 MB L2 cache. Go with 64x64 for the MPC7410.
Prefetch your data. If you have AltiVec, use AltiVec prefetch.
Do the decryption on that little chunk. Do the various motion
compensation things and inter-frame stuff on that little chunk.
You can process this tile in multiple frames to get better
cache usage. That is, you are doing work for future frames.
Now you may either
a. write back your cache, then start video DMA + color transform
b. do color transform interleaved with writing to video memory
Scaling goes there too, if you must. You might limit scaling
to small integer ratios, and pad/crop as needed to reach the
exact size desired.
Assuming you don't use DMA: make sure the video memory has the
same attributes as everything else, and use explicit cache
write-back instructions to push out the data.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: No cache control on ppc??
2002-01-12 21:39 ` No cache control on ppc?? Albert D. Cahalan
@ 2002-01-13 5:36 ` Timothy A. Seufert
2002-01-13 6:51 ` Albert D. Cahalan
0 siblings, 1 reply; 6+ messages in thread
From: Timothy A. Seufert @ 2002-01-13 5:36 UTC (permalink / raw)
To: Albert D. Cahalan, Siggi Langauf; +Cc: debian-powerpc, linuxppc-dev
At 4:39 PM -0500 1/12/02, Albert D. Cahalan wrote:
>Bite off a small chunk of the image. Pulling a number out of
>my ass, I'll say 128x128 pixels and 4 frames deep. This fits
>nicely into my 1 MB L2 cache. Go with 64x64 for the MPC7410.
You don't need to cut cache use by 1/4 on the 7410. It's got almost
the same L2 cache scheme as the 7400: they added one address bit so
it can use up to 2 MB of SRAM, and it can now use half or all of the
SRAM as memory instead of cache. I think all of Apple's 7410 systems
have 1 MB L2, and naturally Apple configures it all as cache.
Were you thinking of the 7450? It's the one that has 256 KB of
on-die L2. Keep in mind that it still has an interface for external
cache, which is now L3. Apple ships low end 7450 systems with no L3
and medium to high range systems with 2 MB L3.
--
Tim Seufert
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: No cache control on ppc??
2002-01-13 5:36 ` Timothy A. Seufert
@ 2002-01-13 6:51 ` Albert D. Cahalan
2002-01-13 8:06 ` "Cache Profiler" ? (was: No cache control on ppc??) Elizabeth Barham
0 siblings, 1 reply; 6+ messages in thread
From: Albert D. Cahalan @ 2002-01-13 6:51 UTC (permalink / raw)
To: Timothy A. Seufert
Cc: Albert D. Cahalan, Siggi Langauf, debian-powerpc, linuxppc-dev
Timothy A. Seufert writes:
> At 4:39 PM -0500 1/12/02, Albert D. Cahalan wrote:
>> Bite off a small chunk of the image. Pulling a number out of
>> my ass, I'll say 128x128 pixels and 4 frames deep. This fits
>> nicely into my 1 MB L2 cache. Go with 64x64 for the MPC7410.
>
> You don't need to cut cache use by 1/4 on the 7410. It's got almost
> the same L2 cache scheme as the 7400: they added one address bit so
> it can use up to 2 MB of SRAM, and it can now use half or all of the
> SRAM as memory instead of cache. I think all of Apple's 7410 systems
> have 1 MB L2, and naturally Apple configures it all as cache.
>
> Were you thinking of the 7450? It's the one that has 256 KB of
> on-die L2. Keep in mind that it still has an interface for external
> cache, which is now L3. Apple ships low end 7450 systems with no L3
> and medium to high range systems with 2 MB L3.
Yes, I meant the 7450.
Configuring the 7410 L2 or the 7450 L3 as SRAM would be going
way, way, too far I think. Not that it wouldn't be fun to try,
but then the box pretty much becomes a dedicated video player.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* "Cache Profiler" ? (was: No cache control on ppc??)
2002-01-13 6:51 ` Albert D. Cahalan
@ 2002-01-13 8:06 ` Elizabeth Barham
2002-01-13 19:36 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: Elizabeth Barham @ 2002-01-13 8:06 UTC (permalink / raw)
To: Albert D. Cahalan
Cc: Timothy A. Seufert, Siggi Langauf, debian-powerpc, linuxppc-dev
Hi,
I recently installed a NewerTech Maxpowr G3 L2-Cache - which is a G3
on a board that fits into one of the L2's ram banks on my Starmax
3000/160. I was ecstatic that the bogomips increased by 187%
(199.47). Recently, though, I heard of someone installing a JoeBoard
into his StarMax 5000 and his bogomips being around 800. He mentioned
something about a "Cache Profiler". It seems that BootX is somehow
able to tell the kernel that there is a G3 in the cache and speed is
increased greatly.
The CPU on the StarMax 3000/160 motherboard itself (what originally
came with it) is a PPC 603e. /proc/cpuinfo shows a 750 - which is good
but the bogomips are nowhere near what this person reported. I do not
use BootX for I prefer booting straight into Linux with Quik. Does
anyone know anymore about this and if it's possible to increase
performance more by somehow making the G3 quicker?
Thank you,
Elizabeth
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "Cache Profiler" ? (was: No cache control on ppc??)
2002-01-13 8:06 ` "Cache Profiler" ? (was: No cache control on ppc??) Elizabeth Barham
@ 2002-01-13 19:36 ` Benjamin Herrenschmidt
2002-01-15 9:17 ` Elizabeth Barham
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2002-01-13 19:36 UTC (permalink / raw)
To: Elizabeth Barham; +Cc: debian-powerpc, linuxppc-dev
>I recently installed a NewerTech Maxpowr G3 L2-Cache - which is a G3
>on a board that fits into one of the L2's ram banks on my Starmax
>3000/160. I was ecstatic that the bogomips increased by 187%
>(199.47). Recently, though, I heard of someone installing a JoeBoard
>into his StarMax 5000 and his bogomips being around 800. He mentioned
>something about a "Cache Profiler". It seems that BootX is somehow
>able to tell the kernel that there is a G3 in the cache and speed is
>increased greatly.
>
>The CPU on the StarMax 3000/160 motherboard itself (what originally
>came with it) is a PPC 603e. /proc/cpuinfo shows a 750 - which is good
>but the bogomips are nowhere near what this person reported. I do not
>use BootX for I prefer booting straight into Linux with Quik. Does
>anyone know anymore about this and if it's possible to increase
>performance more by somehow making the G3 quicker?
First boot once with BootX. Once in linux, grab the value of
/proc/sys/kernel/l2cr. Then, go back to quik, and in your
boot scripts, write back this value. This is the configuration
of the backside L2 cache of the 750.
Ben.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "Cache Profiler" ? (was: No cache control on ppc??)
2002-01-13 19:36 ` Benjamin Herrenschmidt
@ 2002-01-15 9:17 ` Elizabeth Barham
0 siblings, 0 replies; 6+ messages in thread
From: Elizabeth Barham @ 2002-01-15 9:17 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: debian-powerpc, linuxppc-dev
> First boot once with BootX. Once in linux, grab the value of
> /proc/sys/kernel/l2cr. Then, go back to quik, and in your boot
> scripts, write back this value. This is the configuration of the
> backside L2 cache of the 750.
Just a follow-up:
It turns out that Linux was using the 750 processor with it's
configuration (1,0,0,1 [NewerTech G3L2]) but it was not using the
cache at all. In order to grab the parameters of the above-mentioned
file in the /proc/sys/kernel directory I had to install Mac
OS. Fortunatly we had an extra drive available to install it upon.
The configuration that I had been using, though, disabled the cache so
I had to find a better setting that was quicker and stable (0,0,1,0
[240 MHz, 478.41 bogomips]). However, the gotcha! with this is that
quik (v2.0) throws a fatal error prior to the start-screen ("Choose
your kernel").
So, I ended up just keeping MacOS on half of the newly-installed drive
and will use BootX to boot into Linux now and in the future; it's not
*that* inconvenient and the increase in speed is easily worth it.
Thank you all for your help.
Kind regards, Elizabeth
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-01-15 9:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.21.0201121753510.11140-100000@darkwing.informatik.uni-stuttgart.de>
2002-01-12 21:39 ` No cache control on ppc?? Albert D. Cahalan
2002-01-13 5:36 ` Timothy A. Seufert
2002-01-13 6:51 ` Albert D. Cahalan
2002-01-13 8:06 ` "Cache Profiler" ? (was: No cache control on ppc??) Elizabeth Barham
2002-01-13 19:36 ` Benjamin Herrenschmidt
2002-01-15 9:17 ` Elizabeth Barham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).