* Write Combining on PowerPC
@ 2004-12-10 21:23 Kendall Bennett
2004-12-10 23:21 ` Randy Vinson
2004-12-13 8:38 ` Lawrence E. Bakst
0 siblings, 2 replies; 3+ messages in thread
From: Kendall Bennett @ 2004-12-10 21:23 UTC (permalink / raw)
To: linuxppc-embedded
Hi Guys,
We are working on some PowerPC machines and noticed that the boxes don't
appear to support the equivalent of Write Combining that we get on x86
boxes. Copies to Video Memory on our Motorola Sandpoint box run about
10Mb/s, which is terribly, terribly slow!
Does anyone know if it is possible to do something similar to Write
Combining for the PowerPC architecture, to speed up CPU access to the
linear framebuffer? Part of the problem is that for video overlay support
(not motion compensation) you have to dump the entire YUV frame into
video memory for the hardware overlay, and even on a 1GHz PPC box playing
an MPEG2 stream is not possible as X takes up over 80% of the CPU just to
copy the YUV data to video memory!
Obviously bus mastering will help solve this problem, but it would be
better if there was a way to enabling faster CPU access to the
framebuffer as well.
Regards,
---
Kendall Bennett
Chief Executive Officer
SciTech Software, Inc.
Phone: (530) 894 8400
http://www.scitechsoft.com
~ SciTech SNAP - The future of device driver technology! ~
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Write Combining on PowerPC
2004-12-10 21:23 Write Combining on PowerPC Kendall Bennett
@ 2004-12-10 23:21 ` Randy Vinson
2004-12-13 8:38 ` Lawrence E. Bakst
1 sibling, 0 replies; 3+ messages in thread
From: Randy Vinson @ 2004-12-10 23:21 UTC (permalink / raw)
To: Kendall Bennett; +Cc: linuxppc-embedded
Kendall Bennett wrote:
[snip]
> Does anyone know if it is possible to do something similar to Write
> Combining for the PowerPC architecture, to speed up CPU access to the
> linear framebuffer?
Yes, it is possible. First you will need to enable Store Gathering for
the Sandpoint's MPC107 bridge. (Go to Platform options->Enable MPC10x
store gathering). You will then need to set the page protections for the
video memory such that _PAGE_GUARDED is removed. The hardware will not
store gather to an area mapped guarded. There appears to be some code to
support this in the PCI MMap routines, but I'm not familiar with the
details.
Randy Vinson
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Write Combining on PowerPC
2004-12-10 21:23 Write Combining on PowerPC Kendall Bennett
2004-12-10 23:21 ` Randy Vinson
@ 2004-12-13 8:38 ` Lawrence E. Bakst
1 sibling, 0 replies; 3+ messages in thread
From: Lawrence E. Bakst @ 2004-12-13 8:38 UTC (permalink / raw)
To: linuxppc-embedded
At 1:23 PM -0800 12/10/04, Kendall Bennett wrote:
>Hi Guys,
>
>We are working on some PowerPC machines and noticed that the boxes don't
>appear to support the equivalent of Write Combining that we get on x86
>boxes. Copies to Video Memory on our Motorola Sandpoint box run about
>10Mb/s, which is terribly, terribly slow!
>
>Does anyone know if it is possible to do something similar to Write
>Combining for the PowerPC architecture, to speed up CPU access to the
>linear framebuffer? Part of the problem is that for video overlay support
>(not motion compensation) you have to dump the entire YUV frame into
>video memory for the hardware overlay, and even on a 1GHz PPC box playing
>an MPEG2 stream is not possible as X takes up over 80% of the CPU just to
>copy the YUV data to video memory!
1. As a previous poster mentioned many PPCs have write combining but they usually call it store gathering. I was just reading about it in the IBM 970fx.
2. What you need are cache line reads or writes through your bridge to the video memory.
3. If your frame buffer is marked non-cachable, which is the usually case, see if you can set up a second aperture that is cached. Otherwise I don't think the store gatherin will work. I don't know your board or processor but you should experiment with cache modes to see which if any work best.
4. Assuming you can get a cachable aperture you need to remember when writing a complete image to frame buffer memory is that you waste 50% of your bandwidth reading cache lines from the frame buffer into your cache. You can use dcbz to clear a cache line and then write it. This should double your bandwidth to 20 MB/sec.
5. How good is your copy loop? if you have floating point registers you can often use these to increase your efficiency. There may be other ways to make the copy loop more efficient using processor specific instructions that generate more efficient memory loads and stores. Try loop unrolling. Also make sure you prefetch the source using a dcbt or similar instruction. You have to experiment to see how far ahead of needed the data you need to prefecth.
6. Use small test programs to get it right.
7. You don't mention your processor type/speed, bus speeds and memory speed so it's pretty hard to tell what efficiency you might be able to achieve.
8. I make no comment about the efficiency of X. It's not would I would use for video applications although I am sure there are those that have hacked it work there.
Best,
leb
>
>
>Obviously bus mastering will help solve this problem, but it would be
>better if there was a way to enabling faster CPU access to the
>framebuffer as well.
>
>Regards,
>
>---
>Kendall Bennett
>Chief Executive Officer
>SciTech Software, Inc.
>Phone: (530) 894 8400
>http://www.scitechsoft.com
>
>~ SciTech SNAP - The future of device driver technology! ~
>
>
>_______________________________________________
>Linuxppc-embedded mailing list
>Linuxppc-embedded@ozlabs.org
>https://ozlabs.org/mailman/listinfo/linuxppc-embedded
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-12-13 8:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-10 21:23 Write Combining on PowerPC Kendall Bennett
2004-12-10 23:21 ` Randy Vinson
2004-12-13 8:38 ` Lawrence E. Bakst
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).