* 5121 cache handling.
@ 2009-08-07 12:53 Kenneth Johansson
2009-08-07 19:56 ` Scott Wood
0 siblings, 1 reply; 6+ messages in thread
From: Kenneth Johansson @ 2009-08-07 12:53 UTC (permalink / raw)
To: linuxppc-dev
on 5121 there is a e300 core that unfortunately is connected to the rest
of the SOC with a bus that do not support coherency.
solution for many driver has been to use uncached memory. But for the
framebuffer that is not going to work as the performance impact of doing
graphics operations on uncached memory is to large.
currently the "solution" is to flush the cache in the interrupt
handler.
#if defined(CONFIG_NOT_COHERENT_CACHE)
int i;
unsigned int *ptr;
ptr = coherence_data;
for (i = 0; i < 1024*8; i++)
*ptr++ = 0;
#endif
Now this apparently is not enough on a e300 core that has a PLRU cache
replacement algorithm. but what is the optimal solution?
should not the framebuffer be marked as cache write through. that is the
W bit should be set in the tlb mapping. Why is this not done ? is that
feature also not working on 5121 ??
if this manual handling needs to be done what is best.
do it like now but over 52KB memory basically throwing out anything in
the cache in the process regardless if it was needed or not.
or do it carefully over just the framebuffer memory.
problem with doing it over just the framebuffer is that a 1024x768
buffer is 98304 cache lines it's going to take a considerable time to
do. how many cycles does it take per cache line if we never get a hit ??
3cycles at 400MHz gives 4.5milisec/sec or 4-5% overhead
1024*768*4/32*3*(1/400000000)*60
.04423680000000000000
52kB on the other hand is only 1664 lines but is obviously going to have
to do a lot of actual memory writes also for any modified cache line and
later a lot of reads to read back what was evicted.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 5121 cache handling.
2009-08-07 12:53 5121 cache handling Kenneth Johansson
@ 2009-08-07 19:56 ` Scott Wood
2009-08-10 20:16 ` Kenneth Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Scott Wood @ 2009-08-07 19:56 UTC (permalink / raw)
To: Kenneth Johansson; +Cc: linuxppc-dev
On Fri, Aug 07, 2009 at 02:53:52PM +0200, Kenneth Johansson wrote:
> on 5121 there is a e300 core that unfortunately is connected to the rest
> of the SOC with a bus that do not support coherency.
>
> solution for many driver has been to use uncached memory. But for the
> framebuffer that is not going to work as the performance impact of doing
> graphics operations on uncached memory is to large.
>
> currently the "solution" is to flush the cache in the interrupt
> handler.
>
> #if defined(CONFIG_NOT_COHERENT_CACHE)
> int i;
> unsigned int *ptr;
> ptr = coherence_data;
> for (i = 0; i < 1024*8; i++)
> *ptr++ = 0;
> #endif
>
> Now this apparently is not enough on a e300 core that has a PLRU cache
> replacement algorithm. but what is the optimal solution?
Which driver (in which kernel) are you looking at?
drivers/video/fsl-diu-fb.c in current mainline has properly sized
coherence data. It also does a dcbz (on unused data) instead of loads,
as it's apparently faster (though I'd think you'd get more traffic
flushing those zeroes out later on, compared to a clean line that can
just be discarded).
> should not the framebuffer be marked as cache write through. that is the
> W bit should be set in the tlb mapping. Why is this not done ? is that
> feature also not working on 5121 ??
It probably would have been too slow.
> problem with doing it over just the framebuffer is that a 1024x768
> buffer is 98304 cache lines it's going to take a considerable time to
> do.
That's why we flush the whole cache instead.
> how many cycles does it take per cache line if we never get a hit ??
> 3cycles at 400MHz gives 4.5milisec/sec or 4-5% overhead
>
> 1024*768*4/32*3*(1/400000000)*60
> .04423680000000000000
>
> 52kB on the other hand is only 1664 lines but is obviously going to have
> to do a lot of actual memory writes also for any modified cache line and
> later a lot of reads to read back what was evicted.
During periods of framebuffer activity, a lot of those cache lines likely
are for the framebuffer, so you'll still have those same issues.
If current performance is inadequate, you may want to consider using the
MMU and timers to figure out when the framebuffer is active, and stop the
sync when it's not.
-Scott
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 5121 cache handling.
2009-08-07 19:56 ` Scott Wood
@ 2009-08-10 20:16 ` Kenneth Johansson
2009-08-10 20:26 ` Scott Wood
0 siblings, 1 reply; 6+ messages in thread
From: Kenneth Johansson @ 2009-08-10 20:16 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
On Fri, 2009-08-07 at 14:56 -0500, Scott Wood wrote:
> On Fri, Aug 07, 2009 at 02:53:52PM +0200, Kenneth Johansson wrote:
> > on 5121 there is a e300 core that unfortunately is connected to the rest
> > of the SOC with a bus that do not support coherency.
> >
> > solution for many driver has been to use uncached memory. But for the
> > framebuffer that is not going to work as the performance impact of doing
> > graphics operations on uncached memory is to large.
> >
> > currently the "solution" is to flush the cache in the interrupt
> > handler.
> >
> > #if defined(CONFIG_NOT_COHERENT_CACHE)
> > int i;
> > unsigned int *ptr;
> > ptr = coherence_data;
> > for (i = 0; i < 1024*8; i++)
> > *ptr++ = 0;
> > #endif
> >
> > Now this apparently is not enough on a e300 core that has a PLRU cache
> > replacement algorithm. but what is the optimal solution?
>
> Which driver (in which kernel) are you looking at?
The one included in ltib 2009-06-02 for ads5121. Thought that was
including the latest drivers.
> drivers/video/fsl-diu-fb.c in current mainline has properly sized
> coherence data. It also does a dcbz (on unused data) instead of loads,
> as it's apparently faster (though I'd think you'd get more traffic
> flushing those zeroes out later on, compared to a clean line that can
> just be discarded).
It's hard to know exactly how things behave when cache is involved.
But the code allocate the 52KB buffer with vmalloc that cant be right as
cache is stored with physical address the 52KB data need to be 52KB
continuous in physical address and vmalloc do not guarantee that.
> > should not the framebuffer be marked as cache write through. that is the
> > W bit should be set in the tlb mapping. Why is this not done ? is that
> > feature also not working on 5121 ??
>
> It probably would have been too slow.
how much slower would write through be ? I thought it was not that big
of a difference from copy back.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 5121 cache handling.
2009-08-10 20:16 ` Kenneth Johansson
@ 2009-08-10 20:26 ` Scott Wood
2009-08-10 20:45 ` Kenneth Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Scott Wood @ 2009-08-10 20:26 UTC (permalink / raw)
To: Kenneth Johansson; +Cc: linuxppc-dev
Kenneth Johansson wrote:
> But the code allocate the 52KB buffer with vmalloc that cant be right as
> cache is stored with physical address the 52KB data need to be 52KB
> continuous in physical address and vmalloc do not guarantee that.
Yeah, that looks like a bug.
>>> should not the framebuffer be marked as cache write through. that is the
>>> W bit should be set in the tlb mapping. Why is this not done ? is that
>>> feature also not working on 5121 ??
>> It probably would have been too slow.
>
> how much slower would write through be ? I thought it was not that big
> of a difference from copy back.
It's a big difference if you're writing out an entire cache line of data
anyway, but because of write-through it goes out one word at a time
without bursting.
-Scott
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 5121 cache handling.
2009-08-10 20:26 ` Scott Wood
@ 2009-08-10 20:45 ` Kenneth Johansson
2009-08-10 20:49 ` Scott Wood
0 siblings, 1 reply; 6+ messages in thread
From: Kenneth Johansson @ 2009-08-10 20:45 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
On Mon, 2009-08-10 at 15:26 -0500, Scott Wood wrote:
> Kenneth Johansson wrote:
> >>> should not the framebuffer be marked as cache write through. that is the
> >>> W bit should be set in the tlb mapping. Why is this not done ? is that
> >>> feature also not working on 5121 ??
> >> It probably would have been too slow.
> >
> > how much slower would write through be ? I thought it was not that big
> > of a difference from copy back.
>
> It's a big difference if you're writing out an entire cache line of data
> anyway, but because of write-through it goes out one word at a time
> without bursting.
>
> -Scott
>
Yes the memory system would obviously get a higher load but do the CPU
actually see that? do it stall on the write ?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-08-10 20:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-07 12:53 5121 cache handling Kenneth Johansson
2009-08-07 19:56 ` Scott Wood
2009-08-10 20:16 ` Kenneth Johansson
2009-08-10 20:26 ` Scott Wood
2009-08-10 20:45 ` Kenneth Johansson
2009-08-10 20:49 ` Scott Wood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).