* OMAPFB_FILLRECT and friends
@ 2006-02-03 19:11 Brian Swetland
2006-02-06 8:26 ` Imre Deak
0 siblings, 1 reply; 7+ messages in thread
From: Brian Swetland @ 2006-02-03 19:11 UTC (permalink / raw)
To: linux-omap-open-source
It appears that an older version of the omapfb driver had support for
ioctls to FILLRECT, COPYAREA, TRANSPARENT_BLIT, etc. A bit of searching
found this version at:
http://source.mvista.com/git/gitweb.cgi?p=linux-omap-2.6.git;a=blob;h=aeb66c8fbb09bb675a4662a5c055d0162f5e109f
Being unwise in the way of git or gitweb, I have no idea how to go
from this page to "where in space or time does this stuff live".
Any pointers? Was this stuff once there and later removed (it
doesn't seem to be in tip of tree) and if so, why?
Curious,
Brian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OMAPFB_FILLRECT and friends
2006-02-03 19:11 Brian Swetland
@ 2006-02-06 8:26 ` Imre Deak
0 siblings, 0 replies; 7+ messages in thread
From: Imre Deak @ 2006-02-06 8:26 UTC (permalink / raw)
To: ext Brian Swetland; +Cc: linux-omap-open-source
Hi,
On Fri, 2006-02-03 at 11:11 -0800, ext Brian Swetland wrote:
> It appears that an older version of the omapfb driver had support for
> ioctls to FILLRECT, COPYAREA, TRANSPARENT_BLIT, etc. A bit of searching
> found this version at:
> http://source.mvista.com/git/gitweb.cgi?p=linux-omap-2.6.git;a=blob;h=aeb66c8fbb09bb675a4662a5c055d0162f5e109f
>
> Being unwise in the way of git or gitweb, I have no idea how to go
> from this page to "where in space or time does this stuff live".
> Any pointers?
You can check out a specific version based on the commit id with
$ git checkout -b <branch> <commit id>
> Was this stuff once there and later removed (it
> doesn't seem to be in tip of tree) and if so, why?
Yes, I removed it because later performance testing showed no
improvement whatsoever when using DMA for SDRAM - SDRAM transfers
compared to using MPU bursts.
Being a dead code I didn't see any reason leaving it in the driver.
--Imre
>
> Curious,
>
> Brian
> _______________________________________________
> Linux-omap-open-source mailing list
> Linux-omap-open-source@linux.omap.com
> http://linux.omap.com/mailman/listinfo/linux-omap-open-source
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: OMAPFB_FILLRECT and friends
@ 2006-02-06 13:36 Woodruff, Richard
2006-02-07 7:18 ` Imre Deak
0 siblings, 1 reply; 7+ messages in thread
From: Woodruff, Richard @ 2006-02-06 13:36 UTC (permalink / raw)
To: Imre Deak, ext Brian Swetland; +Cc: linux-omap-open-source
> $ git checkout -b <branch> <commit id>
>
> > Was this stuff once there and later removed (it
> > doesn't seem to be in tip of tree) and if so, why?
>
> Yes, I removed it because later performance testing showed no
> improvement whatsoever when using DMA for SDRAM - SDRAM transfers
> compared to using MPU bursts.
>
> Being a dead code I didn't see any reason leaving it in the driver.
Generally unless the MPU overhead is high, using DMA usually drops the
CPU load even if the performance is equivalent. Ideally test runs
capturing raw throughput, cpu load, and power usage would happen along
with such tweaks. The results may vary across CPUs as internals are not
all equal even though the register set may be close.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: OMAPFB_FILLRECT and friends
2006-02-06 13:36 Woodruff, Richard
@ 2006-02-07 7:18 ` Imre Deak
0 siblings, 0 replies; 7+ messages in thread
From: Imre Deak @ 2006-02-07 7:18 UTC (permalink / raw)
To: ext Woodruff, Richard; +Cc: linux-omap-open-source
On Mon, 2006-02-06 at 07:36 -0600, ext Woodruff, Richard wrote:
> > $ git checkout -b <branch> <commit id>
> >
> > > Was this stuff once there and later removed (it
> > > doesn't seem to be in tip of tree) and if so, why?
> >
> > Yes, I removed it because later performance testing showed no
> > improvement whatsoever when using DMA for SDRAM - SDRAM transfers
> > compared to using MPU bursts.
> >
> > Being a dead code I didn't see any reason leaving it in the driver.
>
> Generally unless the MPU overhead is high, using DMA usually drops the
> CPU load even if the performance is equivalent.
Yes, that's the only benefit I would have expected.
> Ideally test runs
> capturing raw throughput, cpu load, and power usage would happen along
> with such tweaks.
> The results may vary across CPUs as internals are not
> all equal even though the register set may be close.
I've made the measurements on OMAP1610/1710 platforms. What I wanted to
see is that using the DMA will indeed offload the MPU. In reality the
MPU was practically stalled, I assume because of SDRAM or system
bandwidth limitation and the fact the we have a cache flush between
context switches.
This I tested running X which used the DMA versions of the copy and fill
operations.
--Imre
>
> Regards,
> Richard W.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: OMAPFB_FILLRECT and friends
@ 2006-02-07 13:57 Woodruff, Richard
2006-02-07 14:57 ` Paul Mundt
0 siblings, 1 reply; 7+ messages in thread
From: Woodruff, Richard @ 2006-02-07 13:57 UTC (permalink / raw)
To: Imre Deak; +Cc: linux-omap-open-source
> > Ideally test runs
> > capturing raw throughput, cpu load, and power usage would happen
along
> > with such tweaks.
> > The results may vary across CPUs as internals are not
> > all equal even though the register set may be close.
>
> I've made the measurements on OMAP1610/1710 platforms. What I wanted
to
> see is that using the DMA will indeed offload the MPU. In reality the
> MPU was practically stalled, I assume because of SDRAM or system
> bandwidth limitation and the fact the we have a cache flush between
> context switches.
2420 and 2430 have much greater bandwidth though the L3 interconnects to
the DDR (16/17xx < 2420 < 2430 in this regard). And v6 doesn't flush
caches nearly as much so many of these effects are gone.
At preset on OMAP2 we do make test runs which measure throughput, cpu
load, and power per device. I think in all cases we have measured the
expected reduced cpu load when the DMA portion was enabled (but not
always performance given the speed of the peripheral).
I would expect this code to be useful in OMAP2 for sure.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OMAPFB_FILLRECT and friends
2006-02-07 13:57 OMAPFB_FILLRECT and friends Woodruff, Richard
@ 2006-02-07 14:57 ` Paul Mundt
0 siblings, 0 replies; 7+ messages in thread
From: Paul Mundt @ 2006-02-07 14:57 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-omap-open-source
Hi Richard,
On Tue, Feb 07, 2006 at 07:57:44AM -0600, Woodruff, Richard wrote:
> > I've made the measurements on OMAP1610/1710 platforms. What I wanted
> > to see is that using the DMA will indeed offload the MPU. In reality
> > the MPU was practically stalled, I assume because of SDRAM or system
> > bandwidth limitation and the fact the we have a cache flush between
> > context switches.
>
> 2420 and 2430 have much greater bandwidth though the L3 interconnects to
> the DDR (16/17xx < 2420 < 2430 in this regard). And v6 doesn't flush
> caches nearly as much so many of these effects are gone.
>
That's irrelevant in the DMA case, since there's no snooping logic and we
have the write-back or invalidate depending on direction anyways. It's
certainly not as frequent as every context switch, but it's still going
to have an impact on performance.
If the MPU ends up stalling anyways, then offloading to DMA is pointless,
since the offload doesn't really buy you anything, not to mention the
higher cost of setting up the DMA in the first place.
Unless it's a clear win from a performance point of view, it's not
obvious that having the added complexity of the code in the driver is a
worthwhile tradeoff. The numbers provided by Imre don't indicate that DMA
is a significant win, if you have some that show otherwise, it'd be
interesting to see your test cases, and effect they have on MPU stalling.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: OMAPFB_FILLRECT and friends
@ 2006-02-07 16:34 Woodruff, Richard
0 siblings, 0 replies; 7+ messages in thread
From: Woodruff, Richard @ 2006-02-07 16:34 UTC (permalink / raw)
To: Paul Mundt; +Cc: linux-omap-open-source
Hi Paul,
> On Tue, Feb 07, 2006 at 07:57:44AM -0600, Woodruff, Richard wrote:
> > > I've made the measurements on OMAP1610/1710 platforms. What I
wanted
> > > to see is that using the DMA will indeed offload the MPU. In
reality
> > > the MPU was practically stalled, I assume because of SDRAM or
system
> > > bandwidth limitation and the fact the we have a cache flush
between
> > > context switches.
> >
> > 2420 and 2430 have much greater bandwidth though the L3
interconnects to
> > the DDR (16/17xx < 2420 < 2430 in this regard). And v6 doesn't
flush
> > caches nearly as much so many of these effects are gone.
> >
> That's irrelevant in the DMA case, since there's no snooping logic and
we
> have the write-back or invalidate depending on direction anyways. It's
> certainly not as frequent as every context switch, but it's still
going
> to have an impact on performance.
What is the mapping of the buffers in question? It seems the mmap done
in fbmem.c will mark things as non-cached and at best for ARM write
combining.
DMA maintenance may or may not include cache flushing. Depending on
what attributes the buffer is allocated with. In the case of video it
is not cached in the general case. Or are you referring to another
mechanism?
Having the ARM PIO to strongly ordered memory on a v6 will completely
stop the pipelines and uses a lot of logic to move data where it
probably can be done more efficiently with a tiny by logic comparison
DMA engine. Having the ARM in a WFI is far better than having it do PIO
from a power perspective.
> If the MPU ends up stalling anyways, then offloading to DMA is
pointless,
> since the offload doesn't really buy you anything, not to mention the
> higher cost of setting up the DMA in the first place.
I think I'm missing something here. Why is the MPU stalling here, isn't
there other work for it to do? Why should it be dependency locked on
some graphics operation? While the DMA is on going the CPU should be
off doing other work and get an async notification by the completion of
the event. If there is nothing else for it to do, then fine it can be
stuck in idle. Having the ability to other work is what matters.
When I think of stalling, I guess I look the micro architecture pipeline
level and then right up to the macro idle loop. The micro level doesn't
seem to apply here but the loop does.
> Unless it's a clear win from a performance point of view, it's not
> obvious that having the added complexity of the code in the driver is
a
> worthwhile tradeoff. The numbers provided by Imre don't indicate that
DMA
> is a significant win, if you have some that show otherwise, it'd be
> interesting to see your test cases, and effect they have on MPU
stalling.
Measurements we take are used for internal comparisons and regression.
Releasing them requires several administrative actions which imply
overhead and at minimum re-running tests. It is very likely that
customers can request such data triggering work, but it is something we
are not set up to release at present.
I agree that if you are talking about a couple of bytes the overhead for
a DMA may be more that it is worth. But if you are talking about a
large block copy then it would seem DMA would be better.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-02-07 16:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-07 13:57 OMAPFB_FILLRECT and friends Woodruff, Richard
2006-02-07 14:57 ` Paul Mundt
-- strict thread matches above, loose matches on Subject: below --
2006-02-07 16:34 Woodruff, Richard
2006-02-06 13:36 Woodruff, Richard
2006-02-07 7:18 ` Imre Deak
2006-02-03 19:11 Brian Swetland
2006-02-06 8:26 ` Imre Deak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox