* [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 @ 2010-11-04 9:52 Eric Bénard 2010-11-04 18:05 ` Khem Raj 2010-11-04 18:26 ` Holger Freyther 0 siblings, 2 replies; 12+ messages in thread From: Eric Bénard @ 2010-11-04 9:52 UTC (permalink / raw) To: openembedded-devel; +Cc: Eric Bénard this is a RFC, I think it can also be used on qt 4.6.x and on armv7. Setting QT_ARCH to armv6 enable some asm optimized functions in QT. Signed-off-by: Eric Bénard <eric@eukrea.com> --- recipes/qt4/qt4-embedded_4.7.0.bb | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/recipes/qt4/qt4-embedded_4.7.0.bb b/recipes/qt4/qt4-embedded_4.7.0.bb index 7e3d4b8..fbfaf85 100644 --- a/recipes/qt4/qt4-embedded_4.7.0.bb +++ b/recipes/qt4/qt4-embedded_4.7.0.bb @@ -2,9 +2,10 @@ DEFAULT_PREFERENCE = "-1" require qt4-embedded.inc -PR = "${INC_PR}.1" +PR = "${INC_PR}.2" QT_CONFIG_FLAGS_append_armv6 = " -no-neon " +QT_ARCH_armv6 = "armv6" require qt-${PV}.inc -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 9:52 [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 Eric Bénard @ 2010-11-04 18:05 ` Khem Raj 2010-11-04 18:26 ` Holger Freyther 1 sibling, 0 replies; 12+ messages in thread From: Khem Raj @ 2010-11-04 18:05 UTC (permalink / raw) To: openembedded-devel; +Cc: Eric Bénard On (04/11/10 10:52), Eric Bénard wrote: > this is a RFC, I think it can also be used on qt 4.6.x and on armv7. > Setting QT_ARCH to armv6 enable some asm optimized functions in QT. > > Signed-off-by: Eric Bénard <eric@eukrea.com> Acked-by: Khem Raj <raj.khem@gmail.com> > --- > recipes/qt4/qt4-embedded_4.7.0.bb | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/recipes/qt4/qt4-embedded_4.7.0.bb b/recipes/qt4/qt4-embedded_4.7.0.bb > index 7e3d4b8..fbfaf85 100644 > --- a/recipes/qt4/qt4-embedded_4.7.0.bb > +++ b/recipes/qt4/qt4-embedded_4.7.0.bb > @@ -2,9 +2,10 @@ DEFAULT_PREFERENCE = "-1" > > require qt4-embedded.inc > > -PR = "${INC_PR}.1" > +PR = "${INC_PR}.2" > > QT_CONFIG_FLAGS_append_armv6 = " -no-neon " > +QT_ARCH_armv6 = "armv6" > > require qt-${PV}.inc > > -- > 1.6.3.3 > > > _______________________________________________ > Openembedded-devel mailing list > Openembedded-devel@lists.openembedded.org > http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 9:52 [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 Eric Bénard 2010-11-04 18:05 ` Khem Raj @ 2010-11-04 18:26 ` Holger Freyther 2010-11-04 19:14 ` Eric Bénard ` (3 more replies) 1 sibling, 4 replies; 12+ messages in thread From: Holger Freyther @ 2010-11-04 18:26 UTC (permalink / raw) To: openembedded-devel On 11/04/2010 10:52 AM, Eric Bénard wrote: > this is a RFC, I think it can also be used on qt 4.6.x and on armv7. > Setting QT_ARCH to armv6 enable some asm optimized functions in QT. Do you know the consequences of this change? E.g. in the painting engine unaligned memory access will be allowed. I am not sure that Nokia has ever benchmarked what is faster (aligned/unaligned) access. So it would be very interesting to know the perf difference. E.g. use qttracereply to measure it. Phil do you happen to know why unaligned access got allowed starting from armv6? How fast will it be? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 18:26 ` Holger Freyther @ 2010-11-04 19:14 ` Eric Bénard 2010-11-05 9:32 ` Holger Freyther 2010-11-04 20:27 ` Khem Raj ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Eric Bénard @ 2010-11-04 19:14 UTC (permalink / raw) To: openembedded-devel Hi, Le 04/11/2010 19:26, Holger Freyther a écrit : > On 11/04/2010 10:52 AM, Eric Bénard wrote: >> this is a RFC, I think it can also be used on qt 4.6.x and on armv7. >> Setting QT_ARCH to armv6 enable some asm optimized functions in QT. > > Do you know the consequences of this change? E.g. in the painting > engine unaligned memory access will be allowed. I am not sure that Nokia has > ever benchmarked what is faster (aligned/unaligned) access. So it would be > very interesting to know the perf difference. E.g. use qttracereply to measure it. > > Phil do you happen to know why unaligned access got allowed starting from > armv6? How fast will it be? > in the painting engine, from what I saw in the .h, the asm optimisations are only enabled when using ARM RCVT compiler and not when using gcc but I have not checked everywhere in the code so I'm not sure of this. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 19:14 ` Eric Bénard @ 2010-11-05 9:32 ` Holger Freyther 2010-11-05 9:42 ` Eric Bénard 0 siblings, 1 reply; 12+ messages in thread From: Holger Freyther @ 2010-11-05 9:32 UTC (permalink / raw) To: openembedded-devel On 11/04/2010 08:14 PM, Eric Bénard wrote: > Hi, ill it be? >> > in the painting engine, from what I saw in the .h, the asm optimisations are > only enabled when using ARM RCVT compiler and not when using gcc but I have > not checked everywhere in the code so I'm not sure of this. that is something else, I also have a partially applied C implementation that appears to be faster than the brute force armv6 code. One benefit of using armv6 is that the atomic code will use a different implementation (ldrex,strex IIRC instead of swp). In any case please use something like qttracereply to see if passing armv6 is giving any benefit at all. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-05 9:32 ` Holger Freyther @ 2010-11-05 9:42 ` Eric Bénard 2010-11-05 10:21 ` Holger Freyther 0 siblings, 1 reply; 12+ messages in thread From: Eric Bénard @ 2010-11-05 9:42 UTC (permalink / raw) To: openembedded-devel Hi, Le 05/11/2010 10:32, Holger Freyther a écrit : > On 11/04/2010 08:14 PM, Eric Bénard wrote: >> in the painting engine, from what I saw in the .h, the asm optimisations are >> only enabled when using ARM RCVT compiler and not when using gcc but I have >> not checked everywhere in the code so I'm not sure of this. > > that is something else, I also have a partially applied C implementation that > appears to be faster than the brute force armv6 code. > OK where is the code you are talking about in qt ? > One benefit of using armv6 is that the atomic code will use a different > implementation (ldrex,strex IIRC instead of swp). In any case please use > something like qttracereply to see if passing armv6 is giving any benefit at all. > I can, do you have a quick howto run this ? One thing : what I see here is that qtdemo was crashing after a random time and now (with armv6) it runs stable for hours. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-05 9:42 ` Eric Bénard @ 2010-11-05 10:21 ` Holger Freyther 2010-11-05 10:41 ` Eric Bénard 2010-11-07 19:42 ` Eric Bénard 0 siblings, 2 replies; 12+ messages in thread From: Holger Freyther @ 2010-11-05 10:21 UTC (permalink / raw) To: openembedded-devel On 11/05/2010 10:42 AM, Eric Bénard wrote: > Hi, >> > OK where is the code you are talking about in qt ? tool/qttracereply. 1.) record a trace (e.g. the qtdemo) you will need Qt/X11 for that (desktop, device). But make sure that the qtdemo width/height fits on the screen. $ qttdemo -graphicssystem trace (exit the app normally) 2.) Use qttracereply to replay the trace, it will print nice FPS data. > > One thing : what I see here is that qtdemo was crashing after a random time > and now (with armv6) it runs stable for hours. well, not very scientific. :) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-05 10:21 ` Holger Freyther @ 2010-11-05 10:41 ` Eric Bénard 2010-11-07 19:42 ` Eric Bénard 1 sibling, 0 replies; 12+ messages in thread From: Eric Bénard @ 2010-11-05 10:41 UTC (permalink / raw) To: openembedded-devel Le 05/11/2010 11:21, Holger Freyther a écrit : > On 11/05/2010 10:42 AM, Eric Bénard wrote: >> Hi, > >>> >> OK where is the code you are talking about in qt ? > > tool/qttracereply. > > 1.) record a trace (e.g. the qtdemo) you will need Qt/X11 for that (desktop, > device). But make sure that the qtdemo width/height fits on the screen. > Does this option works for qtembedded (I don't have X11 on my boards) ? > $ qttdemo -graphicssystem trace (exit the app normally) > > 2.) Use qttracereply to replay the trace, it will print nice FPS data. > > > > >> >> One thing : what I see here is that qtdemo was crashing after a random time >> and now (with armv6) it runs stable for hours. > > well, not very scientific. :) > not at all ;) Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-05 10:21 ` Holger Freyther 2010-11-05 10:41 ` Eric Bénard @ 2010-11-07 19:42 ` Eric Bénard 1 sibling, 0 replies; 12+ messages in thread From: Eric Bénard @ 2010-11-07 19:42 UTC (permalink / raw) To: openembedded-devel Hi, Le 05/11/2010 11:21, Holger Freyther a écrit : > 1.) record a trace (e.g. the qtdemo) you will need Qt/X11 for that (desktop, > device). But make sure that the qtdemo width/height fits on the screen. > > $ qttdemo -graphicssystem trace (exit the app normally) > > 2.) Use qttracereply to replay the trace, it will print nice FPS data. > > here are the results : Distro : angstrom-2010.x CPU : i.MX357 Kernel : Linux 2.6.36+patches trace generated using example concentriccirles (from qt4.7 X11 32bits sdk), screen resolution : 320x240 : concentriccircles -geometry 320x240 -graphicssystem trace bench : mount -o remount,ro / qttracereplay ./qtgraphics-106954754.trace -qws (run 5 times) => qt4.7 with armv6 arch : # qttracereplay ./qtgraphics-106954754.trace -qws Read paint buffer version 1 with 282 frames ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7403, median(ms): 7405, stddev: 0.012733 %, max(fps): 38.092665 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7375, median(ms): 7376, stddev: 0.011070 %, max(fps): 38.237288 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7370, median(ms): 7371, stddev: 0.016920 %, max(fps): 38.263229 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7388, median(ms): 7389, stddev: 0.011050 %, max(fps): 38.170005 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7372, median(ms): 7373, stddev: 0.016915 %, max(fps): 38.252849 => qt4.7 with arm arch : # qttracereplay ./qtgraphics-106954754.trace -qws Read paint buffer version 1 with 282 frames ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7435, median(ms): 7436, stddev: 0.010980 %, max(fps): 37.928716 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7431, median(ms): 7434, stddev: 0.022866 %, max(fps): 37.949132 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7456, median(ms): 7457, stddev: 0.010949 %, max(fps): 37.821888 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7423, median(ms): 7424, stddev: 0.010998 %, max(fps): 37.990031 ./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7417, median(ms): 7417, stddev: 0.050828 %, max(fps): 38.020763 Conclusion : no regression, nearly no improvement. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 18:26 ` Holger Freyther 2010-11-04 19:14 ` Eric Bénard @ 2010-11-04 20:27 ` Khem Raj 2010-11-04 21:05 ` Koen Kooi 2010-11-05 11:21 ` Phil Blundell 3 siblings, 0 replies; 12+ messages in thread From: Khem Raj @ 2010-11-04 20:27 UTC (permalink / raw) To: openembedded-devel On Thu, Nov 4, 2010 at 11:26 AM, Holger Freyther <holger+oe@freyther.de> wrote: > On 11/04/2010 10:52 AM, Eric Bénard wrote: >> this is a RFC, I think it can also be used on qt 4.6.x and on armv7. >> Setting QT_ARCH to armv6 enable some asm optimized functions in QT. > > Do you know the consequences of this change? E.g. in the painting > engine unaligned memory access will be allowed. I am not sure that Nokia has > ever benchmarked what is faster (aligned/unaligned) access. So it would be > very interesting to know the perf difference. E.g. use qttracereply to measure it. > > Phil do you happen to know why unaligned access got allowed starting from > armv6? How fast will it be? > The instructions can cope with unaligned data so the performance will not be impacted but its only upto 32-bit data above that the fault will still be generated and not all instructions can operate on unaligned data only ones operating on 32-bit data can do it. > _______________________________________________ > Openembedded-devel mailing list > Openembedded-devel@lists.openembedded.org > http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-devel > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 18:26 ` Holger Freyther 2010-11-04 19:14 ` Eric Bénard 2010-11-04 20:27 ` Khem Raj @ 2010-11-04 21:05 ` Koen Kooi 2010-11-05 11:21 ` Phil Blundell 3 siblings, 0 replies; 12+ messages in thread From: Koen Kooi @ 2010-11-04 21:05 UTC (permalink / raw) To: openembedded-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04-11-10 19:26, Holger Freyther wrote: > On 11/04/2010 10:52 AM, Eric Bénard wrote: >> this is a RFC, I think it can also be used on qt 4.6.x and on armv7. >> Setting QT_ARCH to armv6 enable some asm optimized functions in QT. > > Do you know the consequences of this change? E.g. in the painting > engine unaligned memory access will be allowed. I am not sure that Nokia has > ever benchmarked what is faster (aligned/unaligned) access. So it would be > very interesting to know the perf difference. E.g. use qttracereply to measure it. > > Phil do you happen to know why unaligned access got allowed starting from > armv6? How fast will it be? - From previous experiments the "hardware" unaligned was the same speed as the sw fixup on armv6. regards, Koen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iD8DBQFM0yAhMkyGM64RGpERAo+KAKCLzq/F9/dH/viPTMDWqpIRcv5k8QCgjoAG BU6Ps9W+NUB0nSNOA/yT1hw= =TT6S -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 2010-11-04 18:26 ` Holger Freyther ` (2 preceding siblings ...) 2010-11-04 21:05 ` Koen Kooi @ 2010-11-05 11:21 ` Phil Blundell 3 siblings, 0 replies; 12+ messages in thread From: Phil Blundell @ 2010-11-05 11:21 UTC (permalink / raw) To: Holger Freyther; +Cc: openembedded-devel On Thu, 2010-11-04 at 19:26 +0100, Holger Freyther wrote: > Phil do you happen to know why unaligned access got allowed starting from > armv6? How fast will it be? I don't know, but I would suspect that unaligned access was added to ARMv6 in response to customer demand, and because a certain other popular architecture supported it already. Having the silicon be able to do unaligned accesses isn't a totally worthless thing. If you're faced with, say, a 32-bit int of unknown alignment which you want to demarshal, there are two possible ways you can do it in a pre-v6 world: a) use four single-byte loads plus some ALU operations to combine the results. This is guaranteed safe for all alignments, but the LDRB+LDRB +ORR+LDRB+ORR+LDRB+ORR sequence is inefficient compared to a single LDR: it takes seven cycles to execute and requires a scratch register, even if the operand is in fact naturally aligned at runtime. b) use a single LDR instruction, let the MMU trap unaligned accesses, and fix them up in software using sequence (a). This has the advantage that the aligned case runs at full speed, but the performance penalty for loads which turn out to be unaligned is severe, probably at least several tens of cycles and perhaps even more than that. If many unaligned operands are encountered then the performance might be worse than just using (a) in the first place. If the processor can do unaligned accesses natively then this problem goes away: you can just use a single LDR instruction and it will always execute at the maximum speed the hardware is capable of for that particular operand. For naturally aligned operands the load will complete in a single cycle, just like it always did. Since AHB doesn't support unaligned access at the bus level, misaligned operands will need two bus cycles but that's still much faster than either of the software-based approaches. p. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-11-07 19:43 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-04 9:52 [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 Eric Bénard 2010-11-04 18:05 ` Khem Raj 2010-11-04 18:26 ` Holger Freyther 2010-11-04 19:14 ` Eric Bénard 2010-11-05 9:32 ` Holger Freyther 2010-11-05 9:42 ` Eric Bénard 2010-11-05 10:21 ` Holger Freyther 2010-11-05 10:41 ` Eric Bénard 2010-11-07 19:42 ` Eric Bénard 2010-11-04 20:27 ` Khem Raj 2010-11-04 21:05 ` Koen Kooi 2010-11-05 11:21 ` Phil Blundell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.