All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
@ 2010-11-04  9:52 Eric Bénard
  2010-11-04 18:05 ` Khem Raj
  2010-11-04 18:26 ` Holger Freyther
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Bénard @ 2010-11-04  9:52 UTC (permalink / raw)
  To: openembedded-devel; +Cc: Eric Bénard

this is a RFC, I think it can also be used on qt 4.6.x and on armv7.
Setting QT_ARCH to armv6 enable some asm optimized functions in QT.

Signed-off-by: Eric Bénard <eric@eukrea.com>
---
 recipes/qt4/qt4-embedded_4.7.0.bb |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/recipes/qt4/qt4-embedded_4.7.0.bb b/recipes/qt4/qt4-embedded_4.7.0.bb
index 7e3d4b8..fbfaf85 100644
--- a/recipes/qt4/qt4-embedded_4.7.0.bb
+++ b/recipes/qt4/qt4-embedded_4.7.0.bb
@@ -2,9 +2,10 @@ DEFAULT_PREFERENCE = "-1"
 
 require qt4-embedded.inc
 
-PR = "${INC_PR}.1"
+PR = "${INC_PR}.2"
 
 QT_CONFIG_FLAGS_append_armv6 = " -no-neon "
+QT_ARCH_armv6 = "armv6"
 
 require qt-${PV}.inc
 
-- 
1.6.3.3




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04  9:52 [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 Eric Bénard
@ 2010-11-04 18:05 ` Khem Raj
  2010-11-04 18:26 ` Holger Freyther
  1 sibling, 0 replies; 12+ messages in thread
From: Khem Raj @ 2010-11-04 18:05 UTC (permalink / raw)
  To: openembedded-devel; +Cc: Eric Bénard

On (04/11/10 10:52), Eric Bénard wrote:
> this is a RFC, I think it can also be used on qt 4.6.x and on armv7.
> Setting QT_ARCH to armv6 enable some asm optimized functions in QT.
> 
> Signed-off-by: Eric Bénard <eric@eukrea.com>

Acked-by: Khem Raj <raj.khem@gmail.com>

> ---
>  recipes/qt4/qt4-embedded_4.7.0.bb |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/recipes/qt4/qt4-embedded_4.7.0.bb b/recipes/qt4/qt4-embedded_4.7.0.bb
> index 7e3d4b8..fbfaf85 100644
> --- a/recipes/qt4/qt4-embedded_4.7.0.bb
> +++ b/recipes/qt4/qt4-embedded_4.7.0.bb
> @@ -2,9 +2,10 @@ DEFAULT_PREFERENCE = "-1"
>  
>  require qt4-embedded.inc
>  
> -PR = "${INC_PR}.1"
> +PR = "${INC_PR}.2"
>  
>  QT_CONFIG_FLAGS_append_armv6 = " -no-neon "
> +QT_ARCH_armv6 = "armv6"
>  
>  require qt-${PV}.inc
>  
> -- 
> 1.6.3.3
> 
> 
> _______________________________________________
> Openembedded-devel mailing list
> Openembedded-devel@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-devel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04  9:52 [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 Eric Bénard
  2010-11-04 18:05 ` Khem Raj
@ 2010-11-04 18:26 ` Holger Freyther
  2010-11-04 19:14   ` Eric Bénard
                     ` (3 more replies)
  1 sibling, 4 replies; 12+ messages in thread
From: Holger Freyther @ 2010-11-04 18:26 UTC (permalink / raw)
  To: openembedded-devel

On 11/04/2010 10:52 AM, Eric Bénard wrote:
> this is a RFC, I think it can also be used on qt 4.6.x and on armv7.
> Setting QT_ARCH to armv6 enable some asm optimized functions in QT.

Do you know the consequences of this change? E.g. in the painting
engine unaligned memory access will be allowed. I am not sure that Nokia has
ever benchmarked what is faster (aligned/unaligned) access. So it would be
very interesting to know the perf difference. E.g. use qttracereply to measure it.

Phil do you happen to know why unaligned access got allowed starting from
armv6? How fast will it be?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04 18:26 ` Holger Freyther
@ 2010-11-04 19:14   ` Eric Bénard
  2010-11-05  9:32     ` Holger Freyther
  2010-11-04 20:27   ` Khem Raj
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Eric Bénard @ 2010-11-04 19:14 UTC (permalink / raw)
  To: openembedded-devel

Hi,

Le 04/11/2010 19:26, Holger Freyther a écrit :
> On 11/04/2010 10:52 AM, Eric Bénard wrote:
>> this is a RFC, I think it can also be used on qt 4.6.x and on armv7.
>> Setting QT_ARCH to armv6 enable some asm optimized functions in QT.
>
> Do you know the consequences of this change? E.g. in the painting
> engine unaligned memory access will be allowed. I am not sure that Nokia has
> ever benchmarked what is faster (aligned/unaligned) access. So it would be
> very interesting to know the perf difference. E.g. use qttracereply to measure it.
>
> Phil do you happen to know why unaligned access got allowed starting from
> armv6? How fast will it be?
>
in the painting engine, from what I saw in the .h, the asm optimisations are 
only enabled when using ARM RCVT compiler and not when using gcc but I have 
not checked everywhere in the code so I'm not sure of this.

Eric



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04 18:26 ` Holger Freyther
  2010-11-04 19:14   ` Eric Bénard
@ 2010-11-04 20:27   ` Khem Raj
  2010-11-04 21:05   ` Koen Kooi
  2010-11-05 11:21   ` Phil Blundell
  3 siblings, 0 replies; 12+ messages in thread
From: Khem Raj @ 2010-11-04 20:27 UTC (permalink / raw)
  To: openembedded-devel

On Thu, Nov 4, 2010 at 11:26 AM, Holger Freyther <holger+oe@freyther.de> wrote:
> On 11/04/2010 10:52 AM, Eric Bénard wrote:
>> this is a RFC, I think it can also be used on qt 4.6.x and on armv7.
>> Setting QT_ARCH to armv6 enable some asm optimized functions in QT.
>
> Do you know the consequences of this change? E.g. in the painting
> engine unaligned memory access will be allowed. I am not sure that Nokia has
> ever benchmarked what is faster (aligned/unaligned) access. So it would be
> very interesting to know the perf difference. E.g. use qttracereply to measure it.
>
> Phil do you happen to know why unaligned access got allowed starting from
> armv6? How fast will it be?
>

The instructions can cope with unaligned data so the performance will
not be impacted but its only upto
32-bit data above that the fault will still be generated and not all
instructions can operate on unaligned data
only ones operating on 32-bit data can do it.

> _______________________________________________
> Openembedded-devel mailing list
> Openembedded-devel@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-devel
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04 18:26 ` Holger Freyther
  2010-11-04 19:14   ` Eric Bénard
  2010-11-04 20:27   ` Khem Raj
@ 2010-11-04 21:05   ` Koen Kooi
  2010-11-05 11:21   ` Phil Blundell
  3 siblings, 0 replies; 12+ messages in thread
From: Koen Kooi @ 2010-11-04 21:05 UTC (permalink / raw)
  To: openembedded-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04-11-10 19:26, Holger Freyther wrote:
> On 11/04/2010 10:52 AM, Eric Bénard wrote:
>> this is a RFC, I think it can also be used on qt 4.6.x and on armv7.
>> Setting QT_ARCH to armv6 enable some asm optimized functions in QT.
> 
> Do you know the consequences of this change? E.g. in the painting
> engine unaligned memory access will be allowed. I am not sure that Nokia has
> ever benchmarked what is faster (aligned/unaligned) access. So it would be
> very interesting to know the perf difference. E.g. use qttracereply to measure it.
> 
> Phil do you happen to know why unaligned access got allowed starting from
> armv6? How fast will it be?

- From previous experiments the "hardware" unaligned was the same speed as
the sw fixup on armv6.

regards,

Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFM0yAhMkyGM64RGpERAo+KAKCLzq/F9/dH/viPTMDWqpIRcv5k8QCgjoAG
BU6Ps9W+NUB0nSNOA/yT1hw=
=TT6S
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04 19:14   ` Eric Bénard
@ 2010-11-05  9:32     ` Holger Freyther
  2010-11-05  9:42       ` Eric Bénard
  0 siblings, 1 reply; 12+ messages in thread
From: Holger Freyther @ 2010-11-05  9:32 UTC (permalink / raw)
  To: openembedded-devel

On 11/04/2010 08:14 PM, Eric Bénard wrote:
> Hi,
ill it be?
>>
> in the painting engine, from what I saw in the .h, the asm optimisations are
> only enabled when using ARM RCVT compiler and not when using gcc but I have
> not checked everywhere in the code so I'm not sure of this.

that is something else, I also have a partially applied C implementation that
appears to be faster than the brute force armv6 code.

One benefit of using armv6 is that the atomic code will use a different
implementation (ldrex,strex IIRC instead of swp). In any case please use
something like qttracereply to see if passing armv6 is giving any benefit at all.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-05  9:32     ` Holger Freyther
@ 2010-11-05  9:42       ` Eric Bénard
  2010-11-05 10:21         ` Holger Freyther
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Bénard @ 2010-11-05  9:42 UTC (permalink / raw)
  To: openembedded-devel

Hi,

Le 05/11/2010 10:32, Holger Freyther a écrit :
> On 11/04/2010 08:14 PM, Eric Bénard wrote:
>> in the painting engine, from what I saw in the .h, the asm optimisations are
>> only enabled when using ARM RCVT compiler and not when using gcc but I have
>> not checked everywhere in the code so I'm not sure of this.
>
> that is something else, I also have a partially applied C implementation that
> appears to be faster than the brute force armv6 code.
>
OK where is the code you are talking about in qt ?

> One benefit of using armv6 is that the atomic code will use a different
> implementation (ldrex,strex IIRC instead of swp). In any case please use
> something like qttracereply to see if passing armv6 is giving any benefit at all.
>
I can, do you have a quick howto run this ?

One thing : what I see here is that qtdemo was crashing after a random time 
and now (with armv6) it runs stable for hours.

Eric



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-05  9:42       ` Eric Bénard
@ 2010-11-05 10:21         ` Holger Freyther
  2010-11-05 10:41           ` Eric Bénard
  2010-11-07 19:42           ` Eric Bénard
  0 siblings, 2 replies; 12+ messages in thread
From: Holger Freyther @ 2010-11-05 10:21 UTC (permalink / raw)
  To: openembedded-devel

On 11/05/2010 10:42 AM, Eric Bénard wrote:
> Hi,

>>
> OK where is the code you are talking about in qt ?

tool/qttracereply.

1.) record a trace (e.g. the qtdemo) you will need Qt/X11 for that (desktop,
device). But make sure that the qtdemo width/height fits on the screen.

$ qttdemo -graphicssystem trace (exit the app normally)

2.) Use qttracereply to replay the trace, it will print nice FPS data.




> 
> One thing : what I see here is that qtdemo was crashing after a random time
> and now (with armv6) it runs stable for hours.

well, not very scientific. :)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-05 10:21         ` Holger Freyther
@ 2010-11-05 10:41           ` Eric Bénard
  2010-11-07 19:42           ` Eric Bénard
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Bénard @ 2010-11-05 10:41 UTC (permalink / raw)
  To: openembedded-devel

Le 05/11/2010 11:21, Holger Freyther a écrit :
> On 11/05/2010 10:42 AM, Eric Bénard wrote:
>> Hi,
>
>>>
>> OK where is the code you are talking about in qt ?
>
> tool/qttracereply.
>
> 1.) record a trace (e.g. the qtdemo) you will need Qt/X11 for that (desktop,
> device). But make sure that the qtdemo width/height fits on the screen.
>
Does this option works for qtembedded (I don't have X11 on my boards) ?

> $ qttdemo -graphicssystem trace (exit the app normally)
>
> 2.) Use qttracereply to replay the trace, it will print nice FPS data.
>
>
>
>
>>
>> One thing : what I see here is that qtdemo was crashing after a random time
>> and now (with armv6) it runs stable for hours.
>
> well, not very scientific. :)
>
not at all ;)

Eric



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-04 18:26 ` Holger Freyther
                     ` (2 preceding siblings ...)
  2010-11-04 21:05   ` Koen Kooi
@ 2010-11-05 11:21   ` Phil Blundell
  3 siblings, 0 replies; 12+ messages in thread
From: Phil Blundell @ 2010-11-05 11:21 UTC (permalink / raw)
  To: Holger Freyther; +Cc: openembedded-devel

On Thu, 2010-11-04 at 19:26 +0100, Holger Freyther wrote:
> Phil do you happen to know why unaligned access got allowed starting from
> armv6? How fast will it be?

I don't know, but I would suspect that unaligned access was added to
ARMv6 in response to customer demand, and because a certain other
popular architecture supported it already.

Having the silicon be able to do unaligned accesses isn't a totally
worthless thing.  If you're faced with, say, a 32-bit int of unknown
alignment which you want to demarshal, there are two possible ways you
can do it in a pre-v6 world:

a) use four single-byte loads plus some ALU operations to combine the
results.  This is guaranteed safe for all alignments, but the LDRB+LDRB
+ORR+LDRB+ORR+LDRB+ORR sequence is inefficient compared to a single LDR:
it takes seven cycles to execute and requires a scratch register, even
if the operand is in fact naturally aligned at runtime.

b) use a single LDR instruction, let the MMU trap unaligned accesses,
and fix them up in software using sequence (a).  This has the advantage
that the aligned case runs at full speed, but the performance penalty
for loads which turn out to be unaligned is severe, probably at least
several tens of cycles and perhaps even more than that.  If many
unaligned operands are encountered then the performance might be worse
than just using (a) in the first place.

If the processor can do unaligned accesses natively then this problem
goes away: you can just use a single LDR instruction and it will always
execute at the maximum speed the hardware is capable of for that
particular operand.  For naturally aligned operands the load will
complete in a single cycle, just like it always did.  Since AHB doesn't
support unaligned access at the bus level, misaligned operands will need
two bus cycles but that's still much faster than either of the
software-based approaches.

p.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6
  2010-11-05 10:21         ` Holger Freyther
  2010-11-05 10:41           ` Eric Bénard
@ 2010-11-07 19:42           ` Eric Bénard
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Bénard @ 2010-11-07 19:42 UTC (permalink / raw)
  To: openembedded-devel

Hi,

Le 05/11/2010 11:21, Holger Freyther a écrit :
> 1.) record a trace (e.g. the qtdemo) you will need Qt/X11 for that (desktop,
> device). But make sure that the qtdemo width/height fits on the screen.
>
> $ qttdemo -graphicssystem trace (exit the app normally)
>
> 2.) Use qttracereply to replay the trace, it will print nice FPS data.
>
>
here are the results :
Distro : angstrom-2010.x
CPU : i.MX357
Kernel : Linux 2.6.36+patches

trace generated using example concentriccirles (from qt4.7 X11 32bits sdk), 
screen resolution : 320x240 :
concentriccircles  -geometry 320x240  -graphicssystem trace

bench :
mount -o remount,ro /
qttracereplay ./qtgraphics-106954754.trace -qws (run 5 times)

=> qt4.7 with armv6 arch :
# qttracereplay ./qtgraphics-106954754.trace -qws
Read paint buffer version 1 with 282 frames
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7403, 
median(ms): 7405, stddev: 0.012733 %, max(fps): 38.092665
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7375, 
median(ms): 7376, stddev: 0.011070 %, max(fps): 38.237288
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7370, 
median(ms): 7371, stddev: 0.016920 %, max(fps): 38.263229
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7388, 
median(ms): 7389, stddev: 0.011050 %, max(fps): 38.170005
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7372, 
median(ms): 7373, stddev: 0.016915 %, max(fps): 38.252849

=> qt4.7 with arm arch :
# qttracereplay ./qtgraphics-106954754.trace -qws
Read paint buffer version 1 with 282 frames
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7435, 
median(ms): 7436, stddev: 0.010980 %, max(fps): 37.928716
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7431, 
median(ms): 7434, stddev: 0.022866 %, max(fps): 37.949132
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7456, 
median(ms): 7457, stddev: 0.010949 %, max(fps): 37.821888
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7423, 
median(ms): 7424, stddev: 0.010998 %, max(fps): 37.990031
./qtgraphics-106954754.trace, iterations: 3, frames: 282, min(ms): 7417, 
median(ms): 7417, stddev: 0.050828 %, max(fps): 38.020763

Conclusion : no regression, nearly no improvement.

Eric



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-07 19:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-04  9:52 [PATCH/RFC] qt4-embedded: tune QT_ARCH for armv6 Eric Bénard
2010-11-04 18:05 ` Khem Raj
2010-11-04 18:26 ` Holger Freyther
2010-11-04 19:14   ` Eric Bénard
2010-11-05  9:32     ` Holger Freyther
2010-11-05  9:42       ` Eric Bénard
2010-11-05 10:21         ` Holger Freyther
2010-11-05 10:41           ` Eric Bénard
2010-11-07 19:42           ` Eric Bénard
2010-11-04 20:27   ` Khem Raj
2010-11-04 21:05   ` Koen Kooi
2010-11-05 11:21   ` Phil Blundell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.