* Re: Tune files and knobs to turn
2011-06-24 11:54 Tune files and knobs to turn Koen Kooi
@ 2011-06-24 14:01 ` Richard Purdie
2011-06-24 14:30 ` Mark Hatle
2011-06-24 14:12 ` Mark Hatle
2011-06-28 17:36 ` Darren Hart
2 siblings, 1 reply; 16+ messages in thread
From: Richard Purdie @ 2011-06-24 14:01 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer
On Fri, 2011-06-24 at 13:54 +0200, Koen Kooi wrote:
> We discussed tune files a bit during last nights TSC meeting and Khem
> had expressed the need before, so I'd like to get this discussion
> started by using armv7a as an example.
>
> For armv7a capable cores we have the following hardware features:
>
> * armv7a instruction set
> * thumb1 instruction set
> * thumb2 instruction set
> * VFP coprocessor
> * optional NEON coprocessor
>
> For the ABI we can choose the following:
>
> * softtp without hw support (e.g. no VFP instructions emitted, slow)
> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw, incompatible with everything else
>
> And the extra knobs:
>
> * pure thumb1, no arm instructions (limited use)
> * thumb1/arm interworking
> * pure thumb2, no arm instructions
> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>
> In OE .dev we have the following vars:
>
> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or 'armv7a-hardfp' as package arch
> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
> THUMB_INTERWORK: turns on interworking, no reflection in package arch
I suspect having a variable per tune feature is going to be a recipe for
disaster. Selecting the tune options is something a machine is likely
going to want a good default of but some distros are going to want to
take control over in some cases too.
Given the n^2 problem we have with scale it starts to look like we'd
need something like TUNE_FEATURES which I don't feel brilliant about but
it likely could work well.
I'd also like to look at this from the other direction. What information
do we need from the tune config? Its one thing to set the controls but
we also need to consider what uses the end result and how. Currently the
tune files should really be responsible for:
TARGET_ARCH
TARGET_FPU
TARGET_CC_ARCH
PACKAGE_ARCH
BASE_PACKAGE_ARCH
PACKAGE_EXTRA_ARCHS
FEED_ARCH
of which there is some duplication of data. Also, the distro config
typically sets some of this. What we really need is the following:
* Overall compiler architecture
* Extra libc config data (softfloat?)
* Compiler optimisation flags
* Name of the package "architecture" to use for the config
* List of compatible package "architectures"
These map to:
* TARGET_ARCH
* TARGET_FPU
* TARGET_CC_ARCH
* BASE_PACKAGE_ARCH
* PACKAGE_EXTRA_ARCHS
but we do have issues of these names needing to be overridden so putting
then in some TUNE* namespace variables initially would help clean up the
core significantly.
Adding multlilib into the mix doesn't actually make the problem that
much more complex if we followed the simple rule of making the tune
config name an override. This would make the tune files follow the form:
TARGET_CC_ARCH_tune-armv7a = "xxx"
BASE_PACKAGE_ARCH_tune-armv7a = "xxx"
and then we'd add tune-xxx to OVERRIDES to select a given tuning for a
given multlilib.
I'm continuing to give this some thought but those are the ideas off the
top of my head...
Cheers,
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-24 14:01 ` Richard Purdie
@ 2011-06-24 14:30 ` Mark Hatle
0 siblings, 0 replies; 16+ messages in thread
From: Mark Hatle @ 2011-06-24 14:30 UTC (permalink / raw)
To: openembedded-core
On 6/24/11 9:01 AM, Richard Purdie wrote:
> On Fri, 2011-06-24 at 13:54 +0200, Koen Kooi wrote:
>> We discussed tune files a bit during last nights TSC meeting and Khem
>> had expressed the need before, so I'd like to get this discussion
>> started by using armv7a as an example.
>>
>> For armv7a capable cores we have the following hardware features:
>>
>> * armv7a instruction set
>> * thumb1 instruction set
>> * thumb2 instruction set
>> * VFP coprocessor
>> * optional NEON coprocessor
>>
>> For the ABI we can choose the following:
>>
>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw, incompatible with everything else
>>
>> And the extra knobs:
>>
>> * pure thumb1, no arm instructions (limited use)
>> * thumb1/arm interworking
>> * pure thumb2, no arm instructions
>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>
>> In OE .dev we have the following vars:
>>
>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or 'armv7a-hardfp' as package arch
>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>
> I suspect having a variable per tune feature is going to be a recipe for
> disaster. Selecting the tune options is something a machine is likely
> going to want a good default of but some distros are going to want to
> take control over in some cases too.
We've broken the problem down to set of key variables [note these are used in
our build system, which is not derived from bitbake/OE.. so the names may clash
with current usage.. but I think it's a good example]
Arch file:
TARGET_TOOLCHAIN_ARCH -- arch family ["arm", "ia32", "power", "mips"]
ABI file:
TARGET_ARCH - minimal gnu canonical arch - arch component
TARGET_OS="linux-gnu" or "linux-eabi" [arm] [used in a gnu canonical arch]
TARGET_FUNDAMENTAL_ASFLAGS - minimum flags required by AS to assemble for this ABI
TARGET_FUNDAMENTAL_CFLAGS - minimum cflags required by CC to compile for this ABI
TARGET_FUNDAMENTAL_LDFLAGS - minimum ldflags required by LD to compile for this ABI
TARGET_LIB_DIR="lib64" - library directory for this ABI
TARGET_USERSPACE_BITS - bitsize for this ABI
TARGET_ENDIAN - endian for this ABI
CPU/ISA file & Tuning file: (we combine this in our implementation)
TARGET_ARCH -- more descriptive canonical arch for tunings + abi
TARGET_COMMON_ASFLAGS - optional assembly flags -- tunings
TARGET_COMMON_CFLAGS - optional cflags -- tunings
TARGET_COMMON_LDFLAGS - optional ldflags -- tunings
> Given the n^2 problem we have with scale it starts to look like we'd
> need something like TUNE_FEATURES which I don't feel brilliant about but
> it likely could work well.
>
> I'd also like to look at this from the other direction. What information
> do we need from the tune config? Its one thing to set the controls but
> we also need to consider what uses the end result and how. Currently the
> tune files should really be responsible for:
>
> TARGET_ARCH
> TARGET_FPU
> TARGET_CC_ARCH
> PACKAGE_ARCH
> BASE_PACKAGE_ARCH
> PACKAGE_EXTRA_ARCHS
> FEED_ARCH
>
> of which there is some duplication of data. Also, the distro config
> typically sets some of this. What we really need is the following:
>
> * Overall compiler architecture
> * Extra libc config data (softfloat?)
> * Compiler optimisation flags
> * Name of the package "architecture" to use for the config
> * List of compatible package "architectures"
>
> These map to:
>
> * TARGET_ARCH
> * TARGET_FPU
> * TARGET_CC_ARCH
> * BASE_PACKAGE_ARCH
> * PACKAGE_EXTRA_ARCHS
>
> but we do have issues of these names needing to be overridden so putting
> then in some TUNE* namespace variables initially would help clean up the
> core significantly.
I think it's key to namespace all of these things in a multilib design. That
way it's easy to switch between them as we process the individual files. We
have another variable (not listed above) called "VARIANT". The variant is what
is suffixed to each variable and works very much like the current overrides in OE.
> Adding multlilib into the mix doesn't actually make the problem that
> much more complex if we followed the simple rule of making the tune
> config name an override. This would make the tune files follow the form:
>
> TARGET_CC_ARCH_tune-armv7a = "xxx"
> BASE_PACKAGE_ARCH_tune-armv7a = "xxx"
>
> and then we'd add tune-xxx to OVERRIDES to select a given tuning for a
> given multlilib.
Agreed.. our name space is something equivalent to:
variable_arch-<arch>
variable_abi-<abi>
variable_tune-<tune>
The arch, abi and tune are selected via a series of included files.. and
references. So at each step of processing we resolve and inherit from the
previous step -- and then override it as necessary. So variable_abi-<abi>
automatically gets set to: variable_abi-<abi>=variable_arch-<arch>
Then we can override the value. Same happens when the tune files include the
arch. The end system only uses the values from the tune files. All systems
have tune files, even if they're just inherited from the abi and arch files.
This also allows us to construct a list of compatible package feeds. Since one
tune file can include another and we have a variable (not previously mentioned)
that collects together package architectures in order of inclusion.
> I'm continuing to give this some thought but those are the ideas off the
> top of my head...
>
> Cheers,
>
> Richard
>
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-24 11:54 Tune files and knobs to turn Koen Kooi
2011-06-24 14:01 ` Richard Purdie
@ 2011-06-24 14:12 ` Mark Hatle
2011-06-28 20:27 ` Khem Raj
2011-06-28 17:36 ` Darren Hart
2 siblings, 1 reply; 16+ messages in thread
From: Mark Hatle @ 2011-06-24 14:12 UTC (permalink / raw)
To: openembedded-core
A few, what I suspect are corrections. (I'm going to be a bit pedantic here.)
On 6/24/11 6:54 AM, Koen Kooi wrote:
> Hi,
>
> We discussed tune files a bit during last nights TSC meeting and Khem had expressed the need before, so I'd like to get this discussion started by using armv7a as an example.
>
> For armv7a capable cores we have the following hardware features:
>
> * armv7a instruction set
> * thumb1 instruction set
> * thumb2 instruction set
> * VFP coprocessor
> * optional NEON coprocessor
>
> For the ABI we can choose the following:
>
> * softtp without hw support (e.g. no VFP instructions emitted, slow)
> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw, incompatible with everything else
Actually on ARM there are three ABIs possible but those are not it.
OABI - softfp
OABI - hardfp
EABI
As far as I know nobody uses OABI anymore, and I doubt we should support it.
The above are really instructions used within the ABI.
> And the extra knobs:
>
> * pure thumb1, no arm instructions (limited use)
> * thumb1/arm interworking
> * pure thumb2, no arm instructions
> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>
> In OE .dev we have the following vars:
>
> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or 'armv7a-hardfp' as package arch
> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>
> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU for armv7a and will generate slow code, angstrom does turn it on)
>
> Khem and I would like to start building armv7a (and armv6) in pure thumb2 mode but we want to have the variables to turn those knobs make sense and be consistent. RP has expressed his desire to sort this all out before merging multilib. I'm sure x86/mips/ppc/etc have a similar need, so let's get this discussion started.
Below is a quick step to capture what all I know of for various architectures,
inluding ARM. Note the "tunings" are where I'm a bit sketchy on some
architectures. (I think below also point out why I think this needs to be
hierarchical.. so it's really easy to inherit common stuff, and override only
the pieces you need to.. specifically the tunings.)
Arch Family: ARM
ABI:
- EABI little endian
- EABI big endian
- canonical os=linux-eabi
- library directory is "lib"
CPU/ISA:
- all use EABI
- traditional arm instructions
- thumb1, no arm instructions
- thumb1/arm, interworking
- thumb2, no arm instructions
- thumb2, interworking (note, our customers do use this.. I'm not sure how
much though)
Tunings:
- armv4 (both big & little endian)
- armv5 (both big & little endian)
- armv5 + vfp
- armv6 + vfp
- armv7a + vfp + neon (supports thumb) (little endian)
- armv7a + be8 (supports thumb) (big endian)
---
Arch Family: IA32
ABI:
- x86_32
- There is a soft-fp variant but it's not really used anymore
- hardfp
- defined as an LSB standard
- library directory is "lib"
- x86_64
- hardfp only
- defined as an LSB standard
- library directory is "lib64"
- (x32)
- new experimental ABI
- library directory is "libx32"
CPU/ISA:
- x86_32
- all use x86_32
- i586 -> core2 -> corei7
- x86_64
- only 64-bit chips supported
- x32
- only 64-bit chips supported
Tunings:
- i586, i686, core2, etc..
- MMX, MMX2, AVX, etc..
---
Arch Family: MIPS
ABI:
- MIPS o32 - soft float - big endian
- MIPS o32 - hard float - big endian
- MIPS o32 - soft float - little endian
- MIPS o32 - hard float - little endian
- old 32-bit mips library
- library path is "lib"
- MIPS n32 - soft float - big endian
- MIPS n32 - hard float - big endian
- MIPS n32 - soft float - little endian
- MIPS n32 - hard float - little endian
- new 32-bit MIPS64 library
- library path is "lib32"
- MIPS n64 - soft float - big endian
- MIPS n64 - hard float - big endian
- MIPS n64 - soft float - little endian
- MIPS n64 - hard float - big endian
- new 64-bit MIPS64 library
- library path is "lib64"
CPU/ISA:
- MIPS32
- Various
- MIPS64
- Various
Tunings:
- MIPS32
- Various
- MIPS64
- Various
---
Arch Family: Power
ABI:
- ppc32 - hard float
- ABI defined by LSB
- library path is "lib"
- ppc32 - soft float
- ppc32 - e500v1 (soft-float variant -- likely not needed)
- ppc32 - e500v2 (soft-float variant)
- library path is "lib"
- ppc64 - hard float
- library path is "lib64"
- ABI defined by LSB
CPU/ISA:
- ppc32 - hard float
- ppc603e
- ppc32 - soft float
- ppc603ec
- ppc32 - e500v1 (soft-float variant)
- -mfloat-gprs=single -mspe=yes -mabi=spe
- ppc32 - e500v2 (soft-float variant)
- -mfloat-gprs=double -mspe=yes -mabi=spe
- ppc64 - hard float
- PowerPC 970
Tunings:
- ppc603e
- ppc750
- ppc7400
- e300c2
- ppc405
- ppc405fp for hard float
- ppc440
- ppc440fp for hard float
- ppc476
- ppc476fp
- e500mc
- 8540
- 8548
- -maltivec or -mno-altivec
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-24 14:12 ` Mark Hatle
@ 2011-06-28 20:27 ` Khem Raj
0 siblings, 0 replies; 16+ messages in thread
From: Khem Raj @ 2011-06-28 20:27 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer
On Fri, Jun 24, 2011 at 7:12 AM, Mark Hatle <mark.hatle@windriver.com> wrote:
> A few, what I suspect are corrections. (I'm going to be a bit pedantic here.)
>
> On 6/24/11 6:54 AM, Koen Kooi wrote:
>> Hi,
>>
>> We discussed tune files a bit during last nights TSC meeting and Khem had expressed the need before, so I'd like to get this discussion started by using armv7a as an example.
>>
>> For armv7a capable cores we have the following hardware features:
>>
>> * armv7a instruction set
>> * thumb1 instruction set
>> * thumb2 instruction set
>> * VFP coprocessor
>> * optional NEON coprocessor
>>
>> For the ABI we can choose the following:
>>
>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw, incompatible with everything else
>
> Actually on ARM there are three ABIs possible but those are not it.
>
> OABI - softfp
> OABI - hardfp
> EABI
OABI is probably something we should drop in this era.
So we are left with
a. EABI + soft float ( No hard fp at all)
b. EABI + soft vfp ( Use softfloat calling convention but utilize
hardfp instruction in code generation)
c. EABI + hardfp ( use hardfp for parameter passing and use hard fp
instructions as desrired)
now with hardfp there could be that we use neon or we use vfp only.
and in vfp there are two variants
vfp-d16 and vfp-d32 there are some implementations of armv7 where neon
is not part of SOC and some implement vfp-d16
and some vfp-d32. This greatly affects option b. and c. above
in above case. a and b are compatible provided hardware supports neon/vfp
but c. is not compatible and toolchain obviously needs to be
configured for this as well
unless we want this sort of things in multilib too.
using neon also depends on machine features.
So we need combination of machine features e.g. does it have neon does
it implement vfpv3-d16 or d32 or no FPU at all
then we need hardfloat ABI which is a distro feature I would say and
is only enabled when underlying hardware support
is found through machine features. So we need a switch to select if
hardfp is desired.
Thumb1 and thumb2 can be covered with passing right options globally
to CC iow TARGET_CC_ARCH for most parts
>
> As far as I know nobody uses OABI anymore, and I doubt we should support it.
>
> The above are really instructions used within the ABI.
>
>> And the extra knobs:
>>
>> * pure thumb1, no arm instructions (limited use)
>> * thumb1/arm interworking
>> * pure thumb2, no arm instructions
>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>
>> In OE .dev we have the following vars:
>>
>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or 'armv7a-hardfp' as package arch
>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>
>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU for armv7a and will generate slow code, angstrom does turn it on)
>>
>> Khem and I would like to start building armv7a (and armv6) in pure thumb2 mode but we want to have the variables to turn those knobs make sense and be consistent. RP has expressed his desire to sort this all out before merging multilib. I'm sure x86/mips/ppc/etc have a similar need, so let's get this discussion started.
>
> Below is a quick step to capture what all I know of for various architectures,
> inluding ARM. Note the "tunings" are where I'm a bit sketchy on some
> architectures. (I think below also point out why I think this needs to be
> hierarchical.. so it's really easy to inherit common stuff, and override only
> the pieces you need to.. specifically the tunings.)
>
> Arch Family: ARM
>
> ABI:
> - EABI little endian
> - EABI big endian
> - canonical os=linux-eabi
> - library directory is "lib"
>
> CPU/ISA:
> - all use EABI
> - traditional arm instructions
> - thumb1, no arm instructions
> - thumb1/arm, interworking
> - thumb2, no arm instructions
> - thumb2, interworking (note, our customers do use this.. I'm not sure how
> much though)
>
> Tunings:
> - armv4 (both big & little endian)
> - armv5 (both big & little endian)
> - armv5 + vfp
> - armv6 + vfp
> - armv7a + vfp + neon (supports thumb) (little endian)
> - armv7a + be8 (supports thumb) (big endian)
>
> ---
>
> Arch Family: IA32
>
> ABI:
> - x86_32
> - There is a soft-fp variant but it's not really used anymore
> - hardfp
> - defined as an LSB standard
> - library directory is "lib"
> - x86_64
> - hardfp only
> - defined as an LSB standard
> - library directory is "lib64"
> - (x32)
> - new experimental ABI
> - library directory is "libx32"
>
> CPU/ISA:
> - x86_32
> - all use x86_32
> - i586 -> core2 -> corei7
> - x86_64
> - only 64-bit chips supported
> - x32
> - only 64-bit chips supported
>
> Tunings:
> - i586, i686, core2, etc..
> - MMX, MMX2, AVX, etc..
>
> ---
>
> Arch Family: MIPS
>
> ABI:
> - MIPS o32 - soft float - big endian
> - MIPS o32 - hard float - big endian
> - MIPS o32 - soft float - little endian
> - MIPS o32 - hard float - little endian
> - old 32-bit mips library
> - library path is "lib"
>
> - MIPS n32 - soft float - big endian
> - MIPS n32 - hard float - big endian
> - MIPS n32 - soft float - little endian
> - MIPS n32 - hard float - little endian
> - new 32-bit MIPS64 library
> - library path is "lib32"
>
> - MIPS n64 - soft float - big endian
> - MIPS n64 - hard float - big endian
> - MIPS n64 - soft float - little endian
> - MIPS n64 - hard float - big endian
> - new 64-bit MIPS64 library
> - library path is "lib64"
>
> CPU/ISA:
> - MIPS32
> - Various
> - MIPS64
> - Various
>
> Tunings:
> - MIPS32
> - Various
> - MIPS64
> - Various
>
> ---
>
> Arch Family: Power
>
> ABI:
> - ppc32 - hard float
> - ABI defined by LSB
> - library path is "lib"
> - ppc32 - soft float
> - ppc32 - e500v1 (soft-float variant -- likely not needed)
> - ppc32 - e500v2 (soft-float variant)
> - library path is "lib"
> - ppc64 - hard float
> - library path is "lib64"
> - ABI defined by LSB
>
> CPU/ISA:
> - ppc32 - hard float
> - ppc603e
> - ppc32 - soft float
> - ppc603ec
> - ppc32 - e500v1 (soft-float variant)
> - -mfloat-gprs=single -mspe=yes -mabi=spe
> - ppc32 - e500v2 (soft-float variant)
> - -mfloat-gprs=double -mspe=yes -mabi=spe
> - ppc64 - hard float
> - PowerPC 970
>
> Tunings:
> - ppc603e
> - ppc750
> - ppc7400
> - e300c2
> - ppc405
> - ppc405fp for hard float
> - ppc440
> - ppc440fp for hard float
> - ppc476
> - ppc476fp
> - e500mc
> - 8540
> - 8548
> - -maltivec or -mno-altivec
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-24 11:54 Tune files and knobs to turn Koen Kooi
2011-06-24 14:01 ` Richard Purdie
2011-06-24 14:12 ` Mark Hatle
@ 2011-06-28 17:36 ` Darren Hart
2011-06-28 17:38 ` Koen Kooi
` (2 more replies)
2 siblings, 3 replies; 16+ messages in thread
From: Darren Hart @ 2011-06-28 17:36 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer; +Cc: Koen Kooi
On 06/24/2011 04:54 AM, Koen Kooi wrote:
> Hi,
>
> We discussed tune files a bit during last nights TSC meeting and Khem had
> expressed the need before, so I'd like to get this discussion started by using
> armv7a as an example.
>
> For armv7a capable cores we have the following hardware features:
>
> * armv7a instruction set
> * thumb1 instruction set
> * thumb2 instruction set
> * VFP coprocessor
> * optional NEON coprocessor
>
> For the ABI we can choose the following:
>
> * softtp without hw support (e.g. no VFP instructions emitted, slow)
> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
> incompatible with everything else
>
> And the extra knobs:
>
> * pure thumb1, no arm instructions (limited use)
> * thumb1/arm interworking
> * pure thumb2, no arm instructions
> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>
> In OE .dev we have the following vars:
>
> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
> 'armv7a-hardfp' as package arch
> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>
> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
> for armv7a and will generate slow code, angstrom does turn it on)
oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
meta-texasinstruments) and does make use of the neon coprocessor, but
still uses the softfp float-api:
TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
-mfloat-abi=softfp -fno-tree-vectorize"
Seems like the oe-core tune files need to be synced up with vendor layers?
--
Darren
>
> Khem and I would like to start building armv7a (and armv6) in pure thumb2 mode
> but we want to have the variables to turn those knobs make sense and be
> consistent. RP has expressed his desire to sort this all out before merging
> multilib. I'm sure x86/mips/ppc/etc have a similar need, so let's get this
> discussion started.
>
> regards,
>
> Koen
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Tune files and knobs to turn
2011-06-28 17:36 ` Darren Hart
@ 2011-06-28 17:38 ` Koen Kooi
2011-06-28 19:13 ` Darren Hart
2011-06-28 20:31 ` Khem Raj
2011-06-30 16:02 ` Tom Rini
2 siblings, 1 reply; 16+ messages in thread
From: Koen Kooi @ 2011-06-28 17:38 UTC (permalink / raw)
To: Darren Hart; +Cc: Patches and discussions about the oe-core layer
Op 28 jun 2011, om 19:36 heeft Darren Hart het volgende geschreven:
>
>
> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>> Hi,
>>
>> We discussed tune files a bit during last nights TSC meeting and Khem had
>> expressed the need before, so I'd like to get this discussion started by using
>> armv7a as an example.
>>
>> For armv7a capable cores we have the following hardware features:
>>
>> * armv7a instruction set
>> * thumb1 instruction set
>> * thumb2 instruction set
>> * VFP coprocessor
>> * optional NEON coprocessor
>>
>> For the ABI we can choose the following:
>>
>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>> incompatible with everything else
>>
>> And the extra knobs:
>>
>> * pure thumb1, no arm instructions (limited use)
>> * thumb1/arm interworking
>> * pure thumb2, no arm instructions
>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>
>> In OE .dev we have the following vars:
>>
>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>> 'armv7a-hardfp' as package arch
>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>
>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>> for armv7a and will generate slow code, angstrom does turn it on)
>
>
> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
> meta-texasinstruments) and does make use of the neon coprocessor, but
> still uses the softfp float-api:
>
> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
> -mfloat-abi=softfp -fno-tree-vectorize"
Don't confuse softfp calling conventions with softfloat! The above will still emit vfp and neon instructions if your set TARGET_FPU = hard
regards,
Koen
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 17:38 ` Koen Kooi
@ 2011-06-28 19:13 ` Darren Hart
2011-06-28 19:18 ` Koen Kooi
2011-06-28 20:33 ` Khem Raj
0 siblings, 2 replies; 16+ messages in thread
From: Darren Hart @ 2011-06-28 19:13 UTC (permalink / raw)
To: Koen Kooi; +Cc: Patches and discussions about the oe-core layer
On 06/28/2011 10:38 AM, Koen Kooi wrote:
>
> Op 28 jun 2011, om 19:36 heeft Darren Hart het volgende geschreven:
>
>>
>>
>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>> Hi,
>>>
>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>> expressed the need before, so I'd like to get this discussion started by using
>>> armv7a as an example.
>>>
>>> For armv7a capable cores we have the following hardware features:
>>>
>>> * armv7a instruction set
>>> * thumb1 instruction set
>>> * thumb2 instruction set
>>> * VFP coprocessor
>>> * optional NEON coprocessor
>>>
>>> For the ABI we can choose the following:
>>>
>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>> incompatible with everything else
>>>
>>> And the extra knobs:
>>>
>>> * pure thumb1, no arm instructions (limited use)
>>> * thumb1/arm interworking
>>> * pure thumb2, no arm instructions
>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>>
>>> In OE .dev we have the following vars:
>>>
>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>> 'armv7a-hardfp' as package arch
>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>>
>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>> for armv7a and will generate slow code, angstrom does turn it on)
>>
>>
>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>> meta-texasinstruments) and does make use of the neon coprocessor, but
>> still uses the softfp float-api:
>>
>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>> -mfloat-abi=softfp -fno-tree-vectorize"
>
> Don't confuse softfp calling conventions with softfloat! The above will still emit
> vfp and neon instructions if your set TARGET_FPU = hard
Ah. So we would need to add something like:
conf/distro/include/angstrom.inc:TARGET_FPU_armv7a ?= "hard"
conf/distro/include/angstrom.inc:TARGET_FPU_armv7a-vfp ?= "hard"
to something DISTRO specific (poky.conf or similar).
It isn't clear to me why this is a distro policy decision instead of
part of the tune include or the machine config itself.
Can someone elaborate on why this goes where it does?
Thanks,
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 19:13 ` Darren Hart
@ 2011-06-28 19:18 ` Koen Kooi
2011-06-28 20:33 ` Khem Raj
1 sibling, 0 replies; 16+ messages in thread
From: Koen Kooi @ 2011-06-28 19:18 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer
Op 28 jun 2011, om 21:13 heeft Darren Hart het volgende geschreven:
>
>
> On 06/28/2011 10:38 AM, Koen Kooi wrote:
>>
>> Op 28 jun 2011, om 19:36 heeft Darren Hart het volgende geschreven:
>>
>>>
>>>
>>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>>> Hi,
>>>>
>>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>>> expressed the need before, so I'd like to get this discussion started by using
>>>> armv7a as an example.
>>>>
>>>> For armv7a capable cores we have the following hardware features:
>>>>
>>>> * armv7a instruction set
>>>> * thumb1 instruction set
>>>> * thumb2 instruction set
>>>> * VFP coprocessor
>>>> * optional NEON coprocessor
>>>>
>>>> For the ABI we can choose the following:
>>>>
>>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>>> incompatible with everything else
>>>>
>>>> And the extra knobs:
>>>>
>>>> * pure thumb1, no arm instructions (limited use)
>>>> * thumb1/arm interworking
>>>> * pure thumb2, no arm instructions
>>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>>>
>>>> In OE .dev we have the following vars:
>>>>
>>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>>> 'armv7a-hardfp' as package arch
>>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>>>
>>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>>> for armv7a and will generate slow code, angstrom does turn it on)
>>>
>>>
>>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>>> meta-texasinstruments) and does make use of the neon coprocessor, but
>>> still uses the softfp float-api:
>>>
>>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>>> -mfloat-abi=softfp -fno-tree-vectorize"
>>
>> Don't confuse softfp calling conventions with softfloat! The above will still emit
>> vfp and neon instructions if your set TARGET_FPU = hard
>
> Ah. So we would need to add something like:
>
> conf/distro/include/angstrom.inc:TARGET_FPU_armv7a ?= "hard"
> conf/distro/include/angstrom.inc:TARGET_FPU_armv7a-vfp ?= "hard"
>
> to something DISTRO specific (poky.conf or similar).
That won't work because FEED_ARCH/BASE_PACKAGE_ARCH isn't in overrides for oe-core/poky
> It isn't clear to me why this is a distro policy decision instead of
> part of the tune include or the machine config itself.
>
> Can someone elaborate on why this goes where it does?
It isn't inherently hardware specific and might need to link to evil sourceless binaries from $evil_vendor and you want to be able to have your DISTRO control that knob. There are other usecases as well, but the evil vendor one is the most common.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 19:13 ` Darren Hart
2011-06-28 19:18 ` Koen Kooi
@ 2011-06-28 20:33 ` Khem Raj
1 sibling, 0 replies; 16+ messages in thread
From: Khem Raj @ 2011-06-28 20:33 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer; +Cc: Koen Kooi
On Tue, Jun 28, 2011 at 12:13 PM, Darren Hart <dvhart@linux.intel.com> wrote:
>
>
> On 06/28/2011 10:38 AM, Koen Kooi wrote:
>>
>> Op 28 jun 2011, om 19:36 heeft Darren Hart het volgende geschreven:
>>
>>>
>>>
>>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>>> Hi,
>>>>
>>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>>> expressed the need before, so I'd like to get this discussion started by using
>>>> armv7a as an example.
>>>>
>>>> For armv7a capable cores we have the following hardware features:
>>>>
>>>> * armv7a instruction set
>>>> * thumb1 instruction set
>>>> * thumb2 instruction set
>>>> * VFP coprocessor
>>>> * optional NEON coprocessor
>>>>
>>>> For the ABI we can choose the following:
>>>>
>>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>>> incompatible with everything else
>>>>
>>>> And the extra knobs:
>>>>
>>>> * pure thumb1, no arm instructions (limited use)
>>>> * thumb1/arm interworking
>>>> * pure thumb2, no arm instructions
>>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>>>
>>>> In OE .dev we have the following vars:
>>>>
>>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>>> 'armv7a-hardfp' as package arch
>>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>>>
>>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>>> for armv7a and will generate slow code, angstrom does turn it on)
>>>
>>>
>>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>>> meta-texasinstruments) and does make use of the neon coprocessor, but
>>> still uses the softfp float-api:
>>>
>>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>>> -mfloat-abi=softfp -fno-tree-vectorize"
>>
>> Don't confuse softfp calling conventions with softfloat! The above will still emit
>> vfp and neon instructions if your set TARGET_FPU = hard
>
> Ah. So we would need to add something like:
>
> conf/distro/include/angstrom.inc:TARGET_FPU_armv7a ?= "hard"
> conf/distro/include/angstrom.inc:TARGET_FPU_armv7a-vfp ?= "hard"
>
> to something DISTRO specific (poky.conf or similar).
>
> It isn't clear to me why this is a distro policy decision instead of
> part of the tune include or the machine config itself.
>
> Can someone elaborate on why this goes where it does?
>
adding to what Koen said. Its a different incompatible ABI
hardfp binaries will not work on softfp root file system
Since its a ABI changer its best left to distros to choose
what ABI they would prefer but in oe-core we need to make
provisions for that. Which we dont have at the moment.
> Thanks,
>
> --
> Darren Hart
> Intel Open Source Technology Center
> Yocto Project - Linux Kernel
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 17:36 ` Darren Hart
2011-06-28 17:38 ` Koen Kooi
@ 2011-06-28 20:31 ` Khem Raj
2011-06-28 20:33 ` Koen Kooi
2011-06-30 16:02 ` Tom Rini
2 siblings, 1 reply; 16+ messages in thread
From: Khem Raj @ 2011-06-28 20:31 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer; +Cc: Koen Kooi
On Tue, Jun 28, 2011 at 10:36 AM, Darren Hart <dvhart@linux.intel.com> wrote:
>
>
> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>> Hi,
>>
>> We discussed tune files a bit during last nights TSC meeting and Khem had
>> expressed the need before, so I'd like to get this discussion started by using
>> armv7a as an example.
>>
>> For armv7a capable cores we have the following hardware features:
>>
>> * armv7a instruction set
>> * thumb1 instruction set
>> * thumb2 instruction set
>> * VFP coprocessor
>> * optional NEON coprocessor
>>
>> For the ABI we can choose the following:
>>
>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>> incompatible with everything else
>>
>> And the extra knobs:
>>
>> * pure thumb1, no arm instructions (limited use)
>> * thumb1/arm interworking
>> * pure thumb2, no arm instructions
>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>
>> In OE .dev we have the following vars:
>>
>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>> 'armv7a-hardfp' as package arch
>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>
>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>> for armv7a and will generate slow code, angstrom does turn it on)
>
>
> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
> meta-texasinstruments) and does make use of the neon coprocessor, but
> still uses the softfp float-api:
>
> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
> -mfloat-abi=softfp -fno-tree-vectorize"
>
> Seems like the oe-core tune files need to be synced up with vendor layers?
>
Well for enabling hardfp its a fundamental decision and I guess using softfloat
in oe-core is probably best choice and the floating point parameter passing ABI
I am taking about we still use -mfpu=neon so gcc will still try to utilize it
but -fno-tree-vectorize is going to subdue the use of neon intrs since gcc
is disallowed to vectorize
> --
> Darren
>
>
>>
>> Khem and I would like to start building armv7a (and armv6) in pure thumb2 mode
>> but we want to have the variables to turn those knobs make sense and be
>> consistent. RP has expressed his desire to sort this all out before merging
>> multilib. I'm sure x86/mips/ppc/etc have a similar need, so let's get this
>> discussion started.
>>
>> regards,
>>
>> Koen
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
>
> --
> Darren Hart
> Intel Open Source Technology Center
> Yocto Project - Linux Kernel
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 20:31 ` Khem Raj
@ 2011-06-28 20:33 ` Koen Kooi
2011-06-28 20:37 ` Khem Raj
0 siblings, 1 reply; 16+ messages in thread
From: Koen Kooi @ 2011-06-28 20:33 UTC (permalink / raw)
To: Khem Raj; +Cc: Patches and discussions about the oe-core layer
Op 28 jun 2011, om 22:31 heeft Khem Raj het volgende geschreven:
> On Tue, Jun 28, 2011 at 10:36 AM, Darren Hart <dvhart@linux.intel.com> wrote:
>>
>>
>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>> Hi,
>>>
>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>> expressed the need before, so I'd like to get this discussion started by using
>>> armv7a as an example.
>>>
>>> For armv7a capable cores we have the following hardware features:
>>>
>>> * armv7a instruction set
>>> * thumb1 instruction set
>>> * thumb2 instruction set
>>> * VFP coprocessor
>>> * optional NEON coprocessor
>>>
>>> For the ABI we can choose the following:
>>>
>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>> incompatible with everything else
>>>
>>> And the extra knobs:
>>>
>>> * pure thumb1, no arm instructions (limited use)
>>> * thumb1/arm interworking
>>> * pure thumb2, no arm instructions
>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>>
>>> In OE .dev we have the following vars:
>>>
>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>> 'armv7a-hardfp' as package arch
>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>>
>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>> for armv7a and will generate slow code, angstrom does turn it on)
>>
>>
>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>> meta-texasinstruments) and does make use of the neon coprocessor, but
>> still uses the softfp float-api:
>>
>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>> -mfloat-abi=softfp -fno-tree-vectorize"
>>
>> Seems like the oe-core tune files need to be synced up with vendor layers?
>>
>
> Well for enabling hardfp its a fundamental decision and I guess using softfloat
> in oe-core is probably best choice and the floating point parameter passing ABI
> I am taking about we still use -mfpu=neon so gcc will still try to utilize it
> but -fno-tree-vectorize is going to subdue the use of neon intrs since gcc
> is disallowed to vectorize
Experience has shown that -fno-tree-vectorize generates faster code with gcc 4.5 :)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 20:33 ` Koen Kooi
@ 2011-06-28 20:37 ` Khem Raj
0 siblings, 0 replies; 16+ messages in thread
From: Khem Raj @ 2011-06-28 20:37 UTC (permalink / raw)
To: Koen Kooi; +Cc: Patches and discussions about the oe-core layer
On Tue, Jun 28, 2011 at 1:33 PM, Koen Kooi <koen@dominion.thruhere.net> wrote:
>
> Op 28 jun 2011, om 22:31 heeft Khem Raj het volgende geschreven:
>
>> On Tue, Jun 28, 2011 at 10:36 AM, Darren Hart <dvhart@linux.intel.com> wrote:
>>>
>>>
>>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>>> Hi,
>>>>
>>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>>> expressed the need before, so I'd like to get this discussion started by using
>>>> armv7a as an example.
>>>>
>>>> For armv7a capable cores we have the following hardware features:
>>>>
>>>> * armv7a instruction set
>>>> * thumb1 instruction set
>>>> * thumb2 instruction set
>>>> * VFP coprocessor
>>>> * optional NEON coprocessor
>>>>
>>>> For the ABI we can choose the following:
>>>>
>>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>>> incompatible with everything else
>>>>
>>>> And the extra knobs:
>>>>
>>>> * pure thumb1, no arm instructions (limited use)
>>>> * thumb1/arm interworking
>>>> * pure thumb2, no arm instructions
>>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>>>
>>>> In OE .dev we have the following vars:
>>>>
>>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>>> 'armv7a-hardfp' as package arch
>>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>>>
>>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>>> for armv7a and will generate slow code, angstrom does turn it on)
>>>
>>>
>>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>>> meta-texasinstruments) and does make use of the neon coprocessor, but
>>> still uses the softfp float-api:
>>>
>>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>>> -mfloat-abi=softfp -fno-tree-vectorize"
>>>
>>> Seems like the oe-core tune files need to be synced up with vendor layers?
>>>
>>
>> Well for enabling hardfp its a fundamental decision and I guess using softfloat
>> in oe-core is probably best choice and the floating point parameter passing ABI
>> I am taking about we still use -mfpu=neon so gcc will still try to utilize it
>> but -fno-tree-vectorize is going to subdue the use of neon intrs since gcc
>> is disallowed to vectorize
>
> Experience has shown that -fno-tree-vectorize generates faster code with gcc 4.5 :)
Someday I will try to benchmark and find out whats going on for myself.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-28 17:36 ` Darren Hart
2011-06-28 17:38 ` Koen Kooi
2011-06-28 20:31 ` Khem Raj
@ 2011-06-30 16:02 ` Tom Rini
2011-06-30 17:41 ` Koen Kooi
2 siblings, 1 reply; 16+ messages in thread
From: Tom Rini @ 2011-06-30 16:02 UTC (permalink / raw)
To: openembedded-core
On 06/28/2011 10:36 AM, Darren Hart wrote:
>
>
> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>> Hi,
>>
>> We discussed tune files a bit during last nights TSC meeting and Khem had
>> expressed the need before, so I'd like to get this discussion started by using
>> armv7a as an example.
>>
>> For armv7a capable cores we have the following hardware features:
>>
>> * armv7a instruction set
>> * thumb1 instruction set
>> * thumb2 instruction set
>> * VFP coprocessor
>> * optional NEON coprocessor
>>
>> For the ABI we can choose the following:
>>
>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>> incompatible with everything else
>>
>> And the extra knobs:
>>
>> * pure thumb1, no arm instructions (limited use)
>> * thumb1/arm interworking
>> * pure thumb2, no arm instructions
>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>
>> In OE .dev we have the following vars:
>>
>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>> 'armv7a-hardfp' as package arch
>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>
>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>> for armv7a and will generate slow code, angstrom does turn it on)
>
>
> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
> meta-texasinstruments) and does make use of the neon coprocessor, but
> still uses the softfp float-api:
>
> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
> -mfloat-abi=softfp -fno-tree-vectorize"
What's with the -fno-tree-vectorize? I had someone point out to me that
the TI wiki recommends turning that on, even outside of -O3 (which
enables it by default).
--
Tom Rini
Mentor Graphics Corporation
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Tune files and knobs to turn
2011-06-30 16:02 ` Tom Rini
@ 2011-06-30 17:41 ` Koen Kooi
2011-06-30 17:58 ` Phil Blundell
0 siblings, 1 reply; 16+ messages in thread
From: Koen Kooi @ 2011-06-30 17:41 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer
Op 30 jun 2011, om 18:02 heeft Tom Rini het volgende geschreven:
> On 06/28/2011 10:36 AM, Darren Hart wrote:
>>
>>
>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>> Hi,
>>>
>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>> expressed the need before, so I'd like to get this discussion started by using
>>> armv7a as an example.
>>>
>>> For armv7a capable cores we have the following hardware features:
>>>
>>> * armv7a instruction set
>>> * thumb1 instruction set
>>> * thumb2 instruction set
>>> * VFP coprocessor
>>> * optional NEON coprocessor
>>>
>>> For the ABI we can choose the following:
>>>
>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>> incompatible with everything else
>>>
>>> And the extra knobs:
>>>
>>> * pure thumb1, no arm instructions (limited use)
>>> * thumb1/arm interworking
>>> * pure thumb2, no arm instructions
>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>>
>>> In OE .dev we have the following vars:
>>>
>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>> 'armv7a-hardfp' as package arch
>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>>
>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>> for armv7a and will generate slow code, angstrom does turn it on)
>>
>>
>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>> meta-texasinstruments) and does make use of the neon coprocessor, but
>> still uses the softfp float-api:
>>
>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>> -mfloat-abi=softfp -fno-tree-vectorize"
>
> What's with the -fno-tree-vectorize? I had someone point out to me that
> the TI wiki recommends turning that on, even outside of -O3 (which
> enables it by default).
Real world experience with gcc 4.3 and 4.5 has shown that gcc is shockingly bad at vectorizing for NEON, so you need this unbreak-me option to avoid slowdowns. It might be less bad on cortex-a9 or a15, but for A8 not vectorizing is a net win.
^ permalink raw reply [flat|nested] 16+ messages in thread