Hello all,

The attached patch contains optimization for scale factors calculation which
provides additional SBC encoder speedup.

For non-gcc compilers, CLZ function is implemented with a very simple and
slow straightforward code (but it is still faster than current git code even
if used instead of __builtin_clz). Something better could be done like: 
http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=en
But I'm not sure about license/copyright of the code at this link and decided
not to touch it. Anyway, I don't think that gcc implementation of
__builtin_clz for the CPU cores which do not support CLZ instruction is any
worse.

Joint stereo processing also involves recalculation of scale factors, which
can use a similar optimization or even exactly the same function.
I intentionally did not benchmark encoding with joint stereo yet as it would
spoil the nice numbers :) That's something to improve next.

Benchmark results (sbcenc with default settings):

====

ARM Cortex-A8:

before:
real    1m 4.84s
user    1m 1.05s
sys     0m 3.78s

after:
real    0m 58.93s
user    0m 55.15s
sys     0m 3.78s

Intel Core2:

before:
real    0m7.729s
user    0m7.268s
sys     0m0.376s

after:
real    0m6.473s
user    0m6.116s
sys     0m0.292s

====

Overall, CPU usage in SBC encoder looks more or less like this (oprofile log
from ARM Cortex-A8):

samples  %        image name               symbol name
2173     30.6791  sbcenc.neon_new          sbc_encode
1774     25.0459  sbcenc.neon_new          sbc_analyze_4b_8s_neon
1525     21.5304  sbcenc.neon_new          sbc_calculate_bits
916      12.9324  sbcenc.neon_new          sbc_calc_scalefactors
600       8.4710  sbcenc.neon_new          sbc_enc_process_input_8s_be
75        1.0589  libc-2.5.so              memcpy
13        0.1835  sbcenc.neon_new          main
4         0.0565  libc-2.5.so              write
2         0.0282  sbcenc.neon_new          .plt
1         0.0141  ld-2.5.so                _dl_relocate_object

 
Best regards,
Siarhei Siamashka