* Re: [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding
2009-03-16 19:59 [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding Siarhei Siamashka
@ 2009-03-14 6:17 ` Marcel Holtmann
2009-03-23 6:51 ` Siarhei Siamashka
0 siblings, 1 reply; 5+ messages in thread
From: Marcel Holtmann @ 2009-03-14 6:17 UTC (permalink / raw)
To: Siarhei Siamashka; +Cc: linux-bluetooth@vger.kernel.org
Hi Siarhei,
> On the last weekend I tried to get familiar with powerpc altivec assembly and
> added some optimization for sbc encoder. Experimental patch is attached. It
> handles 4 subbands case only, so is not that much useful in practice. There
> are no problems supporting 8 subbands too, but I was just running out of
> time. The patch merges processing of 4 blocks into the single block of code.
> It's something that is also in my todo list for ARM NEON. But while this merge
> is mostly "nice to have" optimization for ARM, it is much more important for
> PowerPC because of a huge multiply-accumulate latency.
>
> And bluez a2dp seems to work fine on ppc64 linux (playstation3).
>
> In order to activate altivec code, -maltivec option needs to be added to
> gcc compilation flags.
>
> Benchmark result:
>
> time ./sbcenc -s4 somefile.au > /dev/null
>
> before:
> real 0m13.999s
> user 0m13.468s
> sys 0m0.523s
>
> after:
> real 0m5.714s
> user 0m5.199s
> sys 0m0.519s
>
> 3.2GHz CPU in playstation3 uses roughly 1.5% of cpu resources on sbc encoding
> without any optimizations. cpu usage is down to something like 0.6% after this
> optimization is applied.
please redo the patch and include a proper commit message. For example
the details from the email would be perfect for a commit message. It
doesn't need to be that verbose, but a little bit more would be nice.
Regards
Marcel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding
@ 2009-03-16 19:59 Siarhei Siamashka
2009-03-14 6:17 ` Marcel Holtmann
0 siblings, 1 reply; 5+ messages in thread
From: Siarhei Siamashka @ 2009-03-16 19:59 UTC (permalink / raw)
To: linux-bluetooth@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]
Hello,
On the last weekend I tried to get familiar with powerpc altivec assembly and
added some optimization for sbc encoder. Experimental patch is attached. It
handles 4 subbands case only, so is not that much useful in practice. There
are no problems supporting 8 subbands too, but I was just running out of
time. The patch merges processing of 4 blocks into the single block of code.
It's something that is also in my todo list for ARM NEON. But while this merge
is mostly "nice to have" optimization for ARM, it is much more important for
PowerPC because of a huge multiply-accumulate latency.
And bluez a2dp seems to work fine on ppc64 linux (playstation3).
In order to activate altivec code, -maltivec option needs to be added to
gcc compilation flags.
Benchmark result:
time ./sbcenc -s4 somefile.au > /dev/null
before:
real 0m13.999s
user 0m13.468s
sys 0m0.523s
after:
real 0m5.714s
user 0m5.199s
sys 0m0.519s
3.2GHz CPU in playstation3 uses roughly 1.5% of cpu resources on sbc encoding
without any optimizations. cpu usage is down to something like 0.6% after this
optimization is applied.
--
Best regards,
Siarhei Siamashka
[-- Attachment #2: 0004-sbc-powerpc-altivec-optimizations-for-4-subbands-en.patch --]
[-- Type: text/x-diff, Size: 10954 bytes --]
From a995acc428e2c02306ca69efa85d7f6e15529245 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@nokia.com>
Date: Mon, 16 Mar 2009 03:38:52 +0200
Subject: [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding
---
sbc/Makefile.am | 3 +-
sbc/sbc_primitives.c | 6 +
sbc/sbc_primitives_altivec.c | 207 ++++++++++++++++++++++++++++++++++++++++++
sbc/sbc_primitives_altivec.h | 40 ++++++++
4 files changed, 255 insertions(+), 1 deletions(-)
create mode 100644 sbc/sbc_primitives_altivec.c
create mode 100644 sbc/sbc_primitives_altivec.h
diff --git a/sbc/Makefile.am b/sbc/Makefile.am
index f870164..75cc29b 100644
--- a/sbc/Makefile.am
+++ b/sbc/Makefile.am
@@ -10,7 +10,8 @@ noinst_LTLIBRARIES = libsbc.la
libsbc_la_SOURCES = sbc.h sbc.c sbc_math.h sbc_tables.h \
sbc_primitives.h sbc_primitives_mmx.h sbc_primitives_neon.h \
- sbc_primitives.c sbc_primitives_mmx.c sbc_primitives_neon.c
+ sbc_primitives.c sbc_primitives_mmx.c sbc_primitives_neon.c \
+ sbc_primitives_altivec.h sbc_primitives_altivec.c
libsbc_la_CFLAGS = -finline-functions -fgcse-after-reload \
-funswitch-loops -funroll-loops
diff --git a/sbc/sbc_primitives.c b/sbc/sbc_primitives.c
index 2105280..209e2c3 100644
--- a/sbc/sbc_primitives.c
+++ b/sbc/sbc_primitives.c
@@ -33,6 +33,7 @@
#include "sbc_primitives.h"
#include "sbc_primitives_mmx.h"
#include "sbc_primitives_neon.h"
+#include "sbc_primitives_altivec.h"
/*
* A reference C code of analysis filter with SIMD-friendly tables
@@ -467,4 +468,9 @@ void sbc_init_primitives(struct sbc_encoder_state *state)
#ifdef SBC_BUILD_WITH_NEON_SUPPORT
sbc_init_primitives_neon(state);
#endif
+
+ /* PPC Altivec optimizations */
+#ifdef SBC_BUILD_WITH_ALTIVEC_SUPPORT
+ sbc_init_primitives_altivec(state);
+#endif
}
diff --git a/sbc/sbc_primitives_altivec.c b/sbc/sbc_primitives_altivec.c
new file mode 100644
index 0000000..537cd8a
--- /dev/null
+++ b/sbc/sbc_primitives_altivec.c
@@ -0,0 +1,207 @@
+/*
+ *
+ * Bluetooth low-complexity, subband codec (SBC) library
+ *
+ * Copyright (C) 2004-2009 Marcel Holtmann <marcel@holtmann.org>
+ * Copyright (C) 2004-2005 Henryk Ploetz <henryk@ploetzli.ch>
+ * Copyright (C) 2005-2006 Brad Midgley <bmidgley@xmission.com>
+ *
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include <stdint.h>
+#include <limits.h>
+#include "sbc.h"
+#include "sbc_math.h"
+#include "sbc_tables.h"
+
+#include "sbc_primitives_altivec.h"
+
+#include <stdio.h>
+
+/*
+ * PPC Altivec optimizations
+ */
+
+#ifdef SBC_BUILD_WITH_ALTIVEC_SUPPORT
+
+/* Because of strict 16-byte alignment requirements for altivec, we need
+ * to add some zero padding to the beginning and end of the first part
+ * of the odd case coefficients table.
+ */
+static const FIXED_T SBC_ALIGNED analysis_consts_fixed4_altivec_odd[48 + 16] = {
+#define C0 1.3056875580
+#define C1 1.6772280856
+#define C2 1.0932568993
+#define C3 1.3056875580
+
+#define F(x) F_PROTO4(x)
+ 0, 0,
+ 0, 0,
+ F(2.73370904E-03 * C0), F(5.36548976E-04 * C0),
+ -F(1.49188357E-03 * C1), F(0.00000000E+00 * C1),
+ F(3.83720193E-03 * C2), F(1.09137620E-02 * C2),
+ F(3.89205149E-03 * C3), F(3.06012286E-03 * C3),
+ F(3.21939290E-02 * C0), F(2.04385087E-02 * C0),
+ -F(2.88757392E-02 * C1), F(0.00000000E+00 * C1),
+ F(2.58767811E-02 * C2), F(1.35593274E-01 * C2),
+ F(6.13245186E-03 * C3), F(7.76463494E-02 * C3),
+ F(2.81828203E-01 * C0), F(1.94987841E-01 * C0),
+ -F(2.46636662E-01 * C1), F(0.00000000E+00 * C1),
+ F(2.94315332E-01 * C2), -F(1.35593274E-01 * C2),
+ F(2.81828203E-01 * C3), -F(1.94987841E-01 * C3),
+ F(6.13245186E-03 * C0), -F(7.76463494E-02 * C0),
+ F(2.88217274E-02 * C1), F(0.00000000E+00 * C1),
+ F(2.58767811E-02 * C2), -F(1.09137620E-02 * C2),
+ F(3.21939290E-02 * C3), -F(2.04385087E-02 * C3),
+ F(3.89205149E-03 * C0), -F(3.06012286E-03 * C0),
+ -F(1.86581691E-03 * C1), F(0.00000000E+00 * C1),
+ F(3.83720193E-03 * C2), F(0.00000000E+00 * C2),
+ F(2.73370904E-03 * C3), -F(5.36548976E-04 * C3),
+ 0, 0,
+ 0, 0,
+#undef F
+#define F(x) F_COS4(x)
+ /* swap halves */
+ F(0.7071067812 / C2), F(0.3826834324 / C3),
+ -F(0.7071067812 / C2), -F(0.9238795325 / C3),
+ -F(0.7071067812 / C2), F(0.9238795325 / C3),
+ F(0.7071067812 / C2), -F(0.3826834324 / C3),
+ F(0.9238795325 / C0), -F(1.0000000000 / C1),
+ F(0.3826834324 / C0), -F(1.0000000000 / C1),
+ -F(0.3826834324 / C0), -F(1.0000000000 / C1),
+ -F(0.9238795325 / C0), -F(1.0000000000 / C1),
+#undef F
+
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+};
+
+static void sbc_analyze_4b_4s_altivec(int16_t *x, int32_t *out, int out_stride)
+{
+ static const SBC_ALIGNED int32_t round_c[4] = {
+ 1 << (SBC_PROTO_FIXED4_SCALE - 1),
+ 1 << (SBC_PROTO_FIXED4_SCALE - 1),
+ 1 << (SBC_PROTO_FIXED4_SCALE - 1),
+ 1 << (SBC_PROTO_FIXED4_SCALE - 1),
+ };
+ static const SBC_ALIGNED int8_t perm_c1[16] = {
+ 0, 1, 4, 5, 0, 1, 4, 5, 0, 1, 4, 5, 0, 1, 4, 5,
+ };
+ static const SBC_ALIGNED int8_t perm_c2[16] = {
+ 8, 9, 12, 13, 8, 9, 12, 13, 8, 9, 12, 13, 8, 9, 12, 13,
+ };
+ const int16_t *const_e = analysis_consts_fixed4_simd_even;
+ const int16_t *const_o = analysis_consts_fixed4_altivec_odd;
+ asm volatile (
+ "lvx %%v17, 0, %[round_c]\n"
+
+ "lvx %%v1, 0, %[in]\n"
+ "addi %[in], %[in], 16\n"
+
+ "lvx %%v2, 0, %[consts_e]\n"
+ "addi %[consts_e], %[consts_e], 16\n"
+ "lvx %%v12, 0, %[consts_o]\n"
+ "addi %[consts_o], %[consts_o], 16\n"
+ "vmsumshm %%v0, %%v1, %%v2, %%v17\n"
+ "vmsumshm %%v10, %%v1, %%v12, %%v17\n"
+ "lvx %%v1, 0, %[in]\n"
+ "addi %[in], %[in], 16\n"
+ "vmsumshm %%v14, %%v1, %%v2, %%v17\n"
+ "vmsumshm %%v17, %%v1, %%v12, %%v17\n"
+
+ ".rept 4\n"
+ "lvx %%v2, 0, %[consts_e]\n"
+ "addi %[consts_e], %[consts_e], 16\n"
+ "lvx %%v12, 0, %[consts_o]\n"
+ "addi %[consts_o], %[consts_o], 16\n"
+ "vmsumshm %%v0, %%v1, %%v2, %%v0\n"
+ "vmsumshm %%v10, %%v1, %%v12, %%v10\n"
+ "lvx %%v1, 0, %[in]\n"
+ "addi %[in], %[in], 16\n"
+ "vmsumshm %%v14, %%v1, %%v2, %%v14\n"
+ "vmsumshm %%v17, %%v1, %%v12, %%v17\n"
+ ".endr\n"
+
+ "lvx %%v12, 0, %[consts_o]\n"
+ "addi %[consts_o], %[consts_o], 16\n"
+ "lvx %%v3, 0, %[in]\n"
+ "vmsumshm %%v10, %%v1, %%v12, %%v10\n"
+ "vmsumshm %%v17, %%v3, %%v12, %%v17\n"
+
+ "lvx %%v18, 0, %[perm_c1]\n"
+ "lvx %%v19, 0, %[perm_c2]\n"
+ "vperm %%v1, %%v0, %%v0, %%v18\n"
+ "vperm %%v2, %%v0, %%v0, %%v19\n"
+ "vperm %%v15, %%v14, %%v14, %%v18\n"
+ "vperm %%v16, %%v14, %%v14, %%v19\n"
+ "vperm %%v11, %%v10, %%v10, %%v18\n"
+ "vperm %%v12, %%v10, %%v10, %%v19\n"
+ "vperm %%v18, %%v17, %%v17, %%v18\n"
+ "vperm %%v19, %%v17, %%v17, %%v19\n"
+
+ "vspltisw %%v0, 0\n"
+
+ "lvx %%v13, 0, %[consts_o]\n"
+ "addi %[consts_o], %[consts_o], 16\n"
+ "lvx %%v3, 0, %[consts_e]\n"
+ "addi %[consts_e], %[consts_e], 16\n"
+ "vmsumshm %%v17, %%v13, %%v18, %%v0\n"
+ "vmsumshm %%v14, %%v3, %%v15, %%v0\n"
+ "vmsumshm %%v10, %%v13, %%v11, %%v0\n"
+ "vmsumshm %%v0, %%v3, %%v1, %%v0\n"
+
+ "lvx %%v13, %%v0, %[consts_o]\n"
+ "lvx %%v3, %%v0, %[consts_e]\n"
+ "vmsumshm %%v17, %%v13, %%v19, %%v17\n"
+ "vmsumshm %%v14, %%v3, %%v16, %%v14\n"
+ "vmsumshm %%v10, %%v13, %%v12, %%v10\n"
+ "vmsumshm %%v0, %%v3, %%v2, %%v0\n"
+
+ "add %[consts_e], %[out], %[out_stride]\n"
+ "add %[consts_e], %[consts_e], %[out_stride]\n"
+ "stvx %%v17, 0, %[out]\n"
+ "stvx %%v14, %[out_stride], %[out]\n"
+ "stvx %%v10, 0, %[consts_e]\n"
+ "stvx %%v0, %[out_stride], %[consts_e]\n"
+
+ :
+ [in] "+b" (x),
+ [consts_e] "+b" (const_e),
+ [consts_o] "+b" (const_o),
+ [out] "+b" (out)
+ :
+ [round_c] "b" (round_c),
+ [out_stride] "b" (out_stride * 4),
+ [perm_c1] "b" (perm_c1),
+ [perm_c2] "b" (perm_c2)
+ :
+ "memory", "v0", "v1", "v2", "v3", "v10", "v11", "v12",
+ "v13", "v14", "v15", "v16", "v17", "v18", "v19");
+}
+
+void sbc_init_primitives_altivec(struct sbc_encoder_state *state)
+{
+ if (SBC_PROTO_FIXED4_SCALE == 16 && SBC_PROTO_FIXED8_SCALE == 16) {
+ state->sbc_analyze_4b_4s = sbc_analyze_4b_4s_altivec;
+ state->implementation_info = "Altivec";
+ }
+}
+
+#endif
diff --git a/sbc/sbc_primitives_altivec.h b/sbc/sbc_primitives_altivec.h
new file mode 100644
index 0000000..8b87c8e
--- /dev/null
+++ b/sbc/sbc_primitives_altivec.h
@@ -0,0 +1,40 @@
+/*
+ *
+ * Bluetooth low-complexity, subband codec (SBC) library
+ *
+ * Copyright (C) 2004-2009 Marcel Holtmann <marcel@holtmann.org>
+ * Copyright (C) 2004-2005 Henryk Ploetz <henryk@ploetzli.ch>
+ * Copyright (C) 2005-2006 Brad Midgley <bmidgley@xmission.com>
+ *
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef __SBC_PRIMITIVES_ALTIVEC_H
+#define __SBC_PRIMITIVES_ALTIVEC_H
+
+#include "sbc_primitives.h"
+
+#if defined(__GNUC__) && defined(__ALTIVEC__) && \
+ !defined(SBC_HIGH_PRECISION) && (SCALE_OUT_BITS == 15)
+
+#define SBC_BUILD_WITH_ALTIVEC_SUPPORT
+
+void sbc_init_primitives_altivec(struct sbc_encoder_state *encoder_state);
+
+#endif
+
+#endif
--
1.5.6.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding
2009-03-14 6:17 ` Marcel Holtmann
@ 2009-03-23 6:51 ` Siarhei Siamashka
2009-03-23 7:20 ` Siarhei Siamashka
2009-03-25 18:15 ` Marcel Holtmann
0 siblings, 2 replies; 5+ messages in thread
From: Siarhei Siamashka @ 2009-03-23 6:51 UTC (permalink / raw)
To: ext Marcel Holtmann; +Cc: linux-bluetooth@vger.kernel.org
On Saturday 14 March 2009 08:17:13 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > On the last weekend I tried to get familiar with powerpc altivec assembly
> > and added some optimization for sbc encoder. Experimental patch is
> > attached. It handles 4 subbands case only, so is not that much useful in
> > practice. There are no problems supporting 8 subbands too, but I was just
> > running out of time. The patch merges processing of 4 blocks into the
> > single block of code. It's something that is also in my todo list for ARM
> > NEON. But while this merge is mostly "nice to have" optimization for ARM,
> > it is much more important for PowerPC because of a huge
> > multiply-accumulate latency.
> >
> > And bluez a2dp seems to work fine on ppc64 linux (playstation3).
> >
> > In order to activate altivec code, -maltivec option needs to be added to
> > gcc compilation flags.
> >
> > Benchmark result:
> >
> > time ./sbcenc -s4 somefile.au > /dev/null
> >
> > before:
> > real 0m13.999s
> > user 0m13.468s
> > sys 0m0.523s
> >
> > after:
> > real 0m5.714s
> > user 0m5.199s
> > sys 0m0.519s
> >
> > 3.2GHz CPU in playstation3 uses roughly 1.5% of cpu resources on sbc
> > encoding without any optimizations. cpu usage is down to something like
> > 0.6% after this optimization is applied.
>
> please redo the patch and include a proper commit message. For example
> the details from the email would be perfect for a commit message. It
> doesn't need to be that verbose, but a little bit more would be nice.
That patch was more like a preview targeted at the people interested in
powerpc optimizations (by the way, are there any low end or embedded
powerpc systems which could benefit the most from these in practice?).
For me it was more like a test if the code works correctly on more exotic
platform like big endian 64-bit system :) And also an exercise in powerpc
assembly and a check if the bluez sbc code can be easily accommodated
to different SIMD architectures.
For it to be ready to be appied, the following still needs to be done in my
opinion:
1. Add '/proc/self/auxv' based altivec instructions support detection at
runtime, this should work for all linux systems.
way the same binary will be usable on which are conservative about the debian
2. Add 8 subbands support, this is what is actually used for A2DP most of the
time
Additionally, I wonder about the copy of the table with coefficients. For
powerpc, some zero padding needs to be added. For ARM NEON, the
second part of the coefficients table can be reordered to make better use
of "vertical" simd instructions that it supports. For ARMv6, the second part
of the table can be also tweaked to exploit the fact that some coefficients
are the same and reduce the number of operations (it only can do 2
multiplicate&accumulate operations at once, so the straight "brute force"
which works fine for the other SIMD extensions is not the fastest here).
As an alternative to having copy-pasted and slightly modified tables in the
sources, reordering of coefficients can be done at runtime (and this
reordering code would also make it easier to see what kind of transformation
was applied).
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding
2009-03-23 6:51 ` Siarhei Siamashka
@ 2009-03-23 7:20 ` Siarhei Siamashka
2009-03-25 18:15 ` Marcel Holtmann
1 sibling, 0 replies; 5+ messages in thread
From: Siarhei Siamashka @ 2009-03-23 7:20 UTC (permalink / raw)
To: ext Marcel Holtmann; +Cc: linux-bluetooth@vger.kernel.org
On Monday 23 March 2009 08:51:53 Siamashka Siarhei (Nokia-D/Helsinki) wrote:
[...]
> For it to be ready to be appied, the following still needs to be done in my
> opinion:
> 1. Add '/proc/self/auxv' based altivec instructions support detection at
> runtime, this should work for all linux systems.
> way the same binary will be usable on which are conservative about the
> debian
Sorry for this unfinished gibberish typing that slipped in.
Translating into a "human" language: most distributions like debian use the
lowest common denominator target cpu options for binary packages, runtime
altivec detection will make sbc altivec optimizations work fine there without
any extra hassle :)
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding
2009-03-23 6:51 ` Siarhei Siamashka
2009-03-23 7:20 ` Siarhei Siamashka
@ 2009-03-25 18:15 ` Marcel Holtmann
1 sibling, 0 replies; 5+ messages in thread
From: Marcel Holtmann @ 2009-03-25 18:15 UTC (permalink / raw)
To: Siarhei Siamashka; +Cc: linux-bluetooth@vger.kernel.org
Hi Siarhei,
> > > On the last weekend I tried to get familiar with powerpc altivec assembly
> > > and added some optimization for sbc encoder. Experimental patch is
> > > attached. It handles 4 subbands case only, so is not that much useful in
> > > practice. There are no problems supporting 8 subbands too, but I was just
> > > running out of time. The patch merges processing of 4 blocks into the
> > > single block of code. It's something that is also in my todo list for ARM
> > > NEON. But while this merge is mostly "nice to have" optimization for ARM,
> > > it is much more important for PowerPC because of a huge
> > > multiply-accumulate latency.
> > >
> > > And bluez a2dp seems to work fine on ppc64 linux (playstation3).
> > >
> > > In order to activate altivec code, -maltivec option needs to be added to
> > > gcc compilation flags.
> > >
> > > Benchmark result:
> > >
> > > time ./sbcenc -s4 somefile.au > /dev/null
> > >
> > > before:
> > > real 0m13.999s
> > > user 0m13.468s
> > > sys 0m0.523s
> > >
> > > after:
> > > real 0m5.714s
> > > user 0m5.199s
> > > sys 0m0.519s
> > >
> > > 3.2GHz CPU in playstation3 uses roughly 1.5% of cpu resources on sbc
> > > encoding without any optimizations. cpu usage is down to something like
> > > 0.6% after this optimization is applied.
> >
> > please redo the patch and include a proper commit message. For example
> > the details from the email would be perfect for a commit message. It
> > doesn't need to be that verbose, but a little bit more would be nice.
>
> That patch was more like a preview targeted at the people interested in
> powerpc optimizations (by the way, are there any low end or embedded
> powerpc systems which could benefit the most from these in practice?).
> For me it was more like a test if the code works correctly on more exotic
> platform like big endian 64-bit system :) And also an exercise in powerpc
> assembly and a check if the bluez sbc code can be easily accommodated
> to different SIMD architectures.
yes PowerPC is important for embedded devices. At some point there where
even talks to use PowerPC for OLPC machines.
> For it to be ready to be appied, the following still needs to be done in my
> opinion:
> 1. Add '/proc/self/auxv' based altivec instructions support detection at
> runtime, this should work for all linux systems.
> way the same binary will be usable on which are conservative about the debian
> 2. Add 8 subbands support, this is what is actually used for A2DP most of the
> time
I think that having runtime would be nice. However it is not the most
important part. The only thing to keep in mind is that we do the runtime
check only once on program load and not always we are initiating a new
SBC encoder.
> Additionally, I wonder about the copy of the table with coefficients. For
> powerpc, some zero padding needs to be added. For ARM NEON, the
> second part of the coefficients table can be reordered to make better use
> of "vertical" simd instructions that it supports. For ARMv6, the second part
> of the table can be also tweaked to exploit the fact that some coefficients
> are the same and reduce the number of operations (it only can do 2
> multiplicate&accumulate operations at once, so the straight "brute force"
> which works fine for the other SIMD extensions is not the fastest here).
> As an alternative to having copy-pasted and slightly modified tables in the
> sources, reordering of coefficients can be done at runtime (and this
> reordering code would also make it easier to see what kind of transformation
> was applied).
You are the expert here. I leave this up to you.
Regards
Marcel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-03-25 18:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-16 19:59 [PATCH] sbc: powerpc altivec optimizations for 4 subbands encoding Siarhei Siamashka
2009-03-14 6:17 ` Marcel Holtmann
2009-03-23 6:51 ` Siarhei Siamashka
2009-03-23 7:20 ` Siarhei Siamashka
2009-03-25 18:15 ` Marcel Holtmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox