* [PATCH 0/4] Speedup vivi
@ 2012-11-02 13:10 Kirill Smelkov
2012-11-02 13:10 ` [PATCH 1/4] [media] vivi: Optimize gen_text() Kirill Smelkov
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Kirill Smelkov @ 2012-11-02 13:10 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Kirill Smelkov, Hans Verkuil, linux-media
Hello up there. I was trying to use vivi to generate multiple video streams for
my test-lab environment on atom system and noticed it wastes a lot of cpu.
Please apply some optimization patches.
Thanks,
Kirill
Kirill Smelkov (4):
[media] vivi: Optimize gen_text()
[media] vivi: vivi_dev->line[] was not aligned
[media] vivi: Move computations out of vivi_fillbuf linecopy loop
[media] vivi: Optimize precalculate_line()
drivers/media/platform/vivi.c | 94 ++++++++++++++++++++++++++++++-------------
1 file changed, 65 insertions(+), 29 deletions(-)
--
1.8.0.316.g291341c
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4] [media] vivi: Optimize gen_text()
2012-11-02 13:10 [PATCH 0/4] Speedup vivi Kirill Smelkov
@ 2012-11-02 13:10 ` Kirill Smelkov
2012-11-02 13:10 ` [PATCH 2/4] [media] vivi: vivi_dev->line[] was not aligned Kirill Smelkov
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Smelkov @ 2012-11-02 13:10 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Kirill Smelkov, Hans Verkuil, linux-media
I've noticed that vivi takes a lot of CPU to produce its frames.
For example for 8 devices and 8 simple programs running, where each
captures YUY2 640x480 and displays it to X via SDL, profile timing is as
follows:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
# Samples: 82K of event 'cycles'
# Event count (approx.): 31551930117
#
# Overhead Command Shared Object Symbol
# ........ ............... ....................
#
49.48% vivi-* [vivi] [k] gen_twopix
10.79% vivi-* [kernel.kallsyms] [k] memcpy
10.02% rawv libc-2.13.so [.] __memcpy_ssse3
8.35% vivi-* [vivi] [k] gen_text.constprop.6
5.06% Xorg [unknown] [.] 0xa73015f8
2.32% rawv [vivi] [k] gen_twopix
1.22% rawv [vivi] [k] precalculate_line
1.20% vivi-* [vivi] [k] vivi_fillbuff
(rawv is display program, vivi-* is a combination of vivi-000 through vivi-007)
so a lot of time is spent in gen_twopix() which as the follwing
call-graph profile shows ...
49.48% vivi-* [vivi] [k] gen_twopix
|
--- gen_twopix
|
|--96.30%-- gen_text.constprop.6
| vivi_fillbuff
| vivi_thread
| kthread
| ret_from_kernel_thread
|
--3.70%-- vivi_fillbuff
vivi_thread
kthread
ret_from_kernel_thread
... is called mostly from gen_text().
If we'll look at gen_text(), in the inner loop, we'll see
if (chr & (1 << (7 - i)))
gen_twopix(dev, pos + j * dev->pixelsize, WHITE, (x+y) & 1);
else
gen_twopix(dev, pos + j * dev->pixelsize, TEXT_BLACK, (x+y) & 1);
which calls gen_twopix() for every character pixel, and that is very
expensive, because gen_twopix() branches several times.
Now, let's note, that we operate on only two colors - WHITE and
TEXT_BLACK, and that pixel for that colors could be precomputed and
gen_twopix() moved out of the inner loop. Also note, that for black
and white colors even/odd does not make a difference for all supported
pixel formats, so we could stop doing that `odd` gen_twopix() parameter
game.
So the first thing we are doing here is
1) moving gen_twopix() calls out of gen_text() into vivi_fillbuff(),
to pregenerate black and white colors, just before printing
starts.
what we have next is that gen_text's font rendering loop, even with
gen_twopix() calls moved out, was inefficient and branchy, so let's
2) rewrite gen_text() loop so it uses less variables + unroll char
horizontal-rendering loop + instantiate 3 code paths for pixelsizes 2,3
and 4 so that in all inner loops we don't have to branch or make
indirections (*).
Done all above reworks, for gen_text() we get nice, non-branchy
streamlined code (showing loop for pixelsize=2):
│ cmp $0x2,%eax
│ ↑ jne 26
│ mov -0x18(%ebp),%eax
│ mov -0x20(%ebp),%edi
│ imul -0x20(%ebp),%eax
│ movzwl 0x3ffc(%ebx),%esi
0,08 │ movzwl 0x4000(%ebx),%ecx
0,04 │ add %edi,%edi
│ mov 0x0,%ebx
0,51 │ mov %edi,-0x1c(%ebp)
│ mov %ebx,-0x14(%ebp)
│ movl $0x0,-0x10(%ebp)
│ lea 0x20(%edx,%eax,2),%eax
│ mov %eax,-0x18(%ebp)
│ xchg %ax,%ax
0,04 │ a0: mov 0x8(%ebp),%ebx
│ mov -0x18(%ebp),%eax
0,04 │ movzbl (%ebx),%edx
0,16 │ test %dl,%dl
0,04 │ ↓ je 128
0,08 │ lea 0x0(%esi),%esi
1,61 │ b0:┌─→shl $0x4,%edx
1,02 │ │ mov -0x14(%ebp),%edi
2,04 │ │ add -0x10(%ebp),%edx
2,24 │ │ lea 0x1(%ebx),%ebx
0,27 │ │ movzbl (%edi,%edx,1),%edx
9,92 │ │ mov %esi,%edi
0,39 │ │ test %dl,%dl
2,04 │ │ cmovns %ecx,%edi
4,63 │ │ test $0x40,%dl
0,55 │ │ mov %di,(%eax)
3,76 │ │ mov %esi,%edi
0,71 │ │ cmove %ecx,%edi
3,41 │ │ test $0x20,%dl
0,75 │ │ mov %di,0x2(%eax)
2,43 │ │ mov %esi,%edi
0,59 │ │ cmove %ecx,%edi
4,59 │ │ test $0x10,%dl
0,67 │ │ mov %di,0x4(%eax)
2,55 │ │ mov %esi,%edi
0,78 │ │ cmove %ecx,%edi
4,31 │ │ test $0x8,%dl
0,67 │ │ mov %di,0x6(%eax)
5,76 │ │ mov %esi,%edi
1,80 │ │ cmove %ecx,%edi
4,20 │ │ test $0x4,%dl
0,86 │ │ mov %di,0x8(%eax)
2,98 │ │ mov %esi,%edi
1,37 │ │ cmove %ecx,%edi
4,67 │ │ test $0x2,%dl
0,20 │ │ mov %di,0xa(%eax)
2,78 │ │ mov %esi,%edi
0,75 │ │ cmove %ecx,%edi
3,92 │ │ and $0x1,%edx
0,75 │ │ mov %esi,%edx
2,59 │ │ mov %di,0xc(%eax)
0,59 │ │ cmove %ecx,%edx
3,10 │ │ mov %dx,0xe(%eax)
2,39 │ │ add $0x10,%eax
0,51 │ │ movzbl (%ebx),%edx
2,86 │ │ test %dl,%dl
2,31 │ └──jne b0
0,04 │128: addl $0x1,-0x10(%ebp)
4,00 │ mov -0x1c(%ebp),%eax
0,04 │ add %eax,-0x18(%ebp)
0,08 │ cmpl $0x10,-0x10(%ebp)
│ ↑ jne a0
which almost goes away from the profile:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
# Samples: 49K of event 'cycles'
# Event count (approx.): 16799780016
#
# Overhead Command Shared Object Symbol
# ........ ............... ....................
#
27.51% rawv libc-2.13.so [.] __memcpy_ssse3
23.77% vivi-* [kernel.kallsyms] [k] memcpy
9.96% Xorg [unknown] [.] 0xa76f5e12
4.94% vivi-* [vivi] [k] gen_text.constprop.6
4.44% rawv [vivi] [k] gen_twopix
3.17% vivi-* [vivi] [k] vivi_fillbuff
2.45% rawv [vivi] [k] precalculate_line
1.20% swapper [kernel.kallsyms] [k] read_hpet
i.e. gen_twopix() overhead dropped from 49% to 4% and gen_text() loops
from ~8% to ~4%, and overal cycles count dropped from 31551930117 to
16799780016 which is ~1.9x whole workload speedup.
(*) for RGB24 rendering I've introduced x24, which could be thought as
synthetic u24 for simplifying the code. That's done because for
memcpy used for conditional assignment, gcc generates suboptimal code
with more indirections.
Fortunately, in C struct assignment is builtin and that's all we
need from pixeltype for font rendering.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
---
drivers/media/platform/vivi.c | 61 ++++++++++++++++++++++++++++++-------------
1 file changed, 43 insertions(+), 18 deletions(-)
diff --git a/drivers/media/platform/vivi.c b/drivers/media/platform/vivi.c
index bfac13d..cb2337e 100644
--- a/drivers/media/platform/vivi.c
+++ b/drivers/media/platform/vivi.c
@@ -245,6 +245,7 @@ struct vivi_dev {
u8 line[MAX_WIDTH * 8];
unsigned int pixelsize;
u8 alpha_component;
+ u32 textfg, textbg;
};
/* ------------------------------------------------------------------
@@ -525,33 +526,54 @@ static void precalculate_line(struct vivi_dev *dev)
}
}
+/* need this to do rgb24 rendering */
+typedef struct { u16 __; u8 _; } __attribute__((packed)) x24;
+
static void gen_text(struct vivi_dev *dev, char *basep,
int y, int x, char *text)
{
int line;
+ unsigned int width = dev->width;
/* Checks if it is possible to show string */
- if (y + 16 >= dev->height || x + strlen(text) * 8 >= dev->width)
+ if (y + 16 >= dev->height || x + strlen(text) * 8 >= width)
return;
/* Print stream time */
- for (line = y; line < y + 16; line++) {
- int j = 0;
- char *pos = basep + line * dev->width * dev->pixelsize + x * dev->pixelsize;
- char *s;
-
- for (s = text; *s; s++) {
- u8 chr = font8x16[*s * 16 + line - y];
- int i;
-
- for (i = 0; i < 7; i++, j++) {
- /* Draw white font on black background */
- if (chr & (1 << (7 - i)))
- gen_twopix(dev, pos + j * dev->pixelsize, WHITE, (x+y) & 1);
- else
- gen_twopix(dev, pos + j * dev->pixelsize, TEXT_BLACK, (x+y) & 1);
- }
- }
+#define PRINTSTR(PIXTYPE) do { \
+ PIXTYPE fg; \
+ PIXTYPE bg; \
+ memcpy(&fg, &dev->textfg, sizeof(PIXTYPE)); \
+ memcpy(&bg, &dev->textbg, sizeof(PIXTYPE)); \
+ \
+ for (line = 0; line < 16; line++) { \
+ PIXTYPE *pos = (PIXTYPE *)( basep + ((y + line) * width + x) * sizeof(PIXTYPE) ); \
+ u8 *s; \
+ \
+ for (s = text; *s; s++) { \
+ u8 chr = font8x16[*s * 16 + line]; \
+ \
+ pos[0] = (chr & (0x01 << 7) ? fg : bg); \
+ pos[1] = (chr & (0x01 << 6) ? fg : bg); \
+ pos[2] = (chr & (0x01 << 5) ? fg : bg); \
+ pos[3] = (chr & (0x01 << 4) ? fg : bg); \
+ pos[4] = (chr & (0x01 << 3) ? fg : bg); \
+ pos[5] = (chr & (0x01 << 2) ? fg : bg); \
+ pos[6] = (chr & (0x01 << 1) ? fg : bg); \
+ pos[7] = (chr & (0x01 << 0) ? fg : bg); \
+ \
+ pos += 8; \
+ } \
+ } \
+} while (0)
+
+ switch (dev->pixelsize) {
+ case 2:
+ PRINTSTR(u16); break;
+ case 4:
+ PRINTSTR(u32); break;
+ case 3:
+ PRINTSTR(x24); break;
}
}
@@ -576,6 +598,9 @@ static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf)
/* Updates stream time */
+ gen_twopix(dev, (u8 *)&dev->textbg, TEXT_BLACK, /*odd=*/ 0);
+ gen_twopix(dev, (u8 *)&dev->textfg, WHITE, /*odd=*/ 0);
+
dev->ms += jiffies_to_msecs(jiffies - dev->jiffies);
dev->jiffies = jiffies;
ms = dev->ms;
--
1.8.0.316.g291341c
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] [media] vivi: vivi_dev->line[] was not aligned
2012-11-02 13:10 [PATCH 0/4] Speedup vivi Kirill Smelkov
2012-11-02 13:10 ` [PATCH 1/4] [media] vivi: Optimize gen_text() Kirill Smelkov
@ 2012-11-02 13:10 ` Kirill Smelkov
2012-11-02 13:10 ` [PATCH 3/4] [media] vivi: Move computations out of vivi_fillbuf linecopy loop Kirill Smelkov
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Smelkov @ 2012-11-02 13:10 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Kirill Smelkov, Hans Verkuil, linux-media
Though dev->line[] is u8 array we work with it as with u16, u24 or u32
pixels, and also pass it to memcpy() and it's better to align it to at
least 4.
Before the patch, on x86 offsetof(vivi_dev, line) was 1003 and after
patch it is 1004.
There is slight performance increase, but I think is is slight, only
because we start copying not from line[0]:
---- 8< ---- drivers/media/platform/vivi.c
static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf)
{
...
for (h = 0; h < hmax; h++)
memcpy(vbuf + h * wmax * dev->pixelsize,
dev->line + (dev->mv_count % wmax) * dev->pixelsize,
wmax * dev->pixelsize);
before:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
#
# Samples: 49K of event 'cycles'
# Event count (approx.): 16799780016
#
# Overhead Command Shared Object
# ........ ............... ....................
#
27.51% rawv libc-2.13.so [.] __memcpy_ssse3
23.77% vivi-* [kernel.kallsyms] [k] memcpy
9.96% Xorg [unknown] [.] 0xa76f5e12
4.94% vivi-* [vivi] [k] gen_text.constprop.6
4.44% rawv [vivi] [k] gen_twopix
3.17% vivi-* [vivi] [k] vivi_fillbuff
2.45% rawv [vivi] [k] precalculate_line
1.20% swapper [kernel.kallsyms] [k] read_hpet
23.77% vivi-* [kernel.kallsyms] [k] memcpy
|
--- memcpy
|
|--99.28%-- vivi_fillbuff
| vivi_thread
| kthread
| ret_from_kernel_thread
--0.72%-- [...]
after:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
#
# Samples: 49K of event 'cycles'
# Event count (approx.): 16475832370
#
# Overhead Command Shared Object
# ........ ............... ......................
#
29.07% rawv libc-2.13.so [.] __memcpy_ssse3
20.57% vivi-* [kernel.kallsyms] [k] memcpy
10.20% Xorg [unknown] [.] 0xa7301494
5.16% vivi-* [vivi] [k] gen_text.constprop.6
4.43% rawv [vivi] [k] gen_twopix
4.36% vivi-* [vivi] [k] vivi_fillbuff
2.42% rawv [vivi] [k] precalculate_line
1.33% swapper [kernel.kallsyms] [k] read_hpet
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
---
drivers/media/platform/vivi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/vivi.c b/drivers/media/platform/vivi.c
index cb2337e..ddcc712 100644
--- a/drivers/media/platform/vivi.c
+++ b/drivers/media/platform/vivi.c
@@ -242,7 +242,7 @@ struct vivi_dev {
unsigned int field_count;
u8 bars[9][3];
- u8 line[MAX_WIDTH * 8];
+ u8 line[MAX_WIDTH * 8] __attribute__((__aligned__(4)));
unsigned int pixelsize;
u8 alpha_component;
u32 textfg, textbg;
--
1.8.0.316.g291341c
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] [media] vivi: Move computations out of vivi_fillbuf linecopy loop
2012-11-02 13:10 [PATCH 0/4] Speedup vivi Kirill Smelkov
2012-11-02 13:10 ` [PATCH 1/4] [media] vivi: Optimize gen_text() Kirill Smelkov
2012-11-02 13:10 ` [PATCH 2/4] [media] vivi: vivi_dev->line[] was not aligned Kirill Smelkov
@ 2012-11-02 13:10 ` Kirill Smelkov
2012-11-02 13:10 ` [PATCH 4/4] [media] vivi: Optimize precalculate_line() Kirill Smelkov
2012-11-02 13:48 ` [PATCH 0/4] Speedup vivi Hans Verkuil
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Smelkov @ 2012-11-02 13:10 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Kirill Smelkov, Hans Verkuil, linux-media
The "dev->mvcount % wmax" thing was showing high in profiles (we do it
for each line which ~ 500 per frame)
│ 000010c0 <vivi_fillbuff>:
...
0,39 │ 70:┌─→mov 0x3ff4(%edi),%esi
0,22 │ 76:│ mov 0x2a0(%edi),%eax
0,30 │ │ mov -0x84(%ebp),%ebx
0,35 │ │ mov %eax,%edx
0,04 │ │ mov -0x7c(%ebp),%ecx
0,35 │ │ sar $0x1f,%edx
0,44 │ │ idivl -0x7c(%ebp)
21,68 │ │ imul %esi,%ecx
0,70 │ │ imul %esi,%ebx
0,52 │ │ add -0x88(%ebp),%ebx
1,65 │ │ mov %ebx,%eax
0,22 │ │ imul %edx,%esi
0,04 │ │ lea 0x3f4(%edi,%esi,1),%edx
2,18 │ │→ call vivi_fillbuff+0xa6
0,74 │ │ addl $0x1,-0x80(%ebp)
62,69 │ │ mov -0x7c(%ebp),%edx
1,18 │ │ mov -0x80(%ebp),%ecx
0,35 │ │ add %edx,-0x84(%ebp)
0,61 │ │ cmp %ecx,-0x8c(%ebp)
0,22 │ └──jne 70
so since all variables stay the same for all iterations let's move
computations out of the loop: the abovementioned division and
"width*pixelsize" too
before:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
#
# Samples: 49K of event 'cycles'
# Event count (approx.): 16475832370
#
# Overhead Command Shared Object
# ........ ............... ......................
#
29.07% rawv libc-2.13.so [.] __memcpy_ssse3
20.57% vivi-* [kernel.kallsyms] [k] memcpy
10.20% Xorg [unknown] [.] 0xa7301494
5.16% vivi-* [vivi] [k] gen_text.constprop.6
4.43% rawv [vivi] [k] gen_twopix
4.36% vivi-* [vivi] [k] vivi_fillbuff
2.42% rawv [vivi] [k] precalculate_line
1.33% swapper [kernel.kallsyms] [k] read_hpet
after:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
#
# Samples: 46K of event 'cycles'
# Event count (approx.): 15574200568
#
# Overhead Command Shared Object
# ........ ............... ....................
#
27.99% rawv libc-2.13.so [.] __memcpy_ssse3
23.29% vivi-* [kernel.kallsyms] [k] memcpy
10.30% Xorg [unknown] [.] 0xa75c98f8
5.34% vivi-* [vivi] [k] gen_text.constprop.6
4.61% rawv [vivi] [k] gen_twopix
2.64% rawv [vivi] [k] precalculate_line
1.37% swapper [kernel.kallsyms] [k] read_hpet
0.79% Xorg [kernel.kallsyms] [k] read_hpet
0.64% Xorg [kernel.kallsyms] [k] unix_poll
0.45% Xorg [kernel.kallsyms] [k] fget_light
0.43% rawv libxcb.so.1.1.0 [.] 0x0000aae9
0.40% runsv [kernel.kallsyms] [k] ext2_try_to_allocate
0.36% Xorg [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.31% vivi-* [vivi] [k] vivi_fillbuff
(i.e. vivi_fillbuff own overhead is almost gone)
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
---
drivers/media/platform/vivi.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/media/platform/vivi.c b/drivers/media/platform/vivi.c
index ddcc712..0272d07 100644
--- a/drivers/media/platform/vivi.c
+++ b/drivers/media/platform/vivi.c
@@ -579,22 +579,23 @@ static void gen_text(struct vivi_dev *dev, char *basep,
static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf)
{
- int wmax = dev->width;
+ int stride = dev->width * dev->pixelsize;
int hmax = dev->height;
struct timeval ts;
void *vbuf = vb2_plane_vaddr(&buf->vb, 0);
unsigned ms;
char str[100];
int h, line = 1;
+ u8 *linestart;
s32 gain;
if (!vbuf)
return;
+ linestart = dev->line + (dev->mv_count % dev->width) * dev->pixelsize;
+
for (h = 0; h < hmax; h++)
- memcpy(vbuf + h * wmax * dev->pixelsize,
- dev->line + (dev->mv_count % wmax) * dev->pixelsize,
- wmax * dev->pixelsize);
+ memcpy(vbuf + h * stride, linestart, stride);
/* Updates stream time */
--
1.8.0.316.g291341c
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] [media] vivi: Optimize precalculate_line()
2012-11-02 13:10 [PATCH 0/4] Speedup vivi Kirill Smelkov
` (2 preceding siblings ...)
2012-11-02 13:10 ` [PATCH 3/4] [media] vivi: Move computations out of vivi_fillbuf linecopy loop Kirill Smelkov
@ 2012-11-02 13:10 ` Kirill Smelkov
2012-11-02 13:48 ` [PATCH 0/4] Speedup vivi Hans Verkuil
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Smelkov @ 2012-11-02 13:10 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Kirill Smelkov, Hans Verkuil, linux-media
precalculate_line() is not very high on profile, but it calls expensive
gen_twopix(), so let's polish it too:
call gen_twopix() only once for every color bar and then distribute
the result.
before:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
#
# Samples: 46K of event 'cycles'
# Event count (approx.): 15574200568
#
# Overhead Command Shared Object
# ........ ............... ....................
#
27.99% rawv libc-2.13.so [.] __memcpy_ssse3
23.29% vivi-* [kernel.kallsyms] [k] memcpy
10.30% Xorg [unknown] [.] 0xa75c98f8
5.34% vivi-* [vivi] [k] gen_text.constprop.6
4.61% rawv [vivi] [k] gen_twopix
2.64% rawv [vivi] [k] precalculate_line
1.37% swapper [kernel.kallsyms] [k] read_hpet
after:
# cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20
#
# Samples: 45K of event 'cycles'
# Event count (approx.): 15561769214
#
# Overhead Command Shared Object
# ........ ............... ....................
#
30.73% rawv libc-2.13.so [.] __memcpy_ssse3
26.78% vivi-* [kernel.kallsyms] [k] memcpy
10.68% Xorg [unknown] [.] 0xa73015e9
5.55% vivi-* [vivi] [k] gen_text.constprop.6
1.36% swapper [kernel.kallsyms] [k] read_hpet
0.96% Xorg [kernel.kallsyms] [k] read_hpet
...
0.16% rawv [vivi] [k] precalculate_line
...
0.14% rawv [vivi] [k] gen_twopix
(i.e. gen_twopix and precalculate_line overheads are almost gone)
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
---
drivers/media/platform/vivi.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/drivers/media/platform/vivi.c b/drivers/media/platform/vivi.c
index 0272d07..37d0af8 100644
--- a/drivers/media/platform/vivi.c
+++ b/drivers/media/platform/vivi.c
@@ -517,12 +517,22 @@ static void gen_twopix(struct vivi_dev *dev, u8 *buf, int colorpos, bool odd)
static void precalculate_line(struct vivi_dev *dev)
{
- int w;
-
- for (w = 0; w < dev->width * 2; w++) {
- int colorpos = w / (dev->width / 8) % 8;
-
- gen_twopix(dev, dev->line + w * dev->pixelsize, colorpos, w & 1);
+ unsigned pixsize = dev->pixelsize;
+ unsigned pixsize2 = 2*pixsize;
+ int colorpos;
+ u8 *pos;
+
+ for (colorpos = 0; colorpos < 16; ++colorpos) {
+ u8 pix[8];
+ int wstart = colorpos * dev->width / 8;
+ int wend = (colorpos+1) * dev->width / 8;
+ int w;
+
+ gen_twopix(dev, &pix[0], colorpos % 8, 0);
+ gen_twopix(dev, &pix[pixsize], colorpos % 8, 1);
+
+ for (w = wstart/2*2, pos = dev->line + w*pixsize; w < wend; w += 2, pos += pixsize2)
+ memcpy(pos, pix, pixsize2);
}
}
--
1.8.0.316.g291341c
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] Speedup vivi
2012-11-02 13:10 [PATCH 0/4] Speedup vivi Kirill Smelkov
` (3 preceding siblings ...)
2012-11-02 13:10 ` [PATCH 4/4] [media] vivi: Optimize precalculate_line() Kirill Smelkov
@ 2012-11-02 13:48 ` Hans Verkuil
2012-11-02 14:31 ` Kirill Smelkov
4 siblings, 1 reply; 7+ messages in thread
From: Hans Verkuil @ 2012-11-02 13:48 UTC (permalink / raw)
To: Kirill Smelkov; +Cc: Mauro Carvalho Chehab, linux-media
On Fri November 2 2012 14:10:29 Kirill Smelkov wrote:
> Hello up there. I was trying to use vivi to generate multiple video streams for
> my test-lab environment on atom system and noticed it wastes a lot of cpu.
>
> Please apply some optimization patches.
Looks good!
For the whole series:
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Regards,
Hans
>
> Thanks,
> Kirill
>
> Kirill Smelkov (4):
> [media] vivi: Optimize gen_text()
> [media] vivi: vivi_dev->line[] was not aligned
> [media] vivi: Move computations out of vivi_fillbuf linecopy loop
> [media] vivi: Optimize precalculate_line()
>
> drivers/media/platform/vivi.c | 94 ++++++++++++++++++++++++++++++-------------
> 1 file changed, 65 insertions(+), 29 deletions(-)
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] Speedup vivi
2012-11-02 13:48 ` [PATCH 0/4] Speedup vivi Hans Verkuil
@ 2012-11-02 14:31 ` Kirill Smelkov
0 siblings, 0 replies; 7+ messages in thread
From: Kirill Smelkov @ 2012-11-02 14:31 UTC (permalink / raw)
To: Hans Verkuil; +Cc: Mauro Carvalho Chehab, linux-media
On Fri, Nov 02, 2012 at 02:48:43PM +0100, Hans Verkuil wrote:
> On Fri November 2 2012 14:10:29 Kirill Smelkov wrote:
> > Hello up there. I was trying to use vivi to generate multiple video streams for
> > my test-lab environment on atom system and noticed it wastes a lot of cpu.
> >
> > Please apply some optimization patches.
>
> Looks good!
>
> For the whole series:
>
> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Thanks a lot!
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-11-02 14:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-02 13:10 [PATCH 0/4] Speedup vivi Kirill Smelkov
2012-11-02 13:10 ` [PATCH 1/4] [media] vivi: Optimize gen_text() Kirill Smelkov
2012-11-02 13:10 ` [PATCH 2/4] [media] vivi: vivi_dev->line[] was not aligned Kirill Smelkov
2012-11-02 13:10 ` [PATCH 3/4] [media] vivi: Move computations out of vivi_fillbuf linecopy loop Kirill Smelkov
2012-11-02 13:10 ` [PATCH 4/4] [media] vivi: Optimize precalculate_line() Kirill Smelkov
2012-11-02 13:48 ` [PATCH 0/4] Speedup vivi Hans Verkuil
2012-11-02 14:31 ` Kirill Smelkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).