From: Denys Vlasenko <vda.linux@googlemail.com>
To: roma1390 <roma1390@gmail.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Douglas W Jones <jones@cs.uiowa.edu>,
Michal Nazarewicz <mnazarewicz@google.com>
Subject: Re: [PATCH 0/1] vsprintf: optimize decimal conversion (again)
Date: Wed, 28 Mar 2012 12:13:07 +0200 [thread overview]
Message-ID: <201203281213.07856.vda.linux@googlemail.com> (raw)
In-Reply-To: <4F72A80F.7000801@gmail.com>
On Wednesday 28 March 2012 07:56, roma1390 wrote:
> On 2012.03.27 18:42, Denys Vlasenko wrote:
> > On Tue, Mar 27, 2012 at 2:08 PM, roma1390<roma1390@gmail.com> wrote:
> >> On 2012.03.26 21:47, Denys Vlasenko wrote:
> >>>
> >>> Please find test programs attached.
> >>>
> >>> 32-bit test programs were built using gcc 4.6.2
> >>> 64-bit test programs were built using gcc 4.2.1
> >>> Command line: gcc --static [-m32] -O2 -Wall test_{org,new}.c
> >>
> >> Can't compile reference for arm:
> >> $ arm-linux-gnueabi-gcc -O2 -Wall test_org.c -o test_org
> >> test_org.c: In function ‘put_dec’:
> >> test_org.c:101: error: impossible constraint in ‘asm’
> >> test_org.c:101: error: impossible constraint in ‘asm’
> >> test_org.c:101: error: impossible constraint in ‘asm’
> >
> > Please find a modified test_header.c attached.
> > I tested and it builds in my arm emulator.
>
>
> Run on same:
> 2.6.32-5-kirkwood #1 Tue Jan 17 05:11:52 UTC 2012 armv5tel GNU/Linux
> GCC version:
> gcc version 4.4.5 (Debian 4.4.5-8), Target: arm-linux-gnueabi
> Compiled with:
> arm-linux-gnueabi-gcc -O2 -Wall test_{org,new}.c -o test_{org,new}
>
>
> run default priority on almost idle machine:
Hmm, results are not good. Up to 8-digit conversions we were winning,
but when we start converting larger numbers, we lose big time:
123456789:2388000 2^32:2268000 2^64:1400000
123456789:1168000 2^32:976000 2^64:532000
Since ARM is 32-bit, we are using algorithm #2.
It's not like algorithm #2 is intrinsically bad: on i386, it is a win
on both AMD and Intel I tested it on.
Either it's just the difference between ARM and x86, or gcc is
generating suboptimal code for it.
I would like to ask you to do a few things.
First: email me both test_org and test_new binaries.
(Privately, to not spam the list).
Second: run
arm-linux-gnueabi-gcc -O2 -Wall test_{org,new}.c -S
and email me resulting test_{org,new}.s files.
Third: switch to algorithm #1 and test whether it fares better.
To do that, go to test_new.c
and replace
#if LONG_MAX > ((1UL<<31)-1) || LLONG_MAX > ((1ULL<<63)-1)
with
#if 1 ////LONG_MAX > ((1UL<<31)-1) || LLONG_MAX > ((1ULL<<63)-1)
(there are two instances of this line there),
then recompile and rerun the test, and post the result.
When I disassemble ARM code produced by _my_ compiler,
I don't see any obviously bad things.
put_dec_trunc8 is the function which works well for you.
put_dec_full4 and put_dec are used for printing 9+ digit numbers,
and your testing says they are slow. I don't see why -
see disassembly below.
I need to look at _your_ gcc's output...
00000000 <put_dec_trunc8>:
0: e92d40f0 stmdb sp!, {r4, r5, r6, r7, lr}
4: e59f6188 ldr r6, [pc, #392] ; 194 <.text+0x194>
8: e0843691 umull r3, r4, r1, r6
c: e1a03004 mov r3, r4
10: e1a0c003 mov ip, r3
14: e1a02083 mov r2, r3, lsl #1
18: e1a03183 mov r3, r3, lsl #3
1c: e3a04000 mov r4, #0 ; 0x0
20: e0822003 add r2, r2, r3
24: e2811030 add r1, r1, #48 ; 0x30
28: e0621001 rsb r1, r2, r1
2c: e15c0004 cmp ip, r4
30: e1a05000 mov r5, r0
34: e3a07000 mov r7, #0 ; 0x0
38: e4c01001 strb r1, [r0], #1
3c: 0a000052 beq 18c <put_dec_trunc8+0x18c>
40: e084369c umull r3, r4, ip, r6
44: e1a03004 mov r3, r4
48: e1a02083 mov r2, r3, lsl #1
4c: e1a01183 mov r1, r3, lsl #3
50: e1a0e003 mov lr, r3
54: e3a04000 mov r4, #0 ; 0x0
58: e0822001 add r2, r2, r1
5c: e28c3030 add r3, ip, #48 ; 0x30
60: e0623003 rsb r3, r2, r3
64: e15e0004 cmp lr, r4
68: e5c53001 strb r3, [r5, #1]
6c: e2800001 add r0, r0, #1 ; 0x1
70: 0a000045 beq 18c <put_dec_trunc8+0x18c>
74: e084369e umull r3, r4, lr, r6
78: e1a03004 mov r3, r4
7c: e1a02083 mov r2, r3, lsl #1
80: e1a01183 mov r1, r3, lsl #3
84: e1a05003 mov r5, r3
88: e3a04000 mov r4, #0 ; 0x0
8c: e0822001 add r2, r2, r1
90: e28e3030 add r3, lr, #48 ; 0x30
94: e0623003 rsb r3, r2, r3
98: e1550004 cmp r5, r4
9c: e4c03001 strb r3, [r0], #1
a0: 0a000039 beq 18c <put_dec_trunc8+0x18c>
a4: e0843695 umull r3, r4, r5, r6
a8: e1a03004 mov r3, r4
ac: e1a02083 mov r2, r3, lsl #1
b0: e1a01183 mov r1, r3, lsl #3
b4: e1a0c003 mov ip, r3
b8: e3a04000 mov r4, #0 ; 0x0
bc: e0822001 add r2, r2, r1
c0: e2853030 add r3, r5, #48 ; 0x30
c4: e0623003 rsb r3, r2, r3
c8: e15c0004 cmp ip, r4
cc: e4c03001 strb r3, [r0], #1
d0: 0a00002d beq 18c <put_dec_trunc8+0x18c>
d4: e1a0220c mov r2, ip, lsl #4
d8: e042210c sub r2, r2, ip, lsl #2
dc: e082200c add r2, r2, ip
e0: e1a03302 mov r3, r2, lsl #6
e4: e0623003 rsb r3, r2, r3
e8: e1a03103 mov r3, r3, lsl #2
ec: e083300c add r3, r3, ip
f0: e1a03083 mov r3, r3, lsl #1
f4: e1a0e823 mov lr, r3, lsr #16
f8: e1a0208e mov r2, lr, lsl #1
fc: e1a0118e mov r1, lr, lsl #3
100: e0822001 add r2, r2, r1
104: e28c3030 add r3, ip, #48 ; 0x30
108: e0623003 rsb r3, r2, r3
10c: e15e0004 cmp lr, r4
110: e4c03001 strb r3, [r0], #1
114: 0a00001c beq 18c <put_dec_trunc8+0x18c>
118: e1a0320e mov r3, lr, lsl #4
11c: e043310e sub r3, r3, lr, lsl #2
120: e1a02203 mov r2, r3, lsl #4
124: e0833002 add r3, r3, r2
128: e083300e add r3, r3, lr
12c: e1a0c5a3 mov ip, r3, lsr #11
130: e1a0208c mov r2, ip, lsl #1
134: e1a0118c mov r1, ip, lsl #3
138: e0822001 add r2, r2, r1
13c: e28e3030 add r3, lr, #48 ; 0x30
140: e0623003 rsb r3, r2, r3
144: e15c0004 cmp ip, r4
148: e4c03001 strb r3, [r0], #1
14c: 0a00000e beq 18c <put_dec_trunc8+0x18c>
150: e1a0320c mov r3, ip, lsl #4
154: e043310c sub r3, r3, ip, lsl #2
158: e1a02203 mov r2, r3, lsl #4
15c: e0833002 add r3, r3, r2
160: e083300c add r3, r3, ip
164: e1a0e5a3 mov lr, r3, lsr #11
168: e1a0208e mov r2, lr, lsl #1
16c: e1a0118e mov r1, lr, lsl #3
170: e0822001 add r2, r2, r1
174: e28c3030 add r3, ip, #48 ; 0x30
178: e0623003 rsb r3, r2, r3
17c: e15e0004 cmp lr, r4
180: e4c03001 strb r3, [r0], #1
184: 128e3030 addne r3, lr, #48 ; 0x30
188: 14c03001 strneb r3, [r0], #1
18c: e8bd40f0 ldmia sp!, {r4, r5, r6, r7, lr}
190: e12fff1e bx lr
194: 1999999a ldmneib r9, {r1, r3, r4, r7, r8, fp, ip, pc}
00000198 <put_dec_full4>:
198: e1a0c201 mov ip, r1, lsl #4
19c: e04cc101 sub ip, ip, r1, lsl #2
1a0: e1a0320c mov r3, ip, lsl #4
1a4: e08cc003 add ip, ip, r3
1a8: e1a0240c mov r2, ip, lsl #8
1ac: e08cc002 add ip, ip, r2
1b0: e08cc001 add ip, ip, r1
1b4: e1a0c9ac mov ip, ip, lsr #19
1b8: e1a0320c mov r3, ip, lsl #4
1bc: e043310c sub r3, r3, ip, lsl #2
1c0: e083300c add r3, r3, ip
1c4: e1a02303 mov r2, r3, lsl #6
1c8: e0632002 rsb r2, r3, r2
1cc: e1a02102 mov r2, r2, lsl #2
1d0: e082200c add r2, r2, ip
1d4: e1a02082 mov r2, r2, lsl #1
1d8: e1a02822 mov r2, r2, lsr #16
1dc: e92d4070 stmdb sp!, {r4, r5, r6, lr}
1e0: e1a0e202 mov lr, r2, lsl #4
1e4: e04ee102 sub lr, lr, r2, lsl #2
1e8: e1a0320e mov r3, lr, lsl #4
1ec: e08ee003 add lr, lr, r3
1f0: e1a0408c mov r4, ip, lsl #1
1f4: e1a0318c mov r3, ip, lsl #3
1f8: e0844003 add r4, r4, r3
1fc: e08ee002 add lr, lr, r2
200: e2811030 add r1, r1, #48 ; 0x30
204: e0641001 rsb r1, r4, r1
208: e1a0e5ae mov lr, lr, lsr #11
20c: e1a04000 mov r4, r0
210: e4c41001 strb r1, [r4], #1
214: e1a06000 mov r6, r0
218: e1a05182 mov r5, r2, lsl #3
21c: e1a0018e mov r0, lr, lsl #3
220: e1a01082 mov r1, r2, lsl #1
224: e1a0308e mov r3, lr, lsl #1
228: e0833000 add r3, r3, r0
22c: e0811005 add r1, r1, r5
230: e28cc030 add ip, ip, #48 ; 0x30
234: e2822030 add r2, r2, #48 ; 0x30
238: e2840001 add r0, r4, #1 ; 0x1
23c: e061c00c rsb ip, r1, ip
240: e0632002 rsb r2, r3, r2
244: e28ee030 add lr, lr, #48 ; 0x30
248: e5c6c001 strb ip, [r6, #1]
24c: e5c42001 strb r2, [r4, #1]
250: e5c0e001 strb lr, [r0, #1]
254: e2800002 add r0, r0, #2 ; 0x2
258: e8bd4070 ldmia sp!, {r4, r5, r6, lr}
25c: e12fff1e bx lr
00000260 <put_dec>:
260: e92d4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
264: e3530000 cmp r3, #0 ; 0x0
268: e24dd00c sub sp, sp, #12 ; 0xc
26c: e1a08002 mov r8, r2
270: e1a09003 mov r9, r3
274: e1a0e000 mov lr, r0
278: 8a000009 bhi 2a4 <put_dec+0x44>
27c: 0a000003 beq 290 <put_dec+0x30>
280: e1a01008 mov r1, r8
284: e28dd00c add sp, sp, #12 ; 0xc
288: e8bd4ff0 ldmia sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
28c: eaffff5b b 0 <put_dec_trunc8>
290: e3e034fa mvn r3, #-100663296 ; 0xfa000000
294: e2433aa1 sub r3, r3, #659456 ; 0xa1000
298: e2433c0f sub r3, r3, #3840 ; 0xf00
29c: e1520003 cmp r2, r3
2a0: 9afffff6 bls 280 <put_dec+0x20>
2a4: e1a07828 mov r7, r8, lsr #16
2a8: e1a0c187 mov ip, r7, lsl #3
2ac: e1a02287 mov r2, r7, lsl #5
2b0: e08c2002 add r2, ip, r2
2b4: e0822007 add r2, r2, r7
2b8: e1a03182 mov r3, r2, lsl #3
2bc: e1a0a829 mov sl, r9, lsr #16
2c0: e0822003 add r2, r2, r3
2c4: e1a04809 mov r4, r9, lsl #16
2c8: e1a04824 mov r4, r4, lsr #16
2cc: e1a00202 mov r0, r2, lsl #4
2d0: e1a0118a mov r1, sl, lsl #3
2d4: e1a0328a mov r3, sl, lsl #5
2d8: e0815003 add r5, r1, r3
2dc: e1a09184 mov r9, r4, lsl #3
2e0: e0620000 rsb r0, r2, r0
2e4: e1a01808 mov r1, r8, lsl #16
2e8: e1a03304 mov r3, r4, lsl #6
2ec: e0800007 add r0, r0, r7
2f0: e1a01821 mov r1, r1, lsr #16
2f4: e0693003 rsb r3, r9, r3
2f8: e085200a add r2, r5, sl
2fc: e1a02202 mov r2, r2, lsl #4
300: e0800001 add r0, r0, r1
304: e0833004 add r3, r3, r4
308: e0800002 add r0, r0, r2
30c: e59fb164 ldr fp, [pc, #356] ; 478 <.text+0x478>
310: e1a03383 mov r3, r3, lsl #7
314: e0800003 add r0, r0, r3
318: e088209b umull r2, r8, fp, r0
31c: e1a086a8 mov r8, r8, lsr #13
320: e1a01388 mov r1, r8, lsl #7
324: e0411108 sub r1, r1, r8, lsl #2
328: e0811008 add r1, r1, r8
32c: e1a03101 mov r3, r1, lsl #2
330: e0811003 add r1, r1, r3
334: e0401201 sub r1, r0, r1, lsl #4
338: e1a0000e mov r0, lr
33c: e58dc004 str ip, [sp, #4]
340: ebffff94 bl 198 <put_dec_full4>
344: e1a03284 mov r3, r4, lsl #5
348: e1a02104 mov r2, r4, lsl #2
34c: e0822003 add r2, r2, r3
350: e1a0630a mov r6, sl, lsl #6
354: e1a0350a mov r3, sl, lsl #10
358: e0663003 rsb r3, r6, r3
35c: e1a01282 mov r1, r2, lsl #5
360: e0822001 add r2, r2, r1
364: e06a3003 rsb r3, sl, r3
368: e0642002 rsb r2, r4, r2
36c: e59dc004 ldr ip, [sp, #4]
370: e1a03183 mov r3, r3, lsl #3
374: e1a02182 mov r2, r2, lsl #3
378: e06a3003 rsb r3, sl, r3
37c: e04cc087 sub ip, ip, r7, lsl #1
380: e0833002 add r3, r3, r2
384: e083300c add r3, r3, ip
388: e0833008 add r3, r3, r8
38c: e087239b umull r2, r7, fp, r3
390: e1a076a7 mov r7, r7, lsr #13
394: e1a01387 mov r1, r7, lsl #7
398: e0411107 sub r1, r1, r7, lsl #2
39c: e0811007 add r1, r1, r7
3a0: e1a02101 mov r2, r1, lsl #2
3a4: e0811002 add r1, r1, r2
3a8: e0431201 sub r1, r3, r1, lsl #4
3ac: e046620a sub r6, r6, sl, lsl #4
3b0: ebffff78 bl 198 <put_dec_full4>
3b4: e1a03286 mov r3, r6, lsl #5
3b8: e0866003 add r6, r6, r3
3bc: e0499084 sub r9, r9, r4, lsl #1
3c0: e06a6006 rsb r6, sl, r6
3c4: e1a02189 mov r2, r9, lsl #3
3c8: e1a03106 mov r3, r6, lsl #2
3cc: e0663003 rsb r3, r6, r3
3d0: e0692002 rsb r2, r9, r2
3d4: e0822003 add r2, r2, r3
3d8: e0822007 add r2, r2, r7
3dc: e084329b umull r3, r4, fp, r2
3e0: e1a046a4 mov r4, r4, lsr #13
3e4: e1a01384 mov r1, r4, lsl #7
3e8: e0411104 sub r1, r1, r4, lsl #2
3ec: e0811004 add r1, r1, r4
3f0: e1a03101 mov r3, r1, lsl #2
3f4: e0811003 add r1, r1, r3
3f8: e0421201 sub r1, r2, r1, lsl #4
3fc: ebffff65 bl 198 <put_dec_full4>
400: e1a03185 mov r3, r5, lsl #3
404: e0653003 rsb r3, r5, r3
408: e083300a add r3, r3, sl
40c: e0944003 adds r4, r4, r3
410: e1a01000 mov r1, r0
414: 0a000010 beq 45c <put_dec+0x1fc>
418: e083249b umull r2, r3, fp, r4
41c: e1a066a3 mov r6, r3, lsr #13
420: e1a01386 mov r1, r6, lsl #7
424: e0411106 sub r1, r1, r6, lsl #2
428: e0811006 add r1, r1, r6
42c: e1a03101 mov r3, r1, lsl #2
430: e0811003 add r1, r1, r3
434: e0441201 sub r1, r4, r1, lsl #4
438: ebffff56 bl 198 <put_dec_full4>
43c: e3560000 cmp r6, #0 ; 0x0
440: e1a01000 mov r1, r0
444: 0a000004 beq 45c <put_dec+0x1fc>
448: e1a01006 mov r1, r6
44c: ebffff51 bl 198 <put_dec_full4>
450: e1a01000 mov r1, r0
454: ea000000 b 45c <put_dec+0x1fc>
458: e2411001 sub r1, r1, #1 ; 0x1
45c: e5513001 ldrb r3, [r1, #-1]
460: e3530030 cmp r3, #48 ; 0x30
464: 0afffffb beq 458 <put_dec+0x1f8>
468: e1a00001 mov r0, r1
46c: e28dd00c add sp, sp, #12 ; 0xc
470: e8bd4ff0 ldmia sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
474: e12fff1e bx lr
478: d1b71759 movles r1, r9, asr r7
--
vda
next prev parent reply other threads:[~2012-03-28 10:13 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-26 18:47 [PATCH 0/1] vsprintf: optimize decimal conversion (again) Denys Vlasenko
2012-03-26 18:51 ` [PATCH 1/1] " Denys Vlasenko
2012-03-26 19:51 ` Andrew Morton
2012-03-26 19:56 ` Denys Vlasenko
2012-03-26 20:13 ` Andrew Morton
2012-03-26 20:18 ` Geert Uytterhoeven
2012-03-26 23:18 ` Denys Vlasenko
2012-03-27 0:30 ` Denys Vlasenko
2012-03-27 3:49 ` H. Peter Anvin
2012-03-26 20:20 ` H. Peter Anvin
2012-03-27 17:12 ` Michal Nazarewicz
2012-03-27 17:17 ` H. Peter Anvin
2012-03-27 0:26 ` Denys Vlasenko
2012-03-27 12:08 ` [PATCH 0/1] " roma1390
2012-03-27 15:32 ` Denys Vlasenko
2012-03-27 15:42 ` Denys Vlasenko
2012-03-28 5:56 ` roma1390
2012-03-28 10:13 ` Denys Vlasenko [this message]
2012-03-28 10:24 ` roma1390
2012-03-28 10:33 ` Denys Vlasenko
2012-03-28 10:39 ` roma1390
2012-03-28 11:20 ` Denys Vlasenko
2012-03-29 10:35 ` Denys Vlasenko
2012-03-28 10:31 ` roma1390
2012-03-28 11:23 ` Denys Vlasenko
2012-03-29 5:23 ` roma1390
2012-03-29 10:33 ` Denys Vlasenko
2012-03-27 13:49 ` roma1390
2012-03-27 15:33 ` Denys Vlasenko
2012-03-29 5:16 ` roma1390
2012-03-29 10:33 ` Denys Vlasenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201203281213.07856.vda.linux@googlemail.com \
--to=vda.linux@googlemail.com \
--cc=akpm@linux-foundation.org \
--cc=jones@cs.uiowa.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=mnazarewicz@google.com \
--cc=roma1390@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.