From: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
To: "nicolas.pitre@linaro.org" <nicolas.pitre@linaro.org>
Cc: "rmk+kernel@arm.linux.org.uk" <rmk+kernel@arm.linux.org.uk>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Vineet.Gupta1@synopsys.com" <Vineet.Gupta1@synopsys.com>,
"shemminger@linux-foundation.org"
<shemminger@linux-foundation.org>,
"mingo@elte.hu" <mingo@elte.hu>,
"linux-snps-arc@lists.infradead.org"
<linux-snps-arc@lists.infradead.org>,
"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [PATCH] __div64_32: implement division by multiplication for 32-bit arches
Date: Fri, 30 Oct 2015 14:28:39 +0000 [thread overview]
Message-ID: <1446215318.4394.32.camel@synopsys.com> (raw)
In-Reply-To: <alpine.LFD.2.20.1510292112590.630@knanqh.ubzr>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4738 bytes --]
Hi Nicolas,
On Thu, 2015-10-29 at 21:26 -0400, Nicolas Pitre wrote:
> On Wed, 28 Oct 2015, Nicolas Pitre wrote:
>
> > On Thu, 29 Oct 2015, Alexey Brodkin wrote:
> >
> > > Fortunately we already have much better __div64_32() for 32-bit ARM.
> > > There in case of division by constant preprocessor calculates so-called
> > > "magic number" which is later used in multiplications instead of divisions.
> >
> > It's not magic, it is science. :-)
> >
> > > It's really nice and very optimal but obviously works only for ARM
> > > because ARM assembly is involved.
> > >
> > > Now why don't we extend the same approach to all other 32-bit arches
> > > with multiplication part implemented in pure C. With good compiler
> > > resulting assembly will be quite close to manually written assembly.
>
> Well... not as close at least on ARM. Maybe 2x to 3x more costly than
> the one with assembly. Still better than 100x or so without this
> optimization.
Indeed even having that function 25 times faster instead of 100 times is
already quite an achievement. For example that will already cure my iperf
performance degradation: I'll see do_div() taking < 1% instead of > 10% now.
My test source was:
--------------------->8------------------------
int myfunc(u64 data)
{
return do_div(data, 1000);
}
--------------------->8------------------------
Now take a look at disassembly that I'm getting:
--------------------->8------------------------
0000062c <myfunc>:
62c: 19 28 86 0f 4f 8d 3b df mpydu r6,r0,0x8d4fdf3b
634: 00 26 86 8f 4f 8d 3b df add.f r6,r6,0x8d4fdf3b
63c: 02 27 87 0f ed 7c 69 91 sub r7,r7,0x7ced9169
644: c0 27 65 00 add.c r7,r7,1
648: 85 0e c4 71 12 83 97 6e brlo 0x83126e97,r7,6cc <myfunc+0xa0>
650: 75 0f 80 0f 12 83 97 6e breq r7,0x83126e97,6c4 <myfunc+0x98>
658: 0d 70 mov_s r8,0
65a: 2d 71 mov_s r9,1
65c: 19 29 8a 0f 4f 8d 3b df mpydu r10,r1,0x8d4fdf3b
664: 00 27 84 82 add.f r4,r7,r10
668: ac 70 mov_s r5,0
66a: 19 28 86 0f 12 83 97 6e mpydu r6,r0,0x83126e97
672: 01 25 c5 02 adc r5,r5,r11
676: 00 24 04 82 add.f r4,r4,r8
67a: 00 25 45 02 add r5,r5,r9
67e: c0 25 65 00 add.c r5,r5,1
682: 00 26 06 81 add.f r6,r6,r4
686: 01 27 47 01 adc r7,r7,r5
68a: 51 0f 44 01 brlo r7,r5,6d8 <myfunc+0xac>
68e: 49 0d c0 01 breq r5,r7,6d4 <myfunc+0xa8>
692: 8c 70 mov_s r4,0
694: ac 70 mov_s r5,0
696: e0 42 mov_s r2,r7
698: 19 29 86 0f 12 83 97 6e mpydu r6,r1,0x83126e97
6a0: 00 22 82 81 add.f r2,r2,r6
6a4: 6c 70 mov_s r3,0
6a6: 01 23 c3 01 adc r3,r3,r7
6aa: 00 22 02 81 add.f r2,r2,r4
6ae: a0 73 add_s r3,r3,r5
6b0: c0 23 65 00 add.c r3,r3,1
6b4: 29 ba lsr_s r2,r2,9
6b6: 17 bb asl_s r3,r3,23
6b8: 65 7a or_s r2,r2,r3
6ba: 9a 22 0f 0a mpy r2,r2,0x3e8
6be: e0 7f j_s.d [blink]
6c0: 42 78 sub_s r0,r0,r2
6c2: e0 78 nop_s
6c4: 95 0e 85 f1 4f 8d 3a df brhs.nt 0x8d4fdf3a,r6,658 <myfunc+0x2c>
6cc: 0d 70 mov_s r8,0
6ce: 91 07 ef ff b.d 65c <myfunc+0x30>
6d2: 2d 70 mov_s r9,0
6d4: bf 0e 05 81 brhs.nt r6,r4,692 <myfunc+0x66>
6d8: 8c 70 mov_s r4,0
6da: bf 07 ef ff b.d 696 <myfunc+0x6a>
6de: ac 71 mov_s r5,1
--------------------->8------------------------
What you see here is pretty straight-forward conversion to assembly of "run-time calculations".
Things to note:
[1] Only 5 multiplications are used. That's because we have 32x32 multiplication unit
that returns 64-bit result in register pair.
[2] Indeed lots of moves and additions happen here.
So my conclusion would be:
[1] Proposed implementation makes perfect sense because already speeds-up do_div()
significantly.
[2] Ability to substitute "run-time calculations" on per-arch basis would be awsome
because with few lines of assembly another 2-4 times of improvement could be
achieved.
-Alexeyÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
next prev parent reply other threads:[~2015-10-30 14:28 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-28 22:47 [PATCH] __div64_32: implement division by multiplication for 32-bit arches Alexey Brodkin
2015-10-28 23:32 ` Nicolas Pitre
2015-10-29 7:34 ` Alexey Brodkin
2015-10-30 1:26 ` Nicolas Pitre
2015-10-30 5:41 ` Vineet Gupta
2015-10-30 12:41 ` Måns Rullgård
2015-10-30 12:40 ` Måns Rullgård
2015-10-30 15:17 ` Nicolas Pitre
2015-10-30 15:54 ` Alexey Brodkin
2015-10-30 16:55 ` Nicolas Pitre
2015-10-30 17:45 ` Måns Rullgård
2015-11-04 23:46 ` Nicolas Pitre
2015-11-04 23:48 ` Nicolas Pitre
2015-11-05 3:13 ` Vineet Gupta
2015-11-05 5:06 ` Nicolas Pitre
2015-11-04 23:49 ` Måns Rullgård
2015-10-30 14:28 ` Alexey Brodkin [this message]
2015-10-29 0:36 ` kbuild test robot
2015-10-29 12:52 ` Måns Rullgård
2015-10-29 13:05 ` Alexey Brodkin
2015-10-29 13:37 ` Måns Rullgård
2015-10-29 13:31 ` Russell King - ARM Linux
2015-10-29 14:32 ` Alexey Brodkin
2015-10-29 17:09 ` Randy Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1446215318.4394.32.camel@synopsys.com \
--to=alexey.brodkin@synopsys.com \
--cc=Vineet.Gupta1@synopsys.com \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-snps-arc@lists.infradead.org \
--cc=mingo@elte.hu \
--cc=nicolas.pitre@linaro.org \
--cc=rmk+kernel@arm.linux.org.uk \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox