From: Joe Perches <joe@perches.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
Tom Herbert <tom@herbertland.com>
Cc: Alexander Duyck <aduyck@mirantis.com>,
Netdev <netdev@vger.kernel.org>,
David Miller <davem@davemloft.net>
Subject: Re: [net-next PATCH] csum: Update csum_block_add to use rotate instead of byteswap
Date: Wed, 09 Mar 2016 16:18:33 -0800 [thread overview]
Message-ID: <1457569113.3433.7.camel@perches.com> (raw)
In-Reply-To: <CAKgT0UebO62GBsmL17JrZW0Ptzmr05buc1x6pHv6A_PAr4HBLQ@mail.gmail.com>
On Wed, 2016-03-09 at 08:08 -0800, Alexander Duyck wrote:
> On Tue, Mar 8, 2016 at 10:31 PM, Tom Herbert <tom@herbertland.com> wrote:
> > I took a look inlining these.
> >
> > #define rol32(V, X) ({ \
> > int word = V; \
> > if (__builtin_constant_p(X)) \
> > asm("roll $" #X ",%[word]\n\t" \
> > : [word] "=r" (word)); \
> > else \
> > asm("roll %%cl,%[word]\n\t" \
> > : [word] "=r" (word) \
> > : "c" (X)); \
> > word; \
> > })
> >
> > With this I'm seeing a nice speedup in jhash which uses a lot of rol32s...
> Is gcc really not converting the rol32 calls into rotates?
No, it is.
The difference in the object code with the asm for instance is:
(old, compiled with gcc 5.3.1)
<jhash_2words.constprop.5>:
84e: 81 ee 09 41 52 21 sub $0x21524109,%esi
854: 81 ef 09 41 52 21 sub $0x21524109,%edi
85a: 55 push %rbp
85b: 89 f0 mov %esi,%eax
85d: 89 f2 mov %esi,%edx
85f: 48 ff 05 00 00 00 00 incq 0x0(%rip) # 866 <jhash_2words.constprop.5+0x18>
866: c1 c2 0e rol $0xe,%edx
869: 35 f7 be ad de xor $0xdeadbef7,%eax
86e: 48 89 e5 mov %rsp,%rbp
871: 29 d0 sub %edx,%eax
873: 48 ff 05 00 00 00 00 incq 0x0(%rip) # 87a <jhash_2words.constprop.5+0x2c>
87a: 48 ff 05 00 00 00 00 incq 0x0(%rip) # 881 <jhash_2words.constprop.5+0x33>
881: 89 c2 mov %eax,%edx
883: 31 c7 xor %eax,%edi
885: c1 c2 0b rol $0xb,%edx
888: 29 d7 sub %edx,%edi
88a: 89 fa mov %edi,%edx
88c: 31 fe xor %edi,%esi
88e: c1 ca 07 ror $0x7,%edx
891: 29 d6 sub %edx,%esi
893: 89 f2 mov %esi,%edx
895: 31 f0 xor %esi,%eax
897: c1 c2 10 rol $0x10,%edx
89a: 29 d0 sub %edx,%eax
89c: 89 c2 mov %eax,%edx
89e: 31 c7 xor %eax,%edi
8a0: c1 c2 04 rol $0x4,%edx
8a3: 29 d7 sub %edx,%edi
8a5: 31 fe xor %edi,%esi
8a7: c1 c7 0e rol $0xe,%edi
8aa: 29 fe sub %edi,%esi
8ac: 31 f0 xor %esi,%eax
8ae: c1 ce 08 ror $0x8,%esi
8b1: 29 f0 sub %esi,%eax
8b3: 5d pop %rbp
8b4: c3 retq
vs Tom's asm
000000000000084e <jhash_2words.constprop.5>:
84e: 81 ee 09 41 52 21 sub $0x21524109,%esi
854: 8d 87 f7 be ad de lea -0x21524109(%rdi),%eax
85a: 55 push %rbp
85b: 89 f2 mov %esi,%edx
85d: 48 ff 05 00 00 00 00 incq 0x0(%rip) # 864 <jhash_2words.constprop.5+0x16>
864: 48 ff 05 00 00 00 00 incq 0x0(%rip) # 86b <jhash_2words.constprop.5+0x1d>
86b: 81 f2 f7 be ad de xor $0xdeadbef7,%edx
871: 48 89 e5 mov %rsp,%rbp
874: c1 c1 0e rol $0xe,%ecx
877: 29 ca sub %ecx,%edx
879: 31 d0 xor %edx,%eax
87b: c1 c7 0b rol $0xb,%edi
87e: 29 f8 sub %edi,%eax
880: 48 ff 05 00 00 00 00 incq 0x0(%rip) # 887 <jhash_2words.constprop.5+0x39>
887: 31 c6 xor %eax,%esi
889: c1 c7 19 rol $0x19,%edi
88c: 29 fe sub %edi,%esi
88e: 31 f2 xor %esi,%edx
890: c1 c7 10 rol $0x10,%edi
893: 29 fa sub %edi,%edx
895: 31 d0 xor %edx,%eax
897: c1 c7 04 rol $0x4,%edi
89a: 29 f8 sub %edi,%eax
89c: 31 f0 xor %esi,%eax
89e: 29 c8 sub %ecx,%eax
8a0: 31 d0 xor %edx,%eax
8a2: 5d pop %rbp
8a3: c1 c2 18 rol $0x18,%edx
8a6: 29 d0 sub %edx,%eax
8a8: c3 retq
> If we need this type of code in order to get the rotates to occur as
> expected then maybe we need to look at doing arch specific versions of
> the functions in bitops.h in order to improve the performance since I
> know these calls are used in some performance critical paths such as
> crypto and hashing.
Yeah, maybe, but why couldn't gcc generate similar code
as Tom's asm? (modulo the ripple reducing ror vs rol uses
when the shift is > 16
next prev parent reply other threads:[~2016-03-10 0:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 22:42 [net-next PATCH] csum: Update csum_block_add to use rotate instead of byteswap Alexander Duyck
2016-03-08 23:25 ` Joe Perches
2016-03-09 5:23 ` Alexander Duyck
2016-03-09 5:50 ` Joe Perches
2016-03-09 6:08 ` Alexander Duyck
2016-03-09 6:31 ` Tom Herbert
2016-03-09 16:08 ` Alexander Duyck
2016-03-10 0:18 ` Joe Perches [this message]
2016-03-10 0:58 ` Tom Herbert
2016-03-09 10:54 ` David Laight
2016-03-09 16:03 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1457569113.3433.7.camel@perches.com \
--to=joe@perches.com \
--cc=aduyck@mirantis.com \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.