Re: [net-next PATCH] csum: Update csum_block_add to use rotate instead of byteswap

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Joe Perches <joe@perches.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
	Tom Herbert <tom@herbertland.com>
Cc: Alexander Duyck <aduyck@mirantis.com>,
	Netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: [net-next PATCH] csum: Update csum_block_add to use rotate instead of byteswap
Date: Wed, 09 Mar 2016 16:18:33 -0800	[thread overview]
Message-ID: <1457569113.3433.7.camel@perches.com> (raw)
In-Reply-To: <CAKgT0UebO62GBsmL17JrZW0Ptzmr05buc1x6pHv6A_PAr4HBLQ@mail.gmail.com>

On Wed, 2016-03-09 at 08:08 -0800, Alexander Duyck wrote:
> On Tue, Mar 8, 2016 at 10:31 PM, Tom Herbert <tom@herbertland.com> wrote:
> > I took a look inlining these.
> > 
> > #define rol32(V, X) ({                          \
> >         int word = V;                           \
> >         if (__builtin_constant_p(X))            \
> >                 asm("roll $" #X ",%[word]\n\t"  \
> >                     : [word] "=r" (word));      \
> >         else                                    \
> >                 asm("roll %%cl,%[word]\n\t"     \
> >                     : [word] "=r" (word)        \
> >                     : "c" (X));                 \
> >         word;                                   \
> > })
> > 
> > With this I'm seeing a nice speedup in jhash which uses a lot of rol32s...
> Is gcc really not converting the rol32 calls into rotates?

No, it is.

The difference in the object code with the asm for instance is:

(old, compiled with gcc 5.3.1)

<jhash_2words.constprop.5>:
     84e:       81 ee 09 41 52 21       sub    $0x21524109,%esi
     854:       81 ef 09 41 52 21       sub    $0x21524109,%edi
     85a:       55                      push   %rbp
     85b:       89 f0                   mov    %esi,%eax
     85d:       89 f2                   mov    %esi,%edx
     85f:       48 ff 05 00 00 00 00    incq   0x0(%rip)        # 866 <jhash_2words.constprop.5+0x18>
     866:       c1 c2 0e                rol    $0xe,%edx
     869:       35 f7 be ad de          xor    $0xdeadbef7,%eax
     86e:       48 89 e5                mov    %rsp,%rbp
     871:       29 d0                   sub    %edx,%eax
     873:       48 ff 05 00 00 00 00    incq   0x0(%rip)        # 87a <jhash_2words.constprop.5+0x2c>
     87a:       48 ff 05 00 00 00 00    incq   0x0(%rip)        # 881 <jhash_2words.constprop.5+0x33>
     881:       89 c2                   mov    %eax,%edx
     883:       31 c7                   xor    %eax,%edi
     885:       c1 c2 0b                rol    $0xb,%edx
     888:       29 d7                   sub    %edx,%edi
     88a:       89 fa                   mov    %edi,%edx
     88c:       31 fe                   xor    %edi,%esi
     88e:       c1 ca 07                ror    $0x7,%edx
     891:       29 d6                   sub    %edx,%esi
     893:       89 f2                   mov    %esi,%edx
     895:       31 f0                   xor    %esi,%eax
     897:       c1 c2 10                rol    $0x10,%edx
     89a:       29 d0                   sub    %edx,%eax
     89c:       89 c2                   mov    %eax,%edx
     89e:       31 c7                   xor    %eax,%edi
     8a0:       c1 c2 04                rol    $0x4,%edx
     8a3:       29 d7                   sub    %edx,%edi
     8a5:       31 fe                   xor    %edi,%esi
     8a7:       c1 c7 0e                rol    $0xe,%edi
     8aa:       29 fe                   sub    %edi,%esi
     8ac:       31 f0                   xor    %esi,%eax
     8ae:       c1 ce 08                ror    $0x8,%esi
     8b1:       29 f0                   sub    %esi,%eax
     8b3:       5d                      pop    %rbp
     8b4:       c3                      retq   

vs Tom's asm

000000000000084e <jhash_2words.constprop.5>:
     84e:       81 ee 09 41 52 21       sub    $0x21524109,%esi
     854:       8d 87 f7 be ad de       lea    -0x21524109(%rdi),%eax
     85a:       55                      push   %rbp
     85b:       89 f2                   mov    %esi,%edx
     85d:       48 ff 05 00 00 00 00    incq   0x0(%rip)        # 864 <jhash_2words.constprop.5+0x16>
     864:       48 ff 05 00 00 00 00    incq   0x0(%rip)        # 86b <jhash_2words.constprop.5+0x1d>
     86b:       81 f2 f7 be ad de       xor    $0xdeadbef7,%edx
     871:       48 89 e5                mov    %rsp,%rbp
     874:       c1 c1 0e                rol    $0xe,%ecx
     877:       29 ca                   sub    %ecx,%edx
     879:       31 d0                   xor    %edx,%eax
     87b:       c1 c7 0b                rol    $0xb,%edi
     87e:       29 f8                   sub    %edi,%eax
     880:       48 ff 05 00 00 00 00    incq   0x0(%rip)        # 887 <jhash_2words.constprop.5+0x39>
     887:       31 c6                   xor    %eax,%esi
     889:       c1 c7 19                rol    $0x19,%edi
     88c:       29 fe                   sub    %edi,%esi
     88e:       31 f2                   xor    %esi,%edx
     890:       c1 c7 10                rol    $0x10,%edi
     893:       29 fa                   sub    %edi,%edx
     895:       31 d0                   xor    %edx,%eax
     897:       c1 c7 04                rol    $0x4,%edi
     89a:       29 f8                   sub    %edi,%eax
     89c:       31 f0                   xor    %esi,%eax
     89e:       29 c8                   sub    %ecx,%eax
     8a0:       31 d0                   xor    %edx,%eax
     8a2:       5d                      pop    %rbp
     8a3:       c1 c2 18                rol    $0x18,%edx
     8a6:       29 d0                   sub    %edx,%eax
     8a8:       c3                      retq   

> If we need this type of code in order to get the rotates to occur as
> expected then maybe we need to look at doing arch specific versions of
> the functions in bitops.h in order to improve the performance since I
> know these calls are used in some performance critical paths such as
> crypto and hashing.

Yeah, maybe, but why couldn't gcc generate similar code
as Tom's asm? (modulo the ripple reducing ror vs rol uses
when the shift is > 16

next prev parent reply	other threads:[~2016-03-10  0:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 22:42 [net-next PATCH] csum: Update csum_block_add to use rotate instead of byteswap Alexander Duyck
2016-03-08 23:25 ` Joe Perches
2016-03-09  5:23   ` Alexander Duyck
2016-03-09  5:50     ` Joe Perches
2016-03-09  6:08       ` Alexander Duyck
2016-03-09  6:31         ` Tom Herbert
2016-03-09 16:08           ` Alexander Duyck
2016-03-10  0:18             ` Joe Perches [this message]
2016-03-10  0:58               ` Tom Herbert
2016-03-09 10:54   ` David Laight
2016-03-09 16:03     ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1457569113.3433.7.camel@perches.com \
    --to=joe@perches.com \
    --cc=aduyck@mirantis.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox