From: Scott Wood <scottwood@freescale.com>
To: christophe leroy <christophe.leroy@c-s.fr>
Cc: Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [v2,2/2] powerpc32: add support for csum_add()
Date: Fri, 1 May 2015 20:00:14 -0500 [thread overview]
Message-ID: <1430528414.16357.201.camel@freescale.com> (raw)
In-Reply-To: <553FD904.8000309@c-s.fr>
On Tue, 2015-04-28 at 21:01 +0200, christophe leroy wrote:
>
>
> Le 25/03/2015 02:30, Scott Wood a écrit :
>
> > On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:
> > > The C version of csum_add() as defined in include/net/checksum.h gives the
> > > following assembly:
> > > 0: 7c 04 1a 14 add r0,r4,r3
> > > 4: 7c 64 00 10 subfc r3,r4,r0
> > > 8: 7c 63 19 10 subfe r3,r3,r3
> > > c: 7c 63 00 50 subf r3,r3,r0
> > >
> > > include/net/checksum.h also offers the possibility to define an arch specific
> > > function.
> > > This patch provides a ppc32 specific csum_add() inline function.
> > What makes it 32-bit specific?
> >
> >
> As far as I understand, the 64-bit will do a 64 bit addition, so we
> will have to handle differently the carry, can't just be an addze like
> in 32-bit.
OK. Before I couldn't find where this was ifdeffed to 32-bit, but it's
in patch 1/2.
> The generated code is most likely different on ppc64. I have no ppc64
> compiler so I can't check what gcc generates for the following code:
>
> __wsum csum_add(__wsum csum, __wsum addend)
> {
> u32 res = (__force u32)csum;
> res += (__force u32)addend;
> return (__force __wsum)(res + (res < (__force u32)addend));
> }
>
> Can someone with a ppc64 compiler tell what we get ?
With CONFIG_GENERIC_CPU:
0xc000000000001af8 <+0>: add r3,r3,r4
0xc000000000001afc <+4>: cmplw cr7,r3,r4
0xc000000000001b00 <+8>: mfcr r4
0xc000000000001b04 <+12>: rlwinm r4,r4,29,31,31
0xc000000000001b08 <+16>: add r3,r4,r3
0xc000000000001b0c <+20>: clrldi r3,r3,32
0xc000000000001b10 <+24>: blr
The mfcr is particularly nasty, at least on our chips.
With CONFIG_CPU_E6500:
0xc000000000001b30 <+0>: add r3,r3,r4
0xc000000000001b34 <+4>: cmplw cr7,r3,r4
0xc000000000001b38 <+8>: mfocrf r4,1
0xc000000000001b3c <+12>: rlwinm r4,r4,29,31,31
0xc000000000001b40 <+16>: add r3,r4,r3
0xc000000000001b44 <+20>: clrldi r3,r3,32
0xc000000000001b48 <+24>: blr
Ideal (short of a 64-bit __wsum) would probably be something like (untested):
add r3,r3,r4
srdi r5,r3,32
add r3,r3,r5
clrldi r3,r3,32
Or in C code (which would let the compiler schedule it better):
static inline __wsum csum_add(__wsum csum, __wsum addend)
{
u64 res = (__force u64)csum;
res += (__force u32)addend;
return (__force __wsum)((u32)res + (res >> 32));
}
-Scott
WARNING: multiple messages have this Message-ID (diff)
From: Scott Wood <scottwood@freescale.com>
To: christophe leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
<linuxppc-dev@lists.ozlabs.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [v2,2/2] powerpc32: add support for csum_add()
Date: Fri, 1 May 2015 20:00:14 -0500 [thread overview]
Message-ID: <1430528414.16357.201.camel@freescale.com> (raw)
In-Reply-To: <553FD904.8000309@c-s.fr>
On Tue, 2015-04-28 at 21:01 +0200, christophe leroy wrote:
>
>
> Le 25/03/2015 02:30, Scott Wood a écrit :
>
> > On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:
> > > The C version of csum_add() as defined in include/net/checksum.h gives the
> > > following assembly:
> > > 0: 7c 04 1a 14 add r0,r4,r3
> > > 4: 7c 64 00 10 subfc r3,r4,r0
> > > 8: 7c 63 19 10 subfe r3,r3,r3
> > > c: 7c 63 00 50 subf r3,r3,r0
> > >
> > > include/net/checksum.h also offers the possibility to define an arch specific
> > > function.
> > > This patch provides a ppc32 specific csum_add() inline function.
> > What makes it 32-bit specific?
> >
> >
> As far as I understand, the 64-bit will do a 64 bit addition, so we
> will have to handle differently the carry, can't just be an addze like
> in 32-bit.
OK. Before I couldn't find where this was ifdeffed to 32-bit, but it's
in patch 1/2.
> The generated code is most likely different on ppc64. I have no ppc64
> compiler so I can't check what gcc generates for the following code:
>
> __wsum csum_add(__wsum csum, __wsum addend)
> {
> u32 res = (__force u32)csum;
> res += (__force u32)addend;
> return (__force __wsum)(res + (res < (__force u32)addend));
> }
>
> Can someone with a ppc64 compiler tell what we get ?
With CONFIG_GENERIC_CPU:
0xc000000000001af8 <+0>: add r3,r3,r4
0xc000000000001afc <+4>: cmplw cr7,r3,r4
0xc000000000001b00 <+8>: mfcr r4
0xc000000000001b04 <+12>: rlwinm r4,r4,29,31,31
0xc000000000001b08 <+16>: add r3,r4,r3
0xc000000000001b0c <+20>: clrldi r3,r3,32
0xc000000000001b10 <+24>: blr
The mfcr is particularly nasty, at least on our chips.
With CONFIG_CPU_E6500:
0xc000000000001b30 <+0>: add r3,r3,r4
0xc000000000001b34 <+4>: cmplw cr7,r3,r4
0xc000000000001b38 <+8>: mfocrf r4,1
0xc000000000001b3c <+12>: rlwinm r4,r4,29,31,31
0xc000000000001b40 <+16>: add r3,r4,r3
0xc000000000001b44 <+20>: clrldi r3,r3,32
0xc000000000001b48 <+24>: blr
Ideal (short of a 64-bit __wsum) would probably be something like (untested):
add r3,r3,r4
srdi r5,r3,32
add r3,r3,r5
clrldi r3,r3,32
Or in C code (which would let the compiler schedule it better):
static inline __wsum csum_add(__wsum csum, __wsum addend)
{
u64 res = (__force u64)csum;
res += (__force u32)addend;
return (__force __wsum)((u32)res + (res >> 32));
}
-Scott
next prev parent reply other threads:[~2015-05-02 1:00 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-03 11:39 [PATCH v2 2/2] powerpc32: add support for csum_add() Christophe Leroy
2015-02-03 11:39 ` Christophe Leroy
2015-03-25 1:30 ` [v2,2/2] " Scott Wood
2015-03-25 1:30 ` Scott Wood
2015-04-28 19:01 ` christophe leroy
2015-05-02 1:00 ` Scott Wood [this message]
2015-05-02 1:00 ` Scott Wood
2015-05-04 22:10 ` Segher Boessenkool
2015-05-04 22:10 ` Segher Boessenkool
2015-05-19 11:37 ` leroy christophe
2015-05-19 11:37 ` leroy christophe
2015-03-31 3:14 ` Scott Wood
2015-03-31 3:14 ` Scott Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1430528414.16357.201.camel@freescale.com \
--to=scottwood@freescale.com \
--cc=christophe.leroy@c-s.fr \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.