* [PATCH] - trivial - Improve appletalk checksum calculation
@ 2007-10-22 19:36 Joe Perches
2007-10-23 0:35 ` David Miller
2007-10-23 3:30 ` Stephen Hemminger
0 siblings, 2 replies; 11+ messages in thread
From: Joe Perches @ 2007-10-22 19:36 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: netdev, David S. Miller
It's a bit after 2.6.1 now...
Removes unnecessary if, uses 16 bit rotate left.
Performance improves ~30%
Signed-off-by: Joe Perches <joe@perches.com>
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 7c0b515..1c50f4c 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -925,15 +925,9 @@ static int atrtr_ioctl(unsigned int cmd, void __user *arg)
static unsigned long atalk_sum_partial(const unsigned char *data,
int len, unsigned long sum)
{
- /* This ought to be unwrapped neatly. I'll trust gcc for now */
while (len--) {
- sum += *data;
- sum <<= 1;
- if (sum & 0x10000) {
- sum++;
- sum &= 0xffff;
- }
- data++;
+ sum += *data++;
+ sum = ((sum & 0x8000)>>15) | ((sum & 0x7fff)<<1);
}
return sum;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-22 19:36 [PATCH] - trivial - Improve appletalk checksum calculation Joe Perches
@ 2007-10-23 0:35 ` David Miller
2007-10-23 1:36 ` Joe Perches
2007-10-23 3:30 ` Stephen Hemminger
1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2007-10-23 0:35 UTC (permalink / raw)
To: joe; +Cc: acme, netdev
From: Joe Perches <joe@perches.com>
Date: Mon, 22 Oct 2007 12:36:19 -0700
> It's a bit after 2.6.1 now...
>
> Removes unnecessary if, uses 16 bit rotate left.
> Performance improves ~30%
>
> Signed-off-by: Joe Perches <joe@perches.com>
I'm not sure your transformation is equivalent. Did you
actually test this with appletalk traffic to make sure
the sum is computed correctly? Or more simply, did you
write a test program to send some test data through the
old and new functions?
Here is what I think the problem is:
> static unsigned long atalk_sum_partial(const unsigned char *data,
> int len, unsigned long sum)
> {
> - /* This ought to be unwrapped neatly. I'll trust gcc for now */
> while (len--) {
> - sum += *data;
> - sum <<= 1;
> - if (sum & 0x10000) {
> - sum++;
> - sum &= 0xffff;
> - }
> - data++;
> + sum += *data++;
> + sum = ((sum & 0x8000)>>15) | ((sum & 0x7fff)<<1);
> }
> return sum;
> }
The old code is handling overflow, so that when bit 16 gets set by and
addition, that overflow bit gets cleared and then added back into the
sum.
Your code is rotating bit 15 down by one bit and bits 0-14 up by one
bit, my remedial math knowledge tell me that's likely not the same
thing. :-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-23 0:35 ` David Miller
@ 2007-10-23 1:36 ` Joe Perches
2007-10-23 1:43 ` David Miller
0 siblings, 1 reply; 11+ messages in thread
From: Joe Perches @ 2007-10-23 1:36 UTC (permalink / raw)
To: David Miller; +Cc: acme, netdev
On Mon, 2007-10-22 at 17:35 -0700, David Miller wrote:
> Your code is rotating bit 15 down by one bit and bits 0-14 up by one
> bit.
Yes, a 16 bit rotate left.
There was a discussion a few years ago:
http://oss.sgi.com/archives/netdev/2003-10/msg00734.html
>From the spec:
Implementers of DDP should treat generating the checksum as an optional
feature. The 16-bit DDP checksum is computed as follows:
CkSum := 0 ;
FOR each datagram byte starting with the byte immediately following this
Checksum field
REPEAT the following algorithm:
CkSum := CkSum + byte; (unsigned addition)
Rotate CkSum left one bit, rotating the
most significant bit in least significant bit;
IF, at the end, CkSum = 0 THEN
CkSum := $FFFF (all ones).
Reception of a datagram with CkSum equal to 0 implies that a checksum is
not performed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-23 1:36 ` Joe Perches
@ 2007-10-23 1:43 ` David Miller
2007-10-23 1:53 ` Joe Perches
0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2007-10-23 1:43 UTC (permalink / raw)
To: joe; +Cc: acme, netdev
From: Joe Perches <joe@perches.com>
Date: Mon, 22 Oct 2007 18:36:28 -0700
> On Mon, 2007-10-22 at 17:35 -0700, David Miller wrote:
> > Your code is rotating bit 15 down by one bit and bits 0-14 up by one
> > bit.
>
> Yes, a 16 bit rotate left.
>
> There was a discussion a few years ago:
> http://oss.sgi.com/archives/netdev/2003-10/msg00734.html
>
> >From the spec:
...
> Reception of a datagram with CkSum equal to 0 implies that a checksum is
> not performed.
Ok, but again did you test it?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-23 1:43 ` David Miller
@ 2007-10-23 1:53 ` Joe Perches
2007-10-23 3:51 ` Herbert Xu
0 siblings, 1 reply; 11+ messages in thread
From: Joe Perches @ 2007-10-23 1:53 UTC (permalink / raw)
To: David Miller; +Cc: acme, netdev
On Mon, 2007-10-22 at 18:43 -0700, David Miller wrote:
> Ok, but again did you test it?
Nope. Stephen Hemminger did in 2003.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-22 19:36 [PATCH] - trivial - Improve appletalk checksum calculation Joe Perches
2007-10-23 0:35 ` David Miller
@ 2007-10-23 3:30 ` Stephen Hemminger
2007-10-23 3:39 ` Joe Perches
1 sibling, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2007-10-23 3:30 UTC (permalink / raw)
To: Joe Perches; +Cc: Arnaldo Carvalho de Melo, netdev, David S. Miller
On Mon, 22 Oct 2007 12:36:19 -0700
Joe Perches <joe@perches.com> wrote:
> It's a bit after 2.6.1 now...
>
> Removes unnecessary if, uses 16 bit rotate left.
> Performance improves ~30%
>
> Signed-off-by: Joe Perches <joe@perches.com>
>
> diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
> index 7c0b515..1c50f4c 100644
> --- a/net/appletalk/ddp.c
> +++ b/net/appletalk/ddp.c
> @@ -925,15 +925,9 @@ static int atrtr_ioctl(unsigned int cmd, void __user *arg)
> static unsigned long atalk_sum_partial(const unsigned char *data,
> int len, unsigned long sum)
> {
> - /* This ought to be unwrapped neatly. I'll trust gcc for now */
> while (len--) {
> - sum += *data;
> - sum <<= 1;
> - if (sum & 0x10000) {
> - sum++;
> - sum &= 0xffff;
> - }
> - data++;
> + sum += *data++;
> + sum = ((sum & 0x8000)>>15) | ((sum & 0x7fff)<<1);
> }
> return sum;
> }
>
The end of the message you quoted was:
> Corrected fast code is:
>
> while (len--) {
> sum += *data++;
> sum <<= 1;
> sum = (((sum & 0x10000) >> 16) + sum) & 0xffff;
> }
>
> At least it is correct on the standalone random data test, and the
> new code is 30% faster for the cached memory case (13.7 clks/byte vs 18
> clks/byte).
Your code looks different...
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-23 3:30 ` Stephen Hemminger
@ 2007-10-23 3:39 ` Joe Perches
0 siblings, 0 replies; 11+ messages in thread
From: Joe Perches @ 2007-10-23 3:39 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Arnaldo Carvalho de Melo, netdev, David S. Miller
On Mon, 2007-10-22 at 20:30 -0700, Stephen Hemminger wrote:
> > Corrected fast code is:
> >
> > while (len--) {
> > sum += *data++;
> > sum <<= 1;
> > sum = (((sum & 0x10000) >> 16) + sum) & 0xffff;
> > }
> >
> > At least it is correct on the standalone random data test, and the
> > new code is 30% faster for the cached memory case (13.7 clks/byte vs 18
> > clks/byte).
> Your code looks different...
Both are 16 bit rotate lefts.
Which looks clearer?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-23 1:53 ` Joe Perches
@ 2007-10-23 3:51 ` Herbert Xu
2007-10-28 20:18 ` Urs Thuermann
0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2007-10-23 3:51 UTC (permalink / raw)
To: Joe Perches; +Cc: davem, acme, netdev
Joe Perches <joe@perches.com> wrote:
> On Mon, 2007-10-22 at 18:43 -0700, David Miller wrote:
>> Ok, but again did you test it?
>
> Nope. Stephen Hemminger did in 2003.
But your code differs significantly from Stephen's version.
However, if it is correct it does look like a good improvement.
So please write a simple test program. It can't that bad since
there are only 65536 values to test :)
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-23 3:51 ` Herbert Xu
@ 2007-10-28 20:18 ` Urs Thuermann
2007-10-28 21:01 ` Joe Perches
0 siblings, 1 reply; 11+ messages in thread
From: Urs Thuermann @ 2007-10-28 20:18 UTC (permalink / raw)
To: Herbert Xu; +Cc: Joe Perches, davem, acme, netdev
Herbert Xu <herbert@gondor.apana.org.au> writes:
> But your code differs significantly from Stephen's version.
> However, if it is correct it does look like a good improvement.
>
> So please write a simple test program. It can't that bad since
> there are only 65536 values to test :)
I think the code is simple enough to see immediately that it should
calculate the same checksum. But I have written a short test program
to show this and tested it on i686 (gcc-4.2.2) and powerpc (gcc-3.4.5)
without optimization and with -O6.
BTW, shouldn't unsigned long be replaced by unsigned int to avoid
64-bit operations one some platforms?
urs
#include <stdio.h>
unsigned long f1(unsigned long sum)
{
sum <<= 1;
if (sum & 0x10000) {
sum++;
sum &= 0xffff;
}
return sum;
}
unsigned long f2(unsigned long sum)
{
sum = ((sum & 0x8000)>>15) | ((sum & 0x7fff)<<1);
return sum;
}
int main()
{
unsigned long s;
for (s = 0; s < 65536; s++) {
if (f1(s) != f2(s) || f1(s) > 65535)
printf("%ld\n", s);
}
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-28 20:18 ` Urs Thuermann
@ 2007-10-28 21:01 ` Joe Perches
2007-10-29 2:40 ` Joe Perches
0 siblings, 1 reply; 11+ messages in thread
From: Joe Perches @ 2007-10-28 21:01 UTC (permalink / raw)
To: Urs Thuermann; +Cc: Herbert Xu, davem, acme, netdev
On Sun, 2007-10-28 at 21:18 +0100, Urs Thuermann wrote:
> I have written a short test program to show this and tested it on i686
> (gcc-4.2.2) and powerpc (gcc-3.4.5) without optimization and with -O6.
Thanks for that.
> BTW, shouldn't unsigned long be replaced by unsigned int to avoid
> 64-bit operations one some platforms?
probably unsigned long should be u16.
Compiled, untested.
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 7c0b515..c3890cc 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -922,25 +922,19 @@ static int atrtr_ioctl(unsigned int cmd, void __user *arg)
* Checksum: This is 'optional'. It's quite likely also a good
* candidate for assembler hackery 8)
*/
-static unsigned long atalk_sum_partial(const unsigned char *data,
- int len, unsigned long sum)
+static u16 atalk_sum_partial(const unsigned char *data,
+ int len, u16 sum)
{
- /* This ought to be unwrapped neatly. I'll trust gcc for now */
while (len--) {
- sum += *data;
- sum <<= 1;
- if (sum & 0x10000) {
- sum++;
- sum &= 0xffff;
- }
- data++;
+ sum += *data++;
+ sum = ((sum & 0x8000) >> 15) | ((sum & 0x7fff) << 1);
}
return sum;
}
/* Checksum skb data -- similar to skb_checksum */
-static unsigned long atalk_sum_skb(const struct sk_buff *skb, int offset,
- int len, unsigned long sum)
+static u16 atalk_sum_skb(const struct sk_buff *skb, int offset,
+ int len, u16 sum)
{
int start = skb_headlen(skb);
int i, copy;
@@ -1010,13 +1004,13 @@ static unsigned long atalk_sum_skb(const struct sk_buff *skb, int offset,
static __be16 atalk_checksum(const struct sk_buff *skb, int len)
{
- unsigned long sum;
+ u16 sum;
/* skip header 4 bytes */
sum = atalk_sum_skb(skb, 4, len-4, 0);
/* Use 0xFFFF for 0. 0 itself means none */
- return sum ? htons((unsigned short)sum) : htons(0xFFFF);
+ return cpu_to_be16(sum ? sum : 0xFFFF);
}
static struct proto ddp_proto = {
*
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] - trivial - Improve appletalk checksum calculation
2007-10-28 21:01 ` Joe Perches
@ 2007-10-29 2:40 ` Joe Perches
0 siblings, 0 replies; 11+ messages in thread
From: Joe Perches @ 2007-10-29 2:40 UTC (permalink / raw)
To: Urs Thuermann; +Cc: Herbert Xu, davem, acme, netdev
On Sun, 2007-10-28 at 14:01 -0700, Joe Perches wrote:
> probably unsigned long should be u16.
Or not.
x86 u16 performs much worse than u32 (~ 50% worse)
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-10-29 2:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-22 19:36 [PATCH] - trivial - Improve appletalk checksum calculation Joe Perches
2007-10-23 0:35 ` David Miller
2007-10-23 1:36 ` Joe Perches
2007-10-23 1:43 ` David Miller
2007-10-23 1:53 ` Joe Perches
2007-10-23 3:51 ` Herbert Xu
2007-10-28 20:18 ` Urs Thuermann
2007-10-28 21:01 ` Joe Perches
2007-10-29 2:40 ` Joe Perches
2007-10-23 3:30 ` Stephen Hemminger
2007-10-23 3:39 ` Joe Perches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).