Re: csum_partial() and csum_partial_copy_generic() in badly optimized?

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Gabriel Paubert <paubert@iram.es>
To: Joakim Tjernlund <Joakim.Tjernlund@lumentis.se>
Cc: Tim Seufert <tas@mindspring.com>, linuxppc-dev@lists.linuxppc.org
Subject: Re: csum_partial() and csum_partial_copy_generic() in badly optimized?
Date: Mon, 18 Nov 2002 05:12:44 +0100	[thread overview]
Message-ID: <3DD868BC.7090402@iram.es> (raw)
In-Reply-To: 001701c28e91$8ab64b80$0200a8c0@telia.com

Joakim Tjernlund wrote:
>>On Sunday, November 17, 2002, at 07:17  AM, Joakim Tjernlund wrote:
>>
>>
>>>>CTR and the instructions which operate on it
>>>>(such as bdnz) were put into the PPC architecture mainly as an
>>>>optimization opportunity for loops where the loop variable is not used
>>>>inside the loop body.
>>>
>>>loop variable not USED or loop variable not MODIFIED?
>>
>>Not used.  CTR cannot be specified as the source or destination of most
>>instructions.  In order to access its contents you have to use special
>>instructions that move between it and a normal general purpose register.
>
>
> OK, so how about if I modify the crc32 loop:
>
> unsigned char * end = data +len;
> while(data < end) {
>         result = (result << 8 | *data++) ^ crctab[result >> 24];
> }
>
> will that be possible to optimze in with something similar as bdnz also?

I don't know if even bleeding edge gcc can do it, basically you can always
use bdnz as soon as you can compute the iteration count before entering
the loop. The problem is that equivalent source code constructs do not
always result in exactly equivalent internal representation in GCC. The
transforms which are attempted depend on the exact version of GCC and
the optimization level.

In the example code you give, the variable 'end' is absolutely useless and
forces the compiler to do more simplifications (essentially eliminating
end and using end-data as a loop index if it wants to use bdnz). Making
life more complex for the compiler is never a good idea...

I'd rather write it as:

int i;
for(i=0; i< len; i++) {
	result=...data++...;
}

when i is not modified in the loop. I'm almost sure that recent gcc will
end up using a bdnz instruction in this simple case.

This said, it is probably very hard to optimize this loop since
the load from crctab and the dependencies between iterations
introduce quite a few delays.

0:
lbzu
scratch,*data,1
	rlwinm	tmp,result,10,0x3fc
	slwi	result,result,8
	lwzx	tmp,crctab,tmp
	or	result,result,scratch
	xor	result,result,tmp
	bdnz	0b

is probably the best you can get. The worst path which limits iteration
rate is rlwinm+lwz+xor, which will be 4 or 5 clock cycles typically.

I'm not sure that gcc will use an lbzu for reading the byte array. It may
help to explicitly decrement data before the loop and then use *++data
which better matches the operation of lbzu (I know that post-increment
being worse than pre-increment was true for some versions of gcc, but I
don't know exactly which).

Finally a truly clever compiler which knows its PPC assembly should be
able  to notice that one instruction can be saved since (result<<8|*data)
can be replaced by a bit field insert:

0:
lbzu
scratch,*data,1
	rlwinm	tmp,result,10,0x3fc
	lwzx	tmp,crctab,tmp
	rlwimi	scratch,result,8,0xffffff00
	xor	result,scratch,tmp
	bdnz	0b

but this would not help the critical path of the loop.

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

next prev parent reply	other threads:[~2002-11-18  4:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-15 23:01 csum_partial() and csum_partial_copy_generic() in badly optimized? Joakim Tjernlund
2002-11-16  2:39 ` Tim Seufert
2002-11-16 10:16   ` Joakim Tjernlund
2002-11-17  5:58     ` Tim Seufert
2002-11-17 15:17       ` Joakim Tjernlund
2002-11-17 22:00         ` Tim Seufert
2002-11-17 23:32           ` Joakim Tjernlund
2002-11-18  1:27             ` Tim Seufert
2002-11-18  4:12             ` Gabriel Paubert [this message]
2002-11-18 13:49               ` Joakim Tjernlund
2002-11-18 18:05                 ` Gabriel Paubert
2002-11-18 18:43                   ` Joakim Tjernlund
2002-11-19  1:24                     ` Gabriel Paubert
2002-11-19  3:31                   ` Paul Mackerras
2002-11-19  5:35                     ` Gabriel Paubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DD868BC.7090402@iram.es \
    --to=paubert@iram.es \
    --cc=Joakim.Tjernlund@lumentis.se \
    --cc=linuxppc-dev@lists.linuxppc.org \
    --cc=tas@mindspring.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).