linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Albrecht Dreß" <albrecht.dress@arcor.de>
To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] powerpc: tiny memcpy_(to|from)io optimisation
Date: Tue, 02 Jun 2009 20:45:55 +0200	[thread overview]
Message-ID: <1243968361.4951.0@antares> (raw)
In-Reply-To: <OFEEF9A8F1.2B11D1F7-ONC12575C8.00214DF9-C12575C8.00224E4F@transmode.se> (from joakim.tjernlund@transmode.se on Mon Jun  1 08:14:43 2009)

[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]

Am 01.06.09 08:14 schrieb(en) Joakim Tjernlund:
> .. not even 4.2.2 which is fairly modern will get it right. It breaks  
> very easy as gcc has never been any good at this type of  
> optimization. Sometimes small changes will make gcc unhappy and it  
> won't do the right optimization.

It's even worse...  Looking at the assembly output of the simple  
function

<snip>
void loop2(void * src, void * dst, int n)
{
   volatile uint32_t * _dst = (volatile uint32_t *) (dst - 4);
   volatile uint32_t * _src = (volatile uint32_t *) (src - 4);
   n >>= 2;
   do {
     *(++_dst) = *(++_src);
   } while (--n);
}
</snip>

gcc 4.0.1 coming with Apple's Developer Tools (on Tiger) with options  
"-O3 -mcpu=603e -mtune=603e" produces

<snip>
_loop2:
         srawi r5,r5,2
         mtctr r5
         addi r4,r4,-4
         addi r3,r3,-4
L11:
         lwzu r0,4(r3)
         stwu r0,4(r4)
         bdnz L11
         blr
</snip>

which looks perfect to me.  However, gcc 4.3.3 on Ubuntu/PPC produces  
with the same options

<snip>
loop2:
         srawi 5,5,2
         stwu 1,-16(1)
         mtctr 5
         li 9,0
.L8:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L8
         addi 1,1,16
         blr
</snip>

wasting a register and a statement in the loop core, and fiddles around  
with the stack pointer for no good reason.  Gcc 4.4.0 produces

<snip>
loop2:
         srawi 5,5,2
         mtctr 5
         li 9,0
.L9:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L9
         blr
</snip>

which drops the r1 accesses, but still produces the sub-optimal loop.   
Is this a gcc regression, or did I miss something here?  Probably the  
only bullet-proof way is to write some core loops in assembly... :-/

Thanks, Albrecht.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2009-06-02 18:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-27 20:00 [PATCH] powerpc: tiny memcpy_(to|from)io optimisation Albrecht Dreß
2009-05-28 16:13 ` Joakim Tjernlund
2009-05-28 19:50   ` Albrecht Dreß
2009-05-29  6:31     ` Joakim Tjernlund
2009-05-31 10:11       ` Albrecht Dreß
2009-06-01  6:14         ` Joakim Tjernlund
2009-06-02 18:45           ` Albrecht Dreß [this message]
2009-06-02 22:51             ` Benjamin Herrenschmidt
2009-06-03 14:36               ` Kenneth Johansson
2009-06-03 18:35                 ` Albrecht Dreß
2009-06-11 17:07 ` Wolfram Sang
2009-06-11 17:30 ` Grant Likely
2009-06-19 18:42 ` Lorenz Kolb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1243968361.4951.0@antares \
    --to=albrecht.dress@arcor.de \
    --cc=joakim.tjernlund@transmode.se \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).