All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Albrecht Dreß" <albrecht.dress@arcor.de>
To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] powerpc: tiny memcpy_(to|from)io optimisation
Date: Tue, 02 Jun 2009 20:45:55 +0200	[thread overview]
Message-ID: <1243968361.4951.0@antares> (raw)
In-Reply-To: <OFEEF9A8F1.2B11D1F7-ONC12575C8.00214DF9-C12575C8.00224E4F@transmode.se> (from joakim.tjernlund@transmode.se on Mon Jun  1 08:14:43 2009)

[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]

Am 01.06.09 08:14 schrieb(en) Joakim Tjernlund:
> .. not even 4.2.2 which is fairly modern will get it right. It breaks  
> very easy as gcc has never been any good at this type of  
> optimization. Sometimes small changes will make gcc unhappy and it  
> won't do the right optimization.

It's even worse...  Looking at the assembly output of the simple  
function

<snip>
void loop2(void * src, void * dst, int n)
{
   volatile uint32_t * _dst = (volatile uint32_t *) (dst - 4);
   volatile uint32_t * _src = (volatile uint32_t *) (src - 4);
   n >>= 2;
   do {
     *(++_dst) = *(++_src);
   } while (--n);
}
</snip>

gcc 4.0.1 coming with Apple's Developer Tools (on Tiger) with options  
"-O3 -mcpu=603e -mtune=603e" produces

<snip>
_loop2:
         srawi r5,r5,2
         mtctr r5
         addi r4,r4,-4
         addi r3,r3,-4
L11:
         lwzu r0,4(r3)
         stwu r0,4(r4)
         bdnz L11
         blr
</snip>

which looks perfect to me.  However, gcc 4.3.3 on Ubuntu/PPC produces  
with the same options

<snip>
loop2:
         srawi 5,5,2
         stwu 1,-16(1)
         mtctr 5
         li 9,0
.L8:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L8
         addi 1,1,16
         blr
</snip>

wasting a register and a statement in the loop core, and fiddles around  
with the stack pointer for no good reason.  Gcc 4.4.0 produces

<snip>
loop2:
         srawi 5,5,2
         mtctr 5
         li 9,0
.L9:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L9
         blr
</snip>

which drops the r1 accesses, but still produces the sub-optimal loop.   
Is this a gcc regression, or did I miss something here?  Probably the  
only bullet-proof way is to write some core loops in assembly... :-/

Thanks, Albrecht.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2009-06-02 18:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-27 20:00 [PATCH] powerpc: tiny memcpy_(to|from)io optimisation Albrecht Dreß
2009-05-28 16:13 ` Joakim Tjernlund
2009-05-28 19:50   ` Albrecht Dreß
2009-05-29  6:31     ` Joakim Tjernlund
2009-05-31 10:11       ` Albrecht Dreß
2009-06-01  6:14         ` Joakim Tjernlund
2009-06-02 18:45           ` Albrecht Dreß [this message]
2009-06-02 22:51             ` Benjamin Herrenschmidt
2009-06-03 14:36               ` Kenneth Johansson
2009-06-03 18:35                 ` Albrecht Dreß
2009-06-11 17:07 ` Wolfram Sang
2009-06-11 17:30 ` Grant Likely
2009-06-19 18:42 ` Lorenz Kolb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1243968361.4951.0@antares \
    --to=albrecht.dress@arcor.de \
    --cc=joakim.tjernlund@transmode.se \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.