From: "Albrecht Dreß" <albrecht.dress@arcor.de>
To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] powerpc: tiny memcpy_(to|from)io optimisation
Date: Tue, 02 Jun 2009 20:45:55 +0200 [thread overview]
Message-ID: <1243968361.4951.0@antares> (raw)
In-Reply-To: <OFEEF9A8F1.2B11D1F7-ONC12575C8.00214DF9-C12575C8.00224E4F@transmode.se> (from joakim.tjernlund@transmode.se on Mon Jun 1 08:14:43 2009)
[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]
Am 01.06.09 08:14 schrieb(en) Joakim Tjernlund:
> .. not even 4.2.2 which is fairly modern will get it right. It breaks
> very easy as gcc has never been any good at this type of
> optimization. Sometimes small changes will make gcc unhappy and it
> won't do the right optimization.
It's even worse... Looking at the assembly output of the simple
function
<snip>
void loop2(void * src, void * dst, int n)
{
volatile uint32_t * _dst = (volatile uint32_t *) (dst - 4);
volatile uint32_t * _src = (volatile uint32_t *) (src - 4);
n >>= 2;
do {
*(++_dst) = *(++_src);
} while (--n);
}
</snip>
gcc 4.0.1 coming with Apple's Developer Tools (on Tiger) with options
"-O3 -mcpu=603e -mtune=603e" produces
<snip>
_loop2:
srawi r5,r5,2
mtctr r5
addi r4,r4,-4
addi r3,r3,-4
L11:
lwzu r0,4(r3)
stwu r0,4(r4)
bdnz L11
blr
</snip>
which looks perfect to me. However, gcc 4.3.3 on Ubuntu/PPC produces
with the same options
<snip>
loop2:
srawi 5,5,2
stwu 1,-16(1)
mtctr 5
li 9,0
.L8:
lwzx 0,3,9
stwx 0,4,9
addi 9,9,4
bdnz .L8
addi 1,1,16
blr
</snip>
wasting a register and a statement in the loop core, and fiddles around
with the stack pointer for no good reason. Gcc 4.4.0 produces
<snip>
loop2:
srawi 5,5,2
mtctr 5
li 9,0
.L9:
lwzx 0,3,9
stwx 0,4,9
addi 9,9,4
bdnz .L9
blr
</snip>
which drops the r1 accesses, but still produces the sub-optimal loop.
Is this a gcc regression, or did I miss something here? Probably the
only bullet-proof way is to write some core loops in assembly... :-/
Thanks, Albrecht.
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2009-06-02 18:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 20:00 [PATCH] powerpc: tiny memcpy_(to|from)io optimisation Albrecht Dreß
2009-05-28 16:13 ` Joakim Tjernlund
2009-05-28 19:50 ` Albrecht Dreß
2009-05-29 6:31 ` Joakim Tjernlund
2009-05-31 10:11 ` Albrecht Dreß
2009-06-01 6:14 ` Joakim Tjernlund
2009-06-02 18:45 ` Albrecht Dreß [this message]
2009-06-02 22:51 ` Benjamin Herrenschmidt
2009-06-03 14:36 ` Kenneth Johansson
2009-06-03 18:35 ` Albrecht Dreß
2009-06-11 17:07 ` Wolfram Sang
2009-06-11 17:30 ` Grant Likely
2009-06-19 18:42 ` Lorenz Kolb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1243968361.4951.0@antares \
--to=albrecht.dress@arcor.de \
--cc=joakim.tjernlund@transmode.se \
--cc=linuxppc-dev@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.