From: "Albrecht Dreß" <albrecht.dress@arcor.de>
To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] powerpc: tiny memcpy_(to|from)io optimisation
Date: Tue, 02 Jun 2009 20:45:55 +0200 [thread overview]
Message-ID: <1243968361.4951.0@antares> (raw)
In-Reply-To: <OFEEF9A8F1.2B11D1F7-ONC12575C8.00214DF9-C12575C8.00224E4F@transmode.se> (from joakim.tjernlund@transmode.se on Mon Jun 1 08:14:43 2009)
[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]
Am 01.06.09 08:14 schrieb(en) Joakim Tjernlund:
> .. not even 4.2.2 which is fairly modern will get it right. It breaks
> very easy as gcc has never been any good at this type of
> optimization. Sometimes small changes will make gcc unhappy and it
> won't do the right optimization.
It's even worse... Looking at the assembly output of the simple
function
<snip>
void loop2(void * src, void * dst, int n)
{
volatile uint32_t * _dst = (volatile uint32_t *) (dst - 4);
volatile uint32_t * _src = (volatile uint32_t *) (src - 4);
n >>= 2;
do {
*(++_dst) = *(++_src);
} while (--n);
}
</snip>
gcc 4.0.1 coming with Apple's Developer Tools (on Tiger) with options
"-O3 -mcpu=603e -mtune=603e" produces
<snip>
_loop2:
srawi r5,r5,2
mtctr r5
addi r4,r4,-4
addi r3,r3,-4
L11:
lwzu r0,4(r3)
stwu r0,4(r4)
bdnz L11
blr
</snip>
which looks perfect to me. However, gcc 4.3.3 on Ubuntu/PPC produces
with the same options
<snip>
loop2:
srawi 5,5,2
stwu 1,-16(1)
mtctr 5
li 9,0
.L8:
lwzx 0,3,9
stwx 0,4,9
addi 9,9,4
bdnz .L8
addi 1,1,16
blr
</snip>
wasting a register and a statement in the loop core, and fiddles around
with the stack pointer for no good reason. Gcc 4.4.0 produces
<snip>
loop2:
srawi 5,5,2
mtctr 5
li 9,0
.L9:
lwzx 0,3,9
stwx 0,4,9
addi 9,9,4
bdnz .L9
blr
</snip>
which drops the r1 accesses, but still produces the sub-optimal loop.
Is this a gcc regression, or did I miss something here? Probably the
only bullet-proof way is to write some core loops in assembly... :-/
Thanks, Albrecht.
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2009-06-02 18:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 20:00 [PATCH] powerpc: tiny memcpy_(to|from)io optimisation Albrecht Dreß
2009-05-28 16:13 ` Joakim Tjernlund
2009-05-28 19:50 ` Albrecht Dreß
2009-05-29 6:31 ` Joakim Tjernlund
2009-05-31 10:11 ` Albrecht Dreß
2009-06-01 6:14 ` Joakim Tjernlund
2009-06-02 18:45 ` Albrecht Dreß [this message]
2009-06-02 22:51 ` Benjamin Herrenschmidt
2009-06-03 14:36 ` Kenneth Johansson
2009-06-03 18:35 ` Albrecht Dreß
2009-06-11 17:07 ` Wolfram Sang
2009-06-11 17:30 ` Grant Likely
2009-06-19 18:42 ` Lorenz Kolb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1243968361.4951.0@antares \
--to=albrecht.dress@arcor.de \
--cc=joakim.tjernlund@transmode.se \
--cc=linuxppc-dev@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).