From: David Jander <david.jander@protonic.nl>
To: joakim.tjernlund@transmode.se
Cc: munroesj@us.ibm.com, linuxppc-dev@ozlabs.org
Subject: Re: Efficient memcpy()/memmove() for G2/G3 cores...
Date: Mon, 1 Sep 2008 09:23:28 +0200 [thread overview]
Message-ID: <200809010923.28616.david.jander@protonic.nl> (raw)
In-Reply-To: <1220012433.5234.162.camel@gentoo-jocke.transmode.se>
On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote:
>[...]
> > The problem is: I have very little experience with powerpc assembly and
> > only very limited time to dedicate to this and I am looking for others
> > who have
>
> I improved the PowerPC memcpy and friends in uClibc a while ago. It does
> basically the same a the kernel memcpy but without any cache
> instructions. It is written in C, but in such a way that
> optimal assembly is generated.
Hmm, isn't that going to break on a different version of gcc?
I just copied the latest version of trunk/uClibc/libc/string/powerpc/memcpy.c
from subversion as uclibc-memcpy.c, removed the last line and did this:
$ gcc -shared -O2 -Wall -o libucmemcpy.so uclibc-memcpy.c
(should I use other compiler options?)
Then I started my test program with LD_PRELOAD=...
My test program only copies big chunks of aligned memory, so it will only test
for maximum throughput (such as copying video frames). I will make a better
one, to measure throughput on different sized blocks of aligned and unaligned
memory, but first I want to find out why I can't seem to get even close to
the expected RAM bandwidth (bursts occur at 1.6 Gbyte/s, sustained transfers
might be able to reach 400 Mbyte/s in theory, taking into account the video
controller eating almost half of it, I'd like to get somewhere close to 200).
The result is quite a bit better than that of glibc-2.7 (13.2 Mbyte/s --> 22
Mbyte/s), but still far from the 71.5 Mbyte/s achieved when using bigger
strides of 16 registers load/store at a time.
Note, that this is copy performance, one-way througput should be double these
figures.
I'll try to learn how cache manipulating instructions work, to see if I can
gain some more bandwith using them.
Regards,
--
David Jander
next prev parent reply other threads:[~2008-09-01 7:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-25 9:31 Efficient memcpy()/memmove() for G2/G3 cores David Jander
2008-08-25 11:00 ` Matt Sealey
2008-08-25 13:06 ` David Jander
2008-08-25 22:28 ` Benjamin Herrenschmidt
2008-08-27 21:04 ` Steven Munroe
2008-08-29 11:48 ` David Jander
2008-08-29 12:21 ` Joakim Tjernlund
2008-09-01 7:23 ` David Jander [this message]
2008-09-01 9:36 ` Joakim Tjernlund
2008-09-02 13:12 ` David Jander
2008-09-03 6:43 ` Joakim Tjernlund
2008-09-03 20:33 ` prodyut hazarika
2008-09-04 2:04 ` Paul Mackerras
2008-09-04 12:05 ` David Jander
2008-09-04 12:19 ` Josh Boyer
2008-09-04 12:59 ` David Jander
2008-09-04 14:31 ` Steven Munroe
2008-09-04 14:45 ` Gunnar Von Boehn
2008-09-04 15:14 ` Gunnar Von Boehn
2008-09-04 16:25 ` David Jander
2008-09-04 15:01 ` Gunnar Von Boehn
2008-09-04 16:32 ` David Jander
2008-09-04 18:14 ` prodyut hazarika
2008-08-29 20:34 ` Steven Munroe
2008-09-01 8:29 ` David Jander
2008-08-31 8:28 ` Benjamin Herrenschmidt
2008-09-01 6:42 ` David Jander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200809010923.28616.david.jander@protonic.nl \
--to=david.jander@protonic.nl \
--cc=joakim.tjernlund@transmode.se \
--cc=linuxppc-dev@ozlabs.org \
--cc=munroesj@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).