From: David Jander <david.jander@protonic.nl>
To: joakim.tjernlund@transmode.se
Cc: munroesj@us.ibm.com, linuxppc-dev@ozlabs.org
Subject: Re: Efficient memcpy()/memmove() for G2/G3 cores...
Date: Mon, 1 Sep 2008 09:23:28 +0200 [thread overview]
Message-ID: <200809010923.28616.david.jander@protonic.nl> (raw)
In-Reply-To: <1220012433.5234.162.camel@gentoo-jocke.transmode.se>
On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote:
>[...]
> > The problem is: I have very little experience with powerpc assembly and
> > only very limited time to dedicate to this and I am looking for others
> > who have
>
> I improved the PowerPC memcpy and friends in uClibc a while ago. It does
> basically the same a the kernel memcpy but without any cache
> instructions. It is written in C, but in such a way that
> optimal assembly is generated.
Hmm, isn't that going to break on a different version of gcc?
I just copied the latest version of trunk/uClibc/libc/string/powerpc/memcpy.c
from subversion as uclibc-memcpy.c, removed the last line and did this:
$ gcc -shared -O2 -Wall -o libucmemcpy.so uclibc-memcpy.c
(should I use other compiler options?)
Then I started my test program with LD_PRELOAD=...
My test program only copies big chunks of aligned memory, so it will only test
for maximum throughput (such as copying video frames). I will make a better
one, to measure throughput on different sized blocks of aligned and unaligned
memory, but first I want to find out why I can't seem to get even close to
the expected RAM bandwidth (bursts occur at 1.6 Gbyte/s, sustained transfers
might be able to reach 400 Mbyte/s in theory, taking into account the video
controller eating almost half of it, I'd like to get somewhere close to 200).
The result is quite a bit better than that of glibc-2.7 (13.2 Mbyte/s --> 22
Mbyte/s), but still far from the 71.5 Mbyte/s achieved when using bigger
strides of 16 registers load/store at a time.
Note, that this is copy performance, one-way througput should be double these
figures.
I'll try to learn how cache manipulating instructions work, to see if I can
gain some more bandwith using them.
Regards,
--
David Jander
next prev parent reply other threads:[~2008-09-01 7:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-25 9:31 Efficient memcpy()/memmove() for G2/G3 cores David Jander
2008-08-25 11:00 ` Matt Sealey
2008-08-25 13:06 ` David Jander
2008-08-25 22:28 ` Benjamin Herrenschmidt
2008-08-27 21:04 ` Steven Munroe
2008-08-29 11:48 ` David Jander
2008-08-29 12:21 ` Joakim Tjernlund
2008-09-01 7:23 ` David Jander [this message]
2008-09-01 9:36 ` Joakim Tjernlund
2008-09-02 13:12 ` David Jander
2008-09-03 6:43 ` Joakim Tjernlund
2008-09-03 20:33 ` prodyut hazarika
2008-09-04 2:04 ` Paul Mackerras
2008-09-04 12:05 ` David Jander
2008-09-04 12:19 ` Josh Boyer
2008-09-04 12:59 ` David Jander
2008-09-04 14:31 ` Steven Munroe
2008-09-04 14:45 ` Gunnar Von Boehn
2008-09-04 15:14 ` Gunnar Von Boehn
2008-09-04 16:25 ` David Jander
2008-09-04 15:01 ` Gunnar Von Boehn
2008-09-04 16:32 ` David Jander
2008-09-04 18:14 ` prodyut hazarika
2008-08-29 20:34 ` Steven Munroe
2008-09-01 8:29 ` David Jander
2008-08-31 8:28 ` Benjamin Herrenschmidt
2008-09-01 6:42 ` David Jander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200809010923.28616.david.jander@protonic.nl \
--to=david.jander@protonic.nl \
--cc=joakim.tjernlund@transmode.se \
--cc=linuxppc-dev@ozlabs.org \
--cc=munroesj@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.