All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Jander <david.jander@protonic.nl>
To: joakim.tjernlund@transmode.se
Cc: munroesj@us.ibm.com, linuxppc-dev@ozlabs.org
Subject: Re: Efficient memcpy()/memmove() for G2/G3 cores...
Date: Mon, 1 Sep 2008 09:23:28 +0200	[thread overview]
Message-ID: <200809010923.28616.david.jander@protonic.nl> (raw)
In-Reply-To: <1220012433.5234.162.camel@gentoo-jocke.transmode.se>

On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote:
>[...]
> > The problem is: I have very little experience with powerpc assembly and
> > only very limited time to dedicate to this and I am looking for others
> > who have
>
> I improved the PowerPC memcpy and friends in uClibc a while ago. It does
> basically the same a the kernel memcpy but without any cache
> instructions. It is written in C, but in such a way that
> optimal assembly is generated.

Hmm, isn't that going to break on a different version of gcc?
I just copied the latest version of trunk/uClibc/libc/string/powerpc/memcpy.c 
from subversion as uclibc-memcpy.c, removed the last line and did this:

$ gcc -shared -O2 -Wall -o libucmemcpy.so uclibc-memcpy.c

(should I use other compiler options?)

Then I started my test program with LD_PRELOAD=...

My test program only copies big chunks of aligned memory, so it will only test 
for maximum throughput (such as copying video frames). I will make a better 
one, to measure throughput on different sized blocks of aligned and unaligned 
memory, but first I want to find out why I can't seem to get even close to 
the expected RAM bandwidth (bursts occur at 1.6 Gbyte/s, sustained transfers 
might be able to reach 400 Mbyte/s in theory, taking into account the video 
controller eating almost half of it, I'd like to get somewhere close to 200).

The result is quite a bit better than that of glibc-2.7 (13.2 Mbyte/s --> 22 
Mbyte/s), but still far from the 71.5 Mbyte/s achieved when using bigger 
strides of 16 registers load/store at a time.
Note, that this is copy performance, one-way througput should be double these 
figures.

I'll try to learn how cache manipulating instructions work, to see if I can 
gain some more bandwith using them.

Regards,

-- 
David Jander

  reply	other threads:[~2008-09-01  7:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-25  9:31 Efficient memcpy()/memmove() for G2/G3 cores David Jander
2008-08-25 11:00 ` Matt Sealey
2008-08-25 13:06   ` David Jander
2008-08-25 22:28     ` Benjamin Herrenschmidt
2008-08-27 21:04       ` Steven Munroe
2008-08-29 11:48         ` David Jander
2008-08-29 12:21           ` Joakim Tjernlund
2008-09-01  7:23             ` David Jander [this message]
2008-09-01  9:36               ` Joakim Tjernlund
2008-09-02 13:12                 ` David Jander
2008-09-03  6:43                   ` Joakim Tjernlund
2008-09-03 20:33                   ` prodyut hazarika
2008-09-04  2:04                     ` Paul Mackerras
2008-09-04 12:05                       ` David Jander
2008-09-04 12:19                         ` Josh Boyer
2008-09-04 12:59                           ` David Jander
2008-09-04 14:31                             ` Steven Munroe
2008-09-04 14:45                               ` Gunnar Von Boehn
2008-09-04 15:14                               ` Gunnar Von Boehn
2008-09-04 16:25                               ` David Jander
2008-09-04 15:01                             ` Gunnar Von Boehn
2008-09-04 16:32                               ` David Jander
2008-09-04 18:14                       ` prodyut hazarika
2008-08-29 20:34           ` Steven Munroe
2008-09-01  8:29             ` David Jander
2008-08-31  8:28           ` Benjamin Herrenschmidt
2008-09-01  6:42             ` David Jander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200809010923.28616.david.jander@protonic.nl \
    --to=david.jander@protonic.nl \
    --cc=joakim.tjernlund@transmode.se \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=munroesj@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.