Re: Efficient memcpy()/memmove() for G2/G3 cores...

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: David Jander <david.jander@protonic.nl>
To: linuxppc-dev@ozlabs.org
Cc: munroesj@us.ibm.com, John Rigby <jrigby@freescale.com>
Subject: Re: Efficient memcpy()/memmove() for G2/G3 cores...
Date: Tue, 2 Sep 2008 15:12:09 +0200	[thread overview]
Message-ID: <200809021512.10132.david.jander@protonic.nl> (raw)
In-Reply-To: <1220261775.5234.217.camel@gentoo-jocke.transmode.se>

On Monday 01 September 2008 11:36:15 Joakim Tjernlund wrote:
>[...]
> > Then I started my test program with LD_PRELOAD=...
> >
> > My test program only copies big chunks of aligned memory, so it will only
> > test for maximum throughput (such as copying video frames). I will make a
> > better one, to measure throughput on different sized blocks of aligned
> > and unaligned memory, but first I want to find out why I can't seem to
> > get even close to the expected RAM bandwidth (bursts occur at 1.6
> > Gbyte/s, sustained transfers might be able to reach 400 Mbyte/s in
> > theory, taking into account the video controller eating almost half of
> > it, I'd like to get somewhere close to 200).
> >
> > The result is quite a bit better than that of glibc-2.7 (13.2 Mbyte/s -->
> > 22 Mbyte/s), but still far from the 71.5 Mbyte/s achieved when using
> > bigger strides of 16 registers load/store at a time.
> > Note, that this is copy performance, one-way througput should be double
> > these figures.
>
> Yeah, the code is trying to do a reasonable job without knowing what
> micro arch it is running on. These could probably go to glibc
> as new general purpose memxxx() routines. You will probably see
> a big increase once dcbz is added to the copy/memset functions.
>
> Fire away :)

Ok here I go:

I have made some astonishing discoveries, and I'd like to post the used 
source-code somewhere in the meantime, any suggestions? To this list?

There seem to be some substantial differences between the e300 core used in 
the MPC5200B and in the MPC5121e (besides the MPC5121 having double the 
amount of cache). Memcpy()-performance wise, these differences amount to the 
following. The tests done are with vanilla glibc (version 2.6.1 and 2.7 
without any powerpc specific memcpy() optimizations), Gunnar von Boehns 
memcpy_e300 and my tweaked version, memcpy_e300_dj which basically uses 
16-register strides instead of 4-register strides in Gunnar's example.

memcpy() peak-performance (RAM memory throughput) on:

MPC5200B, glibc-2.6, no optimizations: 136 Mbyte/s
MPC5121e, glibc-2.7, no optimizations:  30 Mbyte/s

MPC5200B, memcpy_e300: 225 Mbyte/s
MPC5121e, memcpy_e300: 130 Mbyte/s

MPC5200B, memcpy_e300_dj: 200 Mbyte/s
MPC5121e, memcpy_e300_dj: 202 Mbyte/s

For the MPC5121e, 16-register strides seem to be most optimal, whereas for the 
MPC5200B, 4-register stides give best performance. Also, plain C memcpy() 
performance on MPC5121e is terribly poor! Does enyone know why? I don't quite 
seem to understand those results.

Some information on the test hardware:

MPC5200B-based board has 64 Mbyte DDR-SDRAM, 32-bit wide (two x16 chips), 
running ubuntu 7.10 with kernel 2.6.19.2.

MPC5121e-based board has 256 Mbyte DDR2-SDRAM, 32-bit wide (two x16 chips), 
running ubuntu 8.04.1 with kernel 2.6.24.5 from Freescale LTIB with the DIU 
turned OFF. When the DIU is turned on, maximum throughput drops from 202 to 
196 Mbyte/s.

memcpy_e300 variants basically use 4 or 16-register load/store strides, cache 
alignment and dcbz/bcbt cache-manipulation instructions to tweak performance.

I have not tried interleaving integer and fpu instructions.

Does anybody have any suggestion about where to start searching for an 
explaination of these results? I have the impression that there is something 
wrong with my setup, or with the e300c4-core, or both, but what????

Greetings,

-- 
David Jander

next prev parent reply	other threads:[~2008-09-02 13:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-25  9:31 Efficient memcpy()/memmove() for G2/G3 cores David Jander
2008-08-25 11:00 ` Matt Sealey
2008-08-25 13:06   ` David Jander
2008-08-25 22:28     ` Benjamin Herrenschmidt
2008-08-27 21:04       ` Steven Munroe
2008-08-29 11:48         ` David Jander
2008-08-29 12:21           ` Joakim Tjernlund
2008-09-01  7:23             ` David Jander
2008-09-01  9:36               ` Joakim Tjernlund
2008-09-02 13:12                 ` David Jander [this message]
2008-09-03  6:43                   ` Joakim Tjernlund
2008-09-03 20:33                   ` prodyut hazarika
2008-09-04  2:04                     ` Paul Mackerras
2008-09-04 12:05                       ` David Jander
2008-09-04 12:19                         ` Josh Boyer
2008-09-04 12:59                           ` David Jander
2008-09-04 14:31                             ` Steven Munroe
2008-09-04 14:45                               ` Gunnar Von Boehn
2008-09-04 15:14                               ` Gunnar Von Boehn
2008-09-04 16:25                               ` David Jander
2008-09-04 15:01                             ` Gunnar Von Boehn
2008-09-04 16:32                               ` David Jander
2008-09-04 18:14                       ` prodyut hazarika
2008-08-29 20:34           ` Steven Munroe
2008-09-01  8:29             ` David Jander
2008-08-31  8:28           ` Benjamin Herrenschmidt
2008-09-01  6:42             ` David Jander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200809021512.10132.david.jander@protonic.nl \
    --to=david.jander@protonic.nl \
    --cc=jrigby@freescale.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=munroesj@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).