MPC5200B memory performance

* MPC5200B memory performance
@ 2007-05-15 11:22 Daniel Schnell
  2007-05-16 18:03 ` Matthias Fechner
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Schnell @ 2007-05-15 11:22 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 2432 bytes --]

Hi,

I am doing some memory performance measurements on our custom MPC5200B
board which runs on 396 MHz internally and is connected to DDR RAM. The
RAM is driven with 132 MHz.

With the attached program (compile with -lrt) I am testing the memcpy()
throughput. In theory the memory throughput should be the double of the
memcpy() throughput if source and destination buffers are same size and
inside the DDR-RAM.

So one could make the simple calculation:

132 MHz * 32 Bit (address width) * 2 (DDR) ~ 1GBytes/sec brutto memory
throughput.

For a memcpy this should be then ~500MB/second.

Of course in real world scenarios we cannot reach the theoretical limit,
but be about 30 % near I guess.

I get the following values on my board:

bash-2.05b# ./memcpy_perf
Test (10000) memcpy of sizes (1024) ....
10000 memcpy. Time per memcpy: 1567 [nsec] (653 MB/sec)
 finished.
Test (10000) memcpy of sizes (2048) ....
10000 memcpy. Time per memcpy: 2939 [nsec] (696 MB/sec)
 finished.
Test (10000) memcpy of sizes (4096) ....
10000 memcpy. Time per memcpy: 5706 [nsec] (717 MB/sec)
 finished.
Test (10000) memcpy of sizes (8192) ....
10000 memcpy. Time per memcpy: 17077 [nsec] (479 MB/sec)
 finished.
Test (10000) memcpy of sizes (16384) ....
10000 memcpy. Time per memcpy: 133314 [nsec] (122 MB/sec)
 finished.
Test (1000) memcpy of sizes (32768) ....
1000 memcpy. Time per memcpy: 243417 [nsec] (134 MB/sec)
 finished.
Test (1000) memcpy of sizes (51200) ....
1000 memcpy. Time per memcpy: 403455 [nsec] (126 MB/sec)
 finished.
Test (1000) memcpy of sizes (102400) ....
1000 memcpy. Time per memcpy: 713316 [nsec] (143 MB/sec)
 finished.
Test (100) memcpy of sizes (1048576) ....
100 memcpy. Time per memcpy: 7210570 [nsec] (145 MB/sec)
 finished.
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 78162400 [nsec] (134 MB/sec)
 finished.
Test (5) memcpy of sizes (52428800) ....
5 memcpy. Time per memcpy: 425281800 [nsec] (123 MB/sec)
 finished.

The first 4 values are because of the data cache. So here we are testing
cache performance. All other values will test the memory controller
interface.

All in all, I am not sure, why the memory access is so much slower than
I expected.
Which factors did I miss in my calculation ? Can anybody run this
program on its 5200B based board as a comparision ?

Best regards,

Daniel Schnell.

[-- Attachment #2: memcpy_perf.c --]
[-- Type: application/octet-stream, Size: 1979 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>

int osa_now_timespec(struct timespec* t)
{
	return clock_gettime (CLOCK_REALTIME, t);
}

int osa_timediff(const struct timespec* t1, const struct timespec* t2, struct timespec* diff)
{
	if (t1!=NULL && t2!=NULL && diff!=NULL)
	{
		unsigned long long a_nsec, b_nsec, diff_nsec; 

		// calculate difference time
		a_nsec = t1->tv_sec*1000000000ULL + t1->tv_nsec;
		b_nsec = t2->tv_sec*1000000000ULL + t2->tv_nsec;
		diff_nsec = b_nsec - a_nsec;
		diff->tv_sec =  diff_nsec/1000000000ULL;
		diff->tv_nsec = diff_nsec%1000000000ULL;
		return 0;
	}
	return -1;
}

unsigned long long osa_to_ns(const struct timespec* t)
{
	 return (t->tv_sec*1000000000ULL + t->tv_nsec);
}

int testMemcpy(unsigned long num, size_t msgsize)
{
	printf("Test (%ld) memcpy of sizes (%ld) ....\n",
            num, msgsize);
    unsigned long i;
    unsigned long long nstime;
    struct timespec t1, t2, t3;

    char *buf1=malloc (msgsize);
    char *buf2=malloc (msgsize);

    // measure
	osa_now_timespec(&t1);
    for (i=0; i<num; i++)
        {
        memcpy (buf1, buf2, msgsize);
        }
    // measure
	osa_now_timespec(&t2);
	osa_timediff(&t1, &t2, &t3);

    free (buf2);
    free (buf1);

    nstime = osa_to_ns(&t3)/(unsigned long long) i;
	printf("%ld memcpy. Time per memcpy: %llu [nsec] (%llu MB/sec)\n",
            i, nstime,
            (1000ULL * (unsigned long long) msgsize)/(nstime)) ;
    fflush (stdout);

	printf(" finished.\n");
    fflush (stdout);
}

int main()
{
    testMemcpy(10000,      1*1024);
    testMemcpy(10000,      2*1024);
    testMemcpy(10000,      4*1024);
    testMemcpy(10000,      8*1024);
    testMemcpy(10000,     16*1024);
    testMemcpy(1000 ,     32*1024);
    testMemcpy(1000 ,     50*1024);
    testMemcpy(1000 ,    100*1024);
    testMemcpy( 100 ,   1024*1024);
    testMemcpy(  10 ,10*1024*1024);
    testMemcpy(   5 ,50*1024*1024);

    return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread