linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* performance: memcpy vs. __copy_tofrom_user
@ 2008-10-08 14:39 Dominik Bozek
  2008-10-08 15:31 ` Minh Tuan Duong
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Dominik Bozek @ 2008-10-08 14:39 UTC (permalink / raw)
  To: linuxppc-embedded

Hi all,

I have done a test of memcpy() and __copy_tofrom_user() on the mpc8313.
And the major conclusion is that __copy_tofrom_user is more efficient
than memcpy. Sometimes about 40%.

If I good understand, the memcpy() just copy the data, while
__copy_tofrom_user() take care if the memory wasn't swapped out. So then
memcpy() shall be faster than __copy_tofrom_user(). Am I right?
Is here anybody, who can confirm such results and maybe is able to
improve the memcpy()?


Let talk about the test.
I have prepared two pieces of memory of size 64KB and I make sure that
this memory is not swapped out (necessary for memcpy() later). Then I
run one of the memory copy function to transfer 32MB and I measure the
time. The memory is copied in chunks from 64KB to 8B. I take care about
the cache calling flush_dcache_range() whenever whole 64KB was used.
I know, that memcpy on the kernel level is not intended to copy memory
blocks in userspace and __copy_tofrom_user is not intended to copy data
only between two user blocks, but for the performance test it doesn't
matter.
Bellow you may see the short piece of code in the kernel module.

#define TEST_BUF_SIZE (64*1024)
int function;
char *buf1, *buf2, *buf1_bis, *buf2_bis;
unsigned int size, cnt;

get_user(function, &((TEST_ARG*)(arg))->function);
get_user(buf1, &((TEST_ARG*)(arg))->buf1);
get_user(buf2, &((TEST_ARG*)(arg))->buf2);
get_user(size, &((TEST_ARG*)(arg))->size);

cnt = (32*1024*1024)/size; /* how many repeats of memory copy is needed
to transfer 32MB ? */
buf1_bis = buf1;
buf2_bis = buf2;

switch (function)
{
    case MEMCPY_TEST:
        while (cnt-->0)
        {
            if (buf1_bis >= buf1+TEST_BUF_SIZE)
            {
                /* need for flusch data cache as seldom as possible */
                buf1_bis = buf1;
                buf2_bis = buf2;
                flush_dcache_range((int)buf1, (int)(buf2+TEST_BUF_SIZE));
            }
            if (buf1_bis != memcpy(buf1_bis, buf2_bis, size))
                break;
            buf1_bis += size;
            buf2_bis += size;
        }
        break;

    case COPY_TOFROM_USER_TEST:
        while (cnt-->0)
        {
            if (buf1_bis >= buf1+TEST_BUF_SIZE)
            {
                /* need for flusch data cache as seldom as possible */
                buf1_bis = buf1;
                buf2_bis = buf2;
                flush_dcache_range((int)buf1, (int)(buf2+TEST_BUF_SIZE));
            }
            ret = __copy_tofrom_user(buf1_bis, buf2_bis, size);
            if (ret != 0)
                break;
            buf1_bis += size;
            buf2_bis += size;
        }
        break;
}


Bellow are the results:

memcpy()
chunk:  65536 [B] | transfer:     69.2 [MB/s] | time: 1.849727 [s] |
size:  128.000 [MB]
chunk:  32768 [B] | transfer:     69.2 [MB/s] | time: 1.849700 [s] |
size:  128.000 [MB]
chunk:  16384 [B] | transfer:     69.2 [MB/s] | time: 1.849845 [s] |
size:  128.000 [MB]
chunk:   8192 [B] | transfer:     69.2 [MB/s] | time: 1.850535 [s] |
size:  128.000 [MB]
chunk:   4096 [B] | transfer:     69.1 [MB/s] | time: 1.853405 [s] |
size:  128.000 [MB]
chunk:   2048 [B] | transfer:     69.1 [MB/s] | time: 1.852877 [s] |
size:  128.000 [MB]
chunk:   1024 [B] | transfer:     69.2 [MB/s] | time: 1.849963 [s] |
size:  128.000 [MB]
chunk:    512 [B] | transfer:     69.0 [MB/s] | time: 1.853793 [s] |
size:  128.000 [MB]
chunk:    256 [B] | transfer:     68.6 [MB/s] | time: 1.866222 [s] |
size:  128.000 [MB]
chunk:    128 [B] | transfer:     68.0 [MB/s] | time: 1.883002 [s] |
size:  128.000 [MB]
chunk:     64 [B] | transfer:     67.2 [MB/s] | time: 1.904073 [s] |
size:  128.000 [MB]
chunk:     32 [B] | transfer:     64.7 [MB/s] | time: 1.978109 [s] |
size:  128.000 [MB]
chunk:     16 [B] | transfer:     54.5 [MB/s] | time: 2.348682 [s] |
size:  128.000 [MB]
chunk:      8 [B] | transfer:     47.4 [MB/s] | time: 2.698635 [s] |
size:  128.000 [MB]


__copy_tofrom_user()
chunk:  65536 [B] | transfer:     97.3 [MB/s] | time: 1.315155 [s] |
size:  128.000 [MB]
chunk:  32768 [B] | transfer:     97.3 [MB/s] | time: 1.315762 [s] |
size:  128.000 [MB]
chunk:  16384 [B] | transfer:     97.2 [MB/s] | time: 1.316946 [s] |
size:  128.000 [MB]
chunk:   8192 [B] | transfer:     96.8 [MB/s] | time: 1.321705 [s] |
size:  128.000 [MB]
chunk:   4096 [B] | transfer:     96.6 [MB/s] | time: 1.325516 [s] |
size:  128.000 [MB]
chunk:   2048 [B] | transfer:     96.6 [MB/s] | time: 1.325570 [s] |
size:  128.000 [MB]
chunk:   1024 [B] | transfer:     96.8 [MB/s] | time: 1.322599 [s] |
size:  128.000 [MB]
chunk:    512 [B] | transfer:     97.8 [MB/s] | time: 1.308186 [s] |
size:  128.000 [MB]
chunk:    256 [B] | transfer:    100.2 [MB/s] | time: 1.277788 [s] |
size:  128.000 [MB]
chunk:    128 [B] | transfer:     91.5 [MB/s] | time: 1.398216 [s] |
size:  128.000 [MB]
chunk:     64 [B] | transfer:     87.0 [MB/s] | time: 1.471784 [s] |
size:  128.000 [MB]
chunk:     32 [B] | transfer:     75.0 [MB/s] | time: 1.706426 [s] |
size:  128.000 [MB]
chunk:     16 [B] | transfer:     47.8 [MB/s] | time: 2.678039 [s] |
size:  128.000 [MB]
chunk:      8 [B] | transfer:     41.5 [MB/s] | time: 3.084689 [s] |
size:  128.000 [MB]

Regards
Dominik Bozek


BTW. The memcpy() maybe optimized as it is on i32 when the size of block
is known at compile time.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-10-15  1:37 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-08 14:39 performance: memcpy vs. __copy_tofrom_user Dominik Bozek
2008-10-08 15:31 ` Minh Tuan Duong
2008-10-08 15:39 ` Bill Gatliff
2008-10-08 15:42 ` Grant Likely
2008-10-09  2:34   ` Paul Mackerras
2008-10-09 10:12     ` Dominik Bozek
2008-10-09 11:06       ` Paul Mackerras
2008-10-09 11:41         ` Dominik Bozek
2008-10-09 12:04           ` Leon Woestenberg
2008-10-09 15:37         ` Matt Sealey
2008-10-11 22:30           ` Benjamin Herrenschmidt
2008-10-12  2:05             ` Matt Sealey
2008-10-12  4:05               ` Benjamin Herrenschmidt
2008-10-13 15:20               ` Scott Wood
2008-10-13 20:50                 ` Benjamin Herrenschmidt
2008-10-13 21:03                   ` Scott Wood
2008-10-14  2:14                     ` Matt Sealey
2008-10-14  2:39                       ` Benjamin Herrenschmidt
2008-10-14 15:10                         ` Scott Wood
2008-10-15  1:37                           ` Matt Sealey
2008-10-10 17:17         ` Dominik Bozek
2008-10-08 17:40 ` Scott Wood
2008-10-09  2:36   ` Paul Mackerras
2008-10-11 22:32   ` Benjamin Herrenschmidt
2008-10-13 15:06     ` Scott Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).