From: Jianguo Wu <wujianguo@huawei.com>
To: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
qiuxishi <qiuxishi@huawei.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Hush Bensen <hush.bensen@gmail.com>,
mitake.hitoshi@gmail.com
Subject: Re: Transparent Hugepage impact on memcpy
Date: Fri, 7 Jun 2013 09:26:58 +0800 [thread overview]
Message-ID: <51B136E2.4010606@huawei.com> (raw)
In-Reply-To: <CAMO-S2ixv55bGEFGR6Eh=UZgVBz=nv81EckuzWoVi0t4KdB+VA@mail.gmail.com>
Hi Hitoshi,
Thanks for your reply! please see below.
On 2013/6/6 21:54, Hitoshi Mitake wrote:
> Hi Jianguo,
>
> On Wed, Jun 5, 2013 at 12:26 PM, Jianguo Wu <wujianguo@huawei.com> wrote:
>> Hi,
>> One more question, I wrote a memcpy test program, mostly the same as with perf bench memcpy.
>> But test result isn't consistent with perf bench when THP is off.
>>
>> my program perf bench
>> THP: 3.628368 GB/Sec (with prefault) 3.672879 GB/Sec (with prefault)
>> NO-THP: 3.612743 GB/Sec (with prefault) 6.190187 GB/Sec (with prefault)
>>
>> Below is my code:
>> src = calloc(1, len);
>> dst = calloc(1, len);
>>
>> if (prefault)
>> memcpy(dst, src, len);
>> gettimeofday(&tv_start, NULL);
>> memcpy(dst, src, len);
>> gettimeofday(&tv_end, NULL);
>>
>> timersub(&tv_end, &tv_start, &tv_diff);
>> free(src);
>> free(dst);
>>
>> speed = (double)((double)len / timeval2double(&tv_diff));
>> print_bps(speed);
>>
>> This is weird, is it possible that perf bench do some build optimize?
>>
>> Thansk,
>> Jianguo Wu.
>
> perf bench mem memcpy is build with -O6. This is the compile command
> line (you can get this with make V=1):
> gcc -o bench/mem-memcpy-x86-64-asm.o -c -fno-omit-frame-pointer -ggdb3
> -funwind-tables -Wall -Wextra -std=gnu99 -Werror -O6 .... # ommited
>
> Can I see your compile option for your test program and the actual
> command line executing perf bench mem memcpy?
>
I just compiled my test program with gcc -o memcpy-test memcpy-test.c.
I tried to use the same compile option with perf bench mem memcpy, and
the test result showed no difference.
My execute command line for perf bench mem memcpy:
#./perf bench mem memcpy -l 1gb -o
Thanks,
Jianguo Wu
> Thanks,
> Hitoshi
>
>>
>> On 2013/6/4 16:57, Jianguo Wu wrote:
>>
>>> Hi all,
>>>
>>> I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on,
>>> memcpy has worse performance.
>>>
>>> When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault).
>>>
>>> I think THP will improve performance, but the test result obviously not the case.
>>> Andrea mentioned THP cause "clear_page/copy_page less cache friendly" in
>>> http://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_arcangeli.pdf.
>>>
>>> I am not quite understand this, could you please give me some comments, Thanks!
>>>
>>> I test in Linux-3.4-stable, and my machine info is:
>>> Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
>>>
>>> available: 2 nodes (0-1)
>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>> node 0 size: 24567 MB
>>> node 0 free: 23550 MB
>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>> node 1 size: 24576 MB
>>> node 1 free: 23767 MB
>>> node distances:
>>> node 0 1
>>> 0: 10 20
>>> 1: 20 10
>>>
>>> Below is test result:
>>> ---with THP---
>>> #cat /sys/kernel/mm/transparent_hugepage/enabled
>>> [always] madvise never
>>> #./perf bench mem memcpy -l 1gb -o
>>> # Running mem/memcpy benchmark...
>>> # Copying 1gb Bytes ...
>>>
>>> 3.672879 GB/Sec (with prefault)
>>>
>>> #./perf stat ...
>>> Performance counter stats for './perf bench mem memcpy -l 1gb -o':
>>>
>>> 35455940 cache-misses # 53.504 % of all cache refs [49.45%]
>>> 66267785 cache-references [49.78%]
>>> 2409 page-faults
>>> 450768651 dTLB-loads
>>> [50.78%]
>>> 24580 dTLB-misses
>>> # 0.01% of all dTLB cache hits [51.01%]
>>> 1338974202 dTLB-stores
>>> [50.63%]
>>> 77943 dTLB-misses
>>> [50.24%]
>>> 697404997 iTLB-loads
>>> [49.77%]
>>> 274 iTLB-misses
>>> # 0.00% of all iTLB cache hits [49.30%]
>>>
>>> 0.855041819 seconds time elapsed
>>>
>>> ---no THP---
>>> #cat /sys/kernel/mm/transparent_hugepage/enabled
>>> always madvise [never]
>>>
>>> #./perf bench mem memcpy -l 1gb -o
>>> # Running mem/memcpy benchmark...
>>> # Copying 1gb Bytes ...
>>>
>>> 6.190187 GB/Sec (with prefault)
>>>
>>> #./perf stat ...
>>> Performance counter stats for './perf bench mem memcpy -l 1gb -o':
>>>
>>> 16920763 cache-misses # 98.377 % of all cache refs [50.01%]
>>> 17200000 cache-references [50.04%]
>>> 524652 page-faults
>>> 734365659 dTLB-loads
>>> [50.04%]
>>> 4986387 dTLB-misses
>>> # 0.68% of all dTLB cache hits [50.04%]
>>> 1013408298 dTLB-stores
>>> [50.04%]
>>> 8180817 dTLB-misses
>>> [49.97%]
>>> 1526642351 iTLB-loads
>>> [50.41%]
>>> 56 iTLB-misses
>>> # 0.00% of all iTLB cache hits [50.21%]
>>>
>>> 1.025425847 seconds time elapsed
>>>
>>> Thanks,
>>> Jianguo Wu.
>>
>>
>>
>>
>
> .
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-07 1:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 8:57 Transparent Hugepage impact on memcpy Jianguo Wu
2013-06-04 12:30 ` Wanpeng Li
2013-06-04 12:30 ` Wanpeng Li
2013-06-04 20:20 ` Andrea Arcangeli
2013-06-05 2:49 ` Jianguo Wu
[not found] ` <51adde12.e6b2320a.610d.ffff96f3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-06-04 12:55 ` Jianguo Wu
2013-06-04 14:10 ` Hush Bensen
2013-06-05 3:26 ` Jianguo Wu
2013-06-06 13:54 ` Hitoshi Mitake
2013-06-07 1:26 ` Jianguo Wu [this message]
2013-06-07 13:50 ` Hitoshi Mitake
2013-06-08 1:13 ` Jianguo Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B136E2.4010606@huawei.com \
--to=wujianguo@huawei.com \
--cc=aarcange@redhat.com \
--cc=hush.bensen@gmail.com \
--cc=linux-mm@kvack.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=mitake.hitoshi@gmail.com \
--cc=mitake@dcl.info.waseda.ac.jp \
--cc=qiuxishi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.