From: Marc Sune <marc.sune-kpkqNMk1I7M@public.gmane.org>
To: Bruce Richardson
<bruce.richardson-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: dev-VfR2kkLFssw@public.gmane.org
Subject: Re: [PATCH 0/4] DPDK memcpy optimization
Date: Wed, 21 Jan 2015 14:21:25 +0100 [thread overview]
Message-ID: <54BFA7D5.7020106@bisdn.de> (raw)
In-Reply-To: <20150121130234.GB10756@bricha3-MOBL3>
On 21/01/15 14:02, Bruce Richardson wrote:
> On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
>> On 21/01/15 04:44, Wang, Zhihong wrote:
>>>> -----Original Message-----
>>>> From: Richardson, Bruce
>>>> Sent: Wednesday, January 21, 2015 12:15 AM
>>>> To: Neil Horman
>>>> Cc: Wang, Zhihong; dev-VfR2kkLFssw@public.gmane.org
>>>> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>>>>
>>>> On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
>>>>> On Tue, Jan 20, 2015 at 03:01:44AM +0000, Wang, Zhihong wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Neil Horman [mailto:nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org]
>>>>>>> Sent: Monday, January 19, 2015 9:02 PM
>>>>>>> To: Wang, Zhihong
>>>>>>> Cc: dev-VfR2kkLFssw@public.gmane.org
>>>>>>> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>>>>>>>
>>>>>>> On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org
>>>> wrote:
>>>>>>>> This patch set optimizes memcpy for DPDK for both SSE and AVX
>>>> platforms.
>>>>>>>> It also extends memcpy test coverage with unaligned cases and
>>>>>>>> more test
>>>>>>> points.
>>>>>>>> Optimization techniques are summarized below:
>>>>>>>>
>>>>>>>> 1. Utilize full cache bandwidth
>>>>>>>>
>>>>>>>> 2. Enforce aligned stores
>>>>>>>>
>>>>>>>> 3. Apply load address alignment based on architecture features
>>>>>>>>
>>>>>>>> 4. Make load/store address available as early as possible
>>>>>>>>
>>>>>>>> 5. General optimization techniques like inlining, branch
>>>>>>>> reducing, prefetch pattern access
>>>>>>>>
>>>>>>>> Zhihong Wang (4):
>>>>>>>> Disabled VTA for memcpy test in app/test/Makefile
>>>>>>>> Removed unnecessary test cases in test_memcpy.c
>>>>>>>> Extended test coverage in test_memcpy_perf.c
>>>>>>>> Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
>>>>>>>> platforms
>>>>>>>>
>>>>>>>> app/test/Makefile | 6 +
>>>>>>>> app/test/test_memcpy.c | 52 +-
>>>>>>>> app/test/test_memcpy_perf.c | 238 +++++---
>>>>>>>> .../common/include/arch/x86/rte_memcpy.h | 664
>>>>>>> +++++++++++++++------
>>>>>>>> 4 files changed, 656 insertions(+), 304 deletions(-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> 1.9.3
>>>>>>>>
>>>>>>>>
>>>>>>> Are you able to compile this with gcc 4.9.2? The compilation of
>>>>>>> test_memcpy_perf is taking forever for me. It appears hung.
>>>>>>> Neil
>>>>>> Neil,
>>>>>>
>>>>>> Thanks for reporting this!
>>>>>> It should compile but will take quite some time if the CPU doesn't support
>>>> AVX2, the reason is that:
>>>>>> 1. The SSE & AVX memcpy implementation is more complicated than
>>>> AVX2
>>>>>> version thus the compiler takes more time to compile and optimize 2.
>>>>>> The new test_memcpy_perf.c contains 126 constants memcpy calls for
>>>>>> better test case coverage, that's quite a lot
>>>>>>
>>>>>> I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
>>>>>> 1. The whole compile process takes 9'41" with the original
>>>>>> test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
>>>>>> only 2'41" after I reduce the constant memcpy call number to 12 + 12
>>>>>> = 24
>>>>>>
>>>>>> I'll reduce memcpy call in the next version of patch.
>>>>>>
>>>>> ok, thank you. I'm all for optimzation, but I think a compile that
>>>>> takes almost
>>>>> 10 minutes for a single file is going to generate some raised eyebrows
>>>>> when end users start tinkering with it
>>>>>
>>>>> Neil
>>>>>
>>>>>> Zhihong (John)
>>>>>>
>>>> Even two minutes is a very long time to compile, IMHO. The whole of DPDK
>>>> doesn't take that long to compile right now, and that's with a couple of huge
>>>> header files with routing tables in it. Any chance you could cut compile time
>>>> down to a few seconds while still having reasonable tests?
>>>> Also, when there is AVX2 present on the system, what is the compile time
>>>> like for that code?
>>>>
>>>> /Bruce
>>> Neil, Bruce,
>>>
>>> Some data first.
>>>
>>> Sandy Bridge without AVX2:
>>> 1. original w/ 10 constant memcpy: 2'25"
>>> 2. patch w/ 12 constant memcpy: 2'41"
>>> 3. patch w/ 63 constant memcpy: 9'41"
>>>
>>> Haswell with AVX2:
>>> 1. original w/ 10 constant memcpy: 1'57"
>>> 2. patch w/ 12 constant memcpy: 1'56"
>>> 3. patch w/ 63 constant memcpy: 3'16"
>>>
>>> Also, to address Bruce's question, we have to reduce test case to cut down compile time. Because we use:
>>> 1. intrinsics instead of assembly for better flexibility and can utilize more compiler optimization
>>> 2. complex function body for better performance
>>> 3. inlining
>>> This increases compile time.
>>> But I think it'd be okay to do that as long as we can select a fair set of test points.
>>>
>>> It'd be great if you could give some suggestion, say, 12 points.
>>>
>>> Zhihong (John)
>>>
>>>
>> While I agree in the general case these long compilation times is painful
>> for the users, having a factor of 2-8x in memcpy operations is quite an
>> improvement, specially in DPDK applications which need to deal
>> (unfortunately) heavily on them -- e.g. IP fragmentation and reassembly.
>>
>> Why not having a fast compilation by default, and a tunable config flag to
>> enable a highly optimized version of rte_memcpy (e.g. RTE_EAL_OPT_MEMCPY)?
>>
>> Marc
>>
> Out of interest, are these 2-8x improvements something you have benchmarked
> in these app scenarios? [i.e. not just in micro-benchmarks].
How much that micro-speedup will end up affecting the performance of the
entire application is something I cannot say, so I agree that we should
probably have some additional benchmarks before deciding that pays off
maintaining 2 versions of rte_memcpy.
There are however a bunch of possible DPDK applications that could
potentially benefit; IP fragmentation, tunneling and specialized DPI
applications, among others, since they involve a reasonable amount of
memcpys per pkt. My point was, *if* it proves that is enough beneficial,
why not having it optionally?
Marc
>
> /Bruce
next prev parent reply other threads:[~2015-01-21 13:21 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-19 1:53 [PATCH 0/4] DPDK memcpy optimization zhihong.wang-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1421632414-10027-1-git-send-email-zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-01-19 1:53 ` [PATCH 1/4] app/test: Disabled VTA for memcpy test in app/test/Makefile zhihong.wang-ral2JQCrhuEAvxtiuMwx3w
2015-01-19 1:53 ` [PATCH 2/4] app/test: Removed unnecessary test cases in test_memcpy.c zhihong.wang-ral2JQCrhuEAvxtiuMwx3w
2015-01-19 1:53 ` [PATCH 3/4] app/test: Extended test coverage in test_memcpy_perf.c zhihong.wang-ral2JQCrhuEAvxtiuMwx3w
2015-01-19 1:53 ` [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms zhihong.wang-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1421632414-10027-5-git-send-email-zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-01-20 17:15 ` Stephen Hemminger
2015-01-20 19:16 ` Neil Horman
[not found] ` <20150120191624.GJ18449-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2015-01-21 3:18 ` Wang, Zhihong
2015-01-25 20:02 ` Jim Thompson
2015-01-26 14:43 ` Wodkowski, PawelX
[not found] ` <F6F2A6264E145F47A18AB6DF8E87425D12B8C8E2-kPTMFJFq+rFP9JyJpTNKArfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-27 5:12 ` Wang, Zhihong
2015-01-19 13:02 ` [PATCH 0/4] DPDK memcpy optimization Neil Horman
[not found] ` <20150119130221.GB21790-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2015-01-20 3:01 ` Wang, Zhihong
[not found] ` <F60F360A2500CD45ACDB1D700268892D0E75EFFE-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-20 15:11 ` Neil Horman
[not found] ` <20150120151118.GD18449-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2015-01-20 16:14 ` Bruce Richardson
2015-01-21 3:44 ` Wang, Zhihong
[not found] ` <F60F360A2500CD45ACDB1D700268892D0E75F664-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-21 11:40 ` Bruce Richardson
2015-01-21 12:02 ` Ananyev, Konstantin
[not found] ` <2601191342CEEE43887BDE71AB977258213DE922-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-21 12:38 ` Neil Horman
[not found] ` <20150121123801.GB18515-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2015-01-23 3:26 ` Wang, Zhihong
2015-01-21 12:36 ` Marc Sune
[not found] ` <54BF9D59.7070104-kpkqNMk1I7M@public.gmane.org>
2015-01-21 13:02 ` Bruce Richardson
2015-01-21 13:21 ` Marc Sune [this message]
[not found] ` <54BFA7D5.7020106-kpkqNMk1I7M@public.gmane.org>
2015-01-21 13:26 ` Bruce Richardson
2015-01-21 19:49 ` Stephen Hemminger
2015-01-21 20:54 ` Neil Horman
[not found] ` <20150121205404.GB32617-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2015-01-21 21:25 ` Jim Thompson
[not found] ` <53D2253B-DE20-486E-ADF0-DA02AAB1EF35-jiyf0hk6h8BBDgjK7y7TUQ@public.gmane.org>
2015-01-22 0:53 ` Stephen Hemminger
2015-01-22 9:06 ` Luke Gorrie
[not found] ` <CAA2XHbcG4kZzOiMibQhjRxjg_aCJpZ4djgXbQf=FECgZropbCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-22 13:29 ` Jay Rolette
[not found] ` <CADNuJVrzFzT6WOWM8W13xvv8ad5b2GMO8C12EFYRb1vQZGyTBA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-22 18:27 ` Luke Gorrie
2015-01-22 19:36 ` Jay Rolette
2015-01-22 18:21 ` EDMISON, Kelvin (Kelvin)
[not found] ` <D0E6A94E.41FE0%kelvin.edmison-cfy2TCaE7SFv+uJa97DSA9BPR1lH4CV8@public.gmane.org>
2015-01-27 8:22 ` Wang, Zhihong
[not found] ` <F60F360A2500CD45ACDB1D700268892D0E761378-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-28 21:48 ` EDMISON, Kelvin (Kelvin)
[not found] ` <D0EE79A7.42BCB%kelvin.edmison-cfy2TCaE7SFv+uJa97DSA9BPR1lH4CV8@public.gmane.org>
2015-01-29 1:53 ` Wang, Zhihong
2015-01-23 6:52 ` Wang, Zhihong
[not found] ` <F60F360A2500CD45ACDB1D700268892D0E760527-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-26 18:29 ` Ananyev, Konstantin
[not found] ` <2601191342CEEE43887BDE71AB977258213DFA32-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-27 1:42 ` Wang, Zhihong
[not found] ` <F60F360A2500CD45ACDB1D700268892D0E760F8B-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-27 11:30 ` Ananyev, Konstantin
[not found] ` <2601191342CEEE43887BDE71AB977258213DFC4D-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-27 12:19 ` Ananyev, Konstantin
[not found] ` <2601191342CEEE43887BDE71AB977258213DFDEE-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-28 2:06 ` Wang, Zhihong
2015-01-29 3:42 ` Fu, JingguoX
2015-01-25 14:50 ` [dpdk-dev] " Luke Gorrie
2015-01-26 1:30 ` Wang, Zhihong
2015-01-26 8:03 ` Luke Gorrie
2015-01-27 7:19 ` Wang, Zhihong
2015-01-27 13:57 ` Luke Gorrie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54BFA7D5.7020106@bisdn.de \
--to=marc.sune-kpkqnmk1i7m@public.gmane.org \
--cc=bruce.richardson-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=dev-VfR2kkLFssw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.