From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Herbert Guan <Herbert.Guan@arm.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>
Subject: Re: [PATCH v3] arch/arm: optimization for memcpy on AArch64
Date: Tue, 19 Dec 2017 12:54:32 +0530 [thread overview]
Message-ID: <20171219072431.GA19364@jerin> (raw)
In-Reply-To: <HE1PR08MB28096F9BAA9CFB42E397D18C860F0@HE1PR08MB2809.eurprd08.prod.outlook.com>
-----Original Message-----
> Date: Tue, 19 Dec 2017 05:33:19 +0000
> From: Herbert Guan <Herbert.Guan@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>
> Subject: RE: [PATCH v3] arch/arm: optimization for memcpy on AArch64
>
> Jerin,
>
> Thanks for review and comments. Please find my feedbacks below inline.
>
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Monday, December 18, 2017 15:44
> > To: Herbert Guan <Herbert.Guan@arm.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH v3] arch/arm: optimization for memcpy on AArch64
> >
> > -----Original Message-----
> > > Date: Mon, 18 Dec 2017 10:54:24 +0800
> > > From: Herbert Guan <herbert.guan@arm.com>
> > > To: dev@dpdk.org, jerin.jacob@caviumnetworks.com
> > > CC: Herbert Guan <herbert.guan@arm.com>
> > > Subject: [PATCH v3] arch/arm: optimization for memcpy on AArch64
> > > X-Mailer: git-send-email 1.8.3.1
> > >
> > > Signed-off-by: Herbert Guan <herbert.guan@arm.com>
> > > ---
> > > config/common_armv8a_linuxapp | 6 +
> > > .../common/include/arch/arm/rte_memcpy_64.h | 292
> > +++++++++++++++++++++
> > > 2 files changed, 298 insertions(+)
> > >
> > > diff --git a/config/common_armv8a_linuxapp
> > > b/config/common_armv8a_linuxapp index 6732d1e..8f0cbed 100644
> > > --- a/config/common_armv8a_linuxapp
> > > +++ b/config/common_armv8a_linuxapp
> > > @@ -44,6 +44,12 @@ CONFIG_RTE_FORCE_INTRINSICS=y # to address
> > minimum
> > > DMA alignment across all arm64 implementations.
> > > CONFIG_RTE_CACHE_LINE_SIZE=128
> > >
> > > +# Accelarate rte_memcpy. Be sure to run unit test to determine the
> >
> > Additional space before "Be". Rather than just mentioning the unit test,
> > mention the absolute test case name(memcpy_perf_autotest)
> >
> > > +# best threshold in code. Refer to notes in source file
> >
> > Additional space before "Refer"
>
> Fixed in new version.
>
> >
> > > +# (lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h) for more
> > #
> > > +info.
> > > +CONFIG_RTE_ARCH_ARM64_MEMCPY=n
> > > +
> > > CONFIG_RTE_LIBRTE_FM10K_PMD=n
> > > CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n
> > > CONFIG_RTE_LIBRTE_AVP_PMD=n
> > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> > > b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> > > index b80d8ba..1ea275d 100644
> > > --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> > > +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> > > @@ -42,6 +42,296 @@
> > >
> > > #include "generic/rte_memcpy.h"
> > >
> > > +#ifdef RTE_ARCH_ARM64_MEMCPY
> >
> > See the comment below at "(GCC_VERSION < 50400)" check
> >
> > > +#include <rte_common.h>
> > > +#include <rte_branch_prediction.h>
> > > +
> > > +/*
> > > + * The memory copy performance differs on different AArch64 micro-
> > architectures.
> > > + * And the most recent glibc (e.g. 2.23 or later) can provide a
> > > +better memcpy()
> > > + * performance compared to old glibc versions. It's always suggested
> > > +to use a
> > > + * more recent glibc if possible, from which the entire system can get
> > benefit.
> > > + *
> > > + * This implementation improves memory copy on some aarch64
> > > +micro-architectures,
> > > + * when an old glibc (e.g. 2.19, 2.17...) is being used. It is
> > > +disabled by
> > > + * default and needs "RTE_ARCH_ARM64_MEMCPY" defined to activate.
> > > +It's not
> > > + * always providing better performance than memcpy() so users need to
> > > +run unit
> > > + * test "memcpy_perf_autotest" and customize parameters in
> > > +customization section
> > > + * below for best performance.
> > > + *
> > > + * Compiler version will also impact the rte_memcpy() performance.
> > > +It's observed
> > > + * on some platforms and with the same code, GCC 7.2.0 compiled
> > > +binaries can
> > > + * provide better performance than GCC 4.8.5 compiled binaries.
> > > + */
> > > +
> > > +/**************************************
> > > + * Beginning of customization section
> > > +**************************************/
> > > +#define ALIGNMENT_MASK 0x0F
> >
> > This symbol will be included in public rte_memcpy.h version for arm64 DPDK
> > build.
> > Please use RTE_ prefix to avoid multi
> > definition.(RTE_ARCH_ARM64_ALIGN_MASK ? or any shorter name)
> >
> Changed to RTE_AARCH64_ALIGN_MASK in new version.
Since it is something to do with memcpy and arm64, I prefer,
RTE_ARM64_MEMCPY_ALIGN_MASK
>
> > > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN
> > > +/* Only src unalignment will be treaed as unaligned copy */ #define
> > > +IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & ALIGNMENT_MASK)
> > #else
> > > +/* Both dst and src unalignment will be treated as unaligned copy */
> > > +#define IS_UNALIGNED_COPY(dst, src) \
> > > + (((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK)
> > #endif
> > > +
> > > +
> > > +/*
> > > + * If copy size is larger than threshold, memcpy() will be used.
> > > + * Run "memcpy_perf_autotest" to determine the proper threshold.
> > > + */
> > > +#define ALIGNED_THRESHOLD ((size_t)(0xffffffff))
> > > +#define UNALIGNED_THRESHOLD ((size_t)(0xffffffff))
> >
> > Same as above comment.
> Added RTE_AARCH64_ prefix in new version.
Same as above.
>
> >
> > > +
> > > +/**************************************
> > > + * End of customization section
> > > + **************************************/
> > > +#ifdef RTE_TOOLCHAIN_GCC
> > > +#if (GCC_VERSION < 50400)
> > > +#warning "The GCC version is quite old, which may result in sub-optimal \
> > > +performance of the compiled code. It is suggested that at least GCC 5.4.0 \
> > > +be used."
> >
> > Even though it is warning, based on where this file get included it will
> > generate error(see below)
> > How about, selecting optimized memcpy when RTE_ARCH_ARM64_MEMCPY
> > && if (GCC_VERSION >= 50400) ?
> >
> Fully understand that. While I'm not tending to make it 'silent'. GCC 4.x is just
> quite old and may not provide best optimized code -- not only for DPDK app.
> We can provide another option RTE_AARCH64_SKIP_GCC_VERSION_CHECK to allow
> skipping the GCC version check. How do you think?
I prefer to reduce the options. But, No strong opinion on this as this
the RTE_ARCH_ARM64_MEMCPY option is by default disabled(ie. No risk).
next prev parent reply other threads:[~2017-12-19 7:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-27 7:49 [PATCH] arch/arm: optimization for memcpy on AArch64 Herbert Guan
2017-11-29 12:31 ` Jerin Jacob
2017-12-03 12:37 ` Herbert Guan
2017-12-15 4:06 ` Jerin Jacob
2017-12-18 2:51 ` Herbert Guan
2017-12-18 4:17 ` Jerin Jacob
2017-12-02 7:33 ` Pavan Nikhilesh Bhagavatula
2017-12-03 12:38 ` Herbert Guan
2017-12-03 14:20 ` Pavan Nikhilesh Bhagavatula
2017-12-04 7:14 ` Herbert Guan
2017-12-05 6:02 ` [PATCH v2] " Herbert Guan
2017-12-15 3:41 ` Jerin Jacob
2017-12-18 2:54 ` [PATCH v3] " Herbert Guan
2017-12-18 7:43 ` Jerin Jacob
2017-12-19 5:33 ` Herbert Guan
2017-12-19 7:24 ` Jerin Jacob [this message]
2017-12-21 5:33 ` [PATCH v4] " Herbert Guan
2018-01-03 13:35 ` Jerin Jacob
2018-01-04 10:23 ` Herbert Guan
2018-01-04 10:20 ` [PATCH v5] " Herbert Guan
2018-01-12 17:03 ` Thomas Monjalon
2018-01-15 10:57 ` Herbert Guan
2018-01-15 11:37 ` Thomas Monjalon
2018-01-18 23:54 ` Thomas Monjalon
2018-01-19 6:16 ` 答复: " Herbert Guan
2018-01-19 6:10 ` [PATCH v6] arch/arm: optimization for memcpy on ARM64 Herbert Guan
2018-01-20 16:21 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171219072431.GA19364@jerin \
--to=jerin.jacob@caviumnetworks.com \
--cc=Herbert.Guan@arm.com \
--cc=dev@dpdk.org \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.