From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jianbo.Liu@arm.com" Subject: Re: [PATCH ] examples/l3fwd: fix aliasing in port grouping Date: Fri, 3 Nov 2017 11:21:43 +0800 Message-ID: <20171103032141.GA6518@arm.com> References: <20171102143114.24380-1-gprathyusha@caviumnetworks.com> <2601191342CEEE43887BDE71AB9772585FAB87F0@irsmsx105.ger.corp.intel.com> <20171102153327.GA24586@cavium.com> <2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Guduri Prathyusha , "dev@dpdk.org" , "guduriprathyusha@gmail.com" , "Kantecki, Tomasz" To: "Ananyev, Konstantin" Return-path: Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40086.outbound.protection.outlook.com [40.107.4.86]) by dpdk.org (Postfix) with ESMTP id B859D1B673 for ; Fri, 3 Nov 2017 04:22:55 +0100 (CET) Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The 11/02/2017 15:52, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com] > > Sent: Thursday, November 2, 2017 3:34 PM > > To: Ananyev, Konstantin > > Cc: dev@dpdk.org; Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Kante= cki, Tomasz > > Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port g= rouping > > > > On Thu, Nov 02, 2017 at 02:46:43PM +0000, Ananyev, Konstantin wrote: > > > Hi, > > Hi > > > > > > > -----Original Message----- > > > > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com] > > > > Sent: Thursday, November 2, 2017 2:31 PM > > > > To: Kantecki, Tomasz > > > > Cc: Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Ananyev, Konsta= ntin ; dev@dpdk.org; Guduri > > > > Prathyusha > > > > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port g= rouping > > > > > > > > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x give= s May I ask the detail version about the gcc you are using? > > > > undefined behavior in port_groupx4. 'pn' and 'pnum' are two differe= nt > > > > pointers pointing to same chunk of memory and with -f-strict-aliasi= ng the > > > > pointers are assumed to be pointing to different memory and compile= r > > > > reorders instructions that depend on pnum and pn. This breaks port > > > > grouping algorithm. > > > > > > > > This patch eliminates the usage of union and uses memcpy for copyin= g > > > > gptbl[v].pnum to pn. memcpy when applied on built_in constant size = does > > > > not call its library implementation but uses appropriate LD and ST > > > > instructions directly and hence no performance overhead. > > > > > > > > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") > > > > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5") > > > > Signed-off-by: Guduri Prathyusha > > > > --- > > > > examples/l3fwd/l3fwd_neon.h | 11 +++-------- > > > > examples/l3fwd/l3fwd_sse.h | 11 +++-------- > > > > 2 files changed, 6 insertions(+), 16 deletions(-) > > > > > > > > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neo= n.h > > > > index 4bc161394..10a602a04 100644 > > > > --- a/examples/l3fwd/l3fwd_neon.h > > > > +++ b/examples/l3fwd/l3fwd_neon.h > > > > @@ -100,11 +100,6 @@ static inline uint16_t * > > > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp= 1, > > > > uint16x8_t dp2) > > > > { > > > > - union { > > > > - uint16_t u16[FWDSTEP + 1]; > > > > - uint64_t u64; > > > > - } *pnum =3D (void *)pn; > > > > - > > > > int32_t v; > > > > uint16x8_t mask =3D {1, 2, 4, 8, 0, 0, 0, 0}; > > > > > > > > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t= *lp, uint16x8_t dp1, > > > > > > > > /* if dest port value has changed. */ > > > > if (v !=3D GRPMSK) { > > > > - pnum->u64 =3D gptbl[v].pnum; > > > > - pnum->u16[FWDSTEP] =3D 1; > > > > - lp =3D pnum->u16 + gptbl[v].idx; > > > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum= )); > > > > + pn[FWDSTEP] =3D 1; > > > > + lp =3D pn + gptbl[v].idx; > > > > } > > > > > > > > return lp; > > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.= h > > > > index 831760f02..79a71d77e 100644 > > > > --- a/examples/l3fwd/l3fwd_sse.h > > > > +++ b/examples/l3fwd/l3fwd_sse.h > > > > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], u= int16_t dst_port[FWDSTEP]) > > > > static inline uint16_t * > > > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, = __m128i dp2) > > > > { > > > > - union { > > > > - uint16_t u16[FWDSTEP + 1]; > > > > - uint64_t u64; > > > > - } *pnum =3D (void *)pn; > > > > - > > > > int32_t v; > > > > > > > > dp1 =3D _mm_cmpeq_epi16(dp1, dp2); > > > > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t= *lp, __m128i dp1, __m128i dp2) > > > > > > > > /* if dest port value has changed. */ > > > > if (v !=3D GRPMSK) { > > > > - pnum->u64 =3D gptbl[v].pnum; > > > > - pnum->u16[FWDSTEP] =3D 1; > > > > - lp =3D pnum->u16 + gptbl[v].idx; > > > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum= )); > > > > + pn[FWDSTEP] =3D 1; > > > > + lp =3D pn + gptbl[v].idx; > > > > > > Could you explain a bit more here - which exactly instructions were r= eordered > > > and what kind of problems did it cause? > > > Specially on IA? > > > > This issue is observed on ARM since ARM gcc is more aggressive in > > reordering than x86 gcc. > > Ok, then if x86 is not affected why to modify l3fwd_sse.h at all? > Unless there is a reproducible problem with x86 - > my preference would be to keep that file intact. > > > In ARM when v !=3D GRPMSK, the following > > instructions ordering is not guarenteed because of strict aliasing. > > > > lp[0] +=3D gptbl[v].lpv; > > pnum->u64 =3D gptbl[v].pnum; > > pnum->u16[FWDSTEP] =3D 1; > > lp =3D pnum->u16 + gptbl[v].idx; > > Ok, so what in particular is reordered by the compiler: > > lp[0] +=3D gptbl[v].lpv; (1) > pnum->u64 =3D gptbl[v].pnum; (2) > pnum->u16[FWDSTEP] =3D 1; (3) > lp =3D pnum->u16 + gptbl[v].idx; (4) > > (2) and (3)? > If so I am not sure how it could be a problem: > they do stores to the different locations. > (1) and (4) as I can see shouldn't be reordered. > Anyway - if you think this a compiler reordering issue, > then adding rte_compiler_barrier() should fix the issue, right? Agree. > > > > > That results in wrong lp[0] updation. > > memcpy in this case will avoid this problem. > > > > > In any case I don't think using rte_memcpy is a good thing to use her= e: > > > it is a huge inline function - way too much to copy just 64 bit varia= ble. > > > > I agree that rte_memcpy is overhead in this case but how about using > > memcpy that will not use library implementation if the size is constant= . > > memcpy with constant size uses built_in_memcpy that does not add > > performance overhead. > > On x86 rte_memcpy() doesn't call libc memcpy() at all - it is a separate = function: > ib/librte_eal/common/include/arch/x86/rte_memcpy.h > > > > > Thoughts? > > As I said - if x86 is not affected - please keep l3fwd_sse.h intact. > If it does (still not sure how) - check would compiler barrier help here. > Konstantin > -- IMPORTANT NOTICE: The contents of this email and any attachments are confid= ential and may also be privileged. If you are not the intended recipient, p= lease notify the sender immediately and do not disclose the contents to any= other person, use it for any purpose, or store or copy the information in = any medium. Thank you.