From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerin Jacob Subject: Re: mbuff rearm_data aligmenet issue on non x86 Date: Thu, 12 May 2016 20:20:55 +0530 Message-ID: <20160512145054.GA5784@localhost.localdomain> References: <20160512091349.GA10395@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B4FFE2@irsmsx105.ger.corp.intel.com> <20160512121719.GA1806@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B500B1@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , "Richardson, Bruce" , "thomas.monjalon@6wind.com" To: "Ananyev, Konstantin" Return-path: Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0070.outbound.protection.outlook.com [157.56.110.70]) by dpdk.org (Postfix) with ESMTP id A0DFF6CB8 for ; Thu, 12 May 2016 16:51:17 +0200 (CEST) Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836B500B1@irsmsx105.ger.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, May 12, 2016 at 01:14:34PM +0000, Ananyev, Konstantin wrote: > >=20 > > On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote: > > > Hi Jerrin, > > > > > > > > > > > Hi All, > > > > > > > > I would like align mbuff rearm_data field to 8 byte aligned so th= at > > > > write to mbuf->rearm_data with uint64_t* will be naturally aligne= d. > > > > I am not sure about IA but some other architecture/implementation= has overhead > > > > in non-naturally aligned stores. > > > > > > > > Proposed patch is something like this below, But open for any cha= nge to > > > > make fit for all other architectures/platform. > > > > > > > > Any thoughts ? > > > > > > > > =E2=9E=9C [master] [dpdk-master] $ git diff > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbu= f.h > > > > index 529debb..5a917d0 100644 > > > > --- a/lib/librte_mbuf/rte_mbuf.h > > > > +++ b/lib/librte_mbuf/rte_mbuf.h > > > > @@ -733,10 +733,8 @@ struct rte_mbuf { > > > > void *buf_addr; /**< Virtual address of segment > > > > buffer. */ > > > > phys_addr_t buf_physaddr; /**< Physical address of segmen= t > > > > buffer. */ > > > > > > > > - uint16_t buf_len; /**< Length of segment buffer. = */ > > > > - > > > > > > > > > There is no need to move buf_len itself, I think. > > > Just move rearm_data marker prior to buf_len is enough. > > > Though how do you suggest to deal with the fact, that right now we = blindly > > > update the whole 64bits pointed by rearm_data: > > > > > > drivers/net/ixgbe/ixgbe_rxtx_vec.c: > > > /* > > > * Flush mbuf with pkt template. > > > * Data to be rearmed is 6 bytes long. > > > * Though, RX will overwrite ol_flags that are comi= ng next > > > * anyway. So overwrite whole 8 bytes with one load= : > > > * 6 bytes of rearm_data plus first 2 bytes of ol_f= lags. > > > */ > > > p0 =3D (uintptr_t)&mb0->rearm_data; > > > *(uint64_t *)p0 =3D rxq->mbuf_initializer; > > > > > > ? > > > > > > If buf_len will be inside these 64bits, we can't do it anymore. > > > > > > Are you suggesting something like: > > > > > > uint64_t *p0, v0; > > > > > > p0 =3D &mb0->rearm_data; > > > v0 =3D *p0 & REARM_MASK; > > > *p0 =3D v0 | rxq->mbuf_initializer; > > > ? > >=20 > > Due to unaligned rearm_data issue, In ThunderX platform, we need to w= rite > > multiple half word of aligned stores(so masking was better us). >=20 > Ok, so what would be the gain on ARM if you'll make that change? ~4 cpu cycles per packet.Again it may not be ARM architecture specific as ARM architecture does not define instruction latency so it is more of= a implementation specific data. > Again, what would be the drop (if any) on IA? >=20 > > But I think, if we can put 16bit hole between port and ol_flags then > > we may not need the masking stuff in ixgbe. Right? >=20 > You mean move buf_len somewhere else (end of cacheline0) and=20 > introduce a 2B hole between port and ol_flags, right? Yes > Yep, that probably wouldn't have any performance impact. I will try two options and send a patch which don't have any performance impact on IA. >=20 > >=20 > > OR > >=20 > > Even better, if we can fill in a uint16_t variable which will replace= d > > later in the flow like "data_len"? >=20 > data_len is grouped with rx_descriptor_fields1 on purpose - > so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16= B write. OK >=20 > Konstantin >=20 > > and move buf_len at end the first cache line?=20 > >or any other thoughts to fix unaligned rearm_data issue? > >=20 > > Jerin > >=20 > >=20 > >=20 > > > > > > If so I wonder what would be the performance impact of that change. > > > Konstantin > > > > > > > > > > /* next 6 bytes are initialised on RX descriptor rearm */ > > > > - MARKER8 rearm_data; > > > > + MARKER64 rearm_data; > > > > uint16_t data_off; > > > > > > > > /** > > > > @@ -754,6 +752,7 @@ struct rte_mbuf { > > > > uint8_t nb_segs; /**< Number of segments. */ > > > > uint8_t port; /**< Input port. */ > > > > > > > > + uint16_t buf_len; /**< Length of segment buffer. = */ > > > > uint64_t ol_flags; /**< Offload features. */ > > > > > > > > /* remaining bytes are set on RX when pulling packet from > > > > * descriptor > > > > > > > > /Jerin