From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerin Jacob Subject: Re: [PATCH v2] ring: use aligned memzone allocation Date: Fri, 9 Jun 2017 22:58:55 +0530 Message-ID: <20170609172854.GA2828@jerin> References: <20170602201213.51143-1-daniel.verkamp@intel.com> <2601191342CEEE43887BDE71AB9772583FB05190@IRSMSX109.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB05216@IRSMSX109.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB060FD@IRSMSX109.ger.corp.intel.com> <20170606124201.GA43772@bricha3-MOBL3.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB0644D@IRSMSX109.ger.corp.intel.com> <6908e71a-c849-83d3-e86d-745acf9f9491@sts.kz> <20170609101625.09075858@xeon-e3> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Yerden Zhumabekov , "Ananyev, Konstantin" , "Richardson, Bruce" , "Verkamp, Daniel" , "dev@dpdk.org" To: Stephen Hemminger Return-path: Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on0057.outbound.protection.outlook.com [104.47.36.57]) by dpdk.org (Postfix) with ESMTP id AF2964A63 for ; Fri, 9 Jun 2017 19:29:20 +0200 (CEST) Content-Disposition: inline In-Reply-To: <20170609101625.09075858@xeon-e3> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" -----Original Message----- > Date: Fri, 9 Jun 2017 10:16:25 -0700 > From: Stephen Hemminger > To: Yerden Zhumabekov > Cc: "Ananyev, Konstantin" , "Richardson, > Bruce" , "Verkamp, Daniel" > , "dev@dpdk.org" > Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation > > On Fri, 9 Jun 2017 18:47:43 +0600 > Yerden Zhumabekov wrote: > > > On 06.06.2017 19:19, Ananyev, Konstantin wrote: > > > > > >>>> Maybe there is some deeper reason for the >= 128-byte alignment logic in rte_ring.h? > > >>> Might be, would be good to hear opinion the author of that change. > > >> It gives improved performance for core-2-core transfer. > > > You mean empty cache-line(s) after prod/cons, correct? > > > That's ok but why we can't keep them and whole rte_ring aligned on cache-line boundaries? > > > Something like that: > > > struct rte_ring { > > > ... > > > struct rte_ring_headtail prod __rte_cache_aligned; > > > EMPTY_CACHE_LINE __rte_cache_aligned; > > > struct rte_ring_headtail cons __rte_cache_aligned; > > > EMPTY_CACHE_LINE __rte_cache_aligned; > > > }; > > > > > > Konstantin > > > > > > > I'm curious, can anyone explain, how does it actually affect > > performance? Maybe we can utilize it application code? > > I think it is because on Intel CPU's the CPU will speculatively fetch adjacent cache lines. > If these cache lines change, then it will create false sharing. I see. I think, In such cases it is better to abstract as conditional compilation. The above logic has worst case cache memory requirement if CPU is 128B CL and no speculative prefetch.