From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerin Jacob Subject: Re: [PATCH] lib: change rte_ring dequeue to guarantee ordering before tail update Date: Sat, 23 Jul 2016 18:05:50 +0530 Message-ID: <20160723123549.GA24183@localhost.localdomain> References: <2601191342CEEE43887BDE71AB97725836B7E32F@irsmsx105.ger.corp.intel.com> <14017551.U6D1dIIx0P@xps13> <20160723060515.GA13747@localhost.localdomain> <20160723093621.GA18376@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B81292@irsmsx105.ger.corp.intel.com> <20160723103847.GB18376@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B812E8@irsmsx105.ger.corp.intel.com> <20160723114928.GA21364@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B81316@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Thomas Monjalon , Juhamatti Kuusisaari , "dev@dpdk.org" To: "Ananyev, Konstantin" Return-path: Received: from NAM01-SN1-obe.outbound.protection.outlook.com (mail-sn1nam01on0053.outbound.protection.outlook.com [104.47.32.53]) by dpdk.org (Postfix) with ESMTP id 8CEFF567B for ; Sat, 23 Jul 2016 14:36:13 +0200 (CEST) Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836B81316@irsmsx105.ger.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Sat, Jul 23, 2016 at 12:32:01PM +0000, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > > Sent: Saturday, July 23, 2016 12:49 PM > > To: Ananyev, Konstantin > > Cc: Thomas Monjalon ; Juhamatti Kuusisaari ; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH] lib: change rte_ring dequeue to guarantee ordering before tail update > > > > On Sat, Jul 23, 2016 at 11:15:27AM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > -----Original Message----- > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > > > > Sent: Saturday, July 23, 2016 11:39 AM > > > > To: Ananyev, Konstantin > > > > Cc: Thomas Monjalon ; Juhamatti > > > > Kuusisaari ; dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH] lib: change rte_ring dequeue to > > > > guarantee ordering before tail update > > > > > > > > On Sat, Jul 23, 2016 at 10:14:51AM +0000, Ananyev, Konstantin wrote: > > > > > Hi lads, > > > > > > > > > > > On Sat, Jul 23, 2016 at 11:02:33AM +0200, Thomas Monjalon wrote: > > > > > > > 2016-07-23 8:05 GMT+02:00 Jerin Jacob : > > > > > > > > On Thu, Jul 21, 2016 at 11:26:50PM +0200, Thomas Monjalon wrote: > > > > > > > >> > > Consumer queue dequeuing must be guaranteed to be done > > > > > > > >> > > fully before the tail is updated. This is not > > > > > > > >> > > guaranteed with a read barrier, changed to a write > > > > > > > >> > > barrier just before tail update which in > > > > > > practice guarantees correct order of reads and writes. > > > > > > > >> > > > > > > > > > >> > > Signed-off-by: Juhamatti Kuusisaari > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > Acked-by: Konstantin Ananyev > > > > > > > >> > > > > > > > > >> > > > > > > > >> Applied, thanks > > > > > > > > > > > > > > > > There was ongoing discussion on this > > > > > > > > http://dpdk.org/ml/archives/dev/2016-July/044168.html > > > > > > > > > > > > > > Sorry Jerin, I forgot this email. > > > > > > > The problem is that nobody replied to your email and you did > > > > > > > not nack the v2 of this patch. > > > > > > > > > > It's probably my bad. > > > > > I acked the patch before Jerin response, and forgot to reply later. > > > > > > > > > > > > > > > > > > > > This change may not be required as it has the performance impact. > > > > > > > > > > > > > > We need to clearly understand what is the performance impact > > > > > > > (numbers and use cases) on one hand, and is there a real bug > > > > > > > fixed by this patch on the other hand? > > > > > > > > > > > > IHMO, there is no real bug here. rte_smb_rmb() provides the > > > > > > LOAD-STORE barrier to make sure tail pointer WRITE happens only after prior LOADS. > > > > > > > > > > Yep, from what I read at the link Jerin provided, indeed it seems rte_smp_rmb() is enough for the arm arch here... > > > > > For ppc, as I can see both rte_smp_rmb()/rte_smp_wmb() emits the same instruction. > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > Wonder how big is a performance impact? > > > > > > > > With this change we need to wait for addtional STORES to be completed to local buffer in addtion to LOADS from ring buffers memory. > > > > > > I understand that, just wonder did you see any real performance difference? > > > > Yeah... > > Ok, then I don't see any good reason why we shouldn't revert it. > I suppose the best way would be to submit a new patch for RC5 to revert the changes. > Do you prefer to submit it yourself and I'll ack it or visa-versa? OK. I will submit it then > Thanks > Konstantin > > > > > > Probably with ring_perf_autotest/mempool_perf_autotest or something? > > > > W/O change > > RTE>>ring_perf_autotest > > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 4 MP/MC single enq/dequeue: 16 SP/SC burst enq/dequeue > > (size: 8): 0 MP/MC burst enq/dequeue (size: 8): 2 SP/SC burst enq/dequeue (size: 32): 0 MP/MC burst enq/dequeue (size: 32): 0 > > > > ### Testing empty dequeue ### > > SC empty dequeue: 0.35 > > MC empty dequeue: 0.60 > > > > ### Testing using a single lcore ### > > SP/SC bulk enq/dequeue (size: 8): 0.93 > > MP/MC bulk enq/dequeue (size: 8): 2.45 > > SP/SC bulk enq/dequeue (size: 32): 0.58 > > MP/MC bulk enq/dequeue (size: 32): 0.97 > > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 1.89 MP/MC bulk enq/dequeue (size: 8): 4.28 SP/SC bulk > > enq/dequeue (size: 32): 0.90 MP/MC bulk enq/dequeue (size: 32): 1.19 Test OK > > RTE>> > > > > With change > > RTE>>ring_perf_autotest > > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 6 MP/MC single enq/dequeue: 16 SP/SC burst enq/dequeue > > (size: 8): 1 MP/MC burst enq/dequeue (size: 8): 2 SP/SC burst enq/dequeue (size: 32): 0 MP/MC burst enq/dequeue (size: 32): 0 > > > > ### Testing empty dequeue ### > > SC empty dequeue: 0.35 > > MC empty dequeue: 0.60 > > > > ### Testing using a single lcore ### > > SP/SC bulk enq/dequeue (size: 8): 1.28 > > MP/MC bulk enq/dequeue (size: 8): 2.47 > > SP/SC bulk enq/dequeue (size: 32): 0.64 > > MP/MC bulk enq/dequeue (size: 32): 0.97 > > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 2.08 MP/MC bulk enq/dequeue (size: 8): 4.29 SP/SC bulk > > enq/dequeue (size: 32): 1.24 MP/MC bulk enq/dequeue (size: 32): 1.19 Test OK > > > > > Konstantin > > > > > > > > > > > > If there is a real one, I suppose we can revert the patch? > > > > > > > > Request to revert this one as their no benifts for other > > > > architectures and indeed it creates addtional delay in waiting for STORES to complete in ARM. > > > > Lets do the correct thing by reverting it. > > > > > > > > Jerin > > > > > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > > > > > Please guys make things clear and we'll revert if needed.