From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0138.outbound.protection.outlook.com [207.46.100.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 9CAB21A09AD for ; Sat, 15 Aug 2015 10:44:36 +1000 (AEST) Message-ID: <1439599463.4099.124.camel@freescale.com> Subject: Re: [PATCH 3/3] powerpc/e6500: hw tablewalk: order the memory access when acquire/release tcd lock From: Scott Wood To: Kevin Hao CC: Date: Fri, 14 Aug 2015 19:44:23 -0500 In-Reply-To: <20150814071357.GI30310@pek-khao-d1.corp.ad.wrs.com> References: <1439466697-18989-1-git-send-email-haokexin@gmail.com> <1439466697-18989-3-git-send-email-haokexin@gmail.com> <1439523559.4099.116.camel@freescale.com> <20150814071357.GI30310@pek-khao-d1.corp.ad.wrs.com> Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 2015-08-14 at 15:13 +0800, Kevin Hao wrote: > On Thu, Aug 13, 2015 at 10:39:19PM -0500, Scott Wood wrote: > > On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote: > > > I didn't find anything unusual. But I think we do need to order the > > > load/store of esel_next when acquire/release tcd lock. For acquire, > > > add a data dependency to order the loads of lock and esel_next. > > > For release, even there already have a "isync" here, but it doesn't > > > guarantee any memory access order. So we still need "lwsync" for > > > the two stores for lock and esel_next. > > > > I was going to say that esel_next is just a hint and it doesn't really > > matter > > if we occasionally get the wrong value, unless it happens often enough to > > cause more performance degradation than the lwsync causes. However, with > > the > > A-008139 workaround we do need to read the same value from esel_next both > > times. It might be less costly to save/restore an additional register > > instead of lwsync, though. > > I will try to get some benchmark number to compare which method is a bit > better. > Do you have any recommended benchmark for a case this is? lmbench lat_mem_rd with a stride chosen to maximize TLB misses. For the uncontended case, one instance; for the contended case, two instances, one pinned to each thread of a core. -Scott