From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from co9outboundpool.messaging.microsoft.com (co9ehsobe005.messaging.microsoft.com [207.46.163.28]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "mail.global.frontbridge.com", Issuer "MSIT Machine Auth CA 2" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 0EBEC2C00E4 for ; Fri, 11 Oct 2013 10:25:42 +1100 (EST) Message-ID: <1381447532.7979.488.camel@snotra.buserror.net> Subject: Re: [PATCH v2 1/3] powerpc/booke64: add sync after writing PTE From: Scott Wood To: Benjamin Herrenschmidt Date: Thu, 10 Oct 2013 18:25:32 -0500 In-Reply-To: <1381444273.7979.473.camel@snotra.buserror.net> References: <1379130622-17436-1-git-send-email-scottwood@freescale.com> <1379281131.4098.48.camel@pasglop> <1379376371.2536.218.camel@snotra.buserror.net> <1381444273.7979.473.camel@snotra.buserror.net> Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Cc: linuxppc-dev@lists.ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, 2013-10-10 at 17:31 -0500, Scott Wood wrote: > On Mon, 2013-09-16 at 19:06 -0500, Scott Wood wrote: > > On Mon, 2013-09-16 at 07:38 +1000, Benjamin Herrenschmidt wrote: > > > On Fri, 2013-09-13 at 22:50 -0500, Scott Wood wrote: > > > > The ISA says that a sync is needed to order a PTE write with a > > > > subsequent hardware tablewalk lookup. On e6500, without this sync > > > > we've been observed to die with a DSI due to a PTE write not being seen > > > > by a subsequent access, even when everything happens on the same > > > > CPU. > > > > > > This is gross, I didn't realize we had that bogosity in the > > > architecture... > > > > > > Did you measure the performance impact ? > > > > I didn't see a noticeable impact on the tests I ran, but those were > > aimed at measuring TLB miss overhead. I'll need to try it with a > > benchmark that's more oriented around lots of page table updates. > > Lmbench's fork test runs about 2% slower with the sync. I've been told > that nothing relevant has changed since we saw the failure during > emulation; it's probably luck and/or timing, or maybe a sync got added > somewhere else since then? I think it's only really a problem for > kernel page tables, since user page tables will retry if do_page_fault() > sees a valid PTE. So maybe we should put an mb() in map_kernel_page() > instead. Looking at some of the code in mm/, I suspect that the normal callers of set_pte_at() already have an unlock (and thus a sync) already, so we may not even be relying on those retries. Certainly some of them do; it would take some effort to verify all of them. Also, without such a sync in map_kernel_page(), even with software tablewalk, couldn't we theoretically have a situation where a store to pointer X that exposes a new mapping gets reordered before the PTE store as seen by another CPU? The other CPU could see non-NULL X and dereference it, but get the stale PTE. Callers of ioremap() generally don't do a barrier of their own prior to exposing the result. -Scott