From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751776AbbCQGb2 (ORCPT ); Tue, 17 Mar 2015 02:31:28 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:58189 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750958AbbCQGb1 (ORCPT ); Tue, 17 Mar 2015 02:31:27 -0400 Date: Tue, 17 Mar 2015 07:30:59 +0100 From: Peter Zijlstra To: Mathieu Desnoyers Cc: linux-kernel@vger.kernel.org, KOSAKI Motohiro , Steven Rostedt , "Paul E. McKenney" , Nicholas Miell , Linus Torvalds , Ingo Molnar , Alan Cox , Lai Jiangshan , Stephen Hemminger , Andrew Morton , Josh Triplett , Thomas Gleixner , David Howells , Nick Piggin Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12) Message-ID: <20150317063059.GJ2896@worktop.programming.kicks-ass.net> References: <1426447459-28620-1-git-send-email-mathieu.desnoyers@efficios.com> <20150316141939.GE21418@twins.programming.kicks-ass.net> <1203077851.9491.1426520636551.JavaMail.zimbra@efficios.com> <20150316172104.GH21418@twins.programming.kicks-ass.net> <1003922584.10662.1426532015839.JavaMail.zimbra@efficios.com> <20150316205435.GJ21418@twins.programming.kicks-ass.net> <910572156.13900.1426556725438.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <910572156.13900.1426556725438.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 17, 2015 at 01:45:25AM +0000, Mathieu Desnoyers wrote: > Let's go through a memory ordering scenario to highlight my reasoning > there. > > Let's consider the following memory barrier scenario performed in > user-space on an architecture with very relaxed ordering. PowerPC comes > to mind. > > https://lwn.net/Articles/573436/ > scenario 12: > > CPU 0 CPU 1 > CAO(x) = 1; r3 = CAO(y); > cmm_smp_wmb(); cmm_smp_rmb(); > CAO(y) = 1; r4 = CAO(x); > > BUG_ON(r3 == 1 && r4 == 0) WTF is CAO() ? and that ridiculous cmm_ prefix on the barriers. > We tweak it to use sys_membarrier on CPU 1, and a simple compiler > barrier() on CPU 0: > > CPU 0 CPU 1 > CAO(x) = 1; r3 = CAO(y); > barrier(); sys_membarrier(); > CAO(y) = 1; r4 = CAO(x); > > BUG_ON(r3 == 1 && r4 == 0) That hardly seems like a valid substitution; barrier() is not a valid replacement of a memory barrier is it? Esp not on PPC. > Now if CPU 1 executes sys_membarrier while CPU 0 is preempted after both > stores, we have: > > CPU 0 CPU 1 > CAO(x) = 1; > [1st store is slow to > reach other cores] > CAO(y) = 1; > [2nd store reaches other > cores more quickly] > [preempted] > r3 = CAO(y) > (may see y = 1) > sys_membarrier() > Scheduler changes rq->curr. > skips CPU 0, because rq->curr has > been updated. > [return to userspace] > r4 = CAO(x) > (may see x = 0) > BUG_ON(r3 == 1 && r4 == 0) -> fails. > load_cr3, with implied > memory barrier, comes > after CPU 1 has read "x". > > The only way to make this scenario work is if a memory barrier is added > before updating rq->curr. (we could also do a similar scenario for the > needed barrier after store to rq->curr). Hmmm.. like that. Light begins to dawn. So I think in this case we're good with the smp_mb__before_spinlock() we have; but do note its not a full MB even though the name says so. Its basically: WMB + ACQUIRE, which theoretically can leak a read in, but nobody sane _delays_ reads, you want to speculate reads, not postpone. Also, it lacks the transitive property. > Would you see it as acceptable if we start by implementing > only the non-expedited sys_membarrier() ? Sure. > Then we can add > the expedited-private implementation after rq->curr becomes > available through RCU. Yeah, or not at all; I'm still trying to get Paul to remove the expedited nonsense from the kernel RCU bits; and now you want it in userspace too :/