From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751776AbbCQGb2 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 17 Mar 2015 02:31:28 -0400
Received: from bombadil.infradead.org ([198.137.202.9]:58189 "EHLO
	bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750958AbbCQGb1 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 17 Mar 2015 02:31:27 -0400
Date: Tue, 17 Mar 2015 07:30:59 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-kernel@vger.kernel.org,
        KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Nicholas Miell <nmiell@comcast.net>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Ingo Molnar <mingo@redhat.com>, Alan Cox <gnomes@lxorguk.ukuu.org.uk>,
        Lai Jiangshan <laijs@cn.fujitsu.com>,
        Stephen Hemminger <stephen@networkplumber.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Josh Triplett <josh@joshtriplett.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        David Howells <dhowells@redhat.com>, Nick Piggin <npiggin@kernel.dk>
Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier
 (x86) (v12)
Message-ID: <20150317063059.GJ2896@worktop.programming.kicks-ass.net>
References: <1426447459-28620-1-git-send-email-mathieu.desnoyers@efficios.com>
 <20150316141939.GE21418@twins.programming.kicks-ass.net>
 <1203077851.9491.1426520636551.JavaMail.zimbra@efficios.com>
 <20150316172104.GH21418@twins.programming.kicks-ass.net>
 <1003922584.10662.1426532015839.JavaMail.zimbra@efficios.com>
 <20150316205435.GJ21418@twins.programming.kicks-ass.net>
 <910572156.13900.1426556725438.JavaMail.zimbra@efficios.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <910572156.13900.1426556725438.JavaMail.zimbra@efficios.com>
User-Agent: Mutt/1.5.22.1 (2013-10-16)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Mar 17, 2015 at 01:45:25AM +0000, Mathieu Desnoyers wrote:
> Let's go through a memory ordering scenario to highlight my reasoning
> there.
> 
> Let's consider the following memory barrier scenario performed in
> user-space on an architecture with very relaxed ordering. PowerPC comes
> to mind.
> 
> https://lwn.net/Articles/573436/
> scenario 12:
> 
> CPU 0                   CPU 1
> CAO(x) = 1;             r3 = CAO(y);
> cmm_smp_wmb();          cmm_smp_rmb();
> CAO(y) = 1;             r4 = CAO(x);
> 
> BUG_ON(r3 == 1 && r4 == 0)

WTF is CAO() ? and that ridiculous cmm_ prefix on the barriers.

> We tweak it to use sys_membarrier on CPU 1, and a simple compiler
> barrier() on CPU 0:
> 
> CPU 0                   CPU 1
> CAO(x) = 1;             r3 = CAO(y);
> barrier();              sys_membarrier();
> CAO(y) = 1;             r4 = CAO(x);
> 
> BUG_ON(r3 == 1 && r4 == 0)

That hardly seems like a valid substitution; barrier() is not a valid
replacement of a memory barrier is it? Esp not on PPC.

> Now if CPU 1 executes sys_membarrier while CPU 0 is preempted after both
> stores, we have:
> 
> CPU 0                           CPU 1
> CAO(x) = 1;
>   [1st store is slow to
>    reach other cores]
> CAO(y) = 1;
>   [2nd store reaches other
>    cores more quickly]
> [preempted]
>                                 r3 = CAO(y)
>                                   (may see y = 1)
>                                 sys_membarrier()
> Scheduler changes rq->curr.
>                                 skips CPU 0, because rq->curr has
>                                   been updated.
>                                 [return to userspace]
>                                 r4 = CAO(x)
>                                   (may see x = 0)
>                                 BUG_ON(r3 == 1 && r4 == 0) -> fails.
> load_cr3, with implied
>   memory barrier, comes
>   after CPU 1 has read "x".
> 
> The only way to make this scenario work is if a memory barrier is added
> before updating rq->curr. (we could also do a similar scenario for the
> needed barrier after store to rq->curr).

Hmmm.. like that. Light begins to dawn.

So I think in this case we're good with the smp_mb__before_spinlock() we
have; but do note its not a full MB even though the name says so.

Its basically: WMB + ACQUIRE, which theoretically can leak a read in,
but nobody sane _delays_ reads, you want to speculate reads, not
postpone.

Also, it lacks the transitive property.

> Would you see it as acceptable if we start by implementing
> only the non-expedited sys_membarrier() ?

Sure.

> Then we can add
> the expedited-private implementation after rq->curr becomes
> available through RCU.

Yeah, or not at all; I'm still trying to get Paul to remove the
expedited nonsense from the kernel RCU bits; and now you want it in
userspace too :/