From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751745AbbCQBpf (ORCPT ); Mon, 16 Mar 2015 21:45:35 -0400 Received: from mail.efficios.com ([78.47.125.74]:54449 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750839AbbCQBpc (ORCPT ); Mon, 16 Mar 2015 21:45:32 -0400 Date: Tue, 17 Mar 2015 01:45:25 +0000 (UTC) From: Mathieu Desnoyers To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, KOSAKI Motohiro , Steven Rostedt , "Paul E. McKenney" , Nicholas Miell , Linus Torvalds , Ingo Molnar , Alan Cox , Lai Jiangshan , Stephen Hemminger , Andrew Morton , Josh Triplett , Thomas Gleixner , David Howells , Nick Piggin Message-ID: <910572156.13900.1426556725438.JavaMail.zimbra@efficios.com> In-Reply-To: <20150316205435.GJ21418@twins.programming.kicks-ass.net> References: <1426447459-28620-1-git-send-email-mathieu.desnoyers@efficios.com> <20150316141939.GE21418@twins.programming.kicks-ass.net> <1203077851.9491.1426520636551.JavaMail.zimbra@efficios.com> <20150316172104.GH21418@twins.programming.kicks-ass.net> <1003922584.10662.1426532015839.JavaMail.zimbra@efficios.com> <20150316205435.GJ21418@twins.programming.kicks-ass.net> Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [192.222.176.50] X-Mailer: Zimbra 8.0.7_GA_6021 (ZimbraWebClient - FF36 (Linux)/8.0.7_GA_6021) Thread-Topic: sys_membarrier(): system/process-wide memory barrier (x86) (v12) Thread-Index: ULPnYUJE5cjtdIM5yUBlhvJFlRm5uA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- Original Message ----- > From: "Peter Zijlstra" > To: "Mathieu Desnoyers" > Cc: linux-kernel@vger.kernel.org, "KOSAKI Motohiro" , "Steven Rostedt" > , "Paul E. McKenney" , "Nicholas Miell" , > "Linus Torvalds" , "Ingo Molnar" , "Alan Cox" > , "Lai Jiangshan" , "Stephen Hemminger" > , "Andrew Morton" , "Josh Triplett" , > "Thomas Gleixner" , "David Howells" , "Nick Piggin" > Sent: Monday, March 16, 2015 4:54:35 PM > Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12) > > On Mon, Mar 16, 2015 at 06:53:35PM +0000, Mathieu Desnoyers wrote: > > > I'm not entirely awake atm but I'm not seeing why it would need to be > > > that strict; I think the current single MB on task switch is sufficient > > > because if we're in the middle of schedule, userspace isn't actually > > > running. > > > > > > So from the point of userspace the task switch is atomic. Therefore even > > > if we do not get a barrier before setting ->curr, the expedited thing > > > missing us doesn't matter as userspace cannot observe the difference. > > > > AFAIU, atomicity is not what matters here. It's more about memory ordering. > > What is guaranteeing that upon entry in kernel-space, all prior memory > > accesses (loads and stores) are ordered prior to following loads/stores ? > > > > The same applies when returning to user-space: what is guaranteeing that > > all > > prior loads/stores are ordered before the user-space loads/stores performed > > after returning to user-space ? > > You're still one step ahead of me; why does this matter? > > Or put it another way; what can go wrong? By virtue of being in > schedule() both tasks (prev and next) get an affective MB from the task > switch. > > So even if we see the 'wrong' rq->curr, that CPU will still observe the > MB by the time it gets to userspace. > > All of this is really only about userspace load/store ordering and the > context switch already very much needs to guarantee userspace program > order in the face of context switches. Let's go through a memory ordering scenario to highlight my reasoning there. Let's consider the following memory barrier scenario performed in user-space on an architecture with very relaxed ordering. PowerPC comes to mind. https://lwn.net/Articles/573436/ scenario 12: CPU 0 CPU 1 CAO(x) = 1; r3 = CAO(y); cmm_smp_wmb(); cmm_smp_rmb(); CAO(y) = 1; r4 = CAO(x); BUG_ON(r3 == 1 && r4 == 0) We tweak it to use sys_membarrier on CPU 1, and a simple compiler barrier() on CPU 0: CPU 0 CPU 1 CAO(x) = 1; r3 = CAO(y); barrier(); sys_membarrier(); CAO(y) = 1; r4 = CAO(x); BUG_ON(r3 == 1 && r4 == 0) Now if CPU 1 executes sys_membarrier while CPU 0 is preempted after both stores, we have: CPU 0 CPU 1 CAO(x) = 1; [1st store is slow to reach other cores] CAO(y) = 1; [2nd store reaches other cores more quickly] [preempted] r3 = CAO(y) (may see y = 1) sys_membarrier() Scheduler changes rq->curr. skips CPU 0, because rq->curr has been updated. [return to userspace] r4 = CAO(x) (may see x = 0) BUG_ON(r3 == 1 && r4 == 0) -> fails. load_cr3, with implied memory barrier, comes after CPU 1 has read "x". The only way to make this scenario work is if a memory barrier is added before updating rq->curr. (we could also do a similar scenario for the needed barrier after store to rq->curr). > > > > > In order to be able to dereference rq->curr->mm without holding the > > > > rq->lock, do you envision we should protect task reclaim with RCU-sched > > > > ? > > > > > > A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I > > > think Oleg did mention it would still be 'interesting'. I've not yet had > > > time to really think about that. > > > > This might be an "interesting" modification. :) This could perhaps come > > as an optimization later on ? > > Not really, again, take this for (;;) sys_membar(EXPEDITED) that'll > generate horrendous rq lock contention, with or without the PRIVATE > thing it'll pound a number of rq locks real bad. > > Typical scheduler syscalls only affect a single rq lock at a time -- the > one the task is on. This one potentially pounds all of them. Would you see it as acceptable if we start by implementing only the non-expedited sys_membarrier() ? Then we can add the expedited-private implementation after rq->curr becomes available through RCU. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com