From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932635AbbJPQfh (ORCPT ); Fri, 16 Oct 2015 12:35:37 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:40975 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932455AbbJPQff (ORCPT ); Fri, 16 Oct 2015 12:35:35 -0400 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 16 Oct 2015 09:28:24 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Will Deacon , linux-kernel@vger.kernel.org, Oleg Nesterov , Ingo Molnar Subject: Re: Q: schedule() and implied barriers on arm64 Message-ID: <20151016162824.GS3910@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151016151830.GZ3816@twins.programming.kicks-ass.net> <20151016160422.GQ3910@linux.vnet.ibm.com> <20151016161608.GA3816@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151016161608.GA3816@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15101616-0009-0000-0000-00000EF5FF99 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 16, 2015 at 06:16:08PM +0200, Peter Zijlstra wrote: > On Fri, Oct 16, 2015 at 09:04:22AM -0700, Paul E. McKenney wrote: > > On Fri, Oct 16, 2015 at 05:18:30PM +0200, Peter Zijlstra wrote: > > > Hi, > > > > > > IIRC Paul relies on schedule() implying a full memory barrier with > > > strong transitivity for RCU. > > > > > > If not, ignore this email. > > > > Not so sure about schedule(), but definitely need strong transitivity > > for the rcu_node structure's ->lock field. And the atomic operations > > on the rcu_dyntick structure's fields when entering or leaving the > > idle loop. > > > > With schedule, the thread later reports the quiescent state, which > > involves acquiring the rcu_node structure's ->lock field. So I -think- > > that the locks in the scheduler can be weakly transitive. > > So I _thought_ you needed this to separate the preempt_disabled > sections. Such that rcu_note_context_switch() is guaranteed to be done > before a new preempt_disabled region starts. > > But if you really only need program order guarantees for that, and deal > with everything else from your tick, then that's fine too. > > Maybe some previous RCU variant relied on this? Yes, older versions did rely on this. Now, only the CPU itself observes RCU's state changes during context switch. I couldn't tell you exactly when this changed. :-/ With the exception of some synchronize_sched_expedited() cases, but in those cases, RCU code acquires the CPU's leaf rcu_node structure's ->lock, and with the required strong transitivity. > > > If so, however, I suspect AARGH64 is borken and would need (just like > > > PPC): > > > > > > #define smp_mb__before_spinlock() smp_mb() > > > > > > The problem is that schedule() (when a NO-OP) does: > > > > > > smp_mb__before_spinlock(); > > > LOCK rq->lock > > > > > > clear_bit() > > > > > > UNLOCK rq->lock > > > > > > And nothing there implies a full barrier on AARGH64, since > > > smp_mb__before_spinlock() defaults to WMB, LOCK is an "ldaxr" or > > > load-acquire, UNLOCK is "stlrh" or store-release and clear_bit() isn't > > > anything. > > > > > > Pretty much every other arch has LOCK implying a full barrier, either > > > because its strongly ordered or because it needs one for the ACQUIRE > > > semantics. > > > > But I thought that it used a dmb in the spinlock code somewhere or > > another... > > arm does, arm64 not so much. > > > Well, arm64 might well need smp_mb__after_unlock_lock() to be non-empty. > > Its UNLOCK+LOCK should be RCsc, so that should be good. Its just that > LOCK+UNLOCK isn't anything. Ah! If RCU relies on LOCK+UNLOCK being a barrier of any sort, that is a bug in RCU that needs fixing. Thanx, Paul