From mboxrd@z Thu Jan 1 00:00:00 1970 From: peterz@infradead.org (Peter Zijlstra) Date: Mon, 14 Dec 2015 21:28:55 +0100 Subject: FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) In-Reply-To: <20151211223540.GA22277@linux.vnet.ibm.com> References: <20151211084133.GE6356@twins.programming.kicks-ass.net> <20151211120419.GD18828@arm.com> <20151211121319.GK6356@twins.programming.kicks-ass.net> <20151211121759.GE18828@arm.com> <20151211122647.GM6356@twins.programming.kicks-ass.net> <20151211133313.GG18828@arm.com> <20151211134803.GP6356@twins.programming.kicks-ass.net> <20151211223540.GA22277@linux.vnet.ibm.com> Message-ID: <20151214202855.GX6357@twins.programming.kicks-ass.net> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, Dec 11, 2015 at 02:35:40PM -0800, Paul E. McKenney wrote: > On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > > documented too. > > > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > > > A: SC > > > > B: ACQ > > > > C: Relaxed > > > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > > osq_wait_next, but in that case we can even rely on the control > > > > dependency there. > > > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > > because C consists only of stores? > > > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > > same is true for the unlock site. > > I am probably missing quite a bit on this thread, but don't x86 MMIO > accesses to frame buffers need to interact with something more heavyweight > than an x86 release store or acquire load in order to remain confined > to the resulting critical section? So on x86 there really isn't a problem because every atomic op (and there's plenty here) will be a full barrier. That is, even if you were to replace everything with _relaxed() ops, it would still work as 'expected' on x86. ppc/arm64 will crash and burn, but that's another story. But the important point here was that osq_wait_next() is never relied upon to provide either the ACQUIRE semantics for osq_lock() not the RELEASE semantics for osq_unlock(). Those are provided by other ops.