From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753461AbcIAIpi (ORCPT ); Thu, 1 Sep 2016 04:45:38 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:56011 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752706AbcIAIoq (ORCPT ); Thu, 1 Sep 2016 04:44:46 -0400 Date: Thu, 1 Sep 2016 10:44:35 +0200 From: Peter Zijlstra To: Manfred Spraul Cc: Will Deacon , benh@kernel.crashing.org, paulmck@linux.vnet.ibm.com, Ingo Molnar , Boqun Feng , Andrew Morton , LKML , 1vier1@web.de, Davidlohr Bueso Subject: Re: [PATCH 1/4] spinlock: Document memory barrier rules Message-ID: <20160901084435.GN10153@twins.programming.kicks-ass.net> References: <1472385376-8801-1-git-send-email-manfred@colorfullife.com> <1472385376-8801-2-git-send-email-manfred@colorfullife.com> <20160829104815.GI10153@twins.programming.kicks-ass.net> <968e4c62-4486-a6aa-8fdf-67ff9b05a330@colorfullife.com> <20160829134424.GS10153@twins.programming.kicks-ass.net> <4859166f-ff39-e998-638b-6bf6912422a3@colorfullife.com> <20160831154049.GY10121@twins.programming.kicks-ass.net> <20160831164020.GG29505@arm.com> <80de24e3-fa01-a6d6-99e9-afd1e831e07b@colorfullife.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <80de24e3-fa01-a6d6-99e9-afd1e831e07b@colorfullife.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 31, 2016 at 08:32:18PM +0200, Manfred Spraul wrote: > On 08/31/2016 06:40 PM, Will Deacon wrote: > >The litmus test then looks a bit like: > > > >CPUm: > > > >LOCK(x) > >smp_mb(); > >RyAcq=0 > > > > > >CPUn: > > > >Wy=1 > >smp_mb(); > >UNLOCK_WAIT(x) > Correct. > > > >which I think can be simplified to: > > > > > >LOCK(x) > I thought that here a barrier is required, because Ry=0 could be before > store of the lock. > >Ry=0 > RyAcq instead of Ry would required due to the unlock at the end of the > critical section > CpuN: <...> > WyRelease=0 > for the litmus test irrelevant. > >Wy=1 > >smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64 > >LOCK(x) // spin_unlock_wait behaves like lock; unlock > >UNLOCK(x) > > >[I've removed a bunch of barriers here, that I don't think are necessary > > for the guarantees you're after] > > > >and the question is "Can both CPUs proceed?". > > > >Looking at the above, then I don't think that they can. Whilst CPUm can > >indeed speculate the Ry=0 before successfully taking the lock, if CPUn > >observes CPUm's read, then it must also observe the lock being held wrt > >the spin_lock API. That is because a successful LOCK operation by CPUn > >would force CPUm to replay its LL/SC loop and therefore discard its > >speculation of y. > > > >What am I missing? The code snippet seems to have too many barriers to me! > spin_unlock_wait() is not necessarily lock()+unlock(). > It can be a simple Rx, or now RxAcq. Can be, normally, yes. But on power and arm64, the only architectures on which the ACQUIRE is 'funny' they do the 'pointless' ll/sc cycle in spin_unlock_wait() to 'fix' things. So for both power and arm64, you can in fact model spin_unlock_wait() as LOCK+UNLOCK. All the other archs have (so far) 'sensible' ACQUIRE semantics and all this is moot. [ MIPS _could_ possibly do the 'interesting' ACQUIRE too, but so far hasn't introduced their fancy barriers. ] The other interesting case is qspinlock, which does the unordered store in software, but that is after a necessary atomic (ACQUIRE) operation to enqueue, which we exploit in the queued_spin_unlock_wait() to order against. So that too is good, assuming the ACQUIRE is good. Once Power/ARM64 use qspinlock, they'll need to be careful again.