From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:50452 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752486AbdKGTg1 (ORCPT ); Tue, 7 Nov 2017 14:36:27 -0500 Message-ID: <1510083384.3118.29.camel@HansenPartnership.com> Subject: Re: [PATCH v4] lib/dlock-list: Scale dlock_lists_empty() From: James Bottomley To: Waiman Long , Andreas Dilger , Jan Kara Cc: Davidlohr Bueso , Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Boqun Feng Date: Tue, 07 Nov 2017 11:36:24 -0800 In-Reply-To: <4486fb94-a9fc-5bee-5241-e1e7558eeaa7@redhat.com> References: <1509475860-16139-1-git-send-email-longman@redhat.com> <1509475860-16139-2-git-send-email-longman@redhat.com> <20171102170431.oq3i5mxtjcg53uot@linux-n805> <81bb3365-63f3-fea8-d238-e3880a4c8033@redhat.com> <20171103133420.pngmrsfmtimataz4@linux-n805> <20171103142254.d55bu2n44xe4aruf@linux-n805> <20171106184708.kmwfcchjwjzucuja@linux-n805> <20171107115921.GC11391@quack2.suse.cz> <4486fb94-a9fc-5bee-5241-e1e7558eeaa7@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, 2017-11-07 at 13:57 -0500, Waiman Long wrote: > On 11/07/2017 12:59 PM, Andreas Dilger wrote: > > > > On Nov 7, 2017, at 4:59 AM, Jan Kara wrote: > > > > > > On Mon 06-11-17 10:47:08, Davidlohr Bueso wrote: > > > > > > > > + /* > > > > +  * Serialize dlist->used_lists such that a 0->1 > > > > transition is not > > > > +  * missed by another thread checking if any of the > > > > dlock lists are > > > > +  * used. > > > > +  * > > > > +  * CPU0     CPU1 > > > > +  * > > > > dlock_list_add()                 dlock_lists_empty() > > > > +  *   [S] atomic_inc(used_lists); > > > > +  *       smp_mb__after_atomic(); > > > > +  *   smp_mb__be > > > > fore_atomic(); > > > > +  *       [L] > > > > atomic_read(used_lists) > > > > +  *       list_add() > > > > +  */ > > > > + smp_mb__before_atomic(); > > > > + return !atomic_read(&dlist->used_lists); > > Just a general kernel programming question here - I thought the > > whole point of atomics is that they are, well, atomic across all > > CPUs so there is no need for a memory barrier?  If there is a need > > for a memory barrier for each atomic access (assuming it isn't > > accessed under another lock, which would make the use of atomic > > types pointless, IMHO) then I'd think there is a lot of code in the > > kernel that isn't doing this properly. > > > > What am I missing here? > > Atomic update and memory barrier are 2 different things. Atomic > update means other CPUs see either the value before or after the > update. They won't see anything in between. For a counter, it means > we won't miss any counts. However, not all atomic operations give an > ordering guarantee. The atomic_read() and atomic_inc() are examples > that do not provide memory ordering guarantee. See > Documentation/memory-barriers.txt for more information about it. > > A CPU can perform atomic operations 1 & 2 in program order, but other > CPUs may see operation 2 first before operation 1. Here memory > barrier can be used to guarantee that other CPUs see the memory > updates in certain order. There's an omission here which I think Andreas may have been referring to:  atomic_inc/dec operations *are* strongly ordered with respect to each other, so if two CPUs are simultaneously executing atomic_inc, the order in which they execute isn't guaranteed, but it is guaranteed that the losing atomic_inc will not begin until the winning one is completed, so after both are done the value will have +2.  So although atomic_read and atomic_inc have no ordering guarantee at all (the point of the barrier above), if you're looking at the return values of atomic_inc/dec you don't need a barrier because regardless of which order the CPUs go in, they'll see distinct values (we use this for reference counting). James