From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [PATCH v5 4/4] MCS Lock: Barrier corrections Date: Tue, 19 Nov 2013 11:32:25 -0800 Message-ID: <20131119193224.GT4138@linux.vnet.ibm.com> References: <1383940358.11046.417.camel@schen9-DESK> <20131111181049.GL28302@mudshark.cambridge.arm.com> <1384204673.10046.6.camel@schen9-mobl3> <52818B07.70000@hp.com> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <52818B07.70000@hp.com> Sender: owner-linux-mm@kvack.org To: Waiman Long Cc: Tim Chen , Peter Zijlstra , Will Deacon , Ingo Molnar , Andrew Morton , Thomas Gleixner , "linux-kernel@vger.kernel.org" , linux-mm , "linux-arch@vger.kernel.org" , Linus Torvalds , Andrea Arcangeli , Alex Shi , Andi Kleen , Michel Lespinasse , Davidlohr Bueso , Matthew R Wilcox , Dave Hansen , Rik van Riel , Peter Hurley List-Id: linux-arch.vger.kernel.org On Mon, Nov 11, 2013 at 08:57:27PM -0500, Waiman Long wrote: > On 11/11/2013 04:17 PM, Tim Chen wrote: > >>You could then augment that with [cmp]xchg_{acquire,release} as > >>appropriate. > >> > >>>+/* > >>> * In order to acquire the lock, the caller should declare a local node and > >>> * pass a reference of the node to this function in addition to the lock. > >>> * If the lock has already been acquired, then this will proceed to spin > >>>@@ -37,15 +62,19 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) > >>> node->locked = 0; > >>> node->next = NULL; > >>> > >>>- prev = xchg(lock, node); > >>>+ /* xchg() provides a memory barrier */ > >>>+ prev = xchg_acquire(lock, node); > >>> if (likely(prev == NULL)) { > >>> /* Lock acquired */ > >>> return; > >>> } > >>> ACCESS_ONCE(prev->next) = node; > >>>- smp_wmb(); > >>>- /* Wait until the lock holder passes the lock down */ > >>>- while (!ACCESS_ONCE(node->locked)) > >>>+ /* > >>>+ * Wait until the lock holder passes the lock down. > >>>+ * Using smp_load_acquire() provides a memory barrier that > >>>+ * ensures subsequent operations happen after the lock is acquired. > >>>+ */ > >>>+ while (!(smp_load_acquire(&node->locked))) > >>> arch_mutex_cpu_relax(); > >An alternate implementation is > > while (!ACCESS_ONCE(node->locked)) > > arch_mutex_cpu_relax(); > > smp_load_acquire(&node->locked); > > > >Leaving the smp_load_acquire at the end to provide appropriate barrier. > >Will that be acceptable? > > > >Tim > > I second Tim's opinion. It will be help to have a > smp_mb_load_acquire() function that provide a memory barrier with > load-acquire semantic. I don't think we need one for store-release > as that will not be in a loop. Hmmm... I guess the ACCESS_ONCE() in the smp_load_acquire() should prevent it from being optimized away. But yes, you then end up with an extra load on the critical lock hand-off patch. And something like an smp_mb_acquire() could then be useful, although I believe that on all current hardware smp_mb_acquire() emits the same code as would an smp_mb_release(): o barrier() on TSO systems such as x86 and s390. o lwsync instruction on powerpc. (Really old systems would want a different instruction for smp_mb_acquire(), but let's not optimize for really old systems.) o dmb instruction on ARM. o mf instruction on ia64. So how about an smp_mb_acquire_release() to cover both use cases? This could be used to further optimize circular buffers, for example. Thanx, Paul > Peter, what do you think about adding that to your patch? > > -Longman > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org