From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933891AbcDLQ7p (ORCPT ); Tue, 12 Apr 2016 12:59:45 -0400 Received: from foss.arm.com ([217.140.101.70]:57194 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932463AbcDLQ7o (ORCPT ); Tue, 12 Apr 2016 12:59:44 -0400 Date: Tue, 12 Apr 2016 17:59:41 +0100 From: Will Deacon To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, waiman.long@hpe.com, mingo@redhat.com, paulmck@linux.vnet.ibm.com, boqun.feng@gmail.com, torvalds@linux-foundation.org, dave@stgolabs.net Subject: Re: [RFC][PATCH 3/3] locking,arm64: Introduce cmpwait() Message-ID: <20160412165941.GG26124@arm.com> References: <20160404122250.340636238@infradead.org> <20160404123633.484451002@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160404123633.484451002@infradead.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, Thanks for looking at this! On Mon, Apr 04, 2016 at 02:22:53PM +0200, Peter Zijlstra wrote: > Provide the cmpwait() primitive, which will 'spin' wait for a variable > to change. Use it to implement smp_cond_load_acquire() and provide an > ARM64 implementation. > > The ARM64 implementation uses LDXR+WFE to avoid most spinning by > letting the hardware go idle while waiting for the exclusive load of > the variable to be cancelled (as anybody changing the value must). > > I've misplaced my arm64 compiler, so this is not even compile tested. Guess what? ;) init/do_mounts_initrd.o: In function `__cmpwait_case_1': /home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:242: multiple definition of `__cmpwait_case_1' init/do_mounts.o:/home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:242: first defined here init/do_mounts_initrd.o: In function `__cmpwait_case_2': /home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:243: multiple definition of `__cmpwait_case_2' init/do_mounts.o:/home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:243: first defined here init/do_mounts_initrd.o: In function `__cmpwait_case_4': /home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:244: multiple definition of `__cmpwait_case_4' init/do_mounts.o:/home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:244: first defined here init/do_mounts_initrd.o: In function `__cmpwait_case_8': /home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:245: multiple definition of `__cmpwait_case_8' init/do_mounts.o:/home/will/work/aarch32/linux/linux/./arch/arm64/include/asm/cmpxchg.h:245: first defined here make[1]: *** [init/mounts.o] Error 1 make: *** [init] Error 2 make: *** Waiting for unfinished jobs.... (and lot of similar errors). Looks like you're just missing an #undef in cmpxchg.h. FWIW, you can pick up arm64 toolchain binaries from: https://releases.linaro.org/components/toolchain/binaries/latest-5/ > Suggested-by: Will Deacon > Signed-off-by: Peter Zijlstra (Intel) > --- > arch/arm64/include/asm/cmpxchg.h | 36 +++++++++++++++++++++++++++++ > include/linux/atomic.h | 47 +++++++++++++++++++++++++++++++++++++++ > include/linux/compiler.h | 30 ------------------------ > 3 files changed, 83 insertions(+), 30 deletions(-) > > --- a/arch/arm64/include/asm/cmpxchg.h > +++ b/arch/arm64/include/asm/cmpxchg.h > @@ -224,4 +224,40 @@ __CMPXCHG_GEN(_mb) > __ret; \ > }) > > +#define __CMPWAIT_GEN(w, sz, name) \ > +void __cmpwait_case_##name(volatile void *ptr, unsigned long val) \ > +{ \ > + unsigned long tmp; \ > + \ > + asm volatile( \ > + " ldxr" #sz "\t%" #w "[tmp], %[v]\n" \ > + " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \ > + " cbnz %" #w "[tmp], 1f\n" \ Shouldn't this be cbz? (i.e. branch over the wfe if the value is equal to what we wanted?). > + " wfe\n" \ > + "1:" \ > + : [tmp] "=&r" (tmp), [val] "=&r" (val), \ We only read val, so it can be an input operand, no? > + [v] "+Q" (*(unsigned long *)ptr)); \ > +} > + > +__CMPWAIT_GEN(w, b, 1); > +__CMPWAIT_GEN(w, h, 2); > +__CMPWAIT_GEN(w, , 4); > +__CMPWAIT_GEN( , , 8); > + > +static inline void __cmpwait(volatile void *ptr, unsigned long val, int size) > +{ > + switch (size) { > + case 1: return __cmpwait_case_1(ptr, val); > + case 2: return __cmpwait_case_2(ptr, val); > + case 4: return __cmpwait_case_4(ptr, val); > + case 8: return __cmpwait_case_8(ptr, val); > + default: BUILD_BUG(); > + } > + > + unreachable(); > +} > + > +#define cmpwait(ptr, val) \ > + __cmpwait((ptr), (unsigned long)(val), sizeof(*(ptr))) We might want to call this cmpwait_relaxed, in case we decide to add fenced versions in the future. Or just make it cmpwait_acquire and remove the smp_rmb() from smp_cond_load_acquire(). Dunno. Will