public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leonardo Bras <leobras@redhat.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Leonardo Bras <leobras@redhat.com>, Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>, Guo Ren <guoren@kernel.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Ingo Molnar <mingo@kernel.org>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions
Date: Fri,  5 Jan 2024 01:45:42 -0300	[thread overview]
Message-ID: <ZZeJdjP2gUnTQCl-@LeoBras> (raw)
In-Reply-To: <ZZcoWB_8dumgUn5K@boqun-archlinux>

On Thu, Jan 04, 2024 at 01:51:20PM -0800, Boqun Feng wrote:
> On Thu, Jan 04, 2024 at 05:41:26PM -0300, Leonardo Bras wrote:
> > On Thu, Jan 04, 2024 at 11:53:45AM -0800, Boqun Feng wrote:
> > > On Wed, Jan 03, 2024 at 01:31:59PM -0300, Leonardo Bras wrote:
> > > > In this header every xchg define (_relaxed, _acquire, _release, vanilla)
> > > > contain it's own asm file, both for 4-byte variables an 8-byte variables,
> > > > on a total of 8 versions of mostly the same asm.
> > > > 
> > > > This is usually bad, as it means any change may be done in up to 8
> > > > different places.
> > > > 
> > > > Unify those versions by creating a new define with enough parameters to
> > > > generate any version of the previous 8.
> > > > 
> > > > Then unify the result under a more general define, and simplify
> > > > arch_xchg* generation.
> > > > 
> > > > (This did not cause any change in generated asm)
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > Reviewed-by: Guo Ren <guoren@kernel.org>
> > > > Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
> > > > Tested-by: Guo Ren <guoren@kernel.org>
> > > > ---
> > > >  arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
> > > >  1 file changed, 23 insertions(+), 115 deletions(-)
> > > > 
> > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > > > index 2f4726d3cfcc2..48478a8eecee7 100644
> > > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > > @@ -11,140 +11,48 @@
> > > >  #include <asm/barrier.h>
> > > >  #include <asm/fence.h>
> > > >  
> > > > -#define __xchg_relaxed(ptr, new, size)					\
> > > > +#define __arch_xchg(sfx, prepend, append, r, p, n)			\
> > > >  ({									\
> > > > -	__typeof__(ptr) __ptr = (ptr);					\
> > > > -	__typeof__(new) __new = (new);					\
> > > > -	__typeof__(*(ptr)) __ret;					\
> > > > -	switch (size) {							\
> > > > -	case 4:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			"	amoswap.w %0, %2, %1\n"			\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > 
> > Hello Boqun, thanks for reviewing!
> > 
> > >
> > > Hmm... actually xchg_relaxed() doesn't need to be a barrier(), so the
> > > "memory" clobber here is not needed here. Of course, it's out of the
> > > scope of this series, but I'm curious to see what would happen if we
> > > remove the "memory" clobber _relaxed() ;-)
> > 
> > Nice question :)
> > I am happy my patch can help bring up those ideas :) 
> > 
> > 
> > According to gcc.gnu.org:
> > 
> > ---
> > "memory" [clobber]:
> > 
> >     The "memory" clobber tells the compiler that the assembly code 
> >     performs memory reads or writes to items other than those listed in 
> >     the input and output operands (for example, accessing the memory 
> >     pointed to by one of the input parameters). To ensure memory contains 
> 
> Note here it says "other than those listed in the input and output
> operands", and in the above asm block, the memory pointed by "__ptr" is
> already marked as read-and-write by the asm block via "+A" (*__ptr), so
> the compiler knows the asm block may modify the memory pointed by
> "__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.

Thanks for pointing that out! 
That helped me improve my understanding on constraints for asm operands :)
(I ended up getting even more info from the gcc manual)

So "+A" constraints means the operand will get read/write and it's an 
address stored into a register.

> 
> Here is an example showing the difference, considering the follow case:
> 
> 	this_val = *this;
> 	that_val = *that;
> 	xchg_relaxed(this, 1);
> 	reread_this = *this;
> 
> by the semantics of _relaxed, compilers can optimize the above into
> 
> 	this_val = *this;
> 	xchg_relaxed(this, 1);
> 	that_val = *that;
> 	reread_this = *this;
> 

Seems correct, since there is no barrier().

> but the "memory" clobber in the xchg_relexed() will provide this.

By 'this' here you mean the barrier? I mean, IIUC "memory" clobber will 
avoid the above optimization, right?

> Needless to say the '"+A" (*__ptr)' prevents compiler from the following
> optimization:
> 
> 	this_val = *this;
> 	that_val = *that;
> 	xchg_relaxed(this, 1);
> 	reread_this = this_val;
> 
> since the compiler knows the asm block will read and write *this.
 
Right, the compiler knows that address will be wrote by the asm block, and 
so it reloads the value instead of re-using the old one.


A question, though:
Do we need the "memory" clobber in any other xchg / cmpxchg asm?
I mean, usually the only write to memory will happen in the *__ptr, which 
should be safe by "+A".

I understand that since the others are not "relaxed" they will need to 
have a barrier, but is not the compiler supposed to understand the barrier 
instruction and avoid compiler reordering / optimizations across given 
instruction ?  


Thanks!
Leo

> Regards,
> Boqun
> 
> >     correct values, GCC may need to flush specific register values to 
> >     memory before executing the asm. Further, the compiler does not assume 
> >     that any values read from memory before an asm remain unchanged after 
> >     that asm ; it reloads them as needed. Using the "memory" clobber 
> >     effectively forms a read/write memory barrier for the compiler.
> > 
> >     Note that this clobber does not prevent the processor from doing 
> >     speculative reads past the asm statement. To prevent that, you need 
> >     processor-specific fence instructions.
> > ---
> > 
> > IIUC above text says that having memory accesses to *__ptr would require 
> > above asm to have the "memory" clobber, so memory accesses don't get 
> > reordered by the compiler. 
> > 
> > By above affirmation, all asm in this file should have the "memory" 
> > clobber, since all atomic operations will change memory pointed by an input 
> > ptr. Is that correct?
> > 
> > Thanks!
> > Leo
> > 
> > 
> > > 
> > > Regards,
> > > Boqun
> > > 
> > > > -		break;							\
> > > > -	case 8:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			"	amoswap.d %0, %2, %1\n"			\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > -		break;							\
> > > > -	default:							\
> > > > -		BUILD_BUG();						\
> > > > -	}								\
> > > > -	__ret;								\
> > > > -})
> > > > -
> > > > -#define arch_xchg_relaxed(ptr, x)					\
> > > > -({									\
> > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > -	(__typeof__(*(ptr))) __xchg_relaxed((ptr),			\
> > > > -					    _x_, sizeof(*(ptr)));	\
> > > > +	__asm__ __volatile__ (						\
> > > > +		prepend							\
> > > > +		"	amoswap" sfx " %0, %2, %1\n"			\
> > > > +		append							\
> > > > +		: "=r" (r), "+A" (*(p))					\
> > > > +		: "r" (n)						\
> > > > +		: "memory");						\
> > > >  })
> > > >  
> > > > -#define __xchg_acquire(ptr, new, size)					\
> > > > +#define _arch_xchg(ptr, new, sfx, prepend, append)			\
> > > >  ({									\
> > > >  	__typeof__(ptr) __ptr = (ptr);					\
> > > > -	__typeof__(new) __new = (new);					\
> > > > -	__typeof__(*(ptr)) __ret;					\
> > > > -	switch (size) {							\
> > > > +	__typeof__(*(__ptr)) __new = (new);				\
> > > > +	__typeof__(*(__ptr)) __ret;					\
> > > > +	switch (sizeof(*__ptr)) {					\
> > > >  	case 4:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			"	amoswap.w %0, %2, %1\n"			\
> > > > -			RISCV_ACQUIRE_BARRIER				\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > +		__arch_xchg(".w" sfx, prepend, append,			\
> > > > +			      __ret, __ptr, __new);			\
> > > >  		break;							\
> > > >  	case 8:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			"	amoswap.d %0, %2, %1\n"			\
> > > > -			RISCV_ACQUIRE_BARRIER				\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > +		__arch_xchg(".d" sfx, prepend, append,			\
> > > > +			      __ret, __ptr, __new);			\
> > > >  		break;							\
> > > >  	default:							\
> > > >  		BUILD_BUG();						\
> > > >  	}								\
> > > > -	__ret;								\
> > > > +	(__typeof__(*(__ptr)))__ret;					\
> > > >  })
> > > >  
> > > > -#define arch_xchg_acquire(ptr, x)					\
> > > > -({									\
> > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > -	(__typeof__(*(ptr))) __xchg_acquire((ptr),			\
> > > > -					    _x_, sizeof(*(ptr)));	\
> > > > -})
> > > > +#define arch_xchg_relaxed(ptr, x)					\
> > > > +	_arch_xchg(ptr, x, "", "", "")
> > > >  
> > > > -#define __xchg_release(ptr, new, size)					\
> > > > -({									\
> > > > -	__typeof__(ptr) __ptr = (ptr);					\
> > > > -	__typeof__(new) __new = (new);					\
> > > > -	__typeof__(*(ptr)) __ret;					\
> > > > -	switch (size) {							\
> > > > -	case 4:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			RISCV_RELEASE_BARRIER				\
> > > > -			"	amoswap.w %0, %2, %1\n"			\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > -		break;							\
> > > > -	case 8:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			RISCV_RELEASE_BARRIER				\
> > > > -			"	amoswap.d %0, %2, %1\n"			\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > -		break;							\
> > > > -	default:							\
> > > > -		BUILD_BUG();						\
> > > > -	}								\
> > > > -	__ret;								\
> > > > -})
> > > > +#define arch_xchg_acquire(ptr, x)					\
> > > > +	_arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > > >  
> > > >  #define arch_xchg_release(ptr, x)					\
> > > > -({									\
> > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > -	(__typeof__(*(ptr))) __xchg_release((ptr),			\
> > > > -					    _x_, sizeof(*(ptr)));	\
> > > > -})
> > > > -
> > > > -#define __arch_xchg(ptr, new, size)					\
> > > > -({									\
> > > > -	__typeof__(ptr) __ptr = (ptr);					\
> > > > -	__typeof__(new) __new = (new);					\
> > > > -	__typeof__(*(ptr)) __ret;					\
> > > > -	switch (size) {							\
> > > > -	case 4:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			"	amoswap.w.aqrl %0, %2, %1\n"		\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > -		break;							\
> > > > -	case 8:								\
> > > > -		__asm__ __volatile__ (					\
> > > > -			"	amoswap.d.aqrl %0, %2, %1\n"		\
> > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > -			: "r" (__new)					\
> > > > -			: "memory");					\
> > > > -		break;							\
> > > > -	default:							\
> > > > -		BUILD_BUG();						\
> > > > -	}								\
> > > > -	__ret;								\
> > > > -})
> > > > +	_arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > > >  
> > > >  #define arch_xchg(ptr, x)						\
> > > > -({									\
> > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > -	(__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr)));	\
> > > > -})
> > > > +	_arch_xchg(ptr, x, ".aqrl", "", "")
> > > >  
> > > >  #define xchg32(ptr, x)							\
> > > >  ({									\
> > > > -- 
> > > > 2.43.0
> > > > 
> > > 
> > 
> 


  reply	other threads:[~2024-01-05  4:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03 16:31 [PATCH v1 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
2024-01-03 16:31 ` [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions Leonardo Bras
2024-01-04 19:53   ` Boqun Feng
2024-01-04 20:41     ` Leonardo Bras
2024-01-04 21:51       ` Boqun Feng
2024-01-05  4:45         ` Leonardo Bras [this message]
2024-01-05  5:18           ` Boqun Feng
2024-01-05  6:59             ` Leonardo Bras
2024-01-13  6:54   ` kernel test robot
2024-01-16 19:27     ` Leonardo Bras
2024-01-03 16:32 ` [PATCH v1 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros Leonardo Bras
2024-01-03 16:32 ` [PATCH v1 3/5] riscv/atomic.h : Deduplicate arch_atomic.* Leonardo Bras
2024-01-03 16:32 ` [PATCH v1 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2 Leonardo Bras
2024-01-03 16:32 ` [PATCH v1 5/5] riscv/cmpxchg: Implement xchg " Leonardo Bras
2024-01-03 16:34 ` [PATCH v1 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
2024-04-10 14:20 ` patchwork-bot+linux-riscv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZeJdjP2gUnTQCl-@LeoBras \
    --to=leobras@redhat.com \
    --cc=andrzej.hajda@intel.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=boqun.feng@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=guoren@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=parri.andrea@gmail.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox