linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Leonardo Brás" <leobras@redhat.com>
To: Guo Ren <guoren@kernel.org>, Arnd Bergmann <arnd@arndb.de>
Cc: Palmer Dabbelt <palmer@rivosinc.com>,
	Will Deacon <will@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Mark Rutland <mark.rutland@arm.com>,
	 Paul Walmsley <paul.walmsley@sifive.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Andrea Parri <parri.andrea@gmail.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	 linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
Date: Wed, 30 Aug 2023 18:51:12 -0300	[thread overview]
Message-ID: <aa881f41ca432b76f1cb50abb6903dfdf33b8c35.camel@redhat.com> (raw)
In-Reply-To: <CAJF2gTTg9MMj0TXJmQsWKsnoGKD_F84HKC5D9LFEy_5uERpRCw@mail.gmail.com>

Hello everyone,

Sorry for the delay, I was out of office for a while.

On Fri, 2023-08-11 at 09:24 +0800, Guo Ren wrote:
> On Fri, Aug 11, 2023 at 3:13 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > 
> > On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > > or requiring some rework to make it work properly.
> > > > > > 
> > > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > > architectures.
> > > > 
> > > > > Parity with other architectures by itself is not a reason to do this,
> > > > > in particular the other architectures you listed have the instructions
> > > > > in hardware while riscv does not.
> > > > 
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > > 
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > > 
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > 
> > Right. In particular the qspinlock is a good example for something
> > where having the emulated 16-bit xchg() may end up less efficient
> > than a natively supported instruction.
> The xchg() efficiency depends on micro-architecture. and the number of
> instructions is not the key, even one instruction would be separated
> into several micro-ops. I thought the Power guys won't agree with this
> view :)
> 
> > 
> > The xchg() here is a performance optimization for CPUs that can
> > do this without touching the other half of the 32-bit word.
> It's useless on a non-SMT system because all operations are cacheline
> based. (Ps: Because xchg() has a load semantic, CHI's "Dirty Partial"
> & "Clean Empty" can't help anymore.)
> 
> > 
> > > > 
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > > 
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > > 
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > 
> > I think you misread the arm64 code, which should use native instructions
> > for all sizes, in both the armv8.0 and LSE atomics.

By native I understand you mean swp instead of ll/sc, right?

Well, that's right only if the kernel is compiled with LSE, and the ll/sc option
for is available for other arm64 that don't. 

Also, according to Kconfig, it seems to have been introduced in ARMv8.1, meaning
arm64 for (at least some) ARMv8.0 use ll/sc, and this is why xchg with the ll/sc
code is available for 1, 2, 4 and 8 bytes in arch/arm64/include/asm/cmpxchg.h:

#define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel,
cl)	\static inline u##sz __xchg_case_##name##sz(u##sz x, volatile void *ptr)		\{										\	u##szret;								\	unsignedlongtmp;							\										\
\	asm
volatile(ARM64_LSE_ATOMIC_INSN(					\	/* LL/SC */								\	"	prfm	pstl1strm, %2\n"					\	"1:	ld"#acq"xr"#sfx"\t%"#w"0,%2\n"				\	"	st"#rel"xr"#sfx"\t%w1,%"#w"3,%2\n"			\	"	cbnz	%w1,1b\n"						\	"	"#mb,								\	/*LSEatomics*/							\	"	swp"#acq_lse#rel#sfx"\t%"#w"3,%"#w"0,%2\n"		\[...]

__XCHG_CASE(w, b,     ,  8,        ,    ,  ,  ,  ,         )
__XCHG_CASE(w, h,     , 16,        ,    ,  ,  ,  ,         )
__XCHG_CASE(w,  ,     , 32,        ,    ,  ,  ,  ,         )

> > 
> > PowerPC does use the masking for xchg, but I suspect there are no
> > actual users, at least it actually has its own qspinlock implementation
> > that avoids xchg().
> PowerPC still needs similar things, see publish_tail_cpu(), and more
> complex cmpxchg semantics.
> 
> Paravrit qspinlock and CNA qspinlock still need more:
>  - xchg8 (RCsc)
>  - cmpxchg8/16_relaxed
>  - cmpxchg8/16_release (Rcpc)
>  - cmpxchg8_acquire (RCpc)
>  - cmpxchg8 (RCsc)
> 
> > 
> > > > >  This is also something that almost no architecture
> > > > > specific code relies on (generic qspinlock being a notable exception).
> > > > > 
> > > > 
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > 
> > I don't this this is a safe assumption, we've had endless discussions
> > about using qspinlock on architectures without a native xchg(), which
> > needs either hardware guarantees or special countermeasures in xchg() itself
> > to avoid this.

That seems a nice discussion, do you have a link for this?

By what I could see, Guo Ren is doing a great work on proving that using
qspinlock (with smaller xchg) performs better on RISC-V. 

> > 
> > What I'd actually like to do here is to remove the special 8-bit and
> > 16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
> It needs to modify qspinlock, paravirt_qspinlock, and CNA_qspinlock
> code to prevent using 8-bit/16-bit xchg/cmpxchg, and cleanup all
> architectures' cmpxchg.h. What you do is just get them out of the
> common atomic.h, but architectures still need to solve them and
> connect to the qspinlock series.
> 
> > only fixed 32-bit and native wordsize (either 32 or 64) as the option,
> > while dealing with the others the same way we treat the fixed
> > 64-bit cases that hardcode the 64-bit argument types and are only
> > usable on architectures that provide them.
> > 
> >     Arnd
> 
> 
> 

IIUC, xchg for size 1 & 2 can still be useful if having the lock variable bigger
causes the target struct to use more than a cacheline. This could reduce cache
usage and avoid some cacheline misses. 

Even though in some arches those 'non-native' xchg can take longer, it can be
perceived as a valid tradeoff for some scenarios.

Thanks,
Leo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2023-08-30 21:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 3/5] riscv/atomic.h : Deduplicate arch_atomic.* Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2 Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg " Leonardo Bras
2023-08-10  6:51   ` Arnd Bergmann
2023-08-10 16:04     ` Leonardo Brás
2023-08-10 16:23       ` Palmer Dabbelt
2023-08-10 19:13         ` Arnd Bergmann
2023-08-11  1:24           ` Guo Ren
2023-08-30 21:51             ` Leonardo Brás [this message]
2023-08-11  1:40         ` Guo Ren
2023-08-11  6:27           ` Conor Dooley
2023-08-11  9:48             ` Conor Dooley
2023-08-11 11:10           ` Andrew Jones
2023-08-11 11:16             ` Guo Ren
2023-08-30 21:59         ` Leonardo Brás
2023-09-06 21:15           ` Leonardo Bras Soares Passos
2023-12-06  0:56           ` leobras.c
2024-01-03 11:05             ` Jisheng Zhang
2024-01-03 15:31               ` Leonardo Bras
2023-09-10  8:50 ` [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Guo Ren
2023-09-12  0:22   ` Leonardo Bras Soares Passos
2023-12-23  3:07   ` Leonardo Bras
2023-12-23  3:15     ` Guo Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa881f41ca432b76f1cb50abb6903dfdf33b8c35.camel@redhat.com \
    --to=leobras@redhat.com \
    --cc=andrzej.hajda@intel.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=arnd@arndb.de \
    --cc=boqun.feng@gmail.com \
    --cc=guoren@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=palmer@rivosinc.com \
    --cc=parri.andrea@gmail.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).