Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Olivier Dion <odion@efficios.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	rnk@google.com, Alan Stern <stern@rowland.harvard.edu>,
	Andrea Parri <parri.andrea@gmail.com>,
	Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	David Howells <dhowells@redhat.com>,
	Jade Alglave <j.alglave@ucl.ac.uk>,
	Luc Maranget <luc.maranget@inria.fr>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Tom Rix <trix@redhat.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	gcc@gcc.gnu.org, llvm@lists.linux.dev
Subject: Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
Date: Fri, 7 Jul 2023 17:45:28 +0200	[thread overview]
Message-ID: <20230707154528.GC2883469@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <87cz13hl7t.fsf@laura>

On Fri, Jul 07, 2023 at 10:04:06AM -0400, Olivier Dion wrote:
> On Tue, 04 Jul 2023, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
> [...]
> >> On x86-64 (gcc 13.1 -O2) we get:
> >> 
> >>   t0():
> >>           movl    $1, x(%rip)
> >>           movl    $1, %eax
> >>           xchgl   dummy(%rip), %eax
> >>           lock orq $0, (%rsp)       ;; Redundant with previous exchange.
> >>           movl    y(%rip), %eax
> >>           movl    %eax, r0(%rip)
> >>           ret
> >>   t1():
> >>           movl    $1, y(%rip)
> >>           lock orq $0, (%rsp)
> >>           movl    x(%rip), %eax
> >>           movl    %eax, r1(%rip)
> >>           ret
> >
> > So I would expect the compilers to do better here. It should know those
> > __atomic_thread_fence() thingies are superfluous and simply not emit
> > them. This could even be done as a peephole pass later, where it sees
> > consecutive atomic ops and the second being a no-op.
> 
> Indeed, a peephole optimization could work for this Dekker, if the
> compiler adds the pattern for it.  However, AFAIK, a peephole can not be
> applied when the two fences are in different basic blocks.  For example,
> only emitting a fence on a compare_exchange success.  This limitation
> implies that the optimization can not be done across functions/modules
> (shared libraries).

LTO FTW :-)

> For example, it would be interesting to be able to
> promote an acquire fence of a pthread_mutex_lock() to a full fence on
> weakly ordered architectures while preventing a redundant fence on
> strongly ordered architectures.

That's a very non-trivial thing to do. I know Linux has
smp_mb__after_spinlock() and that x86 has it a no-op, but even on x86
adding a full fence after a lock has observable differences IIRC.

Specifically, the actual store that acquires the lock is not well
ordered vs the critical section itself for non-trivial spinlock
implementations (notably qspinlock).

For RCU you mostly care about RCsc locks (IIRC), and upgrading unlock is
a 'simpler' (IMO) approach to achieve that (which is what RCU does with
smp_mb_after_unlock_lock()).

> We know that at least Clang has such peephole optimizations for some
> architecture backends.  It seems however that they do not recognize
> lock-prefixed instructions as fence.

They seem confused in general for emitting MFENCE.

> AFAIK, GCC does not have that kind
> of optimization.

> We are also aware that some research has been done on this topic [0].
> The idea is to use PRE for elimiation of redundant fences.  This would
> work across multiple basic blocks, although the paper focus on
> intra-procedural eliminations.  However, it seems that the latest work
> on that [1] has never been completed [2].
> 
> Our proposed approach provides a mean for the user to express -- and
> document -- the wanted semantic in the source code.  This allows the
> compiler to only emit wanted fences, therefore not relying on
> architecture specific backend optimizations.  In other words, this
> applies even on unoptimized binaries.

I'm not a tool person, but if I were, I'd be very hesitant to add
__builtin functions that 'conflict'/'overlap' with what an optimizer
should be able to do.

Either way around you need work done on the compilers, and I'm thinking
'fixing' the optimizer will benefit far more people than adding
__builtin's.

Then again, I'm not a tools person, so you don't need to convince me.
But one of the selling points of the whole Atomics as a language feature
was that whole optimizer angle. Otherwise you might as well do as we do,
inline asm the world.

I'll shut up now, thanks for that PRE reference [0], that seems a fun
read for when I'm bored.

next prev parent reply	other threads:[~2023-07-07 15:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03 19:20 [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics Olivier Dion
2023-07-03 20:27 ` Alan Stern
2023-07-04 17:19   ` Olivier Dion
2023-07-04 20:25     ` Alan Stern
2023-07-04 21:25       ` Paul E. McKenney
2023-07-06 16:37       ` Olivier Dion
2023-07-04  9:46 ` Peter Zijlstra
2023-07-04 10:23   ` Jonathan Wakely
2023-07-07 15:31     ` Mathieu Desnoyers
2023-07-07 14:04   ` Olivier Dion
2023-07-07 15:45     ` Peter Zijlstra [this message]
2023-07-05  7:05 ` Boqun Feng
2023-07-05 13:16   ` Mathieu Desnoyers
2023-07-07 10:40 ` Jonas Oberhauser
2023-07-07 17:25   ` Olivier Dion
2023-07-10 14:32     ` Jonas Oberhauser
2023-08-16 14:31       ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230707154528.GC2883469@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=j.alglave@ucl.ac.uk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=luc.maranget@inria.fr \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=npiggin@gmail.com \
    --cc=odion@efficios.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rnk@google.com \
    --cc=stern@rowland.harvard.edu \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox