Re: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Howells <dhowells@redhat.com>
Cc: linux-arch@vger.kernel.org, x86@kernel.org, will.deacon@arm.com,
	linux-kernel@vger.kernel.org, ramana.radhakrishnan@arm.com,
	dwmw2@infradead.org
Subject: Re: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
Date: Wed, 18 May 2016 17:23:44 -0700	[thread overview]
Message-ID: <20160519002344.GF3528@linux.vnet.ibm.com> (raw)
In-Reply-To: <146358423711.8596.9104061348359986393.stgit@warthog.procyon.org.uk>

On Wed, May 18, 2016 at 04:10:37PM +0100, David Howells wrote:
> 
> Here's a set of patches to provide kernel atomics and bitops implemented
> with ISO C++11 atomic intrinsics.  The second part of the set makes the x86
> arch use the implementation.
> 
> Note that the x86 patches are very rough.  It would need to be made
> compile-time conditional in some way and the old code can't really be
> thrown away that easily - but it's a good way to make sure I'm not using
> that code any more.

Good to see this!!!

[ . . . ]

> There are some disadvantages also:
> 
>  (1) It's not available in gcc before gcc-4.7 and there will be some
>      seriously suboptimal code production before gcc-7.1.

Probably need to keep the old stuff around for quite some time.
But it would be good to test the C11 stuff out in any case.

>  (2) The compiler might misoptimise - for example, it might generate a
>      CMPXCHG loop rather than a BTR instruction to clear a bit.

Which is one reason it would be very good to test it out.  ;-)

>  (3) The C++11 memory model permits atomic instructions to be merged if
>      appropriate - for example, two adjacent __atomic_read() calls might
>      get merged if the return value of the first isn't examined.  Making
>      the pointers volatile alleviates this.  Note that gcc doesn't do this
>      yet.

I could imagine a few situations where merging would be a good thing.  But
I can also imagine a great number where it would be very bad.

>  (4) The C++11 memory barriers are, in general, weaker than the kernel's
>      memory barriers are defined to be.  Whether this actually matters is
>      arch dependent.  Further, the C++11 memory barriers are
>      acquire/release, but some arches actually have load/store instead -
>      which may be a problem.

C++11 does have memory_order_seq_cst barriers via atomic_thread_fence(),
and on many architectures they are strong enough for the Linux kernel.
However, the standard does not guarantee this.  (But it is difficult
on x86, ARM, and PowerPC to produce C++11 semantics without also producing
Linux-kernel semantics.)

C++11 has acquire/release barriers as well as acquire loads and release
stores.  Some of this will need case-by-case analysis.  Here is a (dated)
attempt:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0124r1.html

>  (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so
>      they cannot be suppressed if only one CPU is online.
> 
> 
> Things to be considered:
> 
>  (1) We could weaken the kernel memory model to for the benefit of arches
>      that have instructions that employ explicit acquire/release barriers -
>      but that may cause data races to occur based on assumptions we've
>      already made.  Note, however, that powerpc already seems to have a
>      weaker memory model.

PowerPC has relatively weak ordering for lock and unlock, and also for
load-acquire and store-release.  However, last I checked, it had full
ordering for value-returning read-modify-write atomic operations.  As
you say, C11 could potentially produce weaker results.  However, it
should be possible to fix this where necessary.

>  (2) There are three sets of atomics (atomic_t, atomic64_t and
>      atomic_long_t).  I can either put each in a separate file all nicely
>      laid out (see patch 3) or I can make a template header (see patch 4)
>      and use #defines with it to make all three atomics from one piece of
>      code.  Which is best?  The first is definitely easier to read, but the
>      second might be easier to maintain.

The template-header approach is more compatible with the ftrace
macros.  ;-)

>  (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg().
>      I've also provided atomicX_t variants of these.  These return the
>      boolean return value from the __atomic_compare_exchange_n() function
>      (which is carried in the Z flag on x86).  Should this be rolled out
>      even without the ISO atomic implementation?
> 
>  (4) The current x86_64 bitops (set_bit() and co.) are technically broken.
>      The set_bit() function, for example, takes a 64-bit signed bit number
>      but implements this with BTSL, presumably because it's a shorter
>      instruction.
> 
>      The BTS instruction's bit number operand, however, is sized according
>      to the memory operand, so the upper 32 bits of the bit number are
>      discarded by BTSL.  We should really be using BTSQ to implement this
>      correctly (and gcc does just that).  In practice, though, it would
>      seem unlikely that this will ever be a problem as I think we're
>      unlikely to have a bitset with more than ~2 billion bits in it within
>      the kernel (it would be >256MiB in size).
> 
>      Patch 9 casts the pointers internally in the bitops functions to
>      persuade the kernel to use 32-bit bitops - but this only really
>      matters on x86.  Should set_bit() and co. take int rather than long
>      bit number arguments to make the point?

No real opinion on #3 and #4.

> I've opened a number of gcc bugzillas to improve the code generated by the
> atomics:
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244
> 
>      __atomic_fetch_{and,or,xor}() don't generate locked BTR/BTS/BTC on x86
>      and __atomic_load() doesn't generate TEST or BT.  There is a patch for
>      this, but it leaves some cases not fully optimised.
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821
> 
>      __atomic_fetch_{add,sub}() generates XADD rather than DECL when
>      subtracting 1 on x86.  There is a patch for this.
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825
> 
>      __atomic_compare_exchange_n() accesses the stack unnecessarily,
>      leading to extraneous stores being added when everything could be done
>      entirely within registers (on x86, powerpc64, aarch64).
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973
> 
>      Can the __atomic*() ops on x86 generate a list of LOCK prefixes?
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71153
> 
>      aarch64 __atomic_fetch_and() generates a double inversion for the
>      LDSET instructions.  There is a patch for this.
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162
> 
>      powerpc64 should probably emit BNE- not BNE to retry the STDCX.
> 
> 
> The patches can be found here also:
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=iso-atomic
> 
> I have fixed up an x86_64 cross-compiler with the patches that I've been
> given for the above and have booted the resulting kernel.

Nice!!!

							Thanx, Paul

> David
> ---
> David Howells (15):
>       cmpxchg_local() is not signed-value safe, so fix generic atomics
>       tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()
>       Provide atomic_t functions implemented with ISO-C++11 atomics
>       Convert 32-bit ISO atomics into a template
>       Provide atomic64_t and atomic_long_t using ISO atomics
>       Provide 16-bit ISO atomics
>       Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics
>       Provide an implementation of bitops using C++11 atomics
>       Make the ISO bitops use 32-bit values internally
>       x86: Use ISO atomics
>       x86: Use ISO bitops
>       x86: Use ISO xchg(), cmpxchg() and friends
>       x86: Improve spinlocks using ISO C++11 intrinsic atomics
>       x86: Make the mutex implementation use ISO atomic ops
>       x86: Fix misc cmpxchg() and atomic_cmpxchg() calls to use try/return variants
> 
> 
>  arch/x86/events/amd/core.c                |    6 
>  arch/x86/events/amd/uncore.c              |    4 
>  arch/x86/include/asm/atomic.h             |  224 -----------
>  arch/x86/include/asm/bitops.h             |  143 -------
>  arch/x86/include/asm/cmpxchg.h            |   99 -----
>  arch/x86/include/asm/cmpxchg_32.h         |    3 
>  arch/x86/include/asm/cmpxchg_64.h         |    6 
>  arch/x86/include/asm/mutex.h              |    6 
>  arch/x86/include/asm/mutex_iso.h          |   73 ++++
>  arch/x86/include/asm/qspinlock.h          |    2 
>  arch/x86/include/asm/spinlock.h           |   18 +
>  arch/x86/kernel/acpi/boot.c               |   12 -
>  arch/x86/kernel/apic/apic.c               |    3 
>  arch/x86/kernel/cpu/mcheck/mce.c          |    7 
>  arch/x86/kernel/kvm.c                     |    5 
>  arch/x86/kernel/smp.c                     |    2 
>  arch/x86/kvm/mmu.c                        |    2 
>  arch/x86/kvm/paging_tmpl.h                |   11 -
>  arch/x86/kvm/vmx.c                        |   21 +
>  arch/x86/kvm/x86.c                        |   19 -
>  arch/x86/mm/pat.c                         |    2 
>  arch/x86/xen/p2m.c                        |    3 
>  arch/x86/xen/spinlock.c                   |    6 
>  drivers/tty/tty_ldsem.c                   |    2 
>  include/asm-generic/atomic.h              |   17 +
>  include/asm-generic/iso-atomic-long.h     |   32 ++
>  include/asm-generic/iso-atomic-template.h |  572 +++++++++++++++++++++++++++++
>  include/asm-generic/iso-atomic.h          |   28 +
>  include/asm-generic/iso-atomic16.h        |   27 +
>  include/asm-generic/iso-atomic64.h        |   28 +
>  include/asm-generic/iso-bitops.h          |  188 ++++++++++
>  include/asm-generic/iso-cmpxchg.h         |  180 +++++++++
>  include/linux/atomic.h                    |   26 +
>  33 files changed, 1225 insertions(+), 552 deletions(-)
>  create mode 100644 arch/x86/include/asm/mutex_iso.h
>  create mode 100644 include/asm-generic/iso-atomic-long.h
>  create mode 100644 include/asm-generic/iso-atomic-template.h
>  create mode 100644 include/asm-generic/iso-atomic.h
>  create mode 100644 include/asm-generic/iso-atomic16.h
>  create mode 100644 include/asm-generic/iso-atomic64.h
>  create mode 100644 include/asm-generic/iso-bitops.h
>  create mode 100644 include/asm-generic/iso-cmpxchg.h
>

next prev parent reply	other threads:[~2016-05-19  0:23 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-18 15:10 [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics David Howells
2016-05-18 15:10 ` [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics David Howells
2016-05-18 15:10   ` David Howells
2016-05-18 15:29   ` Arnd Bergmann
2016-05-18 15:10 ` [RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() David Howells
2016-05-18 15:10 ` [RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics David Howells
2016-05-18 15:10   ` David Howells
2016-05-18 17:31   ` Peter Zijlstra
2016-05-18 17:32   ` Peter Zijlstra
2016-05-18 17:32     ` Peter Zijlstra
2016-05-19  7:36     ` David Woodhouse
2016-05-19  7:45       ` Peter Zijlstra
2016-05-18 17:33   ` Peter Zijlstra
2016-05-19  9:52   ` David Howells
2016-05-19 10:50     ` Peter Zijlstra
2016-05-19 11:31       ` Peter Zijlstra
2016-05-19 11:33         ` Peter Zijlstra
2016-05-19 14:22       ` Paul E. McKenney
2016-05-19 14:41         ` Peter Zijlstra
2016-05-19 15:00           ` Paul E. McKenney
2016-05-20  9:32             ` Michael Ellerman
2016-05-20  9:32               ` Michael Ellerman
2016-05-23 18:39               ` Paul E. McKenney
2016-06-01 14:16     ` Will Deacon
2016-06-01 14:16       ` Will Deacon
2016-05-18 15:11 ` [RFC PATCH 04/15] Convert 32-bit ISO atomics into a template David Howells
2016-05-18 15:11   ` David Howells
2016-05-18 15:11 ` [RFC PATCH 05/15] Provide atomic64_t and atomic_long_t using ISO atomics David Howells
2016-05-18 15:11   ` David Howells
2016-05-18 15:11 ` [RFC PATCH 06/15] Provide 16-bit " David Howells
2016-05-18 15:11   ` David Howells
2016-05-18 17:28   ` Peter Zijlstra
2016-05-18 15:11 ` [RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics David Howells
2016-05-18 15:11   ` David Howells
2016-05-18 15:11 ` [RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics David Howells
2016-05-18 15:11   ` David Howells
2016-05-18 15:11 ` [RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally David Howells
2016-05-18 15:11   ` David Howells
2016-05-18 15:11 ` [RFC PATCH 10/15] x86: Use ISO atomics David Howells
2016-05-18 15:12 ` [RFC PATCH 11/15] x86: Use ISO bitops David Howells
2016-05-18 15:12   ` David Howells
2016-05-18 15:12 ` [RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends David Howells
2016-05-18 15:12   ` David Howells
2016-05-18 15:12 ` [RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics David Howells
2016-05-18 15:12   ` David Howells
2016-05-18 17:37   ` Peter Zijlstra
2016-05-18 15:12 ` [RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops David Howells
2016-05-18 15:12   ` David Howells
2016-05-18 15:12 ` [RFC PATCH 15/15] x86: Fix misc cmpxchg() and atomic_cmpxchg() calls to use try/return variants David Howells
2016-05-18 15:12   ` David Howells
2016-05-18 17:22 ` [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics Peter Zijlstra
2016-05-18 17:45 ` Peter Zijlstra
2016-05-18 18:05 ` Peter Zijlstra
2016-05-19  0:23 ` Paul E. McKenney [this message]
2016-06-01 14:45 ` Will Deacon
2016-06-01 14:45   ` Will Deacon
2016-06-08 20:01   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160519002344.GF3528@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dhowells@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ramana.radhakrishnan@arm.com \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).