All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Howells <dhowells@redhat.com>
Cc: linux-arch@vger.kernel.org, x86@kernel.org, will.deacon@arm.com,
	linux-kernel@vger.kernel.org, ramana.radhakrishnan@arm.com,
	dwmw2@infradead.org
Subject: Re: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
Date: Wed, 18 May 2016 17:23:44 -0700	[thread overview]
Message-ID: <20160519002344.GF3528@linux.vnet.ibm.com> (raw)
In-Reply-To: <146358423711.8596.9104061348359986393.stgit@warthog.procyon.org.uk>

On Wed, May 18, 2016 at 04:10:37PM +0100, David Howells wrote:
> 
> Here's a set of patches to provide kernel atomics and bitops implemented
> with ISO C++11 atomic intrinsics.  The second part of the set makes the x86
> arch use the implementation.
> 
> Note that the x86 patches are very rough.  It would need to be made
> compile-time conditional in some way and the old code can't really be
> thrown away that easily - but it's a good way to make sure I'm not using
> that code any more.

Good to see this!!!

[ . . . ]

> There are some disadvantages also:
> 
>  (1) It's not available in gcc before gcc-4.7 and there will be some
>      seriously suboptimal code production before gcc-7.1.

Probably need to keep the old stuff around for quite some time.
But it would be good to test the C11 stuff out in any case.

>  (2) The compiler might misoptimise - for example, it might generate a
>      CMPXCHG loop rather than a BTR instruction to clear a bit.

Which is one reason it would be very good to test it out.  ;-)

>  (3) The C++11 memory model permits atomic instructions to be merged if
>      appropriate - for example, two adjacent __atomic_read() calls might
>      get merged if the return value of the first isn't examined.  Making
>      the pointers volatile alleviates this.  Note that gcc doesn't do this
>      yet.

I could imagine a few situations where merging would be a good thing.  But
I can also imagine a great number where it would be very bad.

>  (4) The C++11 memory barriers are, in general, weaker than the kernel's
>      memory barriers are defined to be.  Whether this actually matters is
>      arch dependent.  Further, the C++11 memory barriers are
>      acquire/release, but some arches actually have load/store instead -
>      which may be a problem.

C++11 does have memory_order_seq_cst barriers via atomic_thread_fence(),
and on many architectures they are strong enough for the Linux kernel.
However, the standard does not guarantee this.  (But it is difficult
on x86, ARM, and PowerPC to produce C++11 semantics without also producing
Linux-kernel semantics.)

C++11 has acquire/release barriers as well as acquire loads and release
stores.  Some of this will need case-by-case analysis.  Here is a (dated)
attempt:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0124r1.html

>  (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so
>      they cannot be suppressed if only one CPU is online.
> 
> 
> Things to be considered:
> 
>  (1) We could weaken the kernel memory model to for the benefit of arches
>      that have instructions that employ explicit acquire/release barriers -
>      but that may cause data races to occur based on assumptions we've
>      already made.  Note, however, that powerpc already seems to have a
>      weaker memory model.

PowerPC has relatively weak ordering for lock and unlock, and also for
load-acquire and store-release.  However, last I checked, it had full
ordering for value-returning read-modify-write atomic operations.  As
you say, C11 could potentially produce weaker results.  However, it
should be possible to fix this where necessary.

>  (2) There are three sets of atomics (atomic_t, atomic64_t and
>      atomic_long_t).  I can either put each in a separate file all nicely
>      laid out (see patch 3) or I can make a template header (see patch 4)
>      and use #defines with it to make all three atomics from one piece of
>      code.  Which is best?  The first is definitely easier to read, but the
>      second might be easier to maintain.

The template-header approach is more compatible with the ftrace
macros.  ;-)

>  (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg().
>      I've also provided atomicX_t variants of these.  These return the
>      boolean return value from the __atomic_compare_exchange_n() function
>      (which is carried in the Z flag on x86).  Should this be rolled out
>      even without the ISO atomic implementation?
> 
>  (4) The current x86_64 bitops (set_bit() and co.) are technically broken.
>      The set_bit() function, for example, takes a 64-bit signed bit number
>      but implements this with BTSL, presumably because it's a shorter
>      instruction.
> 
>      The BTS instruction's bit number operand, however, is sized according
>      to the memory operand, so the upper 32 bits of the bit number are
>      discarded by BTSL.  We should really be using BTSQ to implement this
>      correctly (and gcc does just that).  In practice, though, it would
>      seem unlikely that this will ever be a problem as I think we're
>      unlikely to have a bitset with more than ~2 billion bits in it within
>      the kernel (it would be >256MiB in size).
> 
>      Patch 9 casts the pointers internally in the bitops functions to
>      persuade the kernel to use 32-bit bitops - but this only really
>      matters on x86.  Should set_bit() and co. take int rather than long
>      bit number arguments to make the point?

No real opinion on #3 and #4.

> I've opened a number of gcc bugzillas to improve the code generated by the
> atomics:
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244
> 
>      __atomic_fetch_{and,or,xor}() don't generate locked BTR/BTS/BTC on x86
>      and __atomic_load() doesn't generate TEST or BT.  There is a patch for
>      this, but it leaves some cases not fully optimised.
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821
> 
>      __atomic_fetch_{add,sub}() generates XADD rather than DECL when
>      subtracting 1 on x86.  There is a patch for this.
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825
> 
>      __atomic_compare_exchange_n() accesses the stack unnecessarily,
>      leading to extraneous stores being added when everything could be done
>      entirely within registers (on x86, powerpc64, aarch64).
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973
> 
>      Can the __atomic*() ops on x86 generate a list of LOCK prefixes?
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71153
> 
>      aarch64 __atomic_fetch_and() generates a double inversion for the
>      LDSET instructions.  There is a patch for this.
> 
>  (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162
> 
>      powerpc64 should probably emit BNE- not BNE to retry the STDCX.
> 
> 
> The patches can be found here also:
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=iso-atomic
> 
> I have fixed up an x86_64 cross-compiler with the patches that I've been
> given for the above and have booted the resulting kernel.

Nice!!!

							Thanx, Paul

> David
> ---
> David Howells (15):
>       cmpxchg_local() is not signed-value safe, so fix generic atomics
>       tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()
>       Provide atomic_t functions implemented with ISO-C++11 atomics
>       Convert 32-bit ISO atomics into a template
>       Provide atomic64_t and atomic_long_t using ISO atomics
>       Provide 16-bit ISO atomics
>       Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics
>       Provide an implementation of bitops using C++11 atomics
>       Make the ISO bitops use 32-bit values internally
>       x86: Use ISO atomics
>       x86: Use ISO bitops
>       x86: Use ISO xchg(), cmpxchg() and friends
>       x86: Improve spinlocks using ISO C++11 intrinsic atomics
>       x86: Make the mutex implementation use ISO atomic ops
>       x86: Fix misc cmpxchg() and atomic_cmpxchg() calls to use try/return variants
> 
> 
>  arch/x86/events/amd/core.c                |    6 
>  arch/x86/events/amd/uncore.c              |    4 
>  arch/x86/include/asm/atomic.h             |  224 -----------
>  arch/x86/include/asm/bitops.h             |  143 -------
>  arch/x86/include/asm/cmpxchg.h            |   99 -----
>  arch/x86/include/asm/cmpxchg_32.h         |    3 
>  arch/x86/include/asm/cmpxchg_64.h         |    6 
>  arch/x86/include/asm/mutex.h              |    6 
>  arch/x86/include/asm/mutex_iso.h          |   73 ++++
>  arch/x86/include/asm/qspinlock.h          |    2 
>  arch/x86/include/asm/spinlock.h           |   18 +
>  arch/x86/kernel/acpi/boot.c               |   12 -
>  arch/x86/kernel/apic/apic.c               |    3 
>  arch/x86/kernel/cpu/mcheck/mce.c          |    7 
>  arch/x86/kernel/kvm.c                     |    5 
>  arch/x86/kernel/smp.c                     |    2 
>  arch/x86/kvm/mmu.c                        |    2 
>  arch/x86/kvm/paging_tmpl.h                |   11 -
>  arch/x86/kvm/vmx.c                        |   21 +
>  arch/x86/kvm/x86.c                        |   19 -
>  arch/x86/mm/pat.c                         |    2 
>  arch/x86/xen/p2m.c                        |    3 
>  arch/x86/xen/spinlock.c                   |    6 
>  drivers/tty/tty_ldsem.c                   |    2 
>  include/asm-generic/atomic.h              |   17 +
>  include/asm-generic/iso-atomic-long.h     |   32 ++
>  include/asm-generic/iso-atomic-template.h |  572 +++++++++++++++++++++++++++++
>  include/asm-generic/iso-atomic.h          |   28 +
>  include/asm-generic/iso-atomic16.h        |   27 +
>  include/asm-generic/iso-atomic64.h        |   28 +
>  include/asm-generic/iso-bitops.h          |  188 ++++++++++
>  include/asm-generic/iso-cmpxchg.h         |  180 +++++++++
>  include/linux/atomic.h                    |   26 +
>  33 files changed, 1225 insertions(+), 552 deletions(-)
>  create mode 100644 arch/x86/include/asm/mutex_iso.h
>  create mode 100644 include/asm-generic/iso-atomic-long.h
>  create mode 100644 include/asm-generic/iso-atomic-template.h
>  create mode 100644 include/asm-generic/iso-atomic.h
>  create mode 100644 include/asm-generic/iso-atomic16.h
>  create mode 100644 include/asm-generic/iso-atomic64.h
>  create mode 100644 include/asm-generic/iso-bitops.h
>  create mode 100644 include/asm-generic/iso-cmpxchg.h
> 

  parent reply	other threads:[~2016-05-19  0:23 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-18 15:10 [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics David Howells
2016-05-18 15:10 ` [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics David Howells
2016-05-18 15:29   ` Arnd Bergmann
2016-05-18 15:10 ` [RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() David Howells
2016-05-18 15:10 ` [RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics David Howells
2016-05-18 17:31   ` Peter Zijlstra
2016-05-18 17:32   ` Peter Zijlstra
2016-05-19  7:36     ` David Woodhouse
2016-05-19  7:45       ` Peter Zijlstra
2016-05-19  9:52     ` David Howells
2016-05-19 10:50       ` Peter Zijlstra
2016-05-19 11:31         ` Peter Zijlstra
2016-05-19 11:33           ` Peter Zijlstra
2016-05-19 14:22         ` Paul E. McKenney
2016-05-19 14:41           ` Peter Zijlstra
2016-05-19 15:00             ` Paul E. McKenney
2016-05-20  9:32               ` Michael Ellerman
2016-05-23 18:39                 ` Paul E. McKenney
2016-06-01 14:16       ` Will Deacon
2016-05-18 17:33   ` Peter Zijlstra
2016-05-18 15:11 ` [RFC PATCH 04/15] Convert 32-bit ISO atomics into a template David Howells
2016-05-18 15:11 ` [RFC PATCH 05/15] Provide atomic64_t and atomic_long_t using ISO atomics David Howells
2016-05-18 15:11 ` [RFC PATCH 06/15] Provide 16-bit " David Howells
2016-05-18 17:28   ` Peter Zijlstra
2016-05-18 15:11 ` [RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics David Howells
2016-05-18 15:11 ` [RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics David Howells
2016-05-18 15:11 ` [RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally David Howells
2016-05-18 15:11 ` [RFC PATCH 10/15] x86: Use ISO atomics David Howells
2016-05-18 15:12 ` [RFC PATCH 11/15] x86: Use ISO bitops David Howells
2016-05-18 15:12 ` [RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends David Howells
2016-05-18 15:12 ` [RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics David Howells
2016-05-18 17:37   ` Peter Zijlstra
2016-05-18 15:12 ` [RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops David Howells
2016-05-18 15:12 ` [RFC PATCH 15/15] x86: Fix misc cmpxchg() and atomic_cmpxchg() calls to use try/return variants David Howells
2016-05-18 17:22 ` [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics Peter Zijlstra
2016-05-18 17:45 ` Peter Zijlstra
2016-05-18 18:05 ` Peter Zijlstra
2016-05-19  0:23 ` Paul E. McKenney [this message]
2016-06-01 14:45 ` Will Deacon
2016-06-08 20:01   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160519002344.GF3528@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dhowells@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ramana.radhakrishnan@arm.com \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.