From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Howells <dhowells@redhat.com>
Cc: linux-arch@vger.kernel.org, x86@kernel.org, will.deacon@arm.com,
linux-kernel@vger.kernel.org, ramana.radhakrishnan@arm.com,
dwmw2@infradead.org
Subject: Re: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
Date: Wed, 18 May 2016 17:23:44 -0700 [thread overview]
Message-ID: <20160519002344.GF3528@linux.vnet.ibm.com> (raw)
In-Reply-To: <146358423711.8596.9104061348359986393.stgit@warthog.procyon.org.uk>
On Wed, May 18, 2016 at 04:10:37PM +0100, David Howells wrote:
>
> Here's a set of patches to provide kernel atomics and bitops implemented
> with ISO C++11 atomic intrinsics. The second part of the set makes the x86
> arch use the implementation.
>
> Note that the x86 patches are very rough. It would need to be made
> compile-time conditional in some way and the old code can't really be
> thrown away that easily - but it's a good way to make sure I'm not using
> that code any more.
Good to see this!!!
[ . . . ]
> There are some disadvantages also:
>
> (1) It's not available in gcc before gcc-4.7 and there will be some
> seriously suboptimal code production before gcc-7.1.
Probably need to keep the old stuff around for quite some time.
But it would be good to test the C11 stuff out in any case.
> (2) The compiler might misoptimise - for example, it might generate a
> CMPXCHG loop rather than a BTR instruction to clear a bit.
Which is one reason it would be very good to test it out. ;-)
> (3) The C++11 memory model permits atomic instructions to be merged if
> appropriate - for example, two adjacent __atomic_read() calls might
> get merged if the return value of the first isn't examined. Making
> the pointers volatile alleviates this. Note that gcc doesn't do this
> yet.
I could imagine a few situations where merging would be a good thing. But
I can also imagine a great number where it would be very bad.
> (4) The C++11 memory barriers are, in general, weaker than the kernel's
> memory barriers are defined to be. Whether this actually matters is
> arch dependent. Further, the C++11 memory barriers are
> acquire/release, but some arches actually have load/store instead -
> which may be a problem.
C++11 does have memory_order_seq_cst barriers via atomic_thread_fence(),
and on many architectures they are strong enough for the Linux kernel.
However, the standard does not guarantee this. (But it is difficult
on x86, ARM, and PowerPC to produce C++11 semantics without also producing
Linux-kernel semantics.)
C++11 has acquire/release barriers as well as acquire loads and release
stores. Some of this will need case-by-case analysis. Here is a (dated)
attempt:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0124r1.html
> (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so
> they cannot be suppressed if only one CPU is online.
>
>
> Things to be considered:
>
> (1) We could weaken the kernel memory model to for the benefit of arches
> that have instructions that employ explicit acquire/release barriers -
> but that may cause data races to occur based on assumptions we've
> already made. Note, however, that powerpc already seems to have a
> weaker memory model.
PowerPC has relatively weak ordering for lock and unlock, and also for
load-acquire and store-release. However, last I checked, it had full
ordering for value-returning read-modify-write atomic operations. As
you say, C11 could potentially produce weaker results. However, it
should be possible to fix this where necessary.
> (2) There are three sets of atomics (atomic_t, atomic64_t and
> atomic_long_t). I can either put each in a separate file all nicely
> laid out (see patch 3) or I can make a template header (see patch 4)
> and use #defines with it to make all three atomics from one piece of
> code. Which is best? The first is definitely easier to read, but the
> second might be easier to maintain.
The template-header approach is more compatible with the ftrace
macros. ;-)
> (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg().
> I've also provided atomicX_t variants of these. These return the
> boolean return value from the __atomic_compare_exchange_n() function
> (which is carried in the Z flag on x86). Should this be rolled out
> even without the ISO atomic implementation?
>
> (4) The current x86_64 bitops (set_bit() and co.) are technically broken.
> The set_bit() function, for example, takes a 64-bit signed bit number
> but implements this with BTSL, presumably because it's a shorter
> instruction.
>
> The BTS instruction's bit number operand, however, is sized according
> to the memory operand, so the upper 32 bits of the bit number are
> discarded by BTSL. We should really be using BTSQ to implement this
> correctly (and gcc does just that). In practice, though, it would
> seem unlikely that this will ever be a problem as I think we're
> unlikely to have a bitset with more than ~2 billion bits in it within
> the kernel (it would be >256MiB in size).
>
> Patch 9 casts the pointers internally in the bitops functions to
> persuade the kernel to use 32-bit bitops - but this only really
> matters on x86. Should set_bit() and co. take int rather than long
> bit number arguments to make the point?
No real opinion on #3 and #4.
> I've opened a number of gcc bugzillas to improve the code generated by the
> atomics:
>
> (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244
>
> __atomic_fetch_{and,or,xor}() don't generate locked BTR/BTS/BTC on x86
> and __atomic_load() doesn't generate TEST or BT. There is a patch for
> this, but it leaves some cases not fully optimised.
>
> (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821
>
> __atomic_fetch_{add,sub}() generates XADD rather than DECL when
> subtracting 1 on x86. There is a patch for this.
>
> (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825
>
> __atomic_compare_exchange_n() accesses the stack unnecessarily,
> leading to extraneous stores being added when everything could be done
> entirely within registers (on x86, powerpc64, aarch64).
>
> (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973
>
> Can the __atomic*() ops on x86 generate a list of LOCK prefixes?
>
> (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71153
>
> aarch64 __atomic_fetch_and() generates a double inversion for the
> LDSET instructions. There is a patch for this.
>
> (*) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162
>
> powerpc64 should probably emit BNE- not BNE to retry the STDCX.
>
>
> The patches can be found here also:
>
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=iso-atomic
>
> I have fixed up an x86_64 cross-compiler with the patches that I've been
> given for the above and have booted the resulting kernel.
Nice!!!
Thanx, Paul
> David
> ---
> David Howells (15):
> cmpxchg_local() is not signed-value safe, so fix generic atomics
> tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()
> Provide atomic_t functions implemented with ISO-C++11 atomics
> Convert 32-bit ISO atomics into a template
> Provide atomic64_t and atomic_long_t using ISO atomics
> Provide 16-bit ISO atomics
> Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics
> Provide an implementation of bitops using C++11 atomics
> Make the ISO bitops use 32-bit values internally
> x86: Use ISO atomics
> x86: Use ISO bitops
> x86: Use ISO xchg(), cmpxchg() and friends
> x86: Improve spinlocks using ISO C++11 intrinsic atomics
> x86: Make the mutex implementation use ISO atomic ops
> x86: Fix misc cmpxchg() and atomic_cmpxchg() calls to use try/return variants
>
>
> arch/x86/events/amd/core.c | 6
> arch/x86/events/amd/uncore.c | 4
> arch/x86/include/asm/atomic.h | 224 -----------
> arch/x86/include/asm/bitops.h | 143 -------
> arch/x86/include/asm/cmpxchg.h | 99 -----
> arch/x86/include/asm/cmpxchg_32.h | 3
> arch/x86/include/asm/cmpxchg_64.h | 6
> arch/x86/include/asm/mutex.h | 6
> arch/x86/include/asm/mutex_iso.h | 73 ++++
> arch/x86/include/asm/qspinlock.h | 2
> arch/x86/include/asm/spinlock.h | 18 +
> arch/x86/kernel/acpi/boot.c | 12 -
> arch/x86/kernel/apic/apic.c | 3
> arch/x86/kernel/cpu/mcheck/mce.c | 7
> arch/x86/kernel/kvm.c | 5
> arch/x86/kernel/smp.c | 2
> arch/x86/kvm/mmu.c | 2
> arch/x86/kvm/paging_tmpl.h | 11 -
> arch/x86/kvm/vmx.c | 21 +
> arch/x86/kvm/x86.c | 19 -
> arch/x86/mm/pat.c | 2
> arch/x86/xen/p2m.c | 3
> arch/x86/xen/spinlock.c | 6
> drivers/tty/tty_ldsem.c | 2
> include/asm-generic/atomic.h | 17 +
> include/asm-generic/iso-atomic-long.h | 32 ++
> include/asm-generic/iso-atomic-template.h | 572 +++++++++++++++++++++++++++++
> include/asm-generic/iso-atomic.h | 28 +
> include/asm-generic/iso-atomic16.h | 27 +
> include/asm-generic/iso-atomic64.h | 28 +
> include/asm-generic/iso-bitops.h | 188 ++++++++++
> include/asm-generic/iso-cmpxchg.h | 180 +++++++++
> include/linux/atomic.h | 26 +
> 33 files changed, 1225 insertions(+), 552 deletions(-)
> create mode 100644 arch/x86/include/asm/mutex_iso.h
> create mode 100644 include/asm-generic/iso-atomic-long.h
> create mode 100644 include/asm-generic/iso-atomic-template.h
> create mode 100644 include/asm-generic/iso-atomic.h
> create mode 100644 include/asm-generic/iso-atomic16.h
> create mode 100644 include/asm-generic/iso-atomic64.h
> create mode 100644 include/asm-generic/iso-bitops.h
> create mode 100644 include/asm-generic/iso-cmpxchg.h
>
next prev parent reply other threads:[~2016-05-19 0:23 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-18 15:10 [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics David Howells
2016-05-18 15:10 ` [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics David Howells
2016-05-18 15:10 ` David Howells
2016-05-18 15:29 ` Arnd Bergmann
2016-05-18 15:10 ` [RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() David Howells
2016-05-18 15:10 ` [RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics David Howells
2016-05-18 15:10 ` David Howells
2016-05-18 17:31 ` Peter Zijlstra
2016-05-18 17:32 ` Peter Zijlstra
2016-05-18 17:32 ` Peter Zijlstra
2016-05-19 7:36 ` David Woodhouse
2016-05-19 7:45 ` Peter Zijlstra
2016-05-18 17:33 ` Peter Zijlstra
2016-05-19 9:52 ` David Howells
2016-05-19 10:50 ` Peter Zijlstra
2016-05-19 11:31 ` Peter Zijlstra
2016-05-19 11:33 ` Peter Zijlstra
2016-05-19 14:22 ` Paul E. McKenney
2016-05-19 14:41 ` Peter Zijlstra
2016-05-19 15:00 ` Paul E. McKenney
2016-05-20 9:32 ` Michael Ellerman
2016-05-20 9:32 ` Michael Ellerman
2016-05-23 18:39 ` Paul E. McKenney
2016-06-01 14:16 ` Will Deacon
2016-06-01 14:16 ` Will Deacon
2016-05-18 15:11 ` [RFC PATCH 04/15] Convert 32-bit ISO atomics into a template David Howells
2016-05-18 15:11 ` David Howells
2016-05-18 15:11 ` [RFC PATCH 05/15] Provide atomic64_t and atomic_long_t using ISO atomics David Howells
2016-05-18 15:11 ` David Howells
2016-05-18 15:11 ` [RFC PATCH 06/15] Provide 16-bit " David Howells
2016-05-18 15:11 ` David Howells
2016-05-18 17:28 ` Peter Zijlstra
2016-05-18 15:11 ` [RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics David Howells
2016-05-18 15:11 ` David Howells
2016-05-18 15:11 ` [RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics David Howells
2016-05-18 15:11 ` David Howells
2016-05-18 15:11 ` [RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally David Howells
2016-05-18 15:11 ` David Howells
2016-05-18 15:11 ` [RFC PATCH 10/15] x86: Use ISO atomics David Howells
2016-05-18 15:12 ` [RFC PATCH 11/15] x86: Use ISO bitops David Howells
2016-05-18 15:12 ` David Howells
2016-05-18 15:12 ` [RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends David Howells
2016-05-18 15:12 ` David Howells
2016-05-18 15:12 ` [RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics David Howells
2016-05-18 15:12 ` David Howells
2016-05-18 17:37 ` Peter Zijlstra
2016-05-18 15:12 ` [RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops David Howells
2016-05-18 15:12 ` David Howells
2016-05-18 15:12 ` [RFC PATCH 15/15] x86: Fix misc cmpxchg() and atomic_cmpxchg() calls to use try/return variants David Howells
2016-05-18 15:12 ` David Howells
2016-05-18 17:22 ` [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics Peter Zijlstra
2016-05-18 17:45 ` Peter Zijlstra
2016-05-18 18:05 ` Peter Zijlstra
2016-05-19 0:23 ` Paul E. McKenney [this message]
2016-06-01 14:45 ` Will Deacon
2016-06-01 14:45 ` Will Deacon
2016-06-08 20:01 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160519002344.GF3528@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=dhowells@redhat.com \
--cc=dwmw2@infradead.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ramana.radhakrishnan@arm.com \
--cc=will.deacon@arm.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).