Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will.deacon@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
Date: Wed, 14 Jun 2017 13:33:59 +0100	[thread overview]
Message-ID: <20170614123359.GO16190@arm.com> (raw)
In-Reply-To: <20170612144929.3wiwtbqopsfpm3qk@hirez.programming.kicks-ass.net>

On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
> On Sun, Jun 11, 2017 at 09:56:32PM +0800, Boqun Feng wrote:
> 
> > I think the term we use to refer this behavior is "fully-ordered"?
> 
> Right, that is what we used to call it, and the term even occurs in
> memory-barriers.txt but isn't actually defined therein.
> 
> > Could we give it a slight formal definition like:
> > 
> > a.	memory operations preceding and following the RmW operation is
> > 	Sequentially Consistent.
> > 
> > b.	load or store part of the RmW operation is Sequentially
> > 	Consistent with operations preceding or following.
> > 
> > Though, sounds like defining "fully-ordered" is the job for
> > memory-barriers.txt, but it's never done ;-)
> 
> Right, so while memory-barriers.txt uses the term 'fully ordered' it
> doesn't appear to mean the same thing we need here.
> 
> Still, lacking anything better, I did the below. Note that I also
> removed much of the atomic stuff from memory-barrier.txt in order to
> avoid duplication and confusion (it too was severely stale).

A few more comments inline...

> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:

I'm afraid this one is still confusing me :)

> +  PRE:
> +  atomic_set(v, 1);
> +
> +  CPU0						CPU1
> +  atomic_add_unless(v, 1, 0)			atomic_set(v, 0);
> +
> +  POST:
> +  BUG_ON(v->counter == 2);
> +
> +
> +In this case we would expect the atomic_set() from CPU1 to either happen
> +before the atomic_add_unless(), in which case that latter one would no-op, or
> +_after_ in which case we'd overwrite its result. In no case is "2" a valid
> +outcome.

What do you mean by PRE and POST? Are they running on CPU0, or someplace
else (with barriers)? It sounds like you want to rule out:

  CPU1
  PRE
  CPU0
  POST

but it's tough to say whether or not that's actually forbidden.

> +This is typically true on 'normal' platforms, where a regular competing STORE
> +will invalidate a LL/SC or fail a CMPXCHG.
> +
> +The obvious case where this is not so is when we need to implement atomic ops
> +with a lock:
> +
> +
> +  CPU0
> +
> +  atomic_add_unless(v, 1, 0);
> +    lock();
> +    ret = READ_ONCE(v->counter); // == 1
> +						atomic_set(v, 0);
> +    if (ret != u)				  WRITE_ONCE(v->counter, 0);
> +      WRITE_ONCE(v->counter, ret + 1);
> +    unlock();
> +
> +
> +the typical solution is to then implement atomic_set() with atomic_xchg().
> +
> +
> +RmW ops:
> +
> +These come in various forms:
> +
> + - plain operations without return value: atomic_{}()
> +
> + - operations which return the modified value: atomic_{}_return()
> +
> +   these are limited to the arithmetic operations because those are
> +   reversible. Bitops are irreversible and therefore the modified value
> +   is of dubious utility.
> +
> + - operations which return the original value: atomic_fetch_{}()
> +
> + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> +
> + - misc; the special purpose operations that are commonly used and would,
> +   given the interface, normally be implemented using (try_)cmpxchg loops but
> +   are time critical and can, (typically) on LL/SC architectures, be more
> +   efficiently implemented.
> +
> +
> +All these operations are SMP atomic; that is, the operations (for a single
> +atomic variable) can be fully ordered and no intermediate state is lost or
> +visible.
> +
> +
> +Ordering:  (go read memory-barriers.txt first)
> +
> +The rule of thumb:
> +
> + - non-RmW operations are unordered;
> +
> + - RmW operations that have no return value are unordered;
> +
> + - RmW operations that have a return value are fully ordered;
> +
> + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> +   above rules apply.
> +
> +Except of course when an operation has an explicit ordering like:
> +
> + {}_relaxed: unordered
> + {}_acquire: the R of the RmW (or atomic_read) is an ACQUIRE
> + {}_release: the W of the RmW (or atomic_set)  is a  RELEASE
> +
> +
> +Fully ordered primitives are ordered against everything prior and everything
> +subsequenct. They also imply transitivity. Therefore a fully ordered primitive

subsequent

> +is like having an smp_mb() before and an smp_mb() after the primitive.

Actually, perhaps that's the best way to explain this: just say that
fully-ordered primitives behave as is they have an smp_mb() before and an
smp_mb() after. Defer the transitivity to memory_barriers.txt (espec. since
it makes it sounds like acquire/release have no transitivity at all).

> +
> +
> +The barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +only apply to the RmW ops and can be used to augment/upgrade the ordering
> +inherit to the used atomic op. These barriers provide a full smp_mb().
> +
> +These helper barriers exist because architectures have varying implicit
> +ordering on their SMP atomic primitives. For example our TSO architectures
> +provide full ordered atomics and these barriers are no-ops.
> +
> +Thus:
> +
> +  atomic_fetch_add();
> +
> +is equivalent to:
> +
> +  smp_mb__before_atomic();
> +  atomic_fetch_add_relaxed();
> +  smp_mb__after_atomic();
> +
> +
> +Further, while something like:
> +
> +  smp_mb__before_atomic();
> +  atomic_dec(&X);
> +
> +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> +a RELEASE.

There's also an ACQUIRE analogue here, and I think you can interwork the
{_acquire,_release} variants with the smp_mb__{before,after}_atomic
variants. On ARM64 the former will be stronger (RCsc), but the kernel memory
model doesn't distinguish. Agreed?

Will

next prev parent reply	other threads:[~2017-06-14 12:33 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-09  9:24 [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Peter Zijlstra
2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
2017-06-09 11:13   ` Peter Zijlstra
2017-06-09 17:28     ` Vineet Gupta
2017-06-09 17:28       ` Vineet Gupta
2017-06-09 18:49       ` Peter Zijlstra
2017-06-09 18:49         ` Peter Zijlstra
2017-06-09 18:58     ` James Bottomley
2017-06-09 14:03   ` Chris Metcalf
2017-08-10 12:10   ` [tip:locking/core] locking/atomic: " tip-bot for Peter Zijlstra
2017-06-09 15:44 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Will Deacon
2017-06-09 19:36   ` Peter Zijlstra
2017-06-11 13:56     ` Boqun Feng
2017-06-12 14:49       ` Peter Zijlstra
2017-06-13  6:39         ` Boqun Feng
2017-06-14 12:33         ` Will Deacon [this message]
2017-07-12 12:53         ` Boqun Feng
2017-07-12 13:08           ` Peter Zijlstra
2017-07-12 19:13             ` Paul E. McKenney
2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
2017-07-26 12:47           ` Boqun Feng
2017-07-31  9:05             ` Peter Zijlstra
2017-07-31 11:04               ` Boqun Feng
2017-07-31 17:43                 ` Paul E. McKenney
2017-08-01  2:14                   ` Boqun Feng
2017-08-01  9:01                   ` Peter Zijlstra
2017-08-01 10:19                     ` Will Deacon
2017-08-01 11:47                       ` Peter Zijlstra
2017-08-01 12:17                         ` Will Deacon
2017-08-01 12:52                           ` Peter Zijlstra
2017-08-01 16:14                           ` Paul E. McKenney
2017-08-01 16:42                             ` Peter Zijlstra
2017-08-01 16:53                               ` Will Deacon
2017-08-01 22:18                               ` Paul E. McKenney
2017-08-02  8:46                                 ` Will Deacon
2017-08-01 18:37                             ` Paul E. McKenney
2017-08-02  9:45                             ` Will Deacon
2017-08-02 16:17                               ` Paul E. McKenney
2017-08-03 14:05                               ` Boqun Feng
2017-08-03 14:55                                 ` Paul E. McKenney
2017-08-03 16:12                                   ` Will Deacon
2017-08-03 16:58                                     ` Paul E. McKenney
2017-08-01 13:35                     ` Paul E. McKenney
2017-07-26 16:28           ` Randy Dunlap
2017-06-09 18:15 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170614123359.GO16190@arm.com \
    --to=will.deacon@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.