All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: paulmck <paulmck@kernel.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alan Stern <stern@rowland.harvard.edu>,
	Andrea Parri <parri.andrea@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	David Howells <dhowells@redhat.com>,
	j alglave <j.alglave@ucl.ac.uk>,
	luc maranget <luc.maranget@inria.fr>, akiyks <akiyks@gmail.com>,
	linux-toolchains <linux-toolchains@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency
Date: Fri, 1 Oct 2021 11:28:58 -0400 (EDT)	[thread overview]
Message-ID: <971151132.46974.1633102138685.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20210929235700.GF880162@paulmck-ThinkPad-P17-Gen-1>

----- On Sep 29, 2021, at 7:57 PM, paulmck paulmck@kernel.org wrote:

> On Wed, Sep 29, 2021 at 04:47:03PM -0500, Segher Boessenkool wrote:
>> Hi!
>> 
>> On Tue, Sep 28, 2021 at 05:15:07PM -0400, Mathieu Desnoyers wrote:
>> > C99 describes that accessing volatile objects are side-effects, and that
>> > "at certain specified points in the execution sequence called sequence
>> > points, all side effects of previous evaluations shall be complete
>> > and no side effects of subsequent evaluations shall have taken
>> > place". [2]
>> 
>> But note that the kernel explicitly uses C89 (with GNU extensions).
>> Side effects are largely equal there though.
>> 
>> Also note that there may no place in the generated machine code that
>> corresponds exactly to some sequence point.  Sequence points are a
>> concept that applies to the source program and how that executes on the
>> abstract machine.
> 
> Plus the "as if" rule rears its ugly head in many of these situations.
> 
>> > +Because ctrl_dep emits distinct asm volatile within each leg of the if
>> > +statement, the compiler cannot transform the two writes to 'b' into a
>> > +conditional-move (cmov) instruction, thus ensuring the presence of a
>> > +conditional branch.  Also because the ctrl_dep emits asm volatile within
>> > +each leg of the if statement, the compiler cannot move the write to 'c'
>> > +before the conditional branch.
>> 
>> I think your reasoning here misses some things.  So many that I don't
>> know where to start to list them, every "because" and "thus" here does
>> not follow, and even the statements of fact are not a given.
>> 
>> Why do you want a conditional branch insn at all, anyway?  You really
>> want something else as far as I can see.
> 
> Because at the assembly language level on some architectures, a
> conditional branch instruction provides weak but very real and very
> useful memory-ordering properties.  Such a branch orders all loads
> whose return values feed into the branch condition before any stores
> that execute after the branch does (regardless of whether or not the
> branch was taken).  And this is all the ordering that is required for
> the use cases that Mathieu is worried about.
> 
> Yes, you can use explicit memory-barrier or acquire-load instructions,
> but those incur more overhead on some types of hardware.  The code in
> question is on a hotpath and is thus performance-critical.
> 
> It would be nice to be able to somehow tell the compiler exactly
> what the ordering constraints are ("this particular load must be
> ordered before these particular stores") and then let it (1) figure
> out that a conditional branch will do the trick and (2) generate the
> code accordingly.  But last I checked, this was not going to happen any
> time soon.  So for the time being, we have to live within the current
> capability of the tools that are available to us.
> 
> Linus points out that in all the actual control-dependent code in
> the Linux kernel, the compiler is going to be hard-pressed to fail
> to emit the required branch.  (Or in the case of ARMv8, the required
> conditional-move instruction.)
> 
> Mathieu, for his part, recently read the relevant portions of
> memory-barriers.txt (reproduced below) and would like to simplify these
> coding guidlines, which, speaking as the author of those guidelines,
> would be an extremely good thing.  His patches are attempting to move
> us in that direction.
> 
> Alternatives include: (1) Using acquire loads or memory barriers
> and accepting the loss in performance, but giving the compiler much
> less leeway, (2) Ripping all of the two-legged "if" examples from
> memory-barriers.txt and restricting control dependencies to else-less
> "if" statements, again giving the compiler less leeway, and (3) Your
> ideas here.
> 
> Does that help, or am I missing your point?

Thanks Paul, it does help explaining the motivation for relying on
control dependencies for some fast-path memory ordering in the kernel.

And yes, my main goal is to simplify the coding guide lines, but I have
not found any example of bad generated code in the tree kernel at this
point. In some cases (e.g. uses of smp_acquire__after_ctrl_dep()) it's
mainly thanks to luck though.

There is another alternative we could list here: implement ctrl_dep_true(),
ctrl_dep_false() and ctrl_dep(), which would respectively ensure a
control dependency on the then leg, on the else leg, or on both legs of
a conditional expression evaluation.

> 
>> It is essential here that there is a READ_ONCE and the WRITE_ONCE.
>> Those things might make it work the way you want, but as Linus says this
>> is all way too subtle.  Can you include the *_ONCE into the primitive
>> itself somehow?
> 
> Actually, if the store is not involved in a data race, the WRITE_ONCE()
> is not needed.  And in that case, the compiler is much less able to
> fail to provide the needed ordering.  (No, the current documentation
> does not reflect this.)  But if there is a data race, then your point
> is right on the mark -- that WRITE_ONCE() cannot be safely omitted.
> 
> But you are absolutely right that the READ_ONCE() or equivalent is not
> at all optional.  An example of an acceptable equivalent is an atomic
> read-modify-write operation such as atomic_xchg_relaxed().
> 
> The question about whether the READ_ONCE() and WRITE_ONCE() can be
> incorporated into the macro I leave to Mathieu.  I can certainly see
> serious benefits from this approach, at least from a compiler viewpoint.
> I must reserve judgment on usability until I see a proposal.

[...]

After having audited thoroughly all obviously documented control dependencies
in the kernel tree, I'm not sure that including the READ_ONCE() and WRITE_ONCE()
with the ctrl_dep() macro is a good idea, because in some cases there is
calculation to be done on the result of the READ_ONCE() (e.g. through
a static inline) before handing it over to the conditional expression.
In other cases many stores are being done after the control dependency, e.g.:

kernel/events/ring_buffer.c:__perf_output_begin()

        do {
                tail = READ_ONCE(rb->user_page->data_tail);
                offset = head = local_read(&rb->head);
                if (!rb->overwrite) {
                        if (unlikely(!ring_buffer_has_space(head, tail,
                                                            perf_data_size(rb),
                                                            size, backward)))
                                goto fail;
                }

                /*
                 * The above forms a control dependency barrier separating the
                 * @tail load above from the data stores below. Since the @tail
                 * load is required to compute the branch to fail below.
                 *
                 * A, matches D; the full memory barrier userspace SHOULD issue
                 * after reading the data and before storing the new tail
                 * position.
                 *
                 * See perf_output_put_handle().
                 */

                if (!backward)
                        head += size;
                else
                        head -= size;
        } while (local_cmpxchg(&rb->head, offset, head) != offset);

        if (backward) {
                offset = head;
                head = (u64)(-head);
        }

        /*
         * We rely on the implied barrier() by local_cmpxchg() to ensure
         * none of the data stores below can be lifted up by the compiler.
         */

[...]

Note that I suspect that this control dependency documentation could be
improved to state that the first control dependency is with the following
local_cmpxchg store, which itself has a control dependency (when it evaluates
to false) with the following stores to the ring buffer. Those are not volatile
stores, but the "memory" clobber with the local_cmpxchg should ensure that
following stores are after the local_cmpxchg in program order.

One other thing we could do to improve things slightly would be to turn
smp_acquire__after_ctrl_dep() into something which really is an acquire
in all cases, which may not currently be true if the compiler finds a
matching barrier()/smp_rmb() in the other leg after the conditional
expression.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2021-10-01 15:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28 21:15 [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency Mathieu Desnoyers
2021-09-29 12:06 ` Marco Elver
2021-10-01 15:45   ` Mathieu Desnoyers
2021-10-01 16:20     ` Linus Torvalds
2021-10-01 17:28       ` Mathieu Desnoyers
2021-10-01 18:18         ` Linus Torvalds
2021-09-29 12:28 ` Florian Weimer
2021-09-29 17:41   ` Segher Boessenkool
2021-09-29 19:46     ` Florian Weimer
2021-10-01 16:13     ` Mathieu Desnoyers
2021-10-01 16:26       ` Florian Weimer
2021-10-01 16:35         ` Linus Torvalds
2021-10-10 14:02           ` Florian Weimer
2021-10-14  0:01             ` Paul E. McKenney
2021-10-14  2:14               ` Alan Stern
2021-10-14 16:14                 ` Paul E. McKenney
2021-10-14 15:58               ` Florian Weimer
2021-10-14 16:23                 ` Paul E. McKenney
2021-10-14 18:19                   ` Florian Weimer
2021-10-14 21:09                     ` Paul E. McKenney
2021-10-14 22:36                       ` Linus Torvalds
2021-09-30 13:28   ` Mathieu Desnoyers
2021-09-29 14:47 ` Linus Torvalds
2021-09-29 14:54   ` Linus Torvalds
2021-09-29 19:50     ` Mathieu Desnoyers
2021-09-29 20:13       ` Mathieu Desnoyers
2021-09-29 19:27   ` Mathieu Desnoyers
2021-09-29 22:14     ` Linus Torvalds
2021-09-29 21:47 ` Segher Boessenkool
2021-09-29 23:57   ` Paul E. McKenney
2021-10-01 15:28     ` Mathieu Desnoyers [this message]
2021-10-01 22:53       ` Paul E. McKenney
2021-10-01 19:10     ` Segher Boessenkool
2021-10-01 22:50       ` Paul E. McKenney
2021-10-02 14:29       ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=971151132.46974.1633102138685.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akiyks@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=npiggin@gmail.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=segher@kernel.crashing.org \
    --cc=stern@rowland.harvard.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.