linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A few proposals, this time from the C++ standards committee
@ 2024-03-17  9:14 Paul E. McKenney
  2024-03-17 18:50 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-17  9:14 UTC (permalink / raw)
  To: linux-toolchains; +Cc: peterz, hpa, rostedt, gregkh, keescook, torvalds

Hello!

Another language, another standards-committee meeting, another set of
potentially relevant papers.  ;-)

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

P2414R2 — Pointer lifetime-end zap proposed solutions
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2414r2.pdf

	Yet another run at making it easier to express some concurrent
	algorithms dating back to the 1970s.  There has been some
	movement on making the CAS old-value assignment recompute pointer
	provenance, and the most controversial operation turns out to be
	implementable with Linux-kernel barrier().  I have no idea how
	things will go with the notion that atomic pointers should not be
	subject to lifetime-end pointer zap.  It should be interesting...

D3181R0 — Atomic stores and object lifetimes
	This one was late to the party, so is not formally published.
	It deals with an odd corner case in the C and C++ memory models
	in which an atomic_thread_fence(memory_order_release) cannot
	completely emulate a store-release operation.  We avoided this
	problem in the Linux-kernel memory model, and hardware seems to
	do the right thing.  Actually, the speed of light, the atomic
	nature of matter, and the causal nature of the universe being
	what they appear to be, hardware would have some difficulty
	causing trouble here.  But the abstract machine is ignorant of
	the laws of physics, so this should be good clean fun!	;-)

	There is an example code fragment here:

	https://github.com/llvm/llvm-project/issues/64188

D3125R0 — Pointer tagging
	Another one that is late to the party, and thus not yet formally
	published.  The idea is to provide a way to access pointer bits
	that are not relevant to pointer dereferencing for pointers to
	properly aligned objects or that are unused high-order bits.
	It would be nice.  The devil is in the details.

CWG2298 — Actions and expression evaluation
	https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2298

	Language lawyering on portions of the C and C++ memory models.
	Nevertheless, it might be useful to tooling.

LWG3941 — atomics.order inadvertently prohibits widespread implementation techniques
	https://cplusplus.github.io/LWG/issue3941

	"Memory models are hard."  ;-)

	Everyone agrees that the implementations are doing the right
	thing, but we need to get the memory-model definition to agree.
	The Linux-kernel memory model avoids this problem by being more
	of a hardware memory model than a language-level memory model.
	(LKMM pays the price by not completely modeling compiler
	optimizations, so pick your poison carefully.)

LWG4004 — The load and store operation in atomics.order p1 is ambiguous
	https://cplusplus.github.io/LWG/issue4004

	Probably just nomenclature.  Probably.  ;-)

---

And these don't seem to have much to do with the C language, but
here they are anyway:

P3149R0 — async_scope -- Creating scopes for non-sequential concurrency
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3149r0.pdf

P3300R0 — C++ Asynchronous Parallel Algorithms
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3300r0.html

P2882R0 — An Event Model for C++ Executors
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2882r0.html

P3179R0 — C++ parallel range algorithms
	https://isocpp.org/files/papers/P3179R0.html

P3135R0 — Hazard Pointer Extensions
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3135r0.pdf

P3138R0 — views::cache_last
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3138r0.html

P2964R0 — Allowing user-defined types in std::simd
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2964r0.html

P0260R8 — C++ Concurrent Queues
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p0260r7.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
@ 2024-03-17 18:50 ` Linus Torvalds
  2024-03-17 20:56   ` Paul E. McKenney
  2024-03-17 20:50 ` Linus Torvalds
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2024-03-17 18:50 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Another language, another standards-committee meeting, another set of
> potentially relevant papers.  ;-)
>
> Thoughts?

These seem to be mainly all just due to entirely self-inflicted damage
by the C++ standards body.

Well, except for the pointer tagging, which is a reasonable feature
but I suspect the syntax and the way to let people specify *which*
bits to tag would be painful.

The self-inflicted ones seem to all be because of the horrible
syntax-based abstract machine that the C++ standards body refuses to
give up. It is the source of pretty much every single memory ordering
issue.

Yes, that whole "semantics based on high-level language syntax and
contexts" is all lovely - if you do purely functional programming and
you have zero actual interactions with hardware.

But going back all the way to K&R, the C language definition has
always had that whiff of "oh, reality actually matters", and you can
see it in how "volatile" was described. The C standards people always
distrusted that "take it closer to a real machine" for some reason,
and the C++ people seem to have actively hated it, which is part of
why C++ then got _so_ confused with what the actual semantics of
"volatile" really are, because of the whole "is a lvalue an access"
thing etc etc.

(Yes, yes, I know they then introduced "generalized lvalues" aka
"glvalues" to fix that particular braindamage, but my point is that at
no point did they realize that the problem went deeper).

The whole concept of "abstract machine" is broken. Not because it was
a bad idea originally as a way to describe some amount of portability
issues. I guarantee that is how it started for K&R - as a way to just
avoid talking about very concrete limits (word size etc).

But the C++ standard people try *SO*HARD* to describe what a valid
optimization is without ever talking about reality that it has become
a completely broken thing.

I have a solution for it all, but my solution involves throwing out
all that pointless and wasted effort, and involves talking about
optimizations in terms of actual observable differences on real
hardware. So my solution is obviously not acceptable to the C++ people
who have a serious case of Stockholm syndrome with their whole failed
model. These people refuse to admit that their whole approach is
broken.

I quote from the standard:

    The semantic descriptions in this document define a parameterized
    nondeterministic abstract machine. This document places no
    requirement on the structure of conforming implementations. In
    particular, they need not copy or emulate the structure of the
    abstract machine.

    Rather, conforming implementations are required to emulate (only)
    the observable behavior of the abstract machine as explained below.

and the problem here really is that not only does it start from a
ridiculous assumption ("parameterized nondeterministic abstract
machine"), but it ends with a problem that then needs to be defined
("observable behavior") because you started from such an overly
pointless mental exercise.

So my suggestion is that somebody put some psychoactive drugs into the
fountain machine at the next C++ standards meeting, and when all the
members are susceptible to sane suggestions, you instead tell them
that the abstract machine was a mistake. And then you tell them that
you should always generate code as if you were a "simple compiler"
(it's interesting to note that your "lifetime zap" paper actually
talks about that, so *somebody* has a f*cking clue - I haven't seen
that model of "simple compiler" in the C++ standard before).

And then you define the notion of acceptable optimizations as the ones
that have the same results as the simple compiler.

IOW, you make it all *concrete*. And the issue of memory ordering ends
up being pretty much the exact same as the issue of "volatile".
Certain loads and stores can only be combined and moved in certain
ways. Because atomics, memory ordering, and volatile are all basically
the same issue: this is where you deal with reality.

Ta-daa. No stupid abstract machine problems. No odd - and pretty much
unsolvable - impedance issues between "real hardware" and "abstract
machine". In fact, if you do it right, get rid of the "undefined
behavior" catch-all phrase for "we can't describe this, and it ends up
depending on things that depend on runtime differences".

And no,  it's not going to happen. And putting psychoactive drugs in
the fountain machine is immoral.

Too bad.

               Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
  2024-03-17 18:50 ` Linus Torvalds
@ 2024-03-17 20:50 ` Linus Torvalds
  2024-03-17 21:04   ` Paul E. McKenney
                     ` (2 more replies)
  2024-03-19  7:41 ` Marco Elver
  2024-06-05 13:52 ` Paul E. McKenney
  3 siblings, 3 replies; 24+ messages in thread
From: Linus Torvalds @ 2024-03-17 20:50 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> D3181R0 — Atomic stores and object lifetimes
>         This one was late to the party, so is not formally published.
>         It deals with an odd corner case in the C and C++ memory models
>         in which an atomic_thread_fence(memory_order_release) cannot
>         completely emulate a store-release operation.  We avoided this
>         problem in the Linux-kernel memory model, and hardware seems to
>         do the right thing.  Actually, the speed of light, the atomic
>         nature of matter, and the causal nature of the universe being
>         what they appear to be, hardware would have some difficulty
>         causing trouble here.  But the abstract machine is ignorant of
>         the laws of physics, so this should be good clean fun!  ;-)
>
>         There is an example code fragment here:
>
>         https://github.com/llvm/llvm-project/issues/64188

Looking closer at this one, it seems to be purely a compiler bug.

Assuming you want to honor memory ordering in the first place, you
*cannot* move a store that ends up later being visible to other
threads past a function call, because you don't know if that function
call might contain a memory barrier.

There's no laws of physics of speed of light or causality issues at
all. The bug they describe in that github issue happens on real
hardware, not on some kind of abstract machine.

In fact, I think the problem case can be simplified further:

  int *bug(int N)
  {
    int* p = malloc(sizeof(int));
    *p = N;
    function_call();
    return p;
  }

without having that "atomic<int>& a" argument involved at all.

If the compiler moves the store to 'p' to after the function call, and
then does a "return p" (which exposes that memory location), and the
function call has any "memory_order_release" store in it (which the
compiler cannot know), then there needs to be some guarantee that a
third party (that may have done an "acquire" on the same thing that
"function_call()" did a release on) always sees the store of N before
it sees that other store.

Now, on x86, this happens automatically, because even if you move the
"*p = N" down to after the function call, all stores are releases, so
by the time 'p' becomes visible to anybody else, you are guaranteed to
see the right ordering.

But on pretty much any other architecture than s390 and x86, you need
to add your own memory barrier if you did the store to '*p' after the
function call, because otherwise you end up violating the
'memory_order_release' in the called function that you didn't even
see.

And yes, to a compiler person, that is very annoying, because
'function_call()' itself clearly doesn't know anything about 'p', so
you'd think that there are no _possible_ visible ordering differences.

But if the C++ standards body thinks that the re-ordering is fine, the
C++ standards body is standardizing on "memory ordering is not real".

I can't find the actual standards text for this, but at least
according to cppreference.com (I don't know how official that is), we
have a very clear rule (and honestly, it's the _only_ possible sane
rule for release->consume, so I hope it's official):

   All memory writes (non-atomic and relaxed atomic) that
   happened-before the atomic store from the point of view of thread A,
   become visible side-effects within those operations in thread B into
   which the load operation carries dependency, that is, once the atomic
   load is completed, those operators and functions in thread B that use
   the value obtained from the load are guaranteed to see what thread A
   wrote to memory.

so this is all completely unambiguous. The compiler is *WRONG* to move
the store to '*p' to after the function call, unless it also adds its
own 'release' ordering.

Weak memory ordering is subtle and difficult. What else is new?

                 Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 18:50 ` Linus Torvalds
@ 2024-03-17 20:56   ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-17 20:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 11:50:08AM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Another language, another standards-committee meeting, another set of
> > potentially relevant papers.  ;-)
> >
> > Thoughts?
> 
> These seem to be mainly all just due to entirely self-inflicted damage
> by the C++ standards body.
> 
> Well, except for the pointer tagging, which is a reasonable feature
> but I suspect the syntax and the way to let people specify *which*
> bits to tag would be painful.
> 
> The self-inflicted ones seem to all be because of the horrible
> syntax-based abstract machine that the C++ standards body refuses to
> give up. It is the source of pretty much every single memory ordering
> issue.
> 
> Yes, that whole "semantics based on high-level language syntax and
> contexts" is all lovely - if you do purely functional programming and
> you have zero actual interactions with hardware.

My decades of interactions with these committees summed up in a single
sentence.  Thank you for that, it did me good!  ;-)

> But going back all the way to K&R, the C language definition has
> always had that whiff of "oh, reality actually matters", and you can
> see it in how "volatile" was described. The C standards people always
> distrusted that "take it closer to a real machine" for some reason,
> and the C++ people seem to have actively hated it, which is part of
> why C++ then got _so_ confused with what the actual semantics of
> "volatile" really are, because of the whole "is a lvalue an access"
> thing etc etc.
> 
> (Yes, yes, I know they then introduced "generalized lvalues" aka
> "glvalues" to fix that particular braindamage, but my point is that at
> no point did they realize that the problem went deeper).

I have many times had to ask those who were inveighing against volatile
whether they wanted their credit cards to continue working.  And more than
once have had to follow up with an explanation of the connection between
volatile, device drivers, kernels, computers, and credit-card processing.
(And yes, one guy actually responded that he would be happy if his credit
card stopped working.)

> The whole concept of "abstract machine" is broken. Not because it was
> a bad idea originally as a way to describe some amount of portability
> issues. I guarantee that is how it started for K&R - as a way to just
> avoid talking about very concrete limits (word size etc).
> 
> But the C++ standard people try *SO*HARD* to describe what a valid
> optimization is without ever talking about reality that it has become
> a completely broken thing.
> 
> I have a solution for it all, but my solution involves throwing out
> all that pointless and wasted effort, and involves talking about
> optimizations in terms of actual observable differences on real
> hardware. So my solution is obviously not acceptable to the C++ people
> who have a serious case of Stockholm syndrome with their whole failed
> model. These people refuse to admit that their whole approach is
> broken.
> 
> I quote from the standard:
> 
>     The semantic descriptions in this document define a parameterized
>     nondeterministic abstract machine. This document places no
>     requirement on the structure of conforming implementations. In
>     particular, they need not copy or emulate the structure of the
>     abstract machine.
> 
>     Rather, conforming implementations are required to emulate (only)
>     the observable behavior of the abstract machine as explained below.
> 
> and the problem here really is that not only does it start from a
> ridiculous assumption ("parameterized nondeterministic abstract
> machine"), but it ends with a problem that then needs to be defined
> ("observable behavior") because you started from such an overly
> pointless mental exercise.

I would prefer that the abstract machine be constrained by the laws of
physics.  Those wishing to make analysis tools are not quite so happy
with that preference, but on the other hand, almost all C and C++ code
is fed through a compiler and run on real hardware that is subject to
the constraints of the objective universe.

> So my suggestion is that somebody put some psychoactive drugs into the
> fountain machine at the next C++ standards meeting, and when all the
> members are susceptible to sane suggestions, you instead tell them
> that the abstract machine was a mistake. And then you tell them that
> you should always generate code as if you were a "simple compiler"
> (it's interesting to note that your "lifetime zap" paper actually
> talks about that, so *somebody* has a f*cking clue - I haven't seen
> that model of "simple compiler" in the C++ standard before).

I did just that (minus the psychoactive drug, just in case there is any
question) at a workshop a couple of months ago.  Things quieted down a
bit when I noted that it would be acceptable for the abstract machine
to be extended to account for the limitations of the physical universe.

> And then you define the notion of acceptable optimizations as the ones
> that have the same results as the simple compiler.
> 
> IOW, you make it all *concrete*. And the issue of memory ordering ends
> up being pretty much the exact same as the issue of "volatile".
> Certain loads and stores can only be combined and moved in certain
> ways. Because atomics, memory ordering, and volatile are all basically
> the same issue: this is where you deal with reality.
> 
> Ta-daa. No stupid abstract machine problems. No odd - and pretty much
> unsolvable - impedance issues between "real hardware" and "abstract
> machine". In fact, if you do it right, get rid of the "undefined
> behavior" catch-all phrase for "we can't describe this, and it ends up
> depending on things that depend on runtime differences".
> 
> And no,  it's not going to happen. And putting psychoactive drugs in
> the fountain machine is immoral.
> 
> Too bad.

I would of course word this a bit differently, but I cannot argue with
your overall assessment of the technical situation.

On the other hand, I have not given up hope, and so I invoke the wise
words that are often attributed ot George Box: "All models are wrong,
but some are useful."  The C++ abstract machine is a model that is
wrong in that it does not account for hardware (at least not very well)
or even for the laws of physics that govern all hardware.

But as long as the discussion is confined to the non-concurrent code
interacting with itself, the C++ abstract machine is almost always quite
useful.  Too bad that I am almost always working with concurrency and
with the underlying hardware.  On the other hand, what is life without
a challenge?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 20:50 ` Linus Torvalds
@ 2024-03-17 21:04   ` Paul E. McKenney
  2024-03-17 21:44   ` Linus Torvalds
  2024-03-18 16:32   ` Linus Torvalds
  2 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-17 21:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 01:50:02PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > D3181R0 — Atomic stores and object lifetimes
> >         This one was late to the party, so is not formally published.
> >         It deals with an odd corner case in the C and C++ memory models
> >         in which an atomic_thread_fence(memory_order_release) cannot
> >         completely emulate a store-release operation.  We avoided this
> >         problem in the Linux-kernel memory model, and hardware seems to
> >         do the right thing.  Actually, the speed of light, the atomic
> >         nature of matter, and the causal nature of the universe being
> >         what they appear to be, hardware would have some difficulty
> >         causing trouble here.  But the abstract machine is ignorant of
> >         the laws of physics, so this should be good clean fun!  ;-)
> >
> >         There is an example code fragment here:
> >
> >         https://github.com/llvm/llvm-project/issues/64188
> 
> Looking closer at this one, it seems to be purely a compiler bug.
> 
> Assuming you want to honor memory ordering in the first place, you
> *cannot* move a store that ends up later being visible to other
> threads past a function call, because you don't know if that function
> call might contain a memory barrier.
> 
> There's no laws of physics of speed of light or causality issues at
> all. The bug they describe in that github issue happens on real
> hardware, not on some kind of abstract machine.
> 
> In fact, I think the problem case can be simplified further:
> 
>   int *bug(int N)
>   {
>     int* p = malloc(sizeof(int));
>     *p = N;
>     function_call();
>     return p;
>   }
> 
> without having that "atomic<int>& a" argument involved at all.
> 
> If the compiler moves the store to 'p' to after the function call, and
> then does a "return p" (which exposes that memory location), and the
> function call has any "memory_order_release" store in it (which the
> compiler cannot know), then there needs to be some guarantee that a
> third party (that may have done an "acquire" on the same thing that
> "function_call()" did a release on) always sees the store of N before
> it sees that other store.
> 
> Now, on x86, this happens automatically, because even if you move the
> "*p = N" down to after the function call, all stores are releases, so
> by the time 'p' becomes visible to anybody else, you are guaranteed to
> see the right ordering.
> 
> But on pretty much any other architecture than s390 and x86, you need
> to add your own memory barrier if you did the store to '*p' after the
> function call, because otherwise you end up violating the
> 'memory_order_release' in the called function that you didn't even
> see.
> 
> And yes, to a compiler person, that is very annoying, because
> 'function_call()' itself clearly doesn't know anything about 'p', so
> you'd think that there are no _possible_ visible ordering differences.
> 
> But if the C++ standards body thinks that the re-ordering is fine, the
> C++ standards body is standardizing on "memory ordering is not real".
> 
> I can't find the actual standards text for this, but at least
> according to cppreference.com (I don't know how official that is), we
> have a very clear rule (and honestly, it's the _only_ possible sane
> rule for release->consume, so I hope it's official):
> 
>    All memory writes (non-atomic and relaxed atomic) that
>    happened-before the atomic store from the point of view of thread A,
>    become visible side-effects within those operations in thread B into
>    which the load operation carries dependency, that is, once the atomic
>    load is completed, those operators and functions in thread B that use
>    the value obtained from the load are guaranteed to see what thread A
>    wrote to memory.
> 
> so this is all completely unambiguous. The compiler is *WRONG* to move
> the store to '*p' to after the function call, unless it also adds its
> own 'release' ordering.
> 
> Weak memory ordering is subtle and difficult. What else is new?

All good points.  In short, if they really badly want that optimization,
they will need to provide some way to tell the compiler of ordering
provided by external functions, and a way to shut down those
optimizations.

But it just might be simpler to forgo the optimizations.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 20:50 ` Linus Torvalds
  2024-03-17 21:04   ` Paul E. McKenney
@ 2024-03-17 21:44   ` Linus Torvalds
  2024-03-17 22:02     ` Paul E. McKenney
  2024-03-18 16:32   ` Linus Torvalds
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2024-03-17 21:44 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Looking closer at this one, it seems to be purely a compiler bug.

Side note: it may be that all that protects us in the kernel from this
compiler bug is the fact that we do not let the compiler know that
"kmalloc()" returns some private memory. So for that particular
pattern, the compiler doesn't actuially know that 'p' is some private
pointer and not visible to anybody else.

In C++, particularly with 'new', the compiler might be much more aware
of the fact that nobody can possibly see 'p' outside of that function
until after the return.

So a scarier example without that kind of issue might be something like this:

    extern void unlock(void);
    extern void lock(void);
    extern void wait_for_x(int *);

    void buggy(void)
    {
        int p = 5;
        unlock();
        wait_for_x(&p);
        lock();
    }

where the basic theory of operation is that we're calling that
function with a lock held, and then that "wait_for_x()" thing does
something that exposes the value and waits for it to be changed.

And at the time of the "unlock()", a buggy compiler *might* think that
the value of "p" is entirely private to that function, so the compiler
might decide to compile this as

 - call "unlock()"

 - *then* set 'p' to 5, and pass off the address to the wait function

and that is very buggy on weakly ordered machines for the very same
reasons that that github issue was raised - it is re-ordering the
store wrt the store-release inherent in the 'unlock()'.

Now, in my quick tests, that doesn't actually happen. I sincerely hope
it is because the compiler sees "Oh, somebody is taking the address of
'p'" and just the act of that address-of will make the compiler know
that it has to serialize stores to 'p' with any function calls - even
function calls that happen before the address is taken.

But that

     https://github.com/llvm/llvm-project/issues/64188

that you linked to certainly seems to imply that some versions of
clang have made the equivalent of that mistake, and could possibly
hoist the assignment to 'p' to after the 'unlock()' call.

           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 21:44   ` Linus Torvalds
@ 2024-03-17 22:02     ` Paul E. McKenney
  2024-03-17 22:34       ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-17 22:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 02:44:09PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Looking closer at this one, it seems to be purely a compiler bug.
> 
> Side note: it may be that all that protects us in the kernel from this
> compiler bug is the fact that we do not let the compiler know that
> "kmalloc()" returns some private memory. So for that particular
> pattern, the compiler doesn't actuially know that 'p' is some private
> pointer and not visible to anybody else.

Sadly, we really do let the compiler know:

static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)

#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc

#define __malloc                        __attribute__((__malloc__))

Maybe we should stop doing so:

------------------------------------------------------------------------

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index 28566624f008f..7b4db0cd093a2 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -181,7 +181,7 @@
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute
  * clang: https://clang.llvm.org/docs/AttributeReference.html#malloc
  */
-#define __malloc                        __attribute__((__malloc__))
+#define __malloc
 
 /*
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-mode-type-attribute

------------------------------------------------------------------------

> In C++, particularly with 'new', the compiler might be much more aware
> of the fact that nobody can possibly see 'p' outside of that function
> until after the return.
> 
> So a scarier example without that kind of issue might be something like this:
> 
>     extern void unlock(void);
>     extern void lock(void);
>     extern void wait_for_x(int *);
> 
>     void buggy(void)
>     {
>         int p = 5;
>         unlock();
>         wait_for_x(&p);
>         lock();
>     }
> 
> where the basic theory of operation is that we're calling that
> function with a lock held, and then that "wait_for_x()" thing does
> something that exposes the value and waits for it to be changed.
> 
> And at the time of the "unlock()", a buggy compiler *might* think that
> the value of "p" is entirely private to that function, so the compiler
> might decide to compile this as
> 
>  - call "unlock()"
> 
>  - *then* set 'p' to 5, and pass off the address to the wait function
> 
> and that is very buggy on weakly ordered machines for the very same
> reasons that that github issue was raised - it is re-ordering the
> store wrt the store-release inherent in the 'unlock()'.
> 
> Now, in my quick tests, that doesn't actually happen. I sincerely hope
> it is because the compiler sees "Oh, somebody is taking the address of
> 'p'" and just the act of that address-of will make the compiler know
> that it has to serialize stores to 'p' with any function calls - even
> function calls that happen before the address is taken.
> 
> But that
> 
>      https://github.com/llvm/llvm-project/issues/64188
> 
> that you linked to certainly seems to imply that some versions of
> clang have made the equivalent of that mistake, and could possibly
> hoist the assignment to 'p' to after the 'unlock()' call.

Yes, it really does happen in some cases.

And I agree that there are likely a great many failure cases.

The initial examples were user error where the pointer was handed off to
some other thread without synchronizing the lifetime of the pointed-to
object.  I chastised them for this, and they eventually came up with
the external function hiding the atomic_thread_fence() from the compiler.

							Thanx, Paul

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 22:02     ` Paul E. McKenney
@ 2024-03-17 22:34       ` Linus Torvalds
  2024-03-17 23:46         ` Jonathan Martin
  2024-03-18  0:42         ` Paul E. McKenney
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2024-03-17 22:34 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 15:02, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Sadly, we really do let the compiler know:

Oops.

Oh well.

Can you please give the standards body that simplified example of
mine, together with a litmus test, and an explanation for why it's
very fundamentally wrong to move a store past a function call that you
don't know?

Because this is literally fundamental. If compilers move that store
past a random function call, they *will* have destroyed memory
ordering on arm64.

Not on x86, no. Which just means that 99% of all the testing we do for
the kernel won't find this. But on weakly ordered architectures, it
really is very very wrong, and no among ot language lawyering will
ever make it right.

Now, maybe some function attribute (like the already existing
"__attribute__((pure))" or "__attrribute__((const))" can then be used
to say "this function has no memory ordering side effects. But without
that kind of explicit knowledge, the compiler really must not do that
code movement.

And this isn't a kernel issue, This is literally a "without this, all
the memory ordering verbiage is just broken fantasy".

And honestly, compiler writers DO NOT UNDERSTAND memory ordering, and
they don't understand the whole "abstract machine" thing either. This
needs to be a litmus test with real code and real explanation. IOW,
tell them that code like this:

    extern void external_function(void);

    int *buggy(void)
    {
        int *p = new int;
        *p = 5;
        external_function();
        return p;
    }

absolutely *has* to generate code like

        mov     w0, #4
        bl      _Znwm
        mov     w8, #5
        mov     x19, x0
        str     w8, [x0]
        bl      _Z17external_functionv

on arm64, and explain to them *why* that 'str' has to be before the
function call and cannot be moved around a function.

Or explain to them that if they move that store across the function
call (because "obviously the function cannot possibly need it"), they
need to make the 'str' be a 'stlr'.

Make it very concrete, because I *guarantee* that if you explain it in
terms of some abstract machine, it's not going to really make them
understand. It's too far removed from the actual problem case.

           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 22:34       ` Linus Torvalds
@ 2024-03-17 23:46         ` Jonathan Martin
  2024-03-18  0:42         ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Jonathan Martin @ 2024-03-17 23:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains

Hello;

I doubt I have any reason to speak on this but I am compelled to try. I have been thinking this over for a long time and this may be an appropriate entry point.

If the OSI model is combined with the six-tier hierarchy of controls used in safety for industrial systems, you get something that can permutate by substituting the application layer with any other position in the ring. I’ve named this as a substitute for the clamoring of idiocy that is DevOps Tree(3).0 to be “Developemnt of Secure Applications,” and “Secure Operations Development Authority.” Dosa with Soda.

The OSI model is obvious enough that I might as well be handing over a toddler's puzzle cube, but I don’t know if you are aware of the six tier hierarchy of controls since the five-tier model has better advertising:

ELIMINATION, SUBSTITUTION, ISOLATION;

ENGINEERING CONTROLS, ADMINISTRATIVE CONTROLS, Personal Protective Equipment;

Additionally:

PRESENTAITON, SESSION, TRANSPORT;

NETWORK, DATA-LINK, PHYSICAL;

<< APPLICATION -> ROLE

If it follows naturally through the maxim of “God Programmers Create God Programs,” actual and real implementation of this model would lead to a massive reduction in lead time for the Five stages of Project Management.

~A9

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 22:34       ` Linus Torvalds
  2024-03-17 23:46         ` Jonathan Martin
@ 2024-03-18  0:42         ` Paul E. McKenney
  2024-03-18  1:49           ` Linus Torvalds
  1 sibling, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-18  0:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 03:34:17PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 15:02, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Sadly, we really do let the compiler know:
> 
> Oops.
> 
> Oh well.

I was confused into thinking the same a few years back as well.  :-/

> Can you please give the standards body that simplified example of
> mine, together with a litmus test, and an explanation for why it's
> very fundamentally wrong to move a store past a function call that you
> don't know?
> 
> Because this is literally fundamental. If compilers move that store
> past a random function call, they *will* have destroyed memory
> ordering on arm64.
> 
> Not on x86, no. Which just means that 99% of all the testing we do for
> the kernel won't find this. But on weakly ordered architectures, it
> really is very very wrong, and no among ot language lawyering will
> ever make it right.
> 
> Now, maybe some function attribute (like the already existing
> "__attribute__((pure))" or "__attrribute__((const))" can then be used
> to say "this function has no memory ordering side effects. But without
> that kind of explicit knowledge, the compiler really must not do that
> code movement.
> 
> And this isn't a kernel issue, This is literally a "without this, all
> the memory ordering verbiage is just broken fantasy".
> 
> And honestly, compiler writers DO NOT UNDERSTAND memory ordering, and
> they don't understand the whole "abstract machine" thing either.

The compiler writers' protestations about concurrency being a niche use
case certainly are wearing a bit thin, aren't they?  My smartphone has
eight hardware threads, which was considered to be a huge number not
that many decades back.  ;-)

On the other hand, there is much more awareness of concurrency in that
group than 20 years ago, so there is hope.

>                                                                  This
> needs to be a litmus test with real code and real explanation. IOW,
> tell them that code like this:
> 
>     extern void external_function(void);
> 
>     int *buggy(void)
>     {
>         int *p = new int;
>         *p = 5;
>         external_function();
>         return p;
>     }
> 
> absolutely *has* to generate code like
> 
>         mov     w0, #4
>         bl      _Znwm
>         mov     w8, #5
>         mov     x19, x0
>         str     w8, [x0]
>         bl      _Z17external_functionv
> 
> on arm64, and explain to them *why* that 'str' has to be before the
> function call and cannot be moved around a function.
> 
> Or explain to them that if they move that store across the function
> call (because "obviously the function cannot possibly need it"), they
> need to make the 'str' be a 'stlr'.
> 
> Make it very concrete, because I *guarantee* that if you explain it in
> terms of some abstract machine, it's not going to really make them
> understand. It's too far removed from the actual problem case.

Done.

One interesting complication is a guarantee of ordering versus the
possibility of ordering.  If a function is unmarked, the compiler must
assume that it might provide full ordering, but it cannot rely on any
ordering at all.

It should be an interesting discussion.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  0:42         ` Paul E. McKenney
@ 2024-03-18  1:49           ` Linus Torvalds
  2024-03-18  2:44             ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2024-03-18  1:49 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On the other hand, there is much more awareness of concurrency in that
> group than 20 years ago, so there is hope.

Yeah. But when I say "compiler writers don't understand memory
ordering", it's not that I think they need to be singled out - pretty
much *nobody* understands it.

Christ, I'm supposed to know it fairly well, and I still get it wrong
myself regularly and have to really think about it (and honestly just
prefer leaning on a few standard patterns rather than having to think
about it too much).

So "awareness of concurrency" is one thing, and I agree it's getting
much better.

Actually getting memory ordering right - even when you are aware of
concurrency - is another thing entirely.

                 Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  1:49           ` Linus Torvalds
@ 2024-03-18  2:44             ` Paul E. McKenney
  2024-03-18  2:57               ` Randy Dunlap
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-18  2:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On the other hand, there is much more awareness of concurrency in that
> > group than 20 years ago, so there is hope.
> 
> Yeah. But when I say "compiler writers don't understand memory
> ordering", it's not that I think they need to be singled out - pretty
> much *nobody* understands it.

Fair enough!

> Christ, I'm supposed to know it fairly well, and I still get it wrong
> myself regularly and have to really think about it (and honestly just
> prefer leaning on a few standard patterns rather than having to think
> about it too much).
> 
> So "awareness of concurrency" is one thing, and I agree it's getting
> much better.
> 
> Actually getting memory ordering right - even when you are aware of
> concurrency - is another thing entirely.

Agreed, myself included.  So we should all use the standard patterns where
we can, getting ourselves into memory-model trouble when those patterns
are not cutting it.  And over time, we add to the standard patterns.

But we are making progress.  Fifty years ago, the consensus was that
developers could not be trusted to get while-loop conditions right.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  2:44             ` Paul E. McKenney
@ 2024-03-18  2:57               ` Randy Dunlap
  2024-03-18  4:42                 ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Randy Dunlap @ 2024-03-18  2:57 UTC (permalink / raw)
  To: paulmck, Linus Torvalds
  Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook



On 3/17/24 19:44, Paul E. McKenney wrote:
> On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
>> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On the other hand, there is much more awareness of concurrency in that
>>> group than 20 years ago, so there is hope.
>>
>> Yeah. But when I say "compiler writers don't understand memory
>> ordering", it's not that I think they need to be singled out - pretty
>> much *nobody* understands it.
> 
> Fair enough!
> 
>> Christ, I'm supposed to know it fairly well, and I still get it wrong
>> myself regularly and have to really think about it (and honestly just
>> prefer leaning on a few standard patterns rather than having to think
>> about it too much).
>>
>> So "awareness of concurrency" is one thing, and I agree it's getting
>> much better.
>>
>> Actually getting memory ordering right - even when you are aware of
>> concurrency - is another thing entirely.
> 
> Agreed, myself included.  So we should all use the standard patterns where
> we can, getting ourselves into memory-model trouble when those patterns
> are not cutting it.  And over time, we add to the standard patterns.
> 
> But we are making progress.  Fifty years ago, the consensus was that
> developers could not be trusted to get while-loop conditions right.  ;-)

I was using for loops and do-until loops 50 years ago, but maybe not "while"
loops. Or are you off by 10 years or so?

-- 
#Randy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  2:57               ` Randy Dunlap
@ 2024-03-18  4:42                 ` Paul E. McKenney
  2024-03-18  4:45                   ` Randy Dunlap
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2024-03-18  4:42 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linus Torvalds, linux-toolchains, peterz, hpa, rostedt, gregkh,
	keescook

On Sun, Mar 17, 2024 at 07:57:31PM -0700, Randy Dunlap wrote:
> 
> 
> On 3/17/24 19:44, Paul E. McKenney wrote:
> > On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
> >> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>
> >>> On the other hand, there is much more awareness of concurrency in that
> >>> group than 20 years ago, so there is hope.
> >>
> >> Yeah. But when I say "compiler writers don't understand memory
> >> ordering", it's not that I think they need to be singled out - pretty
> >> much *nobody* understands it.
> > 
> > Fair enough!
> > 
> >> Christ, I'm supposed to know it fairly well, and I still get it wrong
> >> myself regularly and have to really think about it (and honestly just
> >> prefer leaning on a few standard patterns rather than having to think
> >> about it too much).
> >>
> >> So "awareness of concurrency" is one thing, and I agree it's getting
> >> much better.
> >>
> >> Actually getting memory ordering right - even when you are aware of
> >> concurrency - is another thing entirely.
> > 
> > Agreed, myself included.  So we should all use the standard patterns where
> > we can, getting ourselves into memory-model trouble when those patterns
> > are not cutting it.  And over time, we add to the standard patterns.
> > 
> > But we are making progress.  Fifty years ago, the consensus was that
> > developers could not be trusted to get while-loop conditions right.  ;-)
> 
> I was using for loops and do-until loops 50 years ago, but maybe not "while"
> loops. Or are you off by 10 years or so?

So was I.  Yet in the late 1970s, I attended a talk by a guy named
Edsger Dijktra with examples claiming that you could not trust ordinary
developers to correctly write "while" loops.  Sort of like some people
today claim that ordinary developers cannot be trusted to write concurrent
code.

Of course, one might reasonably argue that developers cannot be trusted
to write much of any code at all.  Some days I would agree.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  4:42                 ` Paul E. McKenney
@ 2024-03-18  4:45                   ` Randy Dunlap
  0 siblings, 0 replies; 24+ messages in thread
From: Randy Dunlap @ 2024-03-18  4:45 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, linux-toolchains, peterz, hpa, rostedt, gregkh,
	keescook



On 3/17/24 21:42, Paul E. McKenney wrote:
> On Sun, Mar 17, 2024 at 07:57:31PM -0700, Randy Dunlap wrote:
>>
>>
>> On 3/17/24 19:44, Paul E. McKenney wrote:
>>> On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
>>>> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>
>>>>> On the other hand, there is much more awareness of concurrency in that
>>>>> group than 20 years ago, so there is hope.
>>>>
>>>> Yeah. But when I say "compiler writers don't understand memory
>>>> ordering", it's not that I think they need to be singled out - pretty
>>>> much *nobody* understands it.
>>>
>>> Fair enough!
>>>
>>>> Christ, I'm supposed to know it fairly well, and I still get it wrong
>>>> myself regularly and have to really think about it (and honestly just
>>>> prefer leaning on a few standard patterns rather than having to think
>>>> about it too much).
>>>>
>>>> So "awareness of concurrency" is one thing, and I agree it's getting
>>>> much better.
>>>>
>>>> Actually getting memory ordering right - even when you are aware of
>>>> concurrency - is another thing entirely.
>>>
>>> Agreed, myself included.  So we should all use the standard patterns where
>>> we can, getting ourselves into memory-model trouble when those patterns
>>> are not cutting it.  And over time, we add to the standard patterns.
>>>
>>> But we are making progress.  Fifty years ago, the consensus was that
>>> developers could not be trusted to get while-loop conditions right.  ;-)
>>
>> I was using for loops and do-until loops 50 years ago, but maybe not "while"
>> loops. Or are you off by 10 years or so?
> 
> So was I.  Yet in the late 1970s, I attended a talk by a guy named
> Edsger Dijktra with examples claiming that you could not trust ordinary
> developers to correctly write "while" loops.  Sort of like some people
> today claim that ordinary developers cannot be trusted to write concurrent
> code.
> 
> Of course, one might reasonably argue that developers cannot be trusted
> to write much of any code at all.  Some days I would agree.  ;-)

Ack that.

-- 
#Randy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 20:50 ` Linus Torvalds
  2024-03-17 21:04   ` Paul E. McKenney
  2024-03-17 21:44   ` Linus Torvalds
@ 2024-03-18 16:32   ` Linus Torvalds
  2024-03-18 16:48     ` H. Peter Anvin
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2024-03-18 16:32 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

[ Final note on this, I hope ]

On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Now, on x86, this happens automatically, because even if you move the
> "*p = N" down to after the function call, all stores are releases, so
> by the time 'p' becomes visible to anybody else, you are guaranteed to
> see the right ordering.

Actually, I take that back.

Even x86 (and s390) can see problems from the "move store past a
function call" issue, although they require more effort by the
compiler.

Because while it is true that every store is a release on x86, and as
such any store that exposes the address to another CPU will
automatically have done the release that also guarantees that the
value 'N' is visible to any other thread (and then the acquire will
guarantee that the other end sees the right value), that doesn't
necessarily fix the problem.

Why? Once the compiler has missed the original memory barrier (that
was in the function that it moved the store past), the compiler could
end up doing further store movement, and simply generate the code to
do the '*p = N' store after exposing the address.

At that point, even a strong memory ordering won't help - although it
would probably make the problem easier to spot as a human (and would
probably make it easier to trigger too, since then things like
interrupts, preemption or single-stepping would also make the window
to see it much much bigger).

So x86 wouldn't be immune to this, it would just require more
reordering by the compiler (which might in turn require that the
function was inlined in order to give that re-ordering possibility, of
course).

               Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18 16:32   ` Linus Torvalds
@ 2024-03-18 16:48     ` H. Peter Anvin
  0 siblings, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2024-03-18 16:48 UTC (permalink / raw)
  To: Linus Torvalds, paulmck
  Cc: linux-toolchains, peterz, rostedt, gregkh, keescook

On March 18, 2024 9:32:55 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>[ Final note on this, I hope ]
>
>On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
><torvalds@linux-foundation.org> wrote:
>>
>> Now, on x86, this happens automatically, because even if you move the
>> "*p = N" down to after the function call, all stores are releases, so
>> by the time 'p' becomes visible to anybody else, you are guaranteed to
>> see the right ordering.
>
>Actually, I take that back.
>
>Even x86 (and s390) can see problems from the "move store past a
>function call" issue, although they require more effort by the
>compiler.
>
>Because while it is true that every store is a release on x86, and as
>such any store that exposes the address to another CPU will
>automatically have done the release that also guarantees that the
>value 'N' is visible to any other thread (and then the acquire will
>guarantee that the other end sees the right value), that doesn't
>necessarily fix the problem.
>
>Why? Once the compiler has missed the original memory barrier (that
>was in the function that it moved the store past), the compiler could
>end up doing further store movement, and simply generate the code to
>do the '*p = N' store after exposing the address.
>
>At that point, even a strong memory ordering won't help - although it
>would probably make the problem easier to spot as a human (and would
>probably make it easier to trigger too, since then things like
>interrupts, preemption or single-stepping would also make the window
>to see it much much bigger).
>
>So x86 wouldn't be immune to this, it would just require more
>reordering by the compiler (which might in turn require that the
>function was inlined in order to give that re-ordering possibility, of
>course).
>
>               Linus

Hardware memory order doesn't mean anything if the compiler is the one messing it up...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
  2024-03-17 18:50 ` Linus Torvalds
  2024-03-17 20:50 ` Linus Torvalds
@ 2024-03-19  7:41 ` Marco Elver
  2024-03-19  8:07   ` Jakub Jelinek
  2024-06-05 13:52 ` Paul E. McKenney
  3 siblings, 1 reply; 24+ messages in thread
From: Marco Elver @ 2024-03-19  7:41 UTC (permalink / raw)
  To: paulmck
  Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook,
	torvalds, Evgenii Stepanov, Kostya Serebryany

Hi Paul,

On Sun, 17 Mar 2024 at 10:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> ------------------------------------------------------------------------
[...]
> D3125R0 — Pointer tagging
>         Another one that is late to the party, and thus not yet formally
>         published.  The idea is to provide a way to access pointer bits
>         that are not relevant to pointer dereferencing for pointers to
>         properly aligned objects or that are unused high-order bits.
>         It would be nice.  The devil is in the details.

You mention it's not formally published, but is there a draft that is
already accessible somewhere?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-19  7:41 ` Marco Elver
@ 2024-03-19  8:07   ` Jakub Jelinek
  0 siblings, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2024-03-19  8:07 UTC (permalink / raw)
  To: Marco Elver
  Cc: paulmck, linux-toolchains, peterz, hpa, rostedt, gregkh, keescook,
	torvalds, Evgenii Stepanov, Kostya Serebryany

On Tue, Mar 19, 2024 at 08:41:27AM +0100, Marco Elver wrote:
> Hi Paul,
> 
> On Sun, 17 Mar 2024 at 10:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> > ------------------------------------------------------------------------
> [...]
> > D3125R0 — Pointer tagging
> >         Another one that is late to the party, and thus not yet formally
> >         published.  The idea is to provide a way to access pointer bits
> >         that are not relevant to pointer dereferencing for pointers to
> >         properly aligned objects or that are unused high-order bits.
> >         It would be nice.  The devil is in the details.
> 
> You mention it's not formally published, but is there a draft that is
> already accessible somewhere?

https://wg21.link/D3125R0

	Jakub


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
                   ` (2 preceding siblings ...)
  2024-03-19  7:41 ` Marco Elver
@ 2024-06-05 13:52 ` Paul E. McKenney
  2024-06-05 18:08   ` Linus Torvalds
  3 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2024-06-05 13:52 UTC (permalink / raw)
  To: linux-toolchains; +Cc: peterz, hpa, rostedt, gregkh, keescook, torvalds

Hello!

And another set of potentially relevant papers, this time back to the C
standards committee.  I have commented on the ones that I expect would
be of interest, but please feel free to take a look at the others.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

Full list:  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3248.pdf
	The main purpose of the meeting is resolving formal issues,
	so this list is aspirational.

N3210 A String Type for C, Martin Uecker
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3210.pdf

	The proposed strings have explicit size of type size_t and a
	variable-length array of characters.  Maybe not of interest
	for the kernel, but of historical interest, given some of the
	late-1970s and early 1980s opposition to NUL-terminated strings.
	Of course, dedicating 64 bits to the string length would have
	been a no-go back then.  ;-)

N3211 Memory-Safety in C, Martin Uecker
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3211.pdf

	An interesting (and self-described) experiment to obtain
	compile-time checkable memory safety within the bounds and spirit
	of the C language.  There is a static mode that restricts the code
	to that which can be checked for memory safety at compile time,
	and a dynamic mode that includes run-time checks (and which is
	less restrictive.  The mode is selected with a pragma.

	There is not yet a prototype implementation, so early days.

N3212 Polymorphic Types, Martin Uecker
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3212.pdf

	Proposes function overloading.  This proposal avoids some
	typeof() issues by proposing a new _Typeof().  This proposal
	claims to permit changing a function to overloaded without
	changing the ABI for that function.

	If this works out, it might be used to replace some macros.
	The proposal calls out libffi as a target for this purpose.

	Also early days for this proposal.

N3214 Generic selection expression with a type operand, Aaron Ballman
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3214.pdf

	This proposes standardizing a Clang extension that extends
	the _Generic() keyword to take a type name as well as an
	assignment-expression.	This is a step towards allowing more
	natural generics in C.

	This appears to be a reasonably mature proposal, given the
	existing work in Clang.  The GCC folks might have opinions,
	of course.

N3223 The C Standard charter, Seacord et al.
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3223.pdf

	This document describes that the C standards committee is working
	towards and what they take into account in their work.

N3254 Accessing byte arrays, v4, Seacord et al.
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3254.pdf

	This document proposes standardizing GCC's approach for casting
	between structures and arrays of bytes, addresssing one annoying
	case of undefined behavior.  Atomics are an issue on deep embedded
	platforms, and one way forward is to make support for casting
	to and from atomics optional.

	This appears to be a reasonably mature proposal, given the
	existing work in GCC.  The Clang folks might have opinions,
	of course.

n3234 The semantics of the restrict qualifier, Jens Gustedt
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3234.htm

	This document attempts to formalize and rationalize the semantics
	of the C-language "restrict" keyword.  I have not yet read it
	carefully enough to have an informeed opinion on it, other than
	on the bravery of the author.  ;-)

	My perhaps overly cynical guess is that the Linux kernel will
	want to stick with its current command-line approach to this
	issue, but who knows?

n3243 A Memory model with Synchronization based type aliasing V1,
	Eskil Steenberg Hald
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3243.pdf

	This document's purpose is to allow the compiler more freedom
	to optimize in the face of pointer casts by introducing some
	additional undefined behavior.	It also introduces the notion
	of an aliasing barrier, which might well avoid much of the damage.
	In addition, the author does explicitly note the need to avoid
	breaking existing code, even if that code does have UB.  One
	surprising (to me) quote:

		"Our current thinking is that, perhaps the best way
		forward is to define an alternative memory model for C,
		based on these ideas, current implementations, existing
		code, and borrowing some from the Linux kernel memory
		model, and using external standards such as posix for
		concurrency."

	Perhaps it is time to refine the LKMM formalizations surrounding
	dependencies, especially given another surprising quote for an
	issue with the current C memory model: "Numerous issues with
	the concurrency model, like dependent reads."  The surprise is
	that the issue is stated to be with the language's concurrency
	model instead of with the dependencies themselves.

	This is early days for this proposal, and thus a good time to
	review and provide feedback.

N3244 Slay Some Earthly Demons I, Martin Uecker
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3244.pdf

	This proposal takes on a number of cases of undefined behavior,
	recommending compiler diagnostics in some cases and removing or
	weakening the undefinedness in others.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-06-05 13:52 ` Paul E. McKenney
@ 2024-06-05 18:08   ` Linus Torvalds
  2024-06-05 18:24     ` Linus Torvalds
  2024-06-05 19:12     ` Paul E. McKenney
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2024-06-05 18:08 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Wed, 5 Jun 2024 at 06:52, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> n3243 A Memory model with Synchronization based type aliasing V1,
>         Eskil Steenberg Hald
>         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3243.pdf

I love the part that admits that the type-based aliasing is broken.

I don't think the "cast-based barriers" are a sufficient improvement.

Note the "sufficient". I do think a cast-based barrier would have been
a huge improvement *originally*, in that it would avoid two absolutely
huge issues:

 - "char *" being special

 - the insane model of "use a union to tell the compiler type-base
aliasing can happen"

Both of the above things are disgusting and wrong, but they are mostly
disgusting and wrong simply because type-based aliasing is entirely
and utterly wrong to begin with.

Saying "pointer casts are an aliasing barrier" is a much better and
more logical model for the type-based thing. No question about that.
They help make the insanity that is type-based aliasing much more
manageable.

However, I don't see that it would be sufficient for us to ever stop
using the "-fno-strict-aliasing" thing. Because type-based aliasing
continues to be insane, and even with pointer casting acting as a
barrier, has real problems.

The paper points out one such problem: the cast may have been done
long long before the accesses are done. In fact, unions continue to be
one very real case of such a situation, where the pointer cast
basically comes from the type system itself.

But even with explicit casts, those casts may very naturally be before
the accesses (the examples in the paper are simplistic, but we have
the "prepare casts" pattern in the kernel in places like this:

  static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
                struct file *filp)
  {
        void __user *argp = (void __user *)arg;
        int argi = (int)arg;
        ...

which admittedly isn't about two pointers that can alias, but is very
much an example of "it's more convenient to 'prep' the casts before
use", when the 'arg' argument is then used in multiple different ways
- sometimes as a pointer, sometimes as an integer, and sometimes as
the original 'unsigned long'.

I personally think that the whole type-based aliasing is fundamentally
unfixable, and that the C standards committee should just admit that.
It doesn't even *work*, because often the types are the same anyway,
even when you really really want to say "these accesses can't alias".

The whole type-based aliasing is literally designed for - and by - HPC
people who (a) had no taste, (b) didn't understand language design and
just hacked sh*t together and (c) were working with clearly distinct
types because their workloads are trivial.

The HPC people literally tried to solve the issue of "counters are
integers, but our data is FP, and the two have obviously different
types, so let's use that information for alias analysis".

Anybody who doesn't understand how broken and hacky that is SHOULD NOT
BE ON A LANGUAGE COMMITTEE.

Seriously. I think it should be a fundamental filter for any C
language committee member: "Do you think type-based aliasing makes
sense?". If you get anything but an immediate "No!",  you pull the
lever that opens the trap-door to the crocodile-infested waters below.

Or sharks. Sharks are good too.

And no, "restrict" isn't great, but at least it's a better concept.

I would suggest that people look at improving 'restrict' and making it
more useful, and just admit that the type-based thing was a mistake.

                        Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-06-05 18:08   ` Linus Torvalds
@ 2024-06-05 18:24     ` Linus Torvalds
  2024-06-05 19:16       ` Paul E. McKenney
  2024-06-05 19:12     ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2024-06-05 18:24 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Wed, 5 Jun 2024 at 11:08, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I would suggest that people look at improving 'restrict' and making it
> more useful, and just admit that the type-based thing was a mistake.

Note that I did see the other proposal on 'restrict', but I think that
one was a pretty small improvement.

I think people should work on making it work better in general. Real
compilers already effectively do that thin in much more interesting
ways, as part of finding the origin of a pointer.

For example, both clang and gcc have a notion of "alloc-like" functions:

   __attribute__((__malloc__))

which is a function attribute that basically says "the returned
pointer is a 'restricted' pointer". Except it is much better than the
'restrict' keyword, in that it actually works on real loads.

So I think the real answer to type-based aliasing is to throw the
garbage out, and instead help extend on existing notions of
"provenance of where the pointer came from".

Because compilers already do a *lot* of that kind of alias analysis,
and I think the proper approach is to strive to help compilers do
better on something reliable, instead of working around the fact that
some rodent-like creature got dropped on its head a few too many
times, and came up with the notion of type-based aliasing.

                     Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-06-05 18:08   ` Linus Torvalds
  2024-06-05 18:24     ` Linus Torvalds
@ 2024-06-05 19:12     ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2024-06-05 19:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Wed, Jun 05, 2024 at 11:08:40AM -0700, Linus Torvalds wrote:
> On Wed, 5 Jun 2024 at 06:52, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > n3243 A Memory model with Synchronization based type aliasing V1,
> >         Eskil Steenberg Hald
> >         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3243.pdf
> 
> I love the part that admits that the type-based aliasing is broken.
> 
> I don't think the "cast-based barriers" are a sufficient improvement.
> 
> Note the "sufficient". I do think a cast-based barrier would have been
> a huge improvement *originally*, in that it would avoid two absolutely
> huge issues:
> 
>  - "char *" being special
> 
>  - the insane model of "use a union to tell the compiler type-base
> aliasing can happen"
> 
> Both of the above things are disgusting and wrong, but they are mostly
> disgusting and wrong simply because type-based aliasing is entirely
> and utterly wrong to begin with.
> 
> Saying "pointer casts are an aliasing barrier" is a much better and
> more logical model for the type-based thing. No question about that.
> They help make the insanity that is type-based aliasing much more
> manageable.
> 
> However, I don't see that it would be sufficient for us to ever stop
> using the "-fno-strict-aliasing" thing. Because type-based aliasing
> continues to be insane, and even with pointer casting acting as a
> barrier, has real problems.
> 
> The paper points out one such problem: the cast may have been done
> long long before the accesses are done. In fact, unions continue to be
> one very real case of such a situation, where the pointer cast
> basically comes from the type system itself.
> 
> But even with explicit casts, those casts may very naturally be before
> the accesses (the examples in the paper are simplistic, but we have
> the "prepare casts" pattern in the kernel in places like this:
> 
>   static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>                 struct file *filp)
>   {
>         void __user *argp = (void __user *)arg;
>         int argi = (int)arg;
>         ...
> 
> which admittedly isn't about two pointers that can alias, but is very
> much an example of "it's more convenient to 'prep' the casts before
> use", when the 'arg' argument is then used in multiple different ways
> - sometimes as a pointer, sometimes as an integer, and sometimes as
> the original 'unsigned long'.
> 
> I personally think that the whole type-based aliasing is fundamentally
> unfixable, and that the C standards committee should just admit that.
> It doesn't even *work*, because often the types are the same anyway,
> even when you really really want to say "these accesses can't alias".

All good points!

> The whole type-based aliasing is literally designed for - and by - HPC
> people who (a) had no taste, (b) didn't understand language design and
> just hacked sh*t together and (c) were working with clearly distinct
> types because their workloads are trivial.

To your point, my feeling back in the day was that all of this was
designed to allow FORTRAN programs to be more easily ported to C and C++.

> The HPC people literally tried to solve the issue of "counters are
> integers, but our data is FP, and the two have obviously different
> types, so let's use that information for alias analysis".
> 
> Anybody who doesn't understand how broken and hacky that is SHOULD NOT
> BE ON A LANGUAGE COMMITTEE.
> 
> Seriously. I think it should be a fundamental filter for any C
> language committee member: "Do you think type-based aliasing makes
> sense?". If you get anything but an immediate "No!",  you pull the
> lever that opens the trap-door to the crocodile-infested waters below.
> 
> Or sharks. Sharks are good too.

;-) ;-) ;-)

> And no, "restrict" isn't great, but at least it's a better concept.
> 
> I would suggest that people look at improving 'restrict' and making it
> more useful, and just admit that the type-based thing was a mistake.

One of the complaints was that people don't use "restrict" often enough
to allow all the optimizations that compiler writers would like to do.
Fortunately, there seems to be increasing levels of understanding that
the generated code needs to do something useful.  Here is hoping, anyway!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-06-05 18:24     ` Linus Torvalds
@ 2024-06-05 19:16       ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2024-06-05 19:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Wed, Jun 05, 2024 at 11:24:58AM -0700, Linus Torvalds wrote:
> On Wed, 5 Jun 2024 at 11:08, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I would suggest that people look at improving 'restrict' and making it
> > more useful, and just admit that the type-based thing was a mistake.
> 
> Note that I did see the other proposal on 'restrict', but I think that
> one was a pretty small improvement.
> 
> I think people should work on making it work better in general. Real
> compilers already effectively do that thin in much more interesting
> ways, as part of finding the origin of a pointer.
> 
> For example, both clang and gcc have a notion of "alloc-like" functions:
> 
>    __attribute__((__malloc__))
> 
> which is a function attribute that basically says "the returned
> pointer is a 'restricted' pointer". Except it is much better than the
> 'restrict' keyword, in that it actually works on real loads.
> 
> So I think the real answer to type-based aliasing is to throw the
> garbage out, and instead help extend on existing notions of
> "provenance of where the pointer came from".

Given a suitable way to adjust provenance as needed to enable easy coding
of concurrent ABA-tolerant algorithms, agreed.  I am working on this,
but first in C++.  The C meeting is next week, and the C++ one is the
last full week in June, so more on that later.

> Because compilers already do a *lot* of that kind of alias analysis,
> and I think the proper approach is to strive to help compilers do
> better on something reliable, instead of working around the fact that
> some rodent-like creature got dropped on its head a few too many
> times, and came up with the notion of type-based aliasing.

No doubt type-based aliasing seemed like a good idea at the time.  :-/

It could be worse, they could be applying ML to optimization...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-06-05 19:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
2024-03-17 18:50 ` Linus Torvalds
2024-03-17 20:56   ` Paul E. McKenney
2024-03-17 20:50 ` Linus Torvalds
2024-03-17 21:04   ` Paul E. McKenney
2024-03-17 21:44   ` Linus Torvalds
2024-03-17 22:02     ` Paul E. McKenney
2024-03-17 22:34       ` Linus Torvalds
2024-03-17 23:46         ` Jonathan Martin
2024-03-18  0:42         ` Paul E. McKenney
2024-03-18  1:49           ` Linus Torvalds
2024-03-18  2:44             ` Paul E. McKenney
2024-03-18  2:57               ` Randy Dunlap
2024-03-18  4:42                 ` Paul E. McKenney
2024-03-18  4:45                   ` Randy Dunlap
2024-03-18 16:32   ` Linus Torvalds
2024-03-18 16:48     ` H. Peter Anvin
2024-03-19  7:41 ` Marco Elver
2024-03-19  8:07   ` Jakub Jelinek
2024-06-05 13:52 ` Paul E. McKenney
2024-06-05 18:08   ` Linus Torvalds
2024-06-05 18:24     ` Linus Torvalds
2024-06-05 19:16       ` Paul E. McKenney
2024-06-05 19:12     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).