A few proposals from the C standards committee

linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* A few proposals from the C standards committee
@ 2024-01-23 16:46 Paul E. McKenney
  2024-01-23 18:58 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-23 16:46 UTC (permalink / raw)
  To: linux-toolchains; +Cc: peterz, hpa, rostedt, gregkh, keescook, torvalds

Hello!

On the perhaps unlikely off-chance that any of this is of interest.

							Thanx, Paul

------------------------------------------------------------------------

List of proposals with clickable links:

https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log

N3089 _Optional: a type qualifier to indicate pointer nullability
	Proposes _Optional to tag pointer parameters such that
	dereferencing the pointer without first checking for NULL gets
	a compiler warning.
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3089.pdf

N3190 Extensions to the preprocessor for C2Y
	Proposes a number of macros, including things that return a
	count of their arguments.
	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3090.htm

N3194 Case range expressions
	No fewer than 421 files in the Linux kernel use the "..." syntax,
	as in "case 1 ... 3", but there are other syntaxes...  So they
	are proposing "::" instead.  My guess is that "..." won't be
	going away anytime soon.

N3195 Named loops
	Placing a goto label before a loop allows a break/continue to
	target that loop in case of nesting.

n3203 Strict order of expression evaluation
	I do like it.  The 1980s were over a long time ago.

N3199 Improved __attribute__((cleanup)) Through defer
N3198 Conditionally Supported Unwinding
	The Linux kernel is starting to use __attribute__((cleanup))
	via guard(), with 40 files making use of this.	It is not clear
	to me whether or not either of these proposals would be useful
	to the Linux kernel.

N3201 Operator Overloading Without Name Mangling v2
	I have seen Linux-kernel interest in *function* overloading, but
	not in operator overloading.  Nevertheless...

	The trick here is to associate a given operator with a function,
	so that the name-mangling becomes essentially a manual operation.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 16:46 A few proposals from the C standards committee Paul E. McKenney
@ 2024-01-23 18:58 ` Linus Torvalds
  2024-01-23 20:00   ` Paul E. McKenney
  2024-01-23 22:35   ` Martin Uecker
  2024-01-23 20:16 ` H. Peter Anvin
  2024-01-23 22:39 ` Kees Cook
  2 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2024-01-23 18:58 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

I generally like them, but..

On Tue, 23 Jan 2024 at 08:46, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> N3089 _Optional: a type qualifier to indicate pointer nullability
>         Proposes _Optional to tag pointer parameters such that
>         dereferencing the pointer without first checking for NULL gets
>         a compiler warning.
>         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3089.pdf

This one I also like, but at the same time I'm not convinced "types"
are the right way to carry this information.

Because types are historically conceptually static and tied to the
lifetime of the object.

But the actual nullability logic must *not* be.

_Nonnull is fine: if a variable is non-null, it can conceptually never
become anything else (or rather: it would remain a bug if it did).

So _Nonnull is a "statement of fact" about the variable, and makes
sense as a type, and matches the lifetime of the variable.

But the same is *not* true of _Nullable. The type magically and
silently changes after a test.

To make a trivial stupid example of what I mean, something like

    inline int access(int * _Nonnull p) { return *p; }
    ...
    int my_fn(int * _Nullable p)
    { return p ? access(p) : 0; }

which is obviously correct, and shouldn't warn for anything, since
this is literally what the whole thing is designed for.

But part of that "shouldn't warn" is how a nullable 'p' is effectively
silently cast to a non-nullable 'p'. The only thing that makes that
cast valid is the presence of the conditional, but it should be noted
that from a *type* perspective that is just wrong.

IOW, normal types are carried along with their variables, but somehow
the variable 'p' inside the conditional is not really of the the same
type as 'p' outside of it.

So that conditional has that hidden effect of changing what the type
of 'p' is in all dependent expressions.

And I know compilers already effectively implement all this, but I'm
just saying that from a *type* system standpoint, this is all quite a
bit illogical.

In many ways, this is not a type issue, it's really a "value range
analysis" issue. And I think it should be considered that waym, and
the syntax and the logic be also talked about in those terms.

Why would "_Nullable" and "_Nonnull" be conceptually any different
from "I know this value is in the range [0..5]", which is *also*
something that compilers already do, and that we also might want to be
able to describe for warning purposes?

So honestly, I would *love* to be able to give the compiler range
information (which *includes* the "this is nullable" kind of
information), but I don't think it should be described as a "type
qualifier".

Because what if the nullability is hidden in some called function?
Tove give another example - less stuipid this time - think of
somethign like this:

    int my_fn(int * _Nullable p)
    {
        if (check_validity(p))
            return -EINVAL;
        return access(p);
    }

where we have perhaps done extensive validity checks on 'p' (think the
kernel kind of 'access_ok()' function) in the 'check_validity()'
function, but the compiler doesn't see that function, since it's a
rather complicated one that does a whole RB-tree lookup etc. So the
compiler hasn't *seen* that we do a NULL check there.

So it shouldn't warn, but it will - because the compiler is oblivious
about the fact that the pointer has actually been checked for a lot
more than just NULL.

If you think of this as a "value analysis" issue, rather than as a
type issue, the solution is obvious: it's not that the type of 'p'
changes, but you just want a way to tell the compiler "I've done range
checking, the new range is XYZ".

And if you think of it that way, you don't want to re-decare a type,
you want to just update range information, and simply state something
like like

   _Nonnull p;

after doing the check_validity() call. IOW, I really think you should
be able to write something like

    int my_fn(int * _Nullable p)
    {
        if (check_validity(p))
            return -EINVAL;
        _Nonnull p;
        return access(p);
    }

See? My argument is basically that I like the _Nullable/_Nonnull
attributes, but that they shouldn't be seen as part of the *type*
system, but as a more dynamic value range thing, and that they can -
and should - be available separately from just the declaration.

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 18:58 ` Linus Torvalds
@ 2024-01-23 20:00   ` Paul E. McKenney
  2024-01-23 20:20     ` Linus Torvalds
  2024-01-23 20:39     ` Linus Torvalds
  2024-01-23 22:35   ` Martin Uecker
  1 sibling, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-23 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Tue, Jan 23, 2024 at 10:58:04AM -0800, Linus Torvalds wrote:
> I generally like them, but..
> 
> On Tue, 23 Jan 2024 at 08:46, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > N3089 _Optional: a type qualifier to indicate pointer nullability
> >         Proposes _Optional to tag pointer parameters such that
> >         dereferencing the pointer without first checking for NULL gets
> >         a compiler warning.
> >         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3089.pdf
> 
> This one I also like, but at the same time I'm not convinced "types"
> are the right way to carry this information.
> 
> Because types are historically conceptually static and tied to the
> lifetime of the object.
> 
> But the actual nullability logic must *not* be.
> 
> _Nonnull is fine: if a variable is non-null, it can conceptually never
> become anything else (or rather: it would remain a bug if it did).
> 
> So _Nonnull is a "statement of fact" about the variable, and makes
> sense as a type, and matches the lifetime of the variable.
> 
> But the same is *not* true of _Nullable. The type magically and
> silently changes after a test.
> 
> To make a trivial stupid example of what I mean, something like
> 
>     inline int access(int * _Nonnull p) { return *p; }
>     ...
>     int my_fn(int * _Nullable p)
>     { return p ? access(p) : 0; }
> 
> which is obviously correct, and shouldn't warn for anything, since
> this is literally what the whole thing is designed for.
> 
> But part of that "shouldn't warn" is how a nullable 'p' is effectively
> silently cast to a non-nullable 'p'. The only thing that makes that
> cast valid is the presence of the conditional, but it should be noted
> that from a *type* perspective that is just wrong.
> 
> IOW, normal types are carried along with their variables, but somehow
> the variable 'p' inside the conditional is not really of the the same
> type as 'p' outside of it.
> 
> So that conditional has that hidden effect of changing what the type
> of 'p' is in all dependent expressions.
> 
> And I know compilers already effectively implement all this, but I'm
> just saying that from a *type* system standpoint, this is all quite a
> bit illogical.

Would you be OK with something that required a new variable for the
pointer that was now known not to be NULL?  (My guess is "no", given the
following discussion on value ranges, but I figured that I should ask.)

> In many ways, this is not a type issue, it's really a "value range
> analysis" issue. And I think it should be considered that waym, and
> the syntax and the logic be also talked about in those terms.
> 
> Why would "_Nullable" and "_Nonnull" be conceptually any different
> from "I know this value is in the range [0..5]", which is *also*
> something that compilers already do, and that we also might want to be
> able to describe for warning purposes?
> 
> So honestly, I would *love* to be able to give the compiler range
> information (which *includes* the "this is nullable" kind of
> information), but I don't think it should be described as a "type
> qualifier".
> 
> Because what if the nullability is hidden in some called function?
> Tove give another example - less stuipid this time - think of
> somethign like this:
> 
>     int my_fn(int * _Nullable p)
>     {
>         if (check_validity(p))
>             return -EINVAL;
>         return access(p);
>     }
> 
> where we have perhaps done extensive validity checks on 'p' (think the
> kernel kind of 'access_ok()' function) in the 'check_validity()'
> function, but the compiler doesn't see that function, since it's a
> rather complicated one that does a whole RB-tree lookup etc. So the
> compiler hasn't *seen* that we do a NULL check there.
> 
> So it shouldn't warn, but it will - because the compiler is oblivious
> about the fact that the pointer has actually been checked for a lot
> more than just NULL.
> 
> If you think of this as a "value analysis" issue, rather than as a
> type issue, the solution is obvious: it's not that the type of 'p'
> changes, but you just want a way to tell the compiler "I've done range
> checking, the new range is XYZ".
> 
> And if you think of it that way, you don't want to re-decare a type,
> you want to just update range information, and simply state something
> like like
> 
>    _Nonnull p;
> 
> after doing the check_validity() call. IOW, I really think you should
> be able to write something like
> 
>     int my_fn(int * _Nullable p)
>     {
>         if (check_validity(p))
>             return -EINVAL;
>         _Nonnull p;
>         return access(p);
>     }
> 
> See? My argument is basically that I like the _Nullable/_Nonnull
> attributes, but that they shouldn't be seen as part of the *type*
> system, but as a more dynamic value range thing, and that they can -
> and should - be available separately from just the declaration.

In some implementations, you can use assertions to get at least part
of this effect:

    int my_fn(int * _Nullable p)
    {
        if (check_validity(p))
            return -EINVAL;
        assert(p);
        return access(p);
    }

And for your "[0..5]" example, assert(i >= 0 && i <=5):

https://godbolt.org/z/xrdx1P3a8

In the kernel, we would of course need to have a way to tell the compiler
about our assertions.

The downside is that assert() will actually check the condition and emit
code to invoke assert() if that condition is not met.

So you are looking for something like assert, but which simply informs
the compiler rather than doing the checking and calling?

If so, then in clang and GCC there is __builtin_unreachable():

https://godbolt.org/z/9qrbGx848 (clang, works in clang 9 but not 8)
https://godbolt.org/z/Kd44eTTWz (gcc, works also in gcc 8.1)

Is something like that what you had in mind?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 16:46 A few proposals from the C standards committee Paul E. McKenney
  2024-01-23 18:58 ` Linus Torvalds
@ 2024-01-23 20:16 ` H. Peter Anvin
  2024-01-23 20:24   ` Linus Torvalds
  2024-01-25 12:52   ` Paul E. McKenney
  2024-01-23 22:39 ` Kees Cook
  2 siblings, 2 replies; 19+ messages in thread
From: H. Peter Anvin @ 2024-01-23 20:16 UTC (permalink / raw)
  To: paulmck, linux-toolchains; +Cc: peterz, rostedt, gregkh, keescook, torvalds

On 1/23/24 08:46, Paul E. McKenney wrote:
> Hello!
> 
> On the perhaps unlikely off-chance that any of this is of interest.
> 
> 							Thanx, Paul

On the contrary, I find it quite interesting. I have been in contact 
with both the C and C++ committee.

> ------------------------------------------------------------------------
> 
> List of proposals with clickable links:
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log
> 
> N3089 _Optional: a type qualifier to indicate pointer nullability
> 	Proposes _Optional to tag pointer parameters such that
> 	dereferencing the pointer without first checking for NULL gets
> 	a compiler warning.
> 	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3089.pdf
> 
> N3190 Extensions to the preprocessor for C2Y
> 	Proposes a number of macros, including things that return a
> 	count of their arguments.
> 	https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3090.htm

Some of these are *extremely* useful; in fact I believe I asked for some 
of these when I previously contacted one of the C committee members. One 
big motivator is making a size-safe printf().

That being said, they are missing some important bits, in particular 
#embed needs to be able to be expressed as _Embed() for the same reason 
that #pragma has _Pragma(); in fact #embed needs it even more, as if 
there is something you really want to macroize.

#do and #foreach are mentioned but not defined. I'm wondering how useful 
these are if they can't be macroized themselves. At that point it might 
be better to have a proper macro function language.

> N3194 Case range expressions
> 	No fewer than 421 files in the Linux kernel use the "..." syntax,
> 	as in "case 1 ... 3", but there are other syntaxes...  So they
> 	are proposing "::" instead.  My guess is that "..." won't be
> 	going away anytime soon.

:: would be a disaster for C++ compatibility, and I'm feeling that C 
might end up needing to support C++ namespaces or some other mechanism 
like that. .. would be better if ... is unacceptable, or [foo,bar]. 
Inconsistent with range syntax for initializers if that is standard (I 
don't remember.)

> N3195 Named loops
> 	Placing a goto label before a loop allows a break/continue to
> 	target that loop in case of nesting.

... which so many languages already support as an extension.

> n3203 Strict order of expression evaluation
> 	I do like it.  The 1980s were over a long time ago.

The question is: is this going to wreck havoc with performance. The C++ 
reference implies it won't, though.

> N3199 Improved __attribute__((cleanup)) Through defer
> N3198 Conditionally Supported Unwinding
> 	The Linux kernel is starting to use __attribute__((cleanup))
> 	via guard(), with 40 files making use of this.	It is not clear
> 	to me whether or not either of these proposals would be useful
> 	to the Linux kernel.
> 
> N3201 Operator Overloading Without Name Mangling v2
> 	I have seen Linux-kernel interest in *function* overloading, but
> 	not in operator overloading.  Nevertheless...
> 
> 	The trick here is to associate a given operator with a function,
> 	so that the name-mangling becomes essentially a manual operation.

It's kind of odd. It feels a bit like doing C++ backwards...

Thanks for the heads-up. I think I'm going to reach out and chat with 
these folks.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:00   ` Paul E. McKenney
@ 2024-01-23 20:20     ` Linus Torvalds
  2024-01-23 20:35       ` Jakub Jelinek
                         ` (2 more replies)
  2024-01-23 20:39     ` Linus Torvalds
  1 sibling, 3 replies; 19+ messages in thread
From: Linus Torvalds @ 2024-01-23 20:20 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Tue, 23 Jan 2024 at 12:00, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Would you be OK with something that required a new variable for the
> pointer that was now known not to be NULL?  (My guess is "no", given the
> following discussion on value ranges, but I figured that I should ask.)

Yeah, no, I think that ends up putting the burden on the programmer in
the form of a very cumbersome syntax, and just more room for mistakes.

> In some implementations, you can use assertions to get at least part
> of this effect:

Yes. However, the problem with that is that the assert generally then
comes with extra code generation.

IOW, a plain

          _Nonnull p;

in my opinion should imply a promise by the developer - and then you
could have some "debug build" model where the compiler then verifies
the promises.

But an

        assert(p);

implies more than a promise by the developer - it implies that the
compiler *should* generate some code to verify.

And yes, obviously assert() comes with the traditional NDEBUG flag,
but that one has the historical baggage of causing the assert() to be
a no-op. IOW, you lose the code generation, but you also lose the
promise from the developer.

Could all of this be done *properly*? Yes. And I think it should. But
properly literally means having good documented "this is what this
means".

And no, __builtin_unreachable() is not it either, because it again has
the same issue as "assert()" - in *practice* compilers can use it as a
hint, but that's an incidental result, not part of a documented "this
is how you specify a known range"

So yes, I can do things like

        if (a < 0) __builtin_unreachable();

and it will generate the *code* that I want, but it sure as hell isn't
some standard C syntax.

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:16 ` H. Peter Anvin
@ 2024-01-23 20:24   ` Linus Torvalds
  2024-01-24 14:58     ` Paul E. McKenney
  2024-01-25 12:52   ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2024-01-23 20:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: paulmck, linux-toolchains, peterz, rostedt, gregkh, keescook

On Tue, 23 Jan 2024 at 12:19, H. Peter Anvin <hpa@zytor.com> wrote:
>
> > n3203 Strict order of expression evaluation
> >       I do like it.  The 1980s were over a long time ago.
>
> The question is: is this going to wreck havoc with performance. The C++
> reference implies it won't, though.

Well, they also had numbers from an actual implementation showing that
it didn't (ie "win some, lose some").

The "ordering is undefined" is, I think, almost entirely an effect of
"compilers weren't that smart, and implementations differed".

So I'd love for sequence points to go away. They are one of the more
subtle parts of C, and I do not believe that they have any real
advantage any more.

(And by "go away" I obviously mean "everything is a sequence point",
not "nothing is a sequence point" - so they'd go away as a concept,
because they'd become a non-issue).

             Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:20     ` Linus Torvalds
@ 2024-01-23 20:35       ` Jakub Jelinek
  2024-01-23 20:43         ` Linus Torvalds
  2024-01-24 13:16         ` Paul E. McKenney
  2024-01-23 20:44       ` H. Peter Anvin
  2024-01-24 12:52       ` Paul E. McKenney
  2 siblings, 2 replies; 19+ messages in thread
From: Jakub Jelinek @ 2024-01-23 20:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: paulmck, linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Tue, Jan 23, 2024 at 12:20:14PM -0800, Linus Torvalds wrote:
> And no, __builtin_unreachable() is not it either, because it again has
> the same issue as "assert()" - in *practice* compilers can use it as a
> hint, but that's an incidental result, not part of a documented "this
> is how you specify a known range"
> 
> So yes, I can do things like
> 
>         if (a < 0) __builtin_unreachable();
> 
> and it will generate the *code* that I want, but it sure as hell isn't
> some standard C syntax.

C++23 has [[assume (condition)]]; for this (see https://wg21.link/p1774r8)
and GCC supports it also as [[gnu::assume (condition)]] and
__attribute__((assume (condition)));, both in C (the former only in C23)
and C++.  Side-effects in condition aren't evaluated, so it has
different behavior from if (!(condition)) __builtin_unreachable ();

	Jakub


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:00   ` Paul E. McKenney
  2024-01-23 20:20     ` Linus Torvalds
@ 2024-01-23 20:39     ` Linus Torvalds
  1 sibling, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2024-01-23 20:39 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Tue, 23 Jan 2024 at 12:00, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> In some implementations, you can use assertions to get at least part
> of this effect:

Another note: 'assert()' doesn't work in the calling context.

IOW, there is no way to 'assert()' that an incoming variable has a
certain range, unless you start doing strange inline wrapper
functions.

So you can say

       assert(i >= 0 && i <= 5);

to assert a range inside some function, but you can't do that in the
declaration of the function to get a warning if the callers do
something bad.

And that's literally half the whole point of _Nullable and _Nonnull.
You can give the value range description in the declaration of the
function.

The paper actually gives an example of a m,ore powerful syntax, which
is admittedly not pretty, ie that whole

    const char src[static 1]

that says "the argument is a pointer to an array of at least one
character". Yes, the syntax is horrendous, and only works for pointers
which is sad, but it's also an example of a fundamentally more
powerful syntax.

Wouldn't it be lovely to be able to just specify a valid range for
integers too? Other languages have had it. YOu can actually get some
of that bny using enum's, while again, that's not *documented*, and
it's more of a "in practice you can use an enum and a compiler might
assume the values are all valid".

            Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:35       ` Jakub Jelinek
@ 2024-01-23 20:43         ` Linus Torvalds
  2024-01-23 20:46           ` H. Peter Anvin
  2024-01-25 13:00           ` Paul E. McKenney
  2024-01-24 13:16         ` Paul E. McKenney
  1 sibling, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2024-01-23 20:43 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: paulmck, linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Tue, 23 Jan 2024 at 12:36, Jakub Jelinek <jakub@redhat.com> wrote:
>
> C++23 has [[assume (condition)]]; for this (see https://wg21.link/p1774r8)
> and GCC supports it also as [[gnu::assume (condition)]] and
> __attribute__((assume (condition)));, both in C (the former only in C23)
> and C++.  Side-effects in condition aren't evaluated, so it has
> different behavior from if (!(condition)) __builtin_unreachable ();

That's lovely, and exactly the kind of thing I'd think is the rigth model.

If you can also do it in a function declaration, so that it informs
the caller, it's basically perfect.

IOW, something like

   size_t strlen(const char *s [[assume(s)]]);

would be the equivalent of "const char *_Nonnull s" in that callers
could warn if not true.

Except it also would work for other things, not just NULL pointers.

              Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:20     ` Linus Torvalds
  2024-01-23 20:35       ` Jakub Jelinek
@ 2024-01-23 20:44       ` H. Peter Anvin
  2024-01-24 12:52       ` Paul E. McKenney
  2 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2024-01-23 20:44 UTC (permalink / raw)
  To: Linus Torvalds, paulmck
  Cc: linux-toolchains, peterz, rostedt, gregkh, keescook

On 1/23/24 12:20, Linus Torvalds wrote:
> 
> Yes. However, the problem with that is that the assert generally then
> comes with extra code generation.
> 
> IOW, a plain
> 
>            _Nonnull p;
> 
> in my opinion should imply a promise by the developer - and then you
> could have some "debug build" model where the compiler then verifies
> the promises.
> 
> But an
> 
>          assert(p);
> 
> implies more than a promise by the developer - it implies that the
> compiler *should* generate some code to verify.
> 
> And yes, obviously assert() comes with the traditional NDEBUG flag,
> but that one has the historical baggage of causing the assert() to be
> a no-op. IOW, you lose the code generation, but you also lose the
> promise from the developer.
> 
> Could all of this be done *properly*? Yes. And I think it should. But
> properly literally means having good documented "this is what this
> means".
> 
> And no, __builtin_unreachable() is not it either, because it again has
> the same issue as "assert()" - in *practice* compilers can use it as a
> hint, but that's an incidental result, not part of a documented "this
> is how you specify a known range"
> 
> So yes, I can do things like
> 
>          if (a < 0) __builtin_unreachable();
> 
> and it will generate the *code* that I want, but it sure as hell isn't
> some standard C syntax.
> 

C++23 adds [[assume(x)]]; which presumably will be "backported" to the C 
standard. This is basically MSVC's __assume() or gcc's 
__attribute__((assume())) which is otherwise exactly equivalent to using 
an if statement or && to invoke unreachable(); the latter has the 
advantage that if you use a macro you can replace unreachable() with 
something else for debugging purposes.

unreachable() is in C23, in <stddef.h>, so if (a < 0) unreachable(); or 
((a < 0) && unreachable()) actually *is* standard C syntax now...

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:43         ` Linus Torvalds
@ 2024-01-23 20:46           ` H. Peter Anvin
  2024-01-24 13:46             ` Paul E. McKenney
  2024-01-25 13:00           ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2024-01-23 20:46 UTC (permalink / raw)
  To: Linus Torvalds, Jakub Jelinek
  Cc: paulmck, linux-toolchains, peterz, rostedt, gregkh, keescook

On 1/23/24 12:43, Linus Torvalds wrote:
> On Tue, 23 Jan 2024 at 12:36, Jakub Jelinek <jakub@redhat.com> wrote:
>>
>> C++23 has [[assume (condition)]]; for this (see https://wg21.link/p1774r8)
>> and GCC supports it also as [[gnu::assume (condition)]] and
>> __attribute__((assume (condition)));, both in C (the former only in C23)
>> and C++.  Side-effects in condition aren't evaluated, so it has
>> different behavior from if (!(condition)) __builtin_unreachable ();
> 
> That's lovely, and exactly the kind of thing I'd think is the rigth model.
> 
> If you can also do it in a function declaration, so that it informs
> the caller, it's basically perfect.
> 
> IOW, something like
> 
>     size_t strlen(const char *s [[assume(s)]]);
> 
> would be the equivalent of "const char *_Nonnull s" in that callers
> could warn if not true.
> 
> Except it also would work for other things, not just NULL pointers.
> 

This would *definitely* be frakking nice.

	-hpa


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 18:58 ` Linus Torvalds
  2024-01-23 20:00   ` Paul E. McKenney
@ 2024-01-23 22:35   ` Martin Uecker
  1 sibling, 0 replies; 19+ messages in thread
From: Martin Uecker @ 2024-01-23 22:35 UTC (permalink / raw)
  To: Linus Torvalds, paulmck
  Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

Am Dienstag, dem 23.01.2024 um 10:58 -0800 schrieb Linus Torvalds:
> I generally like them, but..
> 
> On Tue, 23 Jan 2024 at 08:46, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > N3089 _Optional: a type qualifier to indicate pointer nullability
> >         Proposes _Optional to tag pointer parameters such that
> >         dereferencing the pointer without first checking for NULL gets
> >         a compiler warning.
> >         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3089.pdf
> 
> This one I also like, but at the same time I'm not convinced "types"
> are the right way to carry this information.
> 
> Because types are historically conceptually static and tied to the
> lifetime of the object.
> 
> But the actual nullability logic must *not* be.
> 
> _Nonnull is fine: if a variable is non-null, it can conceptually never
> become anything else (or rather: it would remain a bug if it did).
> 
> So _Nonnull is a "statement of fact" about the variable, and makes
> sense as a type, and matches the lifetime of the variable.
> 
> But the same is *not* true of _Nullable. The type magically and
> silently changes after a test.


The interesting thing about _Optional is that it is not qualifier
on the pointer but a qualifier on the target.  So conversion
from _Optional to a regular pointer would give you the diagnostic
via the usual rules for pointer conversion.  The paper then
suggests &*ptr as syntax to transform the pointer with qualifier
into a pointer without qualifier (after a check).  Not sure
the design is perfect, but it seems better than  _Nonnull and 
_Nullable.


Martin




> 
> To make a trivial stupid example of what I mean, something like
> 
>     inline int access(int * _Nonnull p) { return *p; }
>     ...
>     int my_fn(int * _Nullable p)
>     { return p ? access(p) : 0; }
> 
> which is obviously correct, and shouldn't warn for anything, since
> this is literally what the whole thing is designed for.
> 
> But part of that "shouldn't warn" is how a nullable 'p' is effectively
> silently cast to a non-nullable 'p'. The only thing that makes that
> cast valid is the presence of the conditional, but it should be noted
> that from a *type* perspective that is just wrong.
> 
> IOW, normal types are carried along with their variables, but somehow
> the variable 'p' inside the conditional is not really of the the same
> type as 'p' outside of it.
> 
> So that conditional has that hidden effect of changing what the type
> of 'p' is in all dependent expressions.
> 
> And I know compilers already effectively implement all this, but I'm
> just saying that from a *type* system standpoint, this is all quite a
> bit illogical.
> 
> In many ways, this is not a type issue, it's really a "value range
> analysis" issue. And I think it should be considered that waym, and
> the syntax and the logic be also talked about in those terms.
> 
> Why would "_Nullable" and "_Nonnull" be conceptually any different
> from "I know this value is in the range [0..5]", which is *also*
> something that compilers already do, and that we also might want to be
> able to describe for warning purposes?
> 
> So honestly, I would *love* to be able to give the compiler range
> information (which *includes* the "this is nullable" kind of
> information), but I don't think it should be described as a "type
> qualifier".
> 
> Because what if the nullability is hidden in some called function?
> Tove give another example - less stuipid this time - think of
> somethign like this:
> 
>     int my_fn(int * _Nullable p)
>     {
>         if (check_validity(p))
>             return -EINVAL;
>         return access(p);
>     }
> 
> where we have perhaps done extensive validity checks on 'p' (think the
> kernel kind of 'access_ok()' function) in the 'check_validity()'
> function, but the compiler doesn't see that function, since it's a
> rather complicated one that does a whole RB-tree lookup etc. So the
> compiler hasn't *seen* that we do a NULL check there.
> 
> So it shouldn't warn, but it will - because the compiler is oblivious
> about the fact that the pointer has actually been checked for a lot
> more than just NULL.
> 
> If you think of this as a "value analysis" issue, rather than as a
> type issue, the solution is obvious: it's not that the type of 'p'
> changes, but you just want a way to tell the compiler "I've done range
> checking, the new range is XYZ".
> 
> And if you think of it that way, you don't want to re-decare a type,
> you want to just update range information, and simply state something
> like like
> 
>    _Nonnull p;
> 
> after doing the check_validity() call. IOW, I really think you should
> be able to write something like
> 
>     int my_fn(int * _Nullable p)
>     {
>         if (check_validity(p))
>             return -EINVAL;
>         _Nonnull p;
>         return access(p);
>     }
> 
> See? My argument is basically that I like the _Nullable/_Nonnull
> attributes, but that they shouldn't be seen as part of the *type*
> system, but as a more dynamic value range thing, and that they can -
> and should - be available separately from just the declaration.
> 
>                Linus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 16:46 A few proposals from the C standards committee Paul E. McKenney
  2024-01-23 18:58 ` Linus Torvalds
  2024-01-23 20:16 ` H. Peter Anvin
@ 2024-01-23 22:39 ` Kees Cook
  2 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2024-01-23 22:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, ndesaulniers,
	justinstitt, torvalds, linux-hardening

On Tue, Jan 23, 2024 at 08:46:13AM -0800, Paul E. McKenney wrote:
> N3201 Operator Overloading Without Name Mangling v2
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3201.pdf
> 	I have seen Linux-kernel interest in *function* overloading, but
> 	not in operator overloading.  Nevertheless...
> 
> 	The trick here is to associate a given operator with a function,
> 	so that the name-mangling becomes essentially a manual operation.

The proposal discusses strings, but I would want to immediately use this
for handling wrap vs trap arithmetic (rather than using sanitizers[1]).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:20     ` Linus Torvalds
  2024-01-23 20:35       ` Jakub Jelinek
  2024-01-23 20:44       ` H. Peter Anvin
@ 2024-01-24 12:52       ` Paul E. McKenney
  2 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-24 12:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Tue, Jan 23, 2024 at 12:20:14PM -0800, Linus Torvalds wrote:

[ . . . ]

> Could all of this be done *properly*? Yes. And I think it should. But
> properly literally means having good documented "this is what this
> means".
> 
> And no, __builtin_unreachable() is not it either, because it again has
> the same issue as "assert()" - in *practice* compilers can use it as a
> hint, but that's an incidental result, not part of a documented "this
> is how you specify a known range"
> 
> So yes, I can do things like
> 
>         if (a < 0) __builtin_unreachable();
> 
> and it will generate the *code* that I want, but it sure as hell isn't
> some standard C syntax.

Agreed, but the fact that it exists is nevertheless valuable.  The reason
is that it is *way* easier to get something into the C standard if the
major implementations already do something supporting it.  Given that
GCC and clang do __builtin_unreachable() and the Microsoft compiler
has __assume() [1], there is hope.  Plus as Jakub noted, C++23 has
[[assume()]], which provides even more hope, especially given that C
and C++ put at some work into maintaining basic compatibility.

That said, the compilers are likely to continue taking value ranges as
a suggestion for optimization, especially at low optimization levels.
Nevertheless, over time optimizers will continue to become more capable
for both good and ill.  :-/

							Thanx, Paul

[1] https://learn.microsoft.com/en-us/cpp/intrinsics/assume?view=msvc-170

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:35       ` Jakub Jelinek
  2024-01-23 20:43         ` Linus Torvalds
@ 2024-01-24 13:16         ` Paul E. McKenney
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-24 13:16 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Linus Torvalds, linux-toolchains, peterz, hpa, rostedt, gregkh,
	keescook

On Tue, Jan 23, 2024 at 09:35:46PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 23, 2024 at 12:20:14PM -0800, Linus Torvalds wrote:
> > And no, __builtin_unreachable() is not it either, because it again has
> > the same issue as "assert()" - in *practice* compilers can use it as a
> > hint, but that's an incidental result, not part of a documented "this
> > is how you specify a known range"
> > 
> > So yes, I can do things like
> > 
> >         if (a < 0) __builtin_unreachable();
> > 
> > and it will generate the *code* that I want, but it sure as hell isn't
> > some standard C syntax.
> 
> C++23 has [[assume (condition)]]; for this (see https://wg21.link/p1774r8)
> and GCC supports it also as [[gnu::assume (condition)]] and
> __attribute__((assume (condition)));, both in C (the former only in C23)
> and C++.  Side-effects in condition aren't evaluated, so it has
> different behavior from if (!(condition)) __builtin_unreachable ();

The lack of side effects is quite nice, thank you for the pointer!

This P1774R8 proposal explicitly calls out the possibility of assumptions
propagating backwards, so there might also need to be a C-language
counterpart to std::observable() [1] to block such propagation.

In addition, if the [[assume()]] expression contains undefined behavior,
that is explicitly allowed to "leak" out of that expression.  So something
like [[assume(*p)]] implies [[assume(p && *p)]] and something like
[[assume(i / j)]] implies [[assume(j && i / j)]].  If this caused a
problem, one alternative is to instead write [[assume(!p || *p)]] on
the one hand or [[assume(!j || i / j)]] on the other.

Thoughts?

						Thanx, Paul

[1] https://wg21.link/p1494r2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:46           ` H. Peter Anvin
@ 2024-01-24 13:46             ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-24 13:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Jakub Jelinek, linux-toolchains, peterz, rostedt,
	gregkh, keescook

On Tue, Jan 23, 2024 at 12:46:31PM -0800, H. Peter Anvin wrote:
> On 1/23/24 12:43, Linus Torvalds wrote:
> > On Tue, 23 Jan 2024 at 12:36, Jakub Jelinek <jakub@redhat.com> wrote:
> > > 
> > > C++23 has [[assume (condition)]]; for this (see https://wg21.link/p1774r8)
> > > and GCC supports it also as [[gnu::assume (condition)]] and
> > > __attribute__((assume (condition)));, both in C (the former only in C23)
> > > and C++.  Side-effects in condition aren't evaluated, so it has
> > > different behavior from if (!(condition)) __builtin_unreachable ();
> > 
> > That's lovely, and exactly the kind of thing I'd think is the rigth model.
> > 
> > If you can also do it in a function declaration, so that it informs
> > the caller, it's basically perfect.
> > 
> > IOW, something like
> > 
> >     size_t strlen(const char *s [[assume(s)]]);
> > 
> > would be the equivalent of "const char *_Nonnull s" in that callers
> > could warn if not true.
> > 
> > Except it also would work for other things, not just NULL pointers.
> 
> This would *definitely* be frakking nice.

It would be!!!

Sadly, if I am reading Section 4.9 correctly, https://wg21.link/p1774r8
proposes permitting [[assume()]] only as a statement, which would rule
out appending it to a formal-parameter declaration.  Their reason is lack
of existing practice, for example, you cannot place __builtin_assume()
in a formal parameter list.  Compare this:

https://godbolt.org/z/hjrzfsxjv

To this:

https://godbolt.org/z/8zdfzsjxe

So if we want this to be added to the standard, we need to convince the
compilers to allow it, then we could propose it.

But couldn't we get the same behavior using static inline functions?

size_t strlen_func(const char *s);
static inline strlen(const char *s)
{
	[[assume(s)]];
	strlen_func(s);
}

The current intrinsics do seem to support this approach:

https://godbolt.org/z/fEjcaorPP (GCC)
https://godbolt.org/z/We8vv47v3 (clang)

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:24   ` Linus Torvalds
@ 2024-01-24 14:58     ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-24 14:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, linux-toolchains, peterz, rostedt, gregkh,
	keescook

On Tue, Jan 23, 2024 at 12:24:44PM -0800, Linus Torvalds wrote:
> On Tue, 23 Jan 2024 at 12:19, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > > n3203 Strict order of expression evaluation
> > >       I do like it.  The 1980s were over a long time ago.
> >
> > The question is: is this going to wreck havoc with performance. The C++
> > reference implies it won't, though.
> 
> Well, they also had numbers from an actual implementation showing that
> it didn't (ie "win some, lose some").
> 
> The "ordering is undefined" is, I think, almost entirely an effect of
> "compilers weren't that smart, and implementations differed".
> 
> So I'd love for sequence points to go away. They are one of the more
> subtle parts of C, and I do not believe that they have any real
> advantage any more.
> 
> (And by "go away" I obviously mean "everything is a sequence point",
> not "nothing is a sequence point" - so they'd go away as a concept,
> because they'd become a non-issue).

Agreed, and here is hoping!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:16 ` H. Peter Anvin
  2024-01-23 20:24   ` Linus Torvalds
@ 2024-01-25 12:52   ` Paul E. McKenney
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-25 12:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-toolchains, peterz, rostedt, gregkh, keescook, torvalds

On Tue, Jan 23, 2024 at 12:16:37PM -0800, H. Peter Anvin wrote:
> On 1/23/24 08:46, Paul E. McKenney wrote:

[ . . . ]

> > N3201 Operator Overloading Without Name Mangling v2
> > 	I have seen Linux-kernel interest in *function* overloading, but
> > 	not in operator overloading.  Nevertheless...
> > 
> > 	The trick here is to associate a given operator with a function,
> > 	so that the name-mangling becomes essentially a manual operation.
> 
> It's kind of odd. It feels a bit like doing C++ backwards...
> 
> Thanks for the heads-up. I think I'm going to reach out and chat with these
> folks.

In the discussion, the clang "__attribute((overloadable))" came up.

Does that do what you want?  If so, perhaps GCC can be persuaded to add it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals from the C standards committee
  2024-01-23 20:43         ` Linus Torvalds
  2024-01-23 20:46           ` H. Peter Anvin
@ 2024-01-25 13:00           ` Paul E. McKenney
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-01-25 13:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jakub Jelinek, linux-toolchains, peterz, hpa, rostedt, gregkh,
	keescook

On Tue, Jan 23, 2024 at 12:43:02PM -0800, Linus Torvalds wrote:
> On Tue, 23 Jan 2024 at 12:36, Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > C++23 has [[assume (condition)]]; for this (see https://wg21.link/p1774r8)
> > and GCC supports it also as [[gnu::assume (condition)]] and
> > __attribute__((assume (condition)));, both in C (the former only in C23)
> > and C++.  Side-effects in condition aren't evaluated, so it has
> > different behavior from if (!(condition)) __builtin_unreachable ();
> 
> That's lovely, and exactly the kind of thing I'd think is the rigth model.
> 
> If you can also do it in a function declaration, so that it informs
> the caller, it's basically perfect.
> 
> IOW, something like
> 
>    size_t strlen(const char *s [[assume(s)]]);
> 
> would be the equivalent of "const char *_Nonnull s" in that callers
> could warn if not true.
> 
> Except it also would work for other things, not just NULL pointers.

None of the current compilers support this, but it should not be
hard to mechanically transform this to the form using static inlines,
presumably with a made-up name for one level or the other </handwaving>.
(Especially easy for all concerned if someone other than me does it,
of course...)

However, the possibility of pointers to these functions means that I must
ask if this assume() is part of the type.  There are a lot of reasons
to *not* want it to be part of the type, but that would mean that calls
through pointers would ignore that assume().

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-01-25 13:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-23 16:46 A few proposals from the C standards committee Paul E. McKenney
2024-01-23 18:58 ` Linus Torvalds
2024-01-23 20:00   ` Paul E. McKenney
2024-01-23 20:20     ` Linus Torvalds
2024-01-23 20:35       ` Jakub Jelinek
2024-01-23 20:43         ` Linus Torvalds
2024-01-23 20:46           ` H. Peter Anvin
2024-01-24 13:46             ` Paul E. McKenney
2024-01-25 13:00           ` Paul E. McKenney
2024-01-24 13:16         ` Paul E. McKenney
2024-01-23 20:44       ` H. Peter Anvin
2024-01-24 12:52       ` Paul E. McKenney
2024-01-23 20:39     ` Linus Torvalds
2024-01-23 22:35   ` Martin Uecker
2024-01-23 20:16 ` H. Peter Anvin
2024-01-23 20:24   ` Linus Torvalds
2024-01-24 14:58     ` Paul E. McKenney
2024-01-25 12:52   ` Paul E. McKenney
2024-01-23 22:39 ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).