public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Arsen Arsenović" <arsen@aarsen.me>
To: Andrew Pinski <pinskia@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	David Howells <dhowells@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/45] C++: Convert the kernel to C++
Date: Thu, 11 Jan 2024 22:01:51 +0100	[thread overview]
Message-ID: <864jfjr29j.fsf@aarsen.me> (raw)
In-Reply-To: <CA+=Sn1n18OmcDzDrZky_KqMW76ttTLGnMYnH5CPg7m9SXor1qQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 12737 bytes --]


Andrew Pinski <pinskia@gmail.com> writes:

> On Tue, Jan 9, 2024 at 11:57 AM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Hi all, I'm going to stir the hornet's nest and make what has become the
>> ultimate sacrilege.
>>
>> Andrew Pinski recently made aware of this thread. I realize it was
>> released on April 1, 2018, and either was a joke or might have been
>> taken as one. However, I think there is validity to it, and I'm going to
>> try to motivate my opinion here.
>>
>> Both C and C++ has had a lot of development since 1999, and C++ has in
>> fact, in my personal opinion, finally "grown up" to be a better C for
>> the kind of embedded programming that an OS kernel epitomizes. I'm
>> saying that as the author of a very large number of macro and inline
>> assembly hacks in the kernel.
>>
>> What really makes me say that is that a lot of things we have recently
>> asked for gcc-specific extensions are in fact relatively easy to
>> implement in standard C++ and, in many cases, allows for infrastructure
>> improvement *without* global code changes (see below.)
>>
>> C++14 is in my option the "minimum" version that has reasonable
>> metaprogramming support has most of it without the type hell of earlier
>> versions (C++11 had most of it, but C++14 fills in some key missing pieces).
>>
>> However C++20 is really the main game changer in my opinion; although
>> earlier versions could play a lot of SFINAE hacks they also gave
>> absolutely useless barf as error messages. C++20 adds concepts, which
>> makes it possible to actually get reasonable errors.
>>
>> We do a lot of metaprogramming in the Linux kernel, implemented with
>> some often truly hideous macro hacks. These are also virtually
>> impossible to debug. Consider the uaccess.h type hacks, some of which I
>> designed and wrote. In C++, the various casts and case statements can be
>> unwound into separate template instances, and with some cleverness can
>> also strictly enforce things like user space vs kernel space pointers as
>> well as already-verified versus unverified user space pointers, not to
>> mention easily handle the case of 32-bit user space types in a 64-bit
>> kernel and make endianness conversion enforceable.
>>
>> Now, "why not Rust"? First of all, Rust uses a different (often, in my
>> opinion, gratuitously so) syntax, and not only would all the kernel
>> developers need to become intimately familiar to the level of getting
>> the same kind of "feel" as we have for C, but converting C code to Rust
>> isn't something that can be done piecemeal, whereas with some cleanups
>> the existing C code can be compiled as C++.
>>
>> However, I find that I disagree with some of David's conclusions; in
>> fact I believe David is unnecessarily *pessimistic* at least given
>> modern C++.
>>
>> Note that no one in their sane mind would expect to use all the features
>> of C++. Just like we have "kernel C" (currently a subset of C11 with a
>> relatively large set of allowed compiler-specific extensions) we would
>> have "kernel C++", which I would suggest to be a strictly defined subset
>> of C++20 combined with a similar set of compiler extensions.) I realize
>> C++20 compiler support is still very new for obvious reasons, so at
>> least some of this is forward looking.
>>
>> So, notes on this specific subset based on David's comments.
>>
>> On 4/1/18 13:40, David Howells wrote:
>> >
>> > Here are a series of patches to start converting the kernel to C++.  It
>> > requires g++ v8.
>> >
>> > What rocks:
>> >
>> >   (1) Inline template functions, which makes implementation of things like
>> >       cmpxchg() and get_user() much cleaner.
>>
>> Much, much cleaner indeed. But it also allows for introducing things
>> like inline patching of immediates *without* having to change literally
>> every instance of a variable.
>>
>> I wrote, in fact, such a patchset. It probably included the most awful
>> assembly hacks I have ever done, in order to implement the mechanics,
>> but what *really* made me give up on it was the fact that every site
>> where a patchable variable is invoked would have to be changed from, say:
>>
>>         foo = bar + some_init_offset;
>>
>> ... to ...
>>
>>         foo = imm_add(bar, some_init_offset);
>>
>>
>> >   (2) Inline overloaded functions, which makes implementation of things like
>> >       static_branch_likely() cleaner.
>>
>> Basically a subset of the above (it just means that for a specific set
>> of very common cases it isn't necessary to go all the way to using
>> templates, which makes the syntax nicer.)
>>
>> >   (3) Class inheritance.  For instance, all those inode wrappers that require
>> >       the base inode struct to be included and that has to be accessed with
>> >       something like:
>> >
>> >       inode->vfs_inode.i_mtime
>> >
>> >       when you could instead do:
>> >
>> >       inode->i_mtime
>>
>> This is nice, but it is fundamentally syntactic sugar. Similar things
>> can be done with anonymous structures, *except* that C doesn't allow
>> another structure to be anonymously included; you have to have an
>> entirely new "struct" statement defining all the fields. Welcome to
>> macro hell.
>>
>> > What I would disallow:
>> >
>> >   (1) new and delete.  There's no way to pass GFP_* flags in.
>>
>> Yes, there is.
>>
>> void * operator new (size_t count, gfp_flags_t flags);
>> void operator delete(void *ptr, ...whatever kfree/vfree/etc need, or a
>> suitable flag);
>>
>> >   (2) Constructors and destructors.  Nests of implicit code makes the code less
>> >       obvious, and the replacement of static initialisation with constructor
>> >       calls would make the code size larger.
>>
>> Yes and no. It also makes it *way* easier to convert to and from using
>> dedicated slabs; we already use semi-initialized slabs for some kinds of
>> objects, but it requires new code to make use of.
>>
>> We already *do* use constructors and *especially* destructors for a lot
>> of objects, we just call them out.
>>
>> Note that modern C++ also has the ability to construct and destruct
>> objects in-place, so allocation and construction/destruction aren't
>> necessarily related.
>>
>> There is no reason you can't do static initialization where possible;
>> even constructors can be evaluated at compile time if they are constexpr.
>>
>> Constructors (and destructors, for modules) in conjunction with gcc's
>> init_priority() extension is also a nice replacement for linker hack
>> tables to invoke intializer functions.
>>
>> >   (3) Exceptions and RTTI.  RTTI would bulk the kernel up too much and
>> >       exception handling is limited without it, and since destructors are not
>> >       allowed, you still have to manually clean up after an error.
>>
>> Agreed here, especially since on many platforms exception handling
>> relies on DWARF unwind information.
>
> Let me just add a few things about exceptions and RTTI.
> In the darwin kernel, C++ is used for device drivers and both
> exceptions and RTTI is not used there either. They have been using C++
> for kernel drivers since the early 2000s even.
> You can find out more at https://developer.apple.com/documentation/driverkit .
> There even was a GCC option added an option which would also disable
> RTTI and change the ABI to explicitly for the kernel.
> -fapple-kext/-mkernel (the former is for only loadable modules while
> the latter is for kernel too).
>
> Note even in GCC, we disable exceptions and RTTI while building GCC.
> This is specifically due to not wanting to use them and use other
> methods to do that.
> Note GDB on the other hand used to use setjmp/longjmp for their
> exception handling in C and I think they moved over to using C++
> exceptions which simplified things there. But as far as I know the
> Linux kernel does not use a mechanism like that (I know of copy
> from/to user using HW exceptions/error/interrupt handling but that is
> a special case only).
>
>
>>
>> >   (4) Operator overloading (except in special cases).
>>
>> See the example of inline patching above. But yes, overloading and
>> *especially* operator overloading should be used only with care; this is
>> pretty much true across the board.
>>
>> >   (5) Function overloading (except in special inline cases).
>>
>> I think we might find non-inline cases where it matters, too.
>>
>> >   (6) STL (though some type trait bits are needed to replace __builtins that
>> >       don't exist in g++).
>>
>> Just like there are parts of the C library which is really about the
>> compiler and not part of the library. <type_traits> is part of that for C++.
>
> There is an idea of a free standing C++ library. newer versions of
> GCC/libstdc++ does support that but IIRC can only be configured at
> compile time of GCC.
> type_traits and a few other headers are included in that. I have not
> looked into it fully though.

There is, and it's quite extensive (and I plan on extending it further
in GCC 15, if I get the chance to).  The full list of headers libstdc++
exports for freestanding use is a bit larger than the standard one:
https://gcc.gnu.org/cgit/gcc/tree/libstdc++-v3/include/Makefile.am#n28

(note that some are partially supported.. I lack a full list of which)

Most (actually, nearly all) of the libstdc++ code works for kernel
environments, and it is very mature and well-tested, so it can and
should be used by kernels too.  I haven't fully enabled using it in such
a manner yet, but

We could handle the kernel specific configuration via a multilib or so
(so, the multilib list becomes 32, 64, x32, and a new k64 or so on
amd64).  Presumably, something like that could be done for libgcc too?

It is not necessarily only configurable at build-time, but the libstdc++
configuration augmented by -ffreestanding and the one generated by a
'proper' freestanding build of libstdc++ differ currently.  Maybe they
can be brought together close enough for Linux?

Managarm, which is the kernel I had in mind when working on getting more
freestanding stuff has a dedicated kernel build of GCC, however, so I
didn't test this case much.  I'd like to, sooner or later, consolidate
it into the normal managarm system GCC, as a multilib, but I haven't had
time to do so yet.

In any case, I strongly prefer configuring toolchains 'properly'.

> Thanks,
> Andrew Pinski
>
>>
>> >   (7) 'class', 'private', 'namespace'.
>> >
>> >   (8) 'virtual'.  Don't want virtual base classes, though virtual function
>> >       tables might make operations tables more efficient.
>>
>> Operations tables *are* virtual classes. virtual base classes make sense
>> in a lot of cases, and we de facto use them already.
>>
>> However, Linux also does conversion of polymorphic objects from one type
>> to another -- that is for example how device nodes are implemented.
>> Using this with C++ polymorphism without RTTI does require some
>> compiler-specific hacks, unfortunately.
>>
>> > Issues:
>> >
>> >   (1) Need spaces inserting between strings and symbols.
>>
>> I have to admit I don't really grok this?
>>
>> >   (2) Direct assignment of pointers to/from void* isn't allowed by C++, though
>> >       g++ grudgingly permits it with -fpermissive.  I would imagine that a
>> >       compiler option could easily be added to hide the error entirely.
>>
>> Seriously. It should also enforce that it should be a trivial type.
>> Unfortunately it doesn't look like there is a way to create user-defined
>> implicit conversions from one pointer to another (via a helper class),
>> which otherwise would have had some other nice applications.
>>
>> >   (3) Need gcc v8+ to statically initialise an object of any struct that's not
>> >       really simple (e.g. if it's got an embedded union).
>>
>> Worst case: constexpr constructor.
>>
>> >   (4) Symbol length.  Really need to extern "C" everything to reduce the size
>> >       of the symbols stored in the kernel image.  This shouldn't be a problem
>> >       if out-of-line function overloading isn't permitted.
>>
>> This really would lose arguably the absolutely biggest advantage of C++:
>> type-safe linkage. This is the one reason why Linus actually tried to
>> use C++ in one single version of the kernel in the early days (0.99.14,
>> if I remember correctly.) At that time, g++ was nowhere near mature
>> enough, and it got dropped right away.
>>
>>
>> > So far, it gets as far as compiling init/main.c to a .o file.
>>
>> ;)


--
Arsen Arsenović

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 381 bytes --]

  parent reply	other threads:[~2024-01-11 21:25 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-01 20:40 [PATCH 00/45] C++: Convert the kernel to C++ David Howells
2018-04-01 20:40 ` [PATCH 01/45] Use UINT_MAX, not -1, to represent an invalid UID, GID or project ID David Howells
2018-04-01 23:04   ` Randy Dunlap
2018-04-01 20:40 ` [PATCH 02/45] Fix exception_enter() return value David Howells
2018-04-05  1:34   ` Sasha Levin
2018-04-01 20:40 ` [PATCH 03/45] Fix loop var in be32_to_cpu_array() and cpu_to_be32_array() David Howells
2018-04-01 20:40 ` [PATCH 04/45] Fix use of ACPI_COMPANION_SET() David Howells
2018-04-01 20:40 ` [PATCH 05/45] C++: Set compilation as C++ for .c files David Howells
2018-04-02  6:10   ` kbuild test robot
2018-04-02  6:10   ` kbuild test robot
2018-04-03 13:16     ` David Howells
2018-04-03 13:27       ` Fengguang Wu
2018-04-10  8:44         ` David Howells
2018-04-10  9:45           ` Fengguang Wu
2018-04-11  1:13             ` Li, Philip
2018-04-01 20:40 ` [PATCH 06/45] C++: Do some basic C++ type definition David Howells
2018-04-02  4:37   ` kbuild test robot
2018-04-02  6:10   ` kbuild test robot
2018-04-01 20:40 ` [PATCH 07/45] C++: Define a header with some C++ type traits for type checking David Howells
2018-04-02  7:00   ` kbuild test robot
2018-04-01 20:41 ` [PATCH 08/45] C++: Implement abs() as an inline template function David Howells
2018-04-01 20:41 ` [PATCH 09/45] C++: x86: Fix the x86 syscall table production for C++ David Howells
2018-04-02  7:57   ` kbuild test robot
2018-04-01 20:41 ` [PATCH 10/45] C++: x86: Turn xchg(), xadd() & co. into inline template functions David Howells
2018-04-01 20:41 ` [PATCH 11/45] C++: x86: Turn cmpxchg() " David Howells
2018-04-01 20:41 ` [PATCH 12/45] C++: x86: Turn cmpxchg_double() " David Howells
2018-04-01 20:41 ` [PATCH 13/45] C++: x86: Turn cmpxchg64() " David Howells
2018-04-01 20:41 ` [PATCH 14/45] C++: x86: Turn put_user(), get_user() " David Howells
2018-04-01 20:41 ` [PATCH 15/45] C++: Need space between string and symbol David Howells
2018-04-01 20:41 ` [PATCH 16/45] C++: Disable VERIFY_OCTAL_PERMISSIONS() for the moment David Howells
2018-04-01 20:41 ` [PATCH 17/45] C++: Turn READ_ONCE(), WRITE_ONCE() & co. into inline template functions David Howells
2018-04-01 20:42 ` [PATCH 18/45] C++: Turn RCU accessors " David Howells
2018-04-01 20:42 ` [PATCH 19/45] C++: Turn ktime_add/sub_ns() " David Howells
2018-04-01 20:42 ` [PATCH 20/45] C++: init/main: Constify pointers David Howells
2018-04-01 20:42 ` [PATCH 21/45] C++: Set the type of atomic64_t to s64 David Howells
2018-04-01 20:42 ` [PATCH 22/45] C++: Define apic_intr_mode after the enum definition, not before David Howells
2018-04-01 20:42 ` [PATCH 23/45] C++: Don't do "extern asmlinkage" David Howells
2018-04-01 20:42 ` [PATCH 24/45] C++: Fix BUILD_BUG_ON_ZERO() David Howells
2018-04-01 20:42 ` [PATCH 25/45] C++: Fix void variables David Howells
2018-04-01 20:42 ` [PATCH 26/45] C++: Can't have variable/member names the same as typedef names David Howells
2018-04-01 20:42 ` [PATCH 27/45] C++: Disable __same_type() for the moment David Howells
2018-04-01 20:43 ` [PATCH 28/45] C++: Move ctx_state enum out of struct context_tracking David Howells
2018-04-01 20:43 ` [PATCH 29/45] C++: Move the print_line_t enum before first use David Howells
2018-04-01 20:43 ` [PATCH 30/45] C++: Include linux/hrtimer.h from linux/timer.h David Howells
2018-04-01 20:43 ` [PATCH 31/45] C++: Avoid using 'compl' and 'and' as names David Howells
2018-04-02  7:57   ` kbuild test robot
2018-04-01 20:43 ` [PATCH 32/45] C++: __to_fd() needs to reduce the size of v for struct fd::flags David Howells
2018-04-01 20:43 ` [PATCH 33/45] C++: Move irqchip_irq_state enum David Howells
2018-04-01 20:43 ` [PATCH 34/45] C++: Fix up use of LIST_POISON* David Howells
2018-04-01 20:43 ` [PATCH 35/45] C++: Fix static_branch_likely/unlikely() David Howells
2018-04-01 20:43 ` [PATCH 36/45] C++: Fix kernfs_type() int->enum David Howells
2018-04-01 20:43 ` [PATCH 37/45] C++: Fix page_zonenum() int->enum David Howells
2018-04-01 20:44 ` [PATCH 38/45] C++: mutex_trylock_recursive_enum() int->enum David Howells
2018-04-01 23:10   ` Randy Dunlap
2018-04-01 20:44 ` [PATCH 39/45] C++: Fix spinlock initialisation David Howells
2018-04-01 20:44 ` [PATCH 40/45] C++: Fix sema_init() David Howells
2018-04-01 20:44 ` [PATCH 41/45] C++: Cast in bitops David Howells
2018-04-02  6:10   ` kbuild test robot
2018-04-01 20:44 ` [PATCH 42/45] C++: Hide C++ keywords David Howells
2018-04-02  7:32   ` kbuild test robot
2018-04-01 20:44 ` [PATCH 43/45] C++: Don't need to declare struct pgd_t after typedef David Howells
2018-04-01 20:44 ` [PATCH 44/45] C++: Can't declare unsized-array in struct cgroup David Howells
2018-04-01 20:44 ` [PATCH 45/45] C++: Move initcall_level_names[] to __initdata section David Howells
2018-04-01 22:20 ` [PATCH 00/45] C++: Convert the kernel to C++ Randy Dunlap
2018-04-02  9:28 ` Vegard Nossum
2024-01-09 19:57 ` H. Peter Anvin
2024-01-09 23:29   ` Andrew Pinski
2024-01-10  0:29     ` David Howells
2024-01-11 21:01     ` Arsen Arsenović [this message]
2024-01-09 23:40   ` David Howells
2024-01-10  7:13     ` Alexey Dobriyan
2024-01-12  2:25     ` H. Peter Anvin
2024-01-12  2:40     ` H. Peter Anvin
2024-01-10  8:58   ` Jiri Slaby
2024-01-10 13:04     ` Neal Gompa
2024-01-10 15:52       ` Jason Gunthorpe
2024-01-10 16:05         ` H. Peter Anvin
2024-01-10 16:25         ` Neal Gompa
2024-01-10 17:57           ` Theodore Ts'o
2024-01-12  2:23             ` H. Peter Anvin
2024-01-12  2:52               ` Kent Overstreet
2024-01-11  8:06       ` Andreas Herrmann
2024-01-10 15:01   ` Michael de Lang
     [not found]     ` <69fe1c0c-b5ec-4031-b719-d9c14742929c@metux.net>
2024-01-12 21:58       ` Michael de Lang
2024-01-11  4:24   ` John Hubbard
2024-01-11  5:09     ` Dave Airlie
2024-01-11 15:24       ` Eric Curtin
2024-01-11 21:37       ` David Laight
2024-01-11 12:39   ` Chris Down
2024-01-11 19:40     ` David Laight
2024-01-24  6:53       ` Jiri Slaby
2024-01-12  2:54   ` H. Peter Anvin
2024-01-12  8:52     ` David Howells
2024-01-12 23:53       ` H. Peter Anvin
2025-02-27 16:44   ` Convert the kernel to C++: Lock-holding class David Howells
2024-01-11 23:09 ` [PATCH 00/45] C++: Convert the kernel to C++ Arsen Arsenović
2024-01-12  9:20   ` David Howells
2024-01-12 21:35     ` Arsen Arsenović
2024-01-12 23:41       ` David Howells
  -- strict thread matches above, loose matches on Subject: below --
2018-04-01 21:32 Alexey Dobriyan
2024-01-11 10:56 Alexey Dobriyan
2024-01-11 10:58 ` Neal Gompa
2024-01-11 11:12   ` Alexey Dobriyan
2024-01-11 12:12   ` Andreas Herrmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=864jfjr29j.fsf@aarsen.me \
    --to=arsen@aarsen.me \
    --cc=dhowells@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pinskia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox