linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
@ 2022-01-03 10:11 ` Greg Kroah-Hartman
  2022-01-03 11:12   ` Ingo Molnar
  2022-01-04 14:05   ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar
  2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 54+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-03 10:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> 
> I'm pleased to announce the first public version of my new "Fast Kernel 
> Headers" project that I've been working on since late 2020, which is a 
> comprehensive rework of the Linux kernel's header hierarchy & header 
> dependencies, with the dual goals of:
> 
>  - speeding up the kernel build (both absolute and incremental build times)
> 
>  - decoupling subsystem type & API definitions from each other
> 
> The fast-headers tree consists of over 25 sub-trees internally, spanning 
> over 2,200 commits, which can be found here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
> 
> As most kernel developers know, there's around ~10,000 main .h headers in 
> the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the 
> last 30+ years they have grown into a complicated & painful set of 
> cross-dependencies we are affectionately calling 'Dependency Hell'.
> 
> Before going into details about how this tree solves 'dependency hell' 
> exactly, here's the current kernel build performance gain with 
> CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as 
> well - see below), using a stock x86 Linux distribution's .config with all 
> modules built into the vmlinux:
> 
>   #
>   # Performance counter stats for 'make -j96 vmlinux' (3 runs):
>   #
>   # (Elapsed time in seconds):
>   #
> 
>   v5.16-rc7:            231.34 +- 0.60 secs, 15.5 builds/hour    # [ vanilla baseline ]
>   -fast-headers-v1:     129.97 +- 0.51 secs, 27.7 builds/hour    # +78.0% improvement
> 
> Or in terms of CPU time utilized:
> 
>   v5.16-rc7:            11,474,982.05 msec cpu-clock   # 49.601 CPUs utilized
>   -fast-headers-v1:      7,100,730.37 msec cpu-clock   # 54.635 CPUs utilized   # +61.6% improvement

Speed up is very impressive, nice job!

> Techniques used by the fast-headers tree to reduce header size & dependencies:
> 
>  - Aggressive decoupling of high level headers from each other, starting
>    with <linux/sched.h>. Since 'struct task_struct' is a union of many
>    subsystems, there's a new "per_task" infrastructure modeled after the
>    per_cpu framework, which creates fields in task_struct without having
>    to modify sched.h or the 'struct task_struct' type:
> 
>             DECLARE_PER_TASK(type, name);
>             ...
>             per_task(current, name) = val;
> 
>    The per_task() facility then seamlessly creates an offset into the
>    task_struct->per_task_area[] array, and uses the asm-offsets.h
>    mechanism to create offsets into it early in the build.
> 
>    There's no runtime overhead disadvantage from using per_task() framework,
>    the generated code is functionally equivalent to types embedded in
>    task_struct.

This is "interesting", but how are you going to keep the
kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task
definition in sync?  It seems that you manually created this (which is
great for testing), but over the long-term, trying to manually determine
what needs to be done here to keep everything lined up properly is going
to be a major pain.

That issue aside, I took a glance at the tree, and overall it looks like
a lot of nice cleanups.  Most of these can probably go through the
various subsystem trees, after you split them out, for the "major" .h
cleanups.  Is that something you are going to be planning on doing?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman
@ 2022-01-03 11:12   ` Ingo Molnar
  2022-01-03 13:46     ` Greg Kroah-Hartman
                       ` (2 more replies)
  2022-01-04 14:05   ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar
  1 sibling, 3 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-03 11:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro


* Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> > Before going into details about how this tree solves 'dependency hell' 
> > exactly, here's the current kernel build performance gain with 
> > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as 
> > well - see below), using a stock x86 Linux distribution's .config with all 
> > modules built into the vmlinux:
> > 
> >   #
> >   # Performance counter stats for 'make -j96 vmlinux' (3 runs):
> >   #
> >   # (Elapsed time in seconds):
> >   #
> > 
> >   v5.16-rc7:            231.34 +- 0.60 secs, 15.5 builds/hour    # [ vanilla baseline ]
> >   -fast-headers-v1:     129.97 +- 0.51 secs, 27.7 builds/hour    # +78.0% improvement
> > 
> > Or in terms of CPU time utilized:
> > 
> >   v5.16-rc7:            11,474,982.05 msec cpu-clock   # 49.601 CPUs utilized
> >   -fast-headers-v1:      7,100,730.37 msec cpu-clock   # 54.635 CPUs utilized   # +61.6% improvement
> 
> Speed up is very impressive, nice job!

Thanks! :-)

> > Techniques used by the fast-headers tree to reduce header size & dependencies:
> > 
> >  - Aggressive decoupling of high level headers from each other, starting
> >    with <linux/sched.h>. Since 'struct task_struct' is a union of many
> >    subsystems, there's a new "per_task" infrastructure modeled after the
> >    per_cpu framework, which creates fields in task_struct without having
> >    to modify sched.h or the 'struct task_struct' type:
> > 
> >             DECLARE_PER_TASK(type, name);
> >             ...
> >             per_task(current, name) = val;
> > 
> >    The per_task() facility then seamlessly creates an offset into the
> >    task_struct->per_task_area[] array, and uses the asm-offsets.h
> >    mechanism to create offsets into it early in the build.
> > 
> >    There's no runtime overhead disadvantage from using per_task() framework,
> >    the generated code is functionally equivalent to types embedded in
> >    task_struct.
> 
> This is "interesting", but how are you going to keep the 
> kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task 
> definition in sync?

I have plans to clean this up further - see below - but in general I'd 
*discourage* the embedding of new complex types to task_struct.

In practice, most new task_struct fields are either simple types or 
pointers to structs, which can be added to task_struct without having to 
define a complex type for <linux/sched.h>.

For example here's the list of the last 5 extensions of task_struct, since 
November 2020 - I copy & pasted them out of git log -p include/linux/sched.h:

+       unsigned                        in_eventfd_signal:1;

+       cpumask_t                       *user_cpus_ptr;

+       unsigned int                    saved_state;

+       unsigned long                   saved_state_change;

+       struct bpf_run_ctx              *bpf_ctx;

All of those new fields are either simple C types or struct pointers - none 
of those extensions need per_task() handling per se.

The overall policy to extend task_struct, going forward, would be to:

 - Either make simple-type or struct-pointer additions to task_struct, that 
   don't couple <linux/sched.h> to other subsystems.

 - Or, if you absolutely must - and we don't want to forbid this - use the 
   per_task() machinery to create a simple accessor to a complex embedded 
   type.

> [...]  It seems that you manually created this (which is great for 
> testing), but over the long-term, trying to manually determine what needs 
> to be done here to keep everything lined up properly is going to be a 
> major pain.

Note that under the policy above - and even according to the practice of 
the last ~1.5 years - it should be exceedingly rare having to extend the 
per_task() facility.

There's one thing ugly about it, the fixed PER_TASK_BYTES limit, I plan to 
make ->per_task_array[] the last field of task_struct, i.e. change it to:

        u8                              per_task_area[];

This actually became possible through the fixing of the x86 FPU code in the 
following fast-headers commit:

   4ae0f28bc1c8 headers/deps: x86/fpu: Make task_struct::thread constant size

In the last ~1 year existence of the per_task() facility I didn't have any 
maintenance troubles with these fields getting out of sync, but we could 
also auto-generate kernel/sched/per_task_area_struct_defs.h from 
kernel/sched/per_task_area_struct.h via a build-time script, and make 
kernel/sched/per_task_area_struct.h the only method to define such fields.

> That issue aside, I took a glance at the tree, and overall it looks like 
> a lot of nice cleanups.  Most of these can probably go through the 
> various subsystem trees, after you split them out, for the "major" .h 
> cleanups.  Is that something you are going to be planning on doing?

Yeah, I absolutely plan on doing that too:

- About ~70% of the commits can be split up & parallelized through 
  maintainer trees.

- With the exception of the untangling of sched.h, per_task and the 
  "Optimize Headers" series, where a lot of patches are dependent on each 
  other. These are actually needed to get any measurable benefits from this 
  tree (!). We can do these through the scheduler tree, or through the 
  dedicated headers tree I posted.

The latter monolithic series is pretty much unavoidable, it's the result of 
30 years of coupling a lot of kernel subsystems to task_struct via embedded 
structs & other complex types, that needed quite a bit of effort to 
untangle, and that untangling needed to happen in-order.

Do these plans this sound good to you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 11:12   ` Ingo Molnar
@ 2022-01-03 13:46     ` Greg Kroah-Hartman
  2022-01-03 16:29       ` Ingo Molnar
  2022-01-04 14:10     ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar
  2022-01-04 17:51     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
  2 siblings, 1 reply; 54+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-03 13:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro

On Mon, Jan 03, 2022 at 12:12:50PM +0100, Ingo Molnar wrote:
> * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > This is "interesting", but how are you going to keep the 
> > kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task 
> > definition in sync?
> 
> I have plans to clean this up further - see below - but in general I'd 
> *discourage* the embedding of new complex types to task_struct.
> 
> In practice, most new task_struct fields are either simple types or 
> pointers to structs, which can be added to task_struct without having to 
> define a complex type for <linux/sched.h>.
> 
> For example here's the list of the last 5 extensions of task_struct, since 
> November 2020 - I copy & pasted them out of git log -p include/linux/sched.h:
> 
> +       unsigned                        in_eventfd_signal:1;
> 
> +       cpumask_t                       *user_cpus_ptr;
> 
> +       unsigned int                    saved_state;
> 
> +       unsigned long                   saved_state_change;
> 
> +       struct bpf_run_ctx              *bpf_ctx;
> 
> All of those new fields are either simple C types or struct pointers - none 
> of those extensions need per_task() handling per se.
> 
> The overall policy to extend task_struct, going forward, would be to:
> 
>  - Either make simple-type or struct-pointer additions to task_struct, that 
>    don't couple <linux/sched.h> to other subsystems.
> 
>  - Or, if you absolutely must - and we don't want to forbid this - use the 
>    per_task() machinery to create a simple accessor to a complex embedded 
>    type.

I'll leave all of this up to the scheduler developers, but it still
looks odd to me.  The mess we create trying to work around issues in C :)

> > That issue aside, I took a glance at the tree, and overall it looks like 
> > a lot of nice cleanups.  Most of these can probably go through the 
> > various subsystem trees, after you split them out, for the "major" .h 
> > cleanups.  Is that something you are going to be planning on doing?
> 
> Yeah, I absolutely plan on doing that too:
> 
> - About ~70% of the commits can be split up & parallelized through 
>   maintainer trees.
> 
> - With the exception of the untangling of sched.h, per_task and the 
>   "Optimize Headers" series, where a lot of patches are dependent on each 
>   other. These are actually needed to get any measurable benefits from this 
>   tree (!). We can do these through the scheduler tree, or through the 
>   dedicated headers tree I posted.
> 
> The latter monolithic series is pretty much unavoidable, it's the result of 
> 30 years of coupling a lot of kernel subsystems to task_struct via embedded 
> structs & other complex types, that needed quite a bit of effort to 
> untangle, and that untangling needed to happen in-order.
> 
> Do these plans this sound good to you?

Yes, taking the majority through the maintainer trees and then doing the
remaining bits in a single tree seems sane, that one tree will be easier
to review as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
  2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman
@ 2022-01-03 13:54 ` Kirill A. Shutemov
  2022-01-04 10:54   ` Ingo Molnar
  2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 54+ messages in thread
From: Kirill A. Shutemov @ 2022-01-03 13:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
>  - As to testing & runtime behavior: while all of these patches are 
>    intended to be bug-free, I did find a couple of semi-bugs in the kernel 
>    where a specific order of headers guaranteed a particular code 
>    generation outcome - and if that header order was disturbed, the kernel 
>    would silently break and fail to boot ...

Looks like you are doing a lot of uninlining. Do you see any runtime
performance degradation with the patchset?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 13:46     ` Greg Kroah-Hartman
@ 2022-01-03 16:29       ` Ingo Molnar
  2022-01-10 10:28         ` Peter Zijlstra
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-03 16:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro


* Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> > The overall policy to extend task_struct, going forward, would be to:
> > 
> >  - Either make simple-type or struct-pointer additions to task_struct, that 
> >    don't couple <linux/sched.h> to other subsystems.
> > 
> >  - Or, if you absolutely must - and we don't want to forbid this - use the 
> >    per_task() machinery to create a simple accessor to a complex embedded 
> >    type.
> 
> I'll leave all of this up to the scheduler developers, but it still looks 
> odd to me.  The mess we create trying to work around issues in C :)

Yeah, so I *did* find this somewhat suboptimal too, and developed an 
earlier version that used linker section tricks to gain the field offsets 
more automatically.

It was an unmitigated disaster: was fragile on x86 already (which has a zoo 
of linking quirks with no precedent of doing this before bounds.c 
processing), but on ARM64 and probably on most of the other RISC-ish 
architectures there was also a real runtime code generation cost of using 
linker tricks: 2-3 extra instructions per per_task() use - clearly 
unacceptable.

Found this out the hard way after making it boot & work on ARM64 and 
looking at the assembly output, trying to figure out why the generated code 
size increased. :-/

Anyway, the current method has the big advantage of being obviously 
invariant wrt. code generation compared to the previous code, on every 
architecture.

> > Do these plans sound good to you?
> 
> Yes, taking the majority through the maintainer trees and then doing the 
> remaining bits in a single tree seems sane, that one tree will be easier 
> to review as well.

Ok. Will definitely offer it up piecemail-wise, in reviewable chunks, via 
existing processes & flows.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
  2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman
  2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov
@ 2022-01-03 17:54 ` Nathan Chancellor
  2022-01-04 10:47   ` Ingo Molnar
  2022-01-04 12:36 ` Willy Tarreau
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-03 17:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

Hi Ingo,

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> Before going into details about how this tree solves 'dependency hell' 
> exactly, here's the current kernel build performance gain with 
> CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as 
> well - see below), using a stock x86 Linux distribution's .config with all 
> modules built into the vmlinux:
> 
>   #
>   # Performance counter stats for 'make -j96 vmlinux' (3 runs):
>   #
>   # (Elapsed time in seconds):
>   #
> 
>   v5.16-rc7:            231.34 +- 0.60 secs, 15.5 builds/hour    # [ vanilla baseline ]
>   -fast-headers-v1:     129.97 +- 0.51 secs, 27.7 builds/hour    # +78.0% improvement

This is really impressive; as someone who constantly builds large
kernels for test coverage, I am excited about less time to get results.
Testing on an 80-core arm64 server (the fastest machine I have access to
at the moment) with LLVM, I can see anywhere from 18% to 35% improvement.


Benchmark 1: ARCH=arm64 defconfig (linux)
  Time (mean ± σ):     97.159 s ±  0.246 s    [User: 4828.383 s, System: 611.256 s]
  Range (min … max):   96.900 s … 97.648 s    10 runs

Benchmark 2: ARCH=arm64 defconfig (linux-fast-headers)
  Time (mean ± σ):     76.300 s ±  0.107 s    [User: 3149.986 s, System: 436.487 s]
  Range (min … max):   76.117 s … 76.467 s    10 runs

Summary
  'ARCH=arm64 defconfig (linux-fast-headers)' ran
    1.27 ± 0.00 times faster than 'ARCH=arm64 defconfig (linux)'


Benchmark 1: ARCH=arm64 allmodconfig (linux)
  Time (mean ± σ):     390.106 s ±  0.192 s    [User: 23893.382 s, System: 2802.413 s]
  Range (min … max):   389.942 s … 390.513 s    7 runs

Benchmark 2: ARCH=arm64 allmodconfig (linux-fast-headers)
  Time (mean ± σ):     288.066 s ±  0.621 s    [User: 16436.098 s, System: 2117.352 s]
  Range (min … max):   287.131 s … 288.982 s    7 runs

Summary
  'ARCH=arm64 allmodconfig (linux-fast-headers)' ran
    1.35 ± 0.00 times faster than 'ARCH=arm64 allmodconfig (linux)'


Benchmark 1: ARCH=arm64 allyesconfig (linux)
  Time (mean ± σ):     557.752 s ±  1.019 s    [User: 21227.404 s, System: 2226.121 s]
  Range (min … max):   555.833 s … 558.775 s    7 runs

Benchmark 2: ARCH=arm64 allyesconfig (linux-fast-headers)
  Time (mean ± σ):     473.815 s ±  1.793 s    [User: 15351.991 s, System: 1689.630 s]
  Range (min … max):   471.542 s … 476.830 s    7 runs

Summary
  'ARCH=arm64 allyesconfig (linux-fast-headers)' ran
    1.18 ± 0.00 times faster than 'ARCH=arm64 allyesconfig (linux)'


I wanted to test the same x86_64 configs last night but I ran out of time
before bed due to some issues that I was only able to look at this morning
(more on those below). I'll just have to settle for defconfig right now, whichs
shows a modest improvement.

Benchmark 1: ARCH=x86_64 defconfig (linux)
  Time (mean ± σ):     41.122 s ±  0.190 s    [User: 1700.206 s, System: 205.555 s]
  Range (min … max):   40.966 s … 41.515 s    7 runs

Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
  Time (mean ± σ):     36.357 s ±  0.183 s    [User: 1134.252 s, System: 152.396 s]
  Range (min … max):   35.983 s … 36.534 s    7 runs

Summary
  'ARCH=x86_64 defconfig (linux-fast-headers)' ran
    1.13 ± 0.01 times faster than 'ARCH=x86_64 defconfig (linux)'

> For example, the preprocessed kernel/pid.c file explodes into over 94,000 
> lines of code on the vanilla kernel:
> 
>   # v5.16-rc7:
> 
>   kepler:~/mingo.tip.git> make kernel/pid.i
>   kepler:~/mingo.tip.git> wc -l kernel/pid.i
>   94569 kernel/pid.i
> 
> The compiler has to go through those 95,000 lines of code - even if a lot 
> of it is trivial fluff not actually used by kernel/pid.c.
> 
> With the fast-headers kernel that's down to ~36,000 lines of code, almost a 
> factor of 3 reduction:
> 
>   # fast-headers-v1:
>   kepler:~/mingo.tip.git> wc -l kernel/pid.i
>   35941 kernel/pid.i

Coming from someone who often has to reduce a preprocessed kernel source
file with creduce/cvise to report compiler bugs, this will be a very
welcomed change, as those tools will have to do less work, and I can get
my reports done faster.

########################################################################

I took the series for a spin with clang and GCC on arm64 and x86_64 and
I found a few warnings/errors.


1. Position of certain attributes

In some commits, you move the cacheline_aligned attributes from after
the closing brace on structures to before the struct keyword, which
causes clang to warn (and error with CONFIG_WERROR):

In file included from arch/arm64/kernel/asm-offsets.c:9:
In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33:
In file included from ./include/linux/perf_event_api.h:17:
In file included from ./include/linux/perf_event_types.h:41:
In file included from ./include/linux/ftrace.h:18:
In file included from ./arch/arm64/include/asm/ftrace.h:53:
In file included from ./include/linux/compat.h:11:
./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
____cacheline_aligned
^
./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
#define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                             ^

My diff to fix this looks like:

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 520daf638d06..da7e77a7cede 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -127,8 +127,7 @@ enum dentry_d_lock_class
        DENTRY_D_LOCK_NESTED
 };

-____cacheline_aligned
-struct dentry_operations {
+struct ____cacheline_aligned dentry_operations {
        int (*d_revalidate)(struct dentry *, unsigned int);
        int (*d_weak_revalidate)(struct dentry *, unsigned int);
        int (*d_hash)(const struct dentry *, struct qstr *);
diff --git a/include/linux/fs_types.h b/include/linux/fs_types.h
index b53aadafab1b..e2e1c0827183 100644
--- a/include/linux/fs_types.h
+++ b/include/linux/fs_types.h
@@ -994,8 +994,7 @@ struct file_operations {
        int (*fadvise)(struct file *, loff_t, loff_t, int);
 } __randomize_layout;

-____cacheline_aligned
-struct inode_operations {
+struct ____cacheline_aligned inode_operations {
        struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
        const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
        int (*permission) (struct user_namespace *, struct inode *, int);
diff --git a/include/linux/netdevice_api.h b/include/linux/netdevice_api.h
index 4a8d7688e148..0e5e08dcbb2a 100644
--- a/include/linux/netdevice_api.h
+++ b/include/linux/netdevice_api.h
@@ -49,7 +49,7 @@
 #endif

 /* This structure contains an instance of an RX queue. */
-____cacheline_aligned_in_smp struct netdev_rx_queue {
+struct ____cacheline_aligned_in_smp netdev_rx_queue {
        struct xdp_rxq_info             xdp_rxq;
 #ifdef CONFIG_RPS
        struct rps_map __rcu            *rps_map;
diff --git a/include/net/xdp_types.h b/include/net/xdp_types.h
index 442028626b35..accc12372bca 100644
--- a/include/net/xdp_types.h
+++ b/include/net/xdp_types.h
@@ -56,7 +56,7 @@ struct xdp_mem_info {
 struct page_pool;

 /* perf critical, avoid false-sharing */
-____cacheline_aligned struct xdp_rxq_info {
+struct ____cacheline_aligned xdp_rxq_info {
        struct net_device *dev;
        u32 queue_index;
        u32 reg_state;


2. Error with CONFIG_SHADOW_CALL_STACK

With ARCH=arm64 defconfig + CONFIG_SHADOW_CALL_STACK, I see the
following error:

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig menuconfig init/
init/main.c:916:50: error: use of undeclared identifier 'init_shadow_call_stack'
        per_task(&init_task, ti) = (struct thread_info) INIT_THREAD_INFO(init_task);
                                                        ^
./arch/arm64/include/asm/thread_info.h:123:2: note: expanded from macro 'INIT_THREAD_INFO'
        INIT_SCS                                                        \
        ^
./arch/arm64/include/asm/thread_info.h:113:14: note: expanded from macro 'INIT_SCS'
        .scs_base       = init_shadow_call_stack,                       \
                          ^
init/main.c:916:50: error: use of undeclared identifier 'init_shadow_call_stack'
./arch/arm64/include/asm/thread_info.h:123:2: note: expanded from macro 'INIT_THREAD_INFO'
        INIT_SCS                                                        \
        ^
./arch/arm64/include/asm/thread_info.h:114:13: note: expanded from macro 'INIT_SCS'
        .scs_sp         = init_shadow_call_stack,
                          ^
2 errors generated.

It looks like on mainline, init_shadow_call_stack is in defined and used
in init/init_task.c but now, it is used in init/main.c, with no
declaration to allow the compiler to find the definition. I guess moving
init_shadow_call_stack out of init/init_task.c to somewhere more common
would fix this but it depends on SCS_SIZE, which is defined in
include/linux/scs.h, and as soon as I tried to include that in another
file, the build broke further... Any ideas you have would be appreciated
:) for benchmarking purposes, I just disabled CONFIG_SHADOW_CALL_STACK.


3. Nested function in arch/x86/kernel/asm-offsets.c

$ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 defconfig all
In file included from arch/x86/kernel/asm-offsets.c:40:
arch/x86/kernel/../../../kernel/sched/per_task_area_struct_defs.h:10:1: error: function definition is not allowed here
{
^
1 error generated.

Clang does not and will not support nested functions; any instances of
those in the kernel were eliminated when formalizing clang support.

I am not really sure if this was intentional or not? Looking at the
other asm-offsets.c files, I see the include outside of any function.
Moving it out of the common() function does not appear to break the
build for defconfig, allmodconfig, or my distribution config and it
boots in QEMU and my AMD based test desktop.

diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ff3f8ed5d0a2..a6d56f4697cd 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -35,10 +35,10 @@
 # include "asm-offsets_64.c"
 #endif

-static void __used common(void)
-{
 #include "../../../kernel/sched/per_task_area_struct_defs.h"

+static void __used common(void)
+{
        BLANK();
        DEFINE(TASK_threadsp, offsetof(struct task_struct, per_task_area) +
                              offsetof(struct task_struct_per_task, thread) +


4. Build error in kernel/gcov/clang.c

$ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 distclean allmodconfig kernel/gcov/clang.o
kernel/gcov/clang.c:232:3: error: implicitly declaring library function 'memset' with type 'void *(void *, int, unsigned long)' [-Werror,-Wimplicit-function-declaration]
                memset(fn->counters, 0,
                ^
kernel/gcov/clang.c:232:3: note: include the header <string.h> or explicitly provide a declaration for 'memset'
kernel/gcov/clang.c:291:32: error: implicit declaration of function 'kmemdup' [-Werror,-Wimplicit-function-declaration]
        struct gcov_fn_info *fn_dup = kmemdup(fn, sizeof(*fn),
                                      ^
kernel/gcov/clang.c:291:23: error: incompatible integer to pointer conversion initializing 'struct gcov_fn_info *' with an expression of type 'int' [-Werror,-Wint-conversion]
        struct gcov_fn_info *fn_dup = kmemdup(fn, sizeof(*fn),
                             ^        ~~~~~~~~~~~~~~~~~~~~~~~~
kernel/gcov/clang.c:304:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration]
        memcpy(fn_dup->counters, fn->counters, cv_size);
        ^
kernel/gcov/clang.c:304:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy'
kernel/gcov/clang.c:320:8: error: implicit declaration of function 'kmemdup' [-Werror,-Wimplicit-function-declaration]
        dup = kmemdup(info, sizeof(*dup), GFP_KERNEL);
              ^
kernel/gcov/clang.c:320:6: error: incompatible integer to pointer conversion assigning to 'struct gcov_info *' from 'int' [-Werror,-Wint-conversion]
        dup = kmemdup(info, sizeof(*dup), GFP_KERNEL);
            ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/gcov/clang.c:325:18: error: implicit declaration of function 'kstrdup' [-Werror,-Wimplicit-function-declaration]
        dup->filename = kstrdup(info->filename, GFP_KERNEL);
                        ^
kernel/gcov/clang.c:325:16: error: incompatible integer to pointer conversion assigning to 'const char *' from 'int' [-Werror,-Wint-conversion]
        dup->filename = kstrdup(info->filename, GFP_KERNEL);
                      ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8 errors generated.

I resolved this with:

diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index 6ee385f6ad47..29f0899ba209 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -52,6 +52,7 @@
 #include <linux/ratelimit.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
+#include <linux/string.h>
 #include "gcov.h"

 typedef void (*llvm_gcov_callback)(void);


5. BPF errors

With Arch Linux's config (https://github.com/archlinux/svntogit-packages/raw/packages/linux/trunk/config),
I see the following errors:

kernel/bpf/preload/iterators/iterators.c:3:10: fatal error: 'linux/sched/signal.h' file not found
#include <linux/sched/signal.h>
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

kernel/bpf/sysfs_btf.c:21:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration]
        memcpy(buf, __start_BTF + off, len);
        ^
kernel/bpf/sysfs_btf.c:21:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy'
1 error generated.

The second error is obviously fixed by just including string.h as above.

I am not sure what is wrong with the first one; the includes all appear
to be userland headers, rather than kernel ones, so maybe an -I flag is
not present that should be? To work around it, I disabled
CONFIG_BPF_PRELOAD.


6. resolve_btfids warning

After working around the above errors, with either GCC or clang, I see
the following warnings with Arch Linux's configuration:

WARN: multiple IDs found for 'task_struct': 103, 23549 - using 103
WARN: multiple IDs found for 'path': 1166, 23551 - using 1166
WARN: multiple IDs found for 'inode': 997, 23561 - using 997
WARN: multiple IDs found for 'file': 714, 23566 - using 714
WARN: multiple IDs found for 'seq_file': 1120, 23673 - using 1120

Which appears to come from symbols_resolve() in
tools/bpf/resolve_btfids/main.c.

########################################################################

I am very excited to see where this goes, it is a herculean effort but I
think it will be worth it in the long run. Let me know if there is any
more information or input that I can provide, cheers!

Nathan

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
@ 2022-01-04 10:47   ` Ingo Molnar
  2022-01-04 10:56     ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar
                       ` (5 more replies)
  0 siblings, 6 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 10:47 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> Hi Ingo,
> 
> On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> > Before going into details about how this tree solves 'dependency hell' 
> > exactly, here's the current kernel build performance gain with 
> > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as 
> > well - see below), using a stock x86 Linux distribution's .config with all 
> > modules built into the vmlinux:
> > 
> >   #
> >   # Performance counter stats for 'make -j96 vmlinux' (3 runs):
> >   #
> >   # (Elapsed time in seconds):
> >   #
> > 
> >   v5.16-rc7:            231.34 +- 0.60 secs, 15.5 builds/hour    # [ vanilla baseline ]
> >   -fast-headers-v1:     129.97 +- 0.51 secs, 27.7 builds/hour    # +78.0% improvement
> 
> This is really impressive; as someone who constantly builds large
> kernels for test coverage, I am excited about less time to get results.
> Testing on an 80-core arm64 server (the fastest machine I have access to
> at the moment) with LLVM, I can see anywhere from 18% to 35% improvement.
> 
> 
> Benchmark 1: ARCH=arm64 defconfig (linux)
>   Time (mean ± σ):     97.159 s ±  0.246 s    [User: 4828.383 s, System: 611.256 s]
>   Range (min … max):   96.900 s … 97.648 s    10 runs
> 
> Benchmark 2: ARCH=arm64 defconfig (linux-fast-headers)
>   Time (mean ± σ):     76.300 s ±  0.107 s    [User: 3149.986 s, System: 436.487 s]
>   Range (min … max):   76.117 s … 76.467 s    10 runs

That looks good, thanks for giving it a test, and thanks for all the fixes! 
:-)

Note that on ARM64 the elapsed time improvement is 'only' 18-35%, because 
the triple-linking of vmlinux serializes much of the of a build & ARM64 
doesn't have the kallsyms-objtool feature yet.

But we can already see how much faster it became, from the user+system time 
spent building the kernel:

           vanilla: 4828.383 s + 611.256 s = 5439.639 s
  -fast-headers-v1: 3149.986 s + 436.487 s = 3586.473 s

That's a +51% speedup. :-)

With CONFIG_KALLSYMS_FAST=y on x86, the final link gets faster by about 
60%-70%, so the header improvements will more directly show up in elapsed 
time as well.

Plus I spent more time looking at x86 header bloat than at ARM64 header 
bloat. In the end I think the improvement could probably moved into the 
broad 60-70% range that I see on x86.

All the other ARM64 tests show a 37%-43% improvement in CPU time used:

> Benchmark 1: ARCH=arm64 allmodconfig (linux)
>   Time (mean ± σ):     390.106 s ±  0.192 s    [User: 23893.382 s, System: 2802.413 s]
>   Range (min … max):   389.942 s … 390.513 s    7 runs
> 
> Benchmark 2: ARCH=arm64 allmodconfig (linux-fast-headers)
>   Time (mean ± σ):     288.066 s ±  0.621 s    [User: 16436.098 s, System: 2117.352 s]
>   Range (min … max):   287.131 s … 288.982 s    7 runs

# (23893.382+2802.413)/(16436.098+2117.352) = +43% in throughput.


> Benchmark 1: ARCH=arm64 allyesconfig (linux)
>   Time (mean ± σ):     557.752 s ±  1.019 s    [User: 21227.404 s, System: 2226.121 s]
>   Range (min … max):   555.833 s … 558.775 s    7 runs
> 
> Benchmark 2: ARCH=arm64 allyesconfig (linux-fast-headers)
>   Time (mean ± σ):     473.815 s ±  1.793 s    [User: 15351.991 s, System: 1689.630 s]
>   Range (min … max):   471.542 s … 476.830 s    7 runs

# (21227.404+2226.121)/(15351.991+1689.630) = +37%


> Benchmark 1: ARCH=x86_64 defconfig (linux)
>   Time (mean ± σ):     41.122 s ±  0.190 s    [User: 1700.206 s, System: 205.555 s]
>   Range (min … max):   40.966 s … 41.515 s    7 runs
> 
> Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
>   Time (mean ± σ):     36.357 s ±  0.183 s    [User: 1134.252 s, System: 152.396 s]
>   Range (min … max):   35.983 s … 36.534 s    7 runs


# (1700.206+205.555)/(1134.252+152.396) = +48%

> Summary
>   'ARCH=x86_64 defconfig (linux-fast-headers)' ran
>     1.13 ± 0.01 times faster than 'ARCH=x86_64 defconfig (linux)'

Now this x86-defconfig result you got is a bit weird - it *should* have 
been around ~50% faster on x86 in terms of elapsed time too.

Here's how x86-64 defconfig looks like on my system - with 128 GB RAM & 
fast NVDIMMs and 64 CPUs:

   #
   # -v5.16-rc8:
   #

   $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null

   Performance counter stats for 'make -j96 vmlinux' (3 runs):

   4,906,953,379,372      instructions              #    0.90  insn per cycle           ( +-  0.00% )
   5,475,163,448,391      cycles                    #    3.898 GHz                      ( +-  0.01% )
        1,404,614.64 msec cpu-clock                 #   45.864 CPUs utilized            ( +-  0.01% )

             30.6258 +- 0.0337 seconds time elapsed  ( +-  0.11% )

   #
   # -fast-headers-v1:
   #

   $ make defconfig
   $ grep KALLSYMS_FAST .config
   CONFIG_KALLSYMS_FAST=y

   $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null

    Performance counter stats for 'make -j96 vmlinux' (3 runs):

     3,500,079,269,120      instructions              #    0.90  insn per cycle           ( +-  0.00% )
     3,872,081,278,824      cycles                    #    3.895 GHz                      ( +-  0.10% )
            993,448.13 msec cpu-clock                 #   47.306 CPUs utilized            ( +-  0.10% )

             21.0004 +- 0.0265 seconds time elapsed  ( +-  0.13% )

That's a +45.8% speedup in elapsed time, and a +41.4% improvement in 
cpu-clock utilization.

I'm wondering whether your system has some sort of bottleneck?


One thing I do though when running benchmarks is to switch the cpufreq 
governor to 'performance', via something like:

   NR_CPUS=$(nproc --all)

   curr=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
   next=performance

   echo "# setting all $NR_CPUS CPUs from '"$curr"' to the '"$next"' governor"

   for ((cpu=0; cpu<$NR_CPUS; cpu++)); do
     G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor
     [ -f $G ] && echo $next > $G
   done

This minimizes the amount of noise across iterations and makes the results 
more dependable:

             30.6258 +- 0.0337 seconds time elapsed  ( +-  0.11% )
             21.0004 +- 0.0265 seconds time elapsed  ( +-  0.13% )

> > With the fast-headers kernel that's down to ~36,000 lines of code, 
> > almost a factor of 3 reduction:
> > 
> >   # fast-headers-v1:
> >   kepler:~/mingo.tip.git> wc -l kernel/pid.i
> >   35941 kernel/pid.i
> 
> Coming from someone who often has to reduce a preprocessed kernel source 
> file with creduce/cvise to report compiler bugs, this will be a very 
> welcomed change, as those tools will have to do less work, and I can get 
> my reports done faster.

That's nice, didn't think of that side effect.

Could you perhaps measure this too, to see how much of a benefit it is?

> ########################################################################
> 
> I took the series for a spin with clang and GCC on arm64 and x86_64 and
> I found a few warnings/errors.

Thank you!

> 1. Position of certain attributes
> 
> In some commits, you move the cacheline_aligned attributes from after
> the closing brace on structures to before the struct keyword, which
> causes clang to warn (and error with CONFIG_WERROR):
> 
> In file included from arch/arm64/kernel/asm-offsets.c:9:
> In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33:
> In file included from ./include/linux/perf_event_api.h:17:
> In file included from ./include/linux/perf_event_types.h:41:
> In file included from ./include/linux/ftrace.h:18:
> In file included from ./arch/arm64/include/asm/ftrace.h:53:
> In file included from ./include/linux/compat.h:11:
> ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
> ____cacheline_aligned
> ^
> ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
> #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))

Yeah, so this is a *really* stupid warning from Clang.

Putting the attribute after 'struct' risks the hard to track down bugs when 
a <linux/cache.h> inclusion is missing, which scenario I pointed out in 
this commit:

    headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
    
    When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
    which caused a couple of hundred of mysterious, somewhat obscure link time errors:
    
      ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
      ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
      ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
      ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
    
    After a bit of head-scratching, what happened is that 'struct dentry_operations'
    has the ____cacheline_aligned attribute at the tail of the type definition -
    which turned into a local variable definition when <linux/cache.h> was not
    included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.
    
    There were no compile time errors, only link time errors.
    
    Move the attribute to the head of the definition, in which case
    a missing <linux/cache.h> inclusion creates an immediate build failure:
    
      In file included from ./include/linux/fs.h:9,
                       from ./include/linux/fsverity.h:14,
                       from fs/verity/fsverity_private.h:18,
                       from fs/verity/read_metadata.c:8:
      ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
        132 | ____cacheline_aligned
            |                      ^
            |                      ;
        133 | struct dentry_operations {
            | ~~~~~~
    
    No change in functionality.
    
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

Can this Clang warning be disabled?

> 2. Error with CONFIG_SHADOW_CALL_STACK

So this feature depends on Clang:

 # Supported by clang >= 7.0
 config CC_HAVE_SHADOW_CALL_STACK
         def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)

No way to activate it under my GCC cross-build toolchain, right?

But ... I hacked the build mode on with GCC using this patch:

From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 4 Jan 2022 11:26:09 +0100
Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing

NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Makefile           | 2 +-
 arch/Kconfig       | 2 +-
 arch/arm64/Kconfig | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 16d7f83ac368..bbab462e7509 100644
--- a/Makefile
+++ b/Makefile
@@ -888,7 +888,7 @@ LDFLAGS_vmlinux += --gc-sections
 endif
 
 ifdef CONFIG_SHADOW_CALL_STACK
-CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
+CC_FLAGS_SCS	:=
 KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
 export CC_FLAGS_SCS
 endif
diff --git a/arch/Kconfig b/arch/Kconfig
index 4e56f66fdbcf..2103d9da4fe1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -605,7 +605,7 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK
 
 config SHADOW_CALL_STACK
 	bool "Clang Shadow Call Stack"
-	depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK
+	depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
 	depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER
 	help
 	  This option enables Clang's Shadow Call Stack, which uses a
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..952f3e56e0a7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1183,7 +1183,7 @@ config ARCH_HAS_FILTER_PGPROT
 
 # Supported by clang >= 7.0
 config CC_HAVE_SHADOW_CALL_STACK
-	def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
+	def_bool y
 
 config PARAVIRT
 	bool "Enable paravirtualization code"


And was able to trigger at least some of the build errors you saw:

  In file included from kernel/scs.c:15:
  ./include/linux/scs.h: In function 'scs_task_reset':
  ./include/linux/scs.h:26:34: error: implicit declaration of function 'task_thread_info' [-Werror=implicit-function-declaration]

This is fixed with:

diff --git a/kernel/scs.c b/kernel/scs.c
index ca9e707049cb..719ab53adc8a 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2019 Google LLC
  */
 
+#include <linux/sched/thread_info_api.h>
 #include <linux/sched.h>
 #include <linux/mm_page_address.h>
 #include <linux/mm_api.h>


Then there's the build failure in init/main.c:

> It looks like on mainline, init_shadow_call_stack is in defined and used 
> in init/init_task.c but now, it is used in init/main.c, with no
> declaration to allow the compiler to find the definition. I guess moving
> init_shadow_call_stack out of init/init_task.c to somewhere more common
> would fix this but it depends on SCS_SIZE, which is defined in
> include/linux/scs.h, and as soon as I tried to include that in another
> file, the build broke further... Any ideas you have would be appreciated
> :) for benchmarking purposes, I just disabled CONFIG_SHADOW_CALL_STACK.

So I see:

In file included from ./include/linux/thread_info.h:63,
                 from ./arch/arm64/include/asm/smp.h:32,
                 from ./include/linux/smp_api.h:15,
                 from ./include/linux/percpu.h:6,
                 from ./include/linux/softirq.h:8,
                 from init/main.c:17:
init/main.c: In function 'init_per_task_early':
./arch/arm64/include/asm/thread_info.h:113:27: error: 'init_shadow_call_stack' undeclared (first use in this function)
  113 |         .scs_base       = init_shadow_call_stack,                       \
      |                           ^~~~~~~~~~~~~~~~~~~~~~

This looks pretty straightforward, does this patch solve it?

 include/linux/scs.h | 3 +++
 init/main.c         | 1 +
 2 files changed, 4 insertions(+)

diff --git a/include/linux/scs.h b/include/linux/scs.h
index 18122d9e17ff..863932a9347a 100644
--- a/include/linux/scs.h
+++ b/include/linux/scs.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_SCS_H
 #define _LINUX_SCS_H
 
+#include <linux/sched/thread_info_api.h>
 #include <linux/gfp.h>
 #include <linux/poison.h>
 #include <linux/sched.h>
@@ -25,6 +26,8 @@
 #define task_scs(tsk)		(task_thread_info(tsk)->scs_base)
 #define task_scs_sp(tsk)	(task_thread_info(tsk)->scs_sp)
 
+extern unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)];
+
 void *scs_alloc(int node);
 void scs_free(void *s);
 void scs_init(void);
diff --git a/init/main.c b/init/main.c
index c9eb3ecbe18c..74ccad445009 100644
--- a/init/main.c
+++ b/init/main.c
@@ -12,6 +12,7 @@
 
 #define DEBUG		/* Enable initcall_debug */
 
+#include <linux/scs.h>
 #include <linux/workqueue_api.h>
 #include <linux/sysctl.h>
 #include <linux/softirq.h>

I've applied these fixes, with that CONFIG_SHADOW_CALL_STACK=y builds fine 
on ARM64 - but I performed no runtime testing.

I've backmerged this into:

    headers/deps: per_task, arm64, x86: Convert task_struct::thread to a per_task() field

where this bug originated from.

I.e. I think the bug was simply to make main.c aware of the array, now that 
the INIT_THREAD initialization is done there.

We could move over the init_shadow_call_stack[] array there and make it 
static to begin with? I don't think anything truly relies on it being a 
global symbol.

> 3. Nested function in arch/x86/kernel/asm-offsets.c

> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index ff3f8ed5d0a2..a6d56f4697cd 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -35,10 +35,10 @@
>  # include "asm-offsets_64.c"
>  #endif
> 
> -static void __used common(void)
> -{
>  #include "../../../kernel/sched/per_task_area_struct_defs.h"
> 
> +static void __used common(void)
> +{
>         BLANK();
>         DEFINE(TASK_threadsp, offsetof(struct task_struct, per_task_area) +
>                               offsetof(struct task_struct_per_task, thread) +

Ha, that code is bogus, it's a merge bug of mine. Super interesting that 
GCC still managed to include the header ...

I've applied your fix.

> 4. Build error in kernel/gcov/clang.c

> 8 errors generated.
> 
> I resolved this with:
> 
> diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
> index 6ee385f6ad47..29f0899ba209 100644
> --- a/kernel/gcov/clang.c
> +++ b/kernel/gcov/clang.c
> @@ -52,6 +52,7 @@
>  #include <linux/ratelimit.h>
>  #include <linux/slab.h>
>  #include <linux/mm.h>
> +#include <linux/string.h>
>  #include "gcov.h"

Thank you - applied!

>  typedef void (*llvm_gcov_callback)(void);
> 
> 
> 5. BPF errors
> 
> With Arch Linux's config (https://github.com/archlinux/svntogit-packages/raw/packages/linux/trunk/config),
> I see the following errors:
> 
> kernel/bpf/preload/iterators/iterators.c:3:10: fatal error: 'linux/sched/signal.h' file not found
> #include <linux/sched/signal.h>
>          ^~~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> 
> kernel/bpf/sysfs_btf.c:21:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration]
>         memcpy(buf, __start_BTF + off, len);
>         ^
> kernel/bpf/sysfs_btf.c:21:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy'
> 1 error generated.
> 
> The second error is obviously fixed by just including string.h as above.

Applied.

> I am not sure what is wrong with the first one; the includes all appear
> to be userland headers, rather than kernel ones, so maybe an -I flag is
> not present that should be? To work around it, I disabled
> CONFIG_BPF_PRELOAD.

Yeah, this should be fixed by simply removing the two stray dependencies 
that found their way into this user-space code:

 kernel/bpf/preload/iterators/iterators.bpf.c | 1 -
 kernel/bpf/preload/iterators/iterators.c     | 1 -
 2 files changed, 2 deletions(-)

diff --git a/kernel/bpf/preload/iterators/iterators.bpf.c b/kernel/bpf/preload/iterators/iterators.bpf.c
index 41ae00edeecf..03af863314ea 100644
--- a/kernel/bpf/preload/iterators/iterators.bpf.c
+++ b/kernel/bpf/preload/iterators/iterators.bpf.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2020 Facebook */
-#include <linux/seq_file.h>
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_core_read.h>
diff --git a/kernel/bpf/preload/iterators/iterators.c b/kernel/bpf/preload/iterators/iterators.c
index d702cbf7ddaf..5d872a705470 100644
--- a/kernel/bpf/preload/iterators/iterators.c
+++ b/kernel/bpf/preload/iterators/iterators.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2020 Facebook */
-#include <linux/sched/signal.h>
 #include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>


> 6. resolve_btfids warning
> 
> After working around the above errors, with either GCC or clang, I see
> the following warnings with Arch Linux's configuration:
> 
> WARN: multiple IDs found for 'task_struct': 103, 23549 - using 103
> WARN: multiple IDs found for 'path': 1166, 23551 - using 1166
> WARN: multiple IDs found for 'inode': 997, 23561 - using 997
> WARN: multiple IDs found for 'file': 714, 23566 - using 714
> WARN: multiple IDs found for 'seq_file': 1120, 23673 - using 1120
> 
> Which appears to come from symbols_resolve() in
> tools/bpf/resolve_btfids/main.c.

Hm, is this perhaps related to CONFIG_KALLSYMS_FAST=y? If yes then turning 
it off might help.

I don't really know this area of BPF all that much, maybe someone else can 
see what the problem is? The error message is not self-explanatory.

> 
> ########################################################################
> 
> I am very excited to see where this goes, it is a herculean effort but I
> think it will be worth it in the long run. Let me know if there is any
> more information or input that I can provide, cheers!

Your testing & patch sending efforts are much appreciated!! You'd help me 
most by continuing on the same path with new fast-headers releases as well, 
whenever you find the time. :-)

BTW., you can always pick up my latest Work-In-Progress branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers

The 'master' branch will carry the release.

The sched/headers branch is already rebased to -rc8 and has some other 
changes as well. It should normally work, with less testing than the main 
releasees, but will at times have fixes at the tail waiting to be 
backmerged in a bisect-friendly way.

Thanks,

	Ingo

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov
@ 2022-01-04 10:54   ` Ingo Molnar
  2022-01-04 13:34     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 10:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

\
* Kirill A. Shutemov <kirill@shutemov.name> wrote:

> On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> >  - As to testing & runtime behavior: while all of these patches are 
> >    intended to be bug-free, I did find a couple of semi-bugs in the kernel 
> >    where a specific order of headers guaranteed a particular code 
> >    generation outcome - and if that header order was disturbed, the kernel 
> >    would silently break and fail to boot ...
> 
> Looks like you are doing a lot of uninlining. Do you see any runtime
> performance degradation with the patchset?

I haven't tested that yet - and it's pretty hard to performance test 
uninlining patches directly.

But what I've done is that I basically looked at the context and tried to 
make a judgement call based on generated code.

In all the uninlining patches where I thought it might not be clear whether 
it's proper to uninline I added detailed analysis, such as this one:

  commit d94530f1abcbfd2500e90e151e7c67ff48ab3259
  Author: Ingo Molnar <mingo@kernel.org>
  Date:   Sat Nov 20 18:20:58 2021 +0100

    headers/uninline: Uninline multi-use function: put_page()
    
    Ever since the page_is_devmap_managed() logic was added to put_page() in:
    
      07d802699528: ("mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages")
    
    put_page() has become a much larger function of over 2 dozen instructions:

    0000000000004d30 <put_page>:
        4d30:       e8 00 00 00 00          call   4d35 <put_page+0x5>
        4d35:       55                      push   %rbp
        4d36:       48 8b 47 08             mov    0x8(%rdi),%rax
        4d3a:       48 8d 50 ff             lea    -0x1(%rax),%rdx
        4d3e:       a8 01                   test   $0x1,%al
        4d40:       48 89 e5                mov    %rsp,%rbp
        4d43:       48 0f 45 fa             cmovne %rdx,%rdi
        4d47:       66 90                   xchg   %ax,%ax
        4d49:       f0 ff 4f 34             lock decl 0x34(%rdi)
        4d4d:       74 27                   je     4d76 <put_page+0x46>
        4d4f:       5d                      pop    %rbp
        4d50:       c3                      ret
        4d51:       48 8b 07                mov    (%rdi),%rax
        4d54:       48 c1 e8 33             shr    $0x33,%rax
        4d58:       83 e0 07                and    $0x7,%eax
        4d5b:       83 f8 04                cmp    $0x4,%eax
        4d5e:       75 e9                   jne    4d49 <put_page+0x19>
        4d60:       48 8b 47 08             mov    0x8(%rdi),%rax
        4d64:       8b 40 68                mov    0x68(%rax),%eax
        4d67:       83 e8 01                sub    $0x1,%eax
        4d6a:       83 f8 01                cmp    $0x1,%eax
        4d6d:       77 da                   ja     4d49 <put_page+0x19>
        4d6f:       e8 00 00 00 00          call   4d74 <put_page+0x44>
        4d74:       5d                      pop    %rbp
        4d75:       c3                      ret
        4d76:       e8 00 00 00 00          call   4d7b <put_page+0x4b>
        4d7b:       5d                      pop    %rbp
        4d7c:       c3                      ret
    
    Uninline it.
    
    To counter some of the runtime overhead of the extra function call,
    inline the __put_page() instance into put_page() - this is now
    possible without extra bloat.
    
    There's a measurable improvement in vmlinux text size, on a distro
    kernel build, by ~4 KB.
    
    Doing so also decouples <linux/mm_api.h> from <linux/memremap.h>.
    
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

I think it's pretty much a given that we don't want to inline 2 dozen 
instructions for every put_page() call and we don't need performance 
testing.

Admittedly my 'judgement call' was colored by the overall goal to decouple 
types and headers, so please do double check! None of the uninlining 
patches are critical to this tree - there's various other ways headers can 
be decoupled other than uninlining.

There's one happy exception though, all the uninlining patches that 
uninline a single-call function are probably fine as-is:

 ef1028c44345 headers/uninline: Uninline single-use function: mips: page_size_ftlb()
 98bc89e85e3f headers/uninline: Uninline single-use function: set_page_links()
 e368b54381e9 headers/uninline: Uninline single-use function: cpupid_to_nid()
 36b59978a96d headers/uninline: Uninline single-use function: wb_domain_size_changed()
 4c95e8f21924 headers/uninline: Uninline single-use function: skb_metadata_differs()
 28195c3f7eba headers/uninline: Uninline single-use function: for_each_netdev_feature()
 3c82b720eb01 headers/uninline: Uninline single-use function: SPI_STATISTICS_ADD_*()
 e7c48e440df3 headers/uninline: Uninline single-use function: qdisc_run()
 ba0bfe18c8cc headers/uninline: Uninline single-use function: dev_validate_header()
 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()
 0e15d2fb85f9 headers/uninline: Uninline single-use function: xfrm_dev_state_free()
 45d5233e1f5f headers/uninline: Uninline single-use function: flow_dissector_init_keys()
 7a897b0747b2 headers/uninline: Uninline single-use function: reqsk_alloc()
 f9003f1bd834 headers/uninline: Uninline single-use function: skb_propagate_pfmemalloc()
 54ea5750f484 headers/uninline: Uninline single-use function: syscall_tracepoint_update()
 5a1dc0bca4a4 headers/uninline: Uninline single-use function: proc_sys_poll_event()
 0af72df4042d headers/uninline: Uninline single-use function: ep_take_care_of_epollwakeup()
 13a8bd09a93a headers/uninline: Uninline single-use function: ptrace_event_pid()
 f2b8980d4178 headers/uninline: Uninline single-use function: itimerspec64_valid()
 ec111205e6de headers/uninline: Uninline single-use function: sk_under_cgroup_hierarchy()
 d623ba9eb252 headers/uninline: Uninline single-use function: wb_find_current() and wb_get_create_current()

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing
  2022-01-04 10:47   ` Ingo Molnar
@ 2022-01-04 10:56     ` Ingo Molnar
  2022-01-04 11:02     ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 10:56 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Ingo Molnar <mingo@kernel.org> wrote:

> > 2. Error with CONFIG_SHADOW_CALL_STACK
> 
> So this feature depends on Clang:
> 
>  # Supported by clang >= 7.0
>  config CC_HAVE_SHADOW_CALL_STACK
>          def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
> 
> No way to activate it under my GCC cross-build toolchain, right?
> 
> But ... I hacked the build mode on with GCC using this patch:
> 
> From: Ingo Molnar <mingo@kernel.org>
> Date: Tue, 4 Jan 2022 11:26:09 +0100
> Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing
> 
> NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ok, I've attached patch again instead embedding it in the middle of a long 
discussion, for future reference.

Thanks,

	Ingo

=====================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 4 Jan 2022 11:26:09 +0100
Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing

NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Makefile           | 2 +-
 arch/Kconfig       | 2 +-
 arch/arm64/Kconfig | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 16d7f83ac368..bbab462e7509 100644
--- a/Makefile
+++ b/Makefile
@@ -888,7 +888,7 @@ LDFLAGS_vmlinux += --gc-sections
 endif
 
 ifdef CONFIG_SHADOW_CALL_STACK
-CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
+CC_FLAGS_SCS	:=
 KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
 export CC_FLAGS_SCS
 endif
diff --git a/arch/Kconfig b/arch/Kconfig
index 4e56f66fdbcf..2103d9da4fe1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -605,7 +605,7 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK
 
 config SHADOW_CALL_STACK
 	bool "Clang Shadow Call Stack"
-	depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK
+	depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
 	depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER
 	help
 	  This option enables Clang's Shadow Call Stack, which uses a
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..952f3e56e0a7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1183,7 +1183,7 @@ config ARCH_HAS_FILTER_PGPROT
 
 # Supported by clang >= 7.0
 config CC_HAVE_SHADOW_CALL_STACK
-	def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
+	def_bool y
 
 config PARAVIRT
 	bool "Enable paravirtualization code"

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
  2022-01-04 10:47   ` Ingo Molnar
  2022-01-04 10:56     ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar
@ 2022-01-04 11:02     ` Ingo Molnar
  2022-01-04 15:05       ` kernel test robot
  2022-01-04 17:51       ` Nathan Chancellor
  2022-01-04 11:19     ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar
                       ` (3 subsequent siblings)
  5 siblings, 2 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 11:02 UTC (permalink / raw)
  To: Nathan Chancellor, Al Viro, Linus Torvalds, Andrew Morton
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Ingo Molnar <mingo@kernel.org> wrote:

> > 1. Position of certain attributes
> > 
> > In some commits, you move the cacheline_aligned attributes from after
> > the closing brace on structures to before the struct keyword, which
> > causes clang to warn (and error with CONFIG_WERROR):
> > 
> > In file included from arch/arm64/kernel/asm-offsets.c:9:
> > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33:
> > In file included from ./include/linux/perf_event_api.h:17:
> > In file included from ./include/linux/perf_event_types.h:41:
> > In file included from ./include/linux/ftrace.h:18:
> > In file included from ./arch/arm64/include/asm/ftrace.h:53:
> > In file included from ./include/linux/compat.h:11:
> > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
> > ____cacheline_aligned
> > ^
> > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
> > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
> 
> Yeah, so this is a *really* stupid warning from Clang.
> 
> Putting the attribute after 'struct' risks the hard to track down bugs when 
> a <linux/cache.h> inclusion is missing, which scenario I pointed out in 
> this commit:
> 
>     headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
>     
>     When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
>     which caused a couple of hundred of mysterious, somewhat obscure link time errors:
>     
>       ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>       ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>       ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>       ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>     
>     After a bit of head-scratching, what happened is that 'struct dentry_operations'
>     has the ____cacheline_aligned attribute at the tail of the type definition -
>     which turned into a local variable definition when <linux/cache.h> was not
>     included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.
>     
>     There were no compile time errors, only link time errors.
>     
>     Move the attribute to the head of the definition, in which case
>     a missing <linux/cache.h> inclusion creates an immediate build failure:
>     
>       In file included from ./include/linux/fs.h:9,
>                        from ./include/linux/fsverity.h:14,
>                        from fs/verity/fsverity_private.h:18,
>                        from fs/verity/read_metadata.c:8:
>       ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
>         132 | ____cacheline_aligned
>             |                      ^
>             |                      ;
>         133 | struct dentry_operations {
>             | ~~~~~~
>     
>     No change in functionality.
>     
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> Can this Clang warning be disabled?

Ok, broke out this issue into its own thread, in form of a patch submission 
- so that others don't have to wade through a massive tree to find a single 
commit ...

I'll of course drop these (non-essential) cleanups if the upstream policy 
is to follow Clang's quirk/convention, but I find the forced attribute 
tail-position a sad misfeature, due to the reasons outlined in this patch: 
a straightforward build failure in case an attribute is not defined is far 
preferable to spurious creation of variables with link-time warnings that 
don't actually highlight the exact nature of the bug ...

Thanks,

	Ingo

=====================>
Date: Sun, 20 Jun 2021 09:41:45 +0200
Subject: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition

When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
which caused a couple of hundred of mysterious, somewhat obscure link time errors:

  ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
  ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
  ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
  ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here

After a bit of head-scratching, what happened is that 'struct dentry_operations'
has the ____cacheline_aligned attribute at the tail of the type definition -
which turned into a local variable definition when <linux/cache.h> was not
included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.

There were no compile time errors, only link time errors.

Move the attribute to the head of the definition, in which case
a missing <linux/cache.h> inclusion creates an immediate build failure:

  In file included from ./include/linux/fs.h:9,
                   from ./include/linux/fsverity.h:14,
                   from fs/verity/fsverity_private.h:18,
                   from fs/verity/read_metadata.c:8:
  ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
    132 | ____cacheline_aligned
        |                      ^
        |                      ;
    133 | struct dentry_operations {
        | ~~~~~~

No change in functionality.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/dcache.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 41062093ec9b..0482c3d6f1ce 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -129,6 +129,7 @@ enum dentry_d_lock_class
 	DENTRY_D_LOCK_NESTED
 };
 
+____cacheline_aligned
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
@@ -144,7 +145,7 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *);
-} ____cacheline_aligned;
+};
 
 /*
  * Locking rules for dentry_operations callbacks are to be found in

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [TREE] "Fast Kernel Headers" Tree WIP/development branch
  2022-01-04 10:47   ` Ingo Molnar
  2022-01-04 10:56     ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar
  2022-01-04 11:02     ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
@ 2022-01-04 11:19     ` Ingo Molnar
  2022-01-04 17:25     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 11:19 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Ingo Molnar <mingo@kernel.org> wrote:

> > ########################################################################
> > 
> > I am very excited to see where this goes, it is a herculean effort but 
> > I think it will be worth it in the long run. Let me know if there is 
> > any more information or input that I can provide, cheers!
> 
> Your testing & patch sending efforts are much appreciated!! You'd help me 
> most by continuing on the same path with new fast-headers releases as 
> well, whenever you find the time. :-)
> 
> BTW., you can always pick up my latest Work-In-Progress branch from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers
> 
> The 'master' branch will carry the release.
> 
> The sched/headers branch is already rebased to -rc8 and has some other 
> changes as well. It should normally work, with less testing than the main 
> releasees, but will at times have fixes at the tail waiting to be 
> backmerged in a bisect-friendly way.

Ok, broke out the sched/headers WIP branch into a separate announcement, in 
case others want to test:

    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers

Note that I sometimes will update the 'master' branch as well, without a 
standalone announcement, if there's some important fix or the previous 
version moved away too much.

Also, where I backmerged your fixes to manual commits I credited you with:

   [ Fixes by Nathan Chancellor ]

   Fixed-by: Nathan Chancellor <nathan@kernel.org>

The (rare) exception would be straight dependency additions such as the 
<linux/string.h> additions, which are auto-generated from scratch to keep 
it maintainable & reviewable - if that's fine with you.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
                   ` (2 preceding siblings ...)
  2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
@ 2022-01-04 12:36 ` Willy Tarreau
  2022-01-04 16:05 ` Andy Shevchenko
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 54+ messages in thread
From: Willy Tarreau @ 2022-01-04 12:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

Hi Ingo!

First, great work! I'm particularly interested in this work because I
went through a similar process a bout 6 months ago in haproxy and saved
40-45% build time, and thought how well the same principles could apply
to the kernel if anyone had felt brave enough to engage into that. I do
appreciate how tedious a work it can be and do really sympathise with
you on this! A few comments below:

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
>  - Uninlining: there's a number of unnecessary inline functions that also
>    couple otherwise unrelated headers to each other. The fast-headers tree
>    contains over 100 uninlining commits.
> 
>  - Type & API header decoupling. This is one of the most effective techniques
>    to reduce size - but it can rarely be done in a straightforward fashion,
>    and has to be prepared by various decoupling measures, such as the moving
>    of inline functions or the creation of new headers for less frequently used
>    APIs and types.

These were the main two key points I went through as well and found them
to be extremely effective. The essential build time in my case came from
the same inline functions being built hundreds of times for nothing, just
because a header file was included for just one type. I had already
decoupled types and API long ago but that didn't stand long enough for a
few files that were included everywhere. What I noticed is that ideally
we'd need to have 3 layers:
  - types alone
  - function prototypes alone, depending on the former if needed
  - inline functions, depending on the two former ones, if needed

Most code doesn't need need the inline functions, especially other headers,
and being able to only cross-include type definitions is extremely helpful.

In my case something that further improved this effectiveness was to use a
lot more incomplete types everywhere possible. There's no reason to include
foo.h just to have a definition of "struct foo" from "bar.h" if you're only
using it as a pointer in "struct bar". Just prepend "struct foo;" before
struct bar and be done with it.

This showed me how horrible typedefs are: there seems to be no way to
create incomplete definitions for them. So I had to create an even lower
level tiny include file for just the few ones I needed (mostly ints).

I hadn't found a perfect way to deal with macros. Sometimes you consider
them as inline functions and they seem to be better placed there, and
sometimes you figure they are used in type declarations and you have to
have them somewhere else. And when a macro is needed between multiple
type definitions (e.g. an array size), it becomes more delicate because
you quickly realize that a dedicated file for all such settings would
make sense, but it can complicate maintenance.

Another point I didn't feel brave enough to experiment with was to guard
include files around the #include directive in order to avoid opening
the files at all. In my case the C files are huge so such savings could
have been small. There are definitely savings to do there but this looked
too complicated to maintain. And I don't think that #pragma once would be
any effective alternative.

>  - For the 'reference' subsystem of the scheduler, I also improved build speed by
>    consolidating .c files into roughly equal size build units. Instead of 20+
>    separate .o's, there's now just 4 .o's being built. Obviously this approach
>    does not scale to the over 30,000 .c files in the kernel, but I wanted to
>    demonstrate it because optimizing at that level brings the next level of build
>    performance, and it might be feasible for a handful of other core kernel subsystems.

I tried this as well for the sake of avoiding to reprocess the same header
files multiple times but it was too difficult and I gave up. I'd be tempted
to encourage developers to write a bit less but larger files, but these can
also become a maintenance nightmare, they tend to be much slower to build
when too big, and they do parallelize less well, so a balance has to be
found, and if the headers hell is better addressed, then this becomes less
important.

I noticed that you measured the number of includes per file. I did the
same by counting the references to the include files in the preprocessed
output, but ultimately found an easier metric: the total preprocessed
size. I simply replaced "-c" with "-E" in my makefile, and ran
"find . -name '*.o' | grep '^[^#]' | xargs cat | wc" to observe the output,
since in the end, that's what is really fed to the compiler. I overall
found that metric to be a relatively accurate representation of an
expected build time. It's particularly interesting because it's much
faster to obtain than a full build and can easily show you that some
optimizations have absolutely zero effect (typically because most
includes are guarded and what's not included at some place will be at
another one).

In my project I noticed that the total preprocessed size was initially
around 50-60 times larger than the total C+H files. After optimizing it
went down to around 20 times, which is roughly in line with the build
time savings.

Just my two cents, kudos for working on this!
Willy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 10:54   ` Ingo Molnar
@ 2022-01-04 13:34     ` Greg Kroah-Hartman
  2022-01-04 13:54       ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-04 13:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro

On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote:
> There's one happy exception though, all the uninlining patches that 
> uninline a single-call function are probably fine as-is:

<snip>

>  3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()

Let me go take this right now, no need for this to wait, it should be
out of kobject.h as you rightfully show there is only one user.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()
  2022-01-04 13:34     ` Greg Kroah-Hartman
@ 2022-01-04 13:54       ` Ingo Molnar
  2022-01-04 15:09         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 13:54 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro


* Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote:
> > There's one happy exception though, all the uninlining patches that 
> > uninline a single-call function are probably fine as-is:
> 
> <snip>
> 
> >  3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()
> 
> Let me go take this right now, no need for this to wait, it should be
> out of kobject.h as you rightfully show there is only one user.

Sure - here you go!

Thanks,

	Ingo


=============================>
From: Ingo Molnar <mingo@kernel.org>
Date: Sun, 29 Aug 2021 09:18:53 +0200
Subject: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()

This was the only usage of <linux/kref_api.h> in <linux/kobject_api.h>,
so we'll able to decouple the two after this change.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 drivers/base/core.c     | 17 +++++++++++++++++
 include/linux/kobject.h | 17 -----------------
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd034d742447..e1f2a5791c0e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3029,6 +3029,23 @@ static inline struct kobject *get_glue_dir(struct device *dev)
 	return dev->kobj.parent;
 }
 
+/**
+ * kobject_has_children - Returns whether a kobject has children.
+ * @kobj: the object to test
+ *
+ * This will return whether a kobject has other kobjects as children.
+ *
+ * It does NOT account for the presence of attribute files, only sub
+ * directories. It also assumes there is no concurrent addition or
+ * removal of such children, and thus relies on external locking.
+ */
+static inline bool kobject_has_children(struct kobject *kobj)
+{
+	WARN_ON_ONCE(kref_read(&kobj->kref) == 0);
+
+	return kobj->sd && kobj->sd->dir.subdirs;
+}
+
 /*
  * make sure cleaning up dir as the last step, we need to make
  * sure .release handler of kobject is run with holding the
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index efd56f990a46..e1c600a377f7 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -117,23 +117,6 @@ extern void kobject_get_ownership(struct kobject *kobj,
 				  kuid_t *uid, kgid_t *gid);
 extern char *kobject_get_path(struct kobject *kobj, gfp_t flag);
 
-/**
- * kobject_has_children - Returns whether a kobject has children.
- * @kobj: the object to test
- *
- * This will return whether a kobject has other kobjects as children.
- *
- * It does NOT account for the presence of attribute files, only sub
- * directories. It also assumes there is no concurrent addition or
- * removal of such children, and thus relies on external locking.
- */
-static inline bool kobject_has_children(struct kobject *kobj)
-{
-	WARN_ON_ONCE(kref_read(&kobj->kref) == 0);
-
-	return kobj->sd && kobj->sd->dir.subdirs;
-}
-
 struct kobj_type {
 	void (*release)(struct kobject *kobj);
 	const struct sysfs_ops *sysfs_ops;

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets
  2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman
  2022-01-03 11:12   ` Ingo Molnar
@ 2022-01-04 14:05   ` Ingo Molnar
  1 sibling, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 14:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro


* Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> > Techniques used by the fast-headers tree to reduce header size & dependencies:
> > 
> >  - Aggressive decoupling of high level headers from each other, starting
> >    with <linux/sched.h>. Since 'struct task_struct' is a union of many
> >    subsystems, there's a new "per_task" infrastructure modeled after the
> >    per_cpu framework, which creates fields in task_struct without having
> >    to modify sched.h or the 'struct task_struct' type:
> > 
> >             DECLARE_PER_TASK(type, name);
> >             ...
> >             per_task(current, name) = val;
> > 
> >    The per_task() facility then seamlessly creates an offset into the
> >    task_struct->per_task_area[] array, and uses the asm-offsets.h
> >    mechanism to create offsets into it early in the build.
> > 
> >    There's no runtime overhead disadvantage from using per_task() framework,
> >    the generated code is functionally equivalent to types embedded in
> >    task_struct.
> 
> This is "interesting", but how are you going to keep the 
> kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task 
> definition in sync?  It seems that you manually created this (which is 
> great for testing), but over the long-term, trying to manually determine 
> what needs to be done here to keep everything lined up properly is going 
> to be a major pain.

On a second thought, I found a solution for this problem and implemented it 
- delta patch attached.

The idea is to unify the two files into a single 'template' definition in:

   kernel/sched/per_task_area_struct_template.h

... with the following, slightly non-standard syntax:

 #ifdef CONFIG_THREAD_INFO_IN_TASK
	/*
	 * For reasons of header soup (see current_thread_info()), this
	 * must be the first element of task_struct.
	 */
	DEF(	struct thread_info,		ti						);
 #endif
	DEF(	void *,				stack						);
	DEF(	refcount_t,			usage						);

	/* Per task flags (PF_*), defined further below: */
	DEF(	unsigned int,			flags						);
	DEF(	unsigned int,			ptrace						);

This looks 'almost' like a C structure definition - but is wrapped in the 
DEF() macro.

Once we have that template, we can use it both to generate the 'struct 
task_struct_per_task' definition, and to pick up the field offsets for the 
per_task() asm-offsets.h machinery.

The advantage is that it solves the problems you mentioned above: the 
per-task structure and the offset definitions can never get out of sync - 
the #ifdefs and the field names will always match.

It's also net reduction in code:

    3 files changed, 216 insertions(+), 341 deletions(-)

Does this approach look better to you?

This patch builds and boots fine in the latest -fast-headers tree.

I'm still of two minds about whether to keep the per-task structure tucked 
away in kernel/sched/, hopefully creating a barrier against spurious 
additions to task_struct by putting it next to scary scheduler code - or 
should we move it into a more formal and easier to access/modify location 
in include/sched/?

Another additional (minor) advantage would be that these uglies:

  arch/arm64/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h"
  arch/arm64/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h"
  arch/mips/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h"
  arch/mips/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h"
  arch/sparc/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h"
  arch/sparc/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h"
  arch/x86/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h"
  arch/x86/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h"

would turn into standard include lines:

  #include <linux/sched/per_task_defs.h>

Thanks,

	Ingo

======================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 4 Jan 2022 14:31:12 +0100
Subject: [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets

Greg observed that the 'struct task_struct_per_task definition'
and the offset definitions are structural duplicates of each
other:

   kernel/sched/per_task_area_struct.h
   kernel/sched/per_task_area_struct_defs.h

These require care during maintenance and could get out of sync.

To address this problem, introduce a single definition template:

   kernel/sched/per_task_area_template.h

And use the template and different preprocessor macros to implement
the two pieces of functionality.

The syntax in the template is C-alike struct field definitions,
wrapped in the DEF() and DEF_A() macros:

 #ifdef CONFIG_THREAD_INFO_IN_TASK
	/*
	 * For reasons of header soup (see current_thread_info()), this
	 * must be the first element of task_struct.
	 */
	DEF(	struct thread_info,		ti						);
 #endif
	DEF(	void *,				stack						);
	DEF(	refcount_t,			usage						);

	/* Per task flags (PF_*), defined further below: */
	DEF(	unsigned int,			flags						);
	DEF(	unsigned int,			ptrace						);

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/per_task_area_struct.h          | 196 ++------------------------
 kernel/sched/per_task_area_struct_defs.h     | 163 ++--------------------
 kernel/sched/per_task_area_struct_template.h | 198 +++++++++++++++++++++++++++
 3 files changed, 216 insertions(+), 341 deletions(-)

diff --git a/kernel/sched/per_task_area_struct.h b/kernel/sched/per_task_area_struct.h
index fad3c24df500..4508160e49ec 100644
--- a/kernel/sched/per_task_area_struct.h
+++ b/kernel/sched/per_task_area_struct.h
@@ -40,194 +40,16 @@
 
 #include "sched.h"
 
-struct task_struct_per_task {
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-	/*
-	 * For reasons of header soup (see current_thread_info()), this
-	 * must be the first element of task_struct.
-	 */
-	struct thread_info		ti;
-#endif
-	void				*stack;
-	refcount_t			usage;
-	/* Per task flags (PF_*), defined further below: */
-	unsigned int			flags;
-	unsigned int			ptrace;
-
-#ifdef CONFIG_SMP
-	int				on_cpu;
-	struct __call_single_node	wake_entry;
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-	/* Current CPU: */
-	unsigned int			cpu;
-#endif
-	unsigned int			wakee_flips;
-	unsigned long			wakee_flip_decay_ts;
-	struct task_struct		*last_wakee;
-	int				recent_used_cpu;
-	int				wake_cpu;
-#endif
-	int				on_rq;
-	struct sched_class		*sched_class;
-	struct sched_entity		se;
-	struct sched_rt_entity		rt;
-	struct sched_dl_entity		dl;
-
-#ifdef CONFIG_SCHED_CORE
-	struct rb_node			core_node;
-	unsigned long			core_cookie;
-	unsigned int			core_occupation;
-#endif
-
-#ifdef CONFIG_CGROUP_SCHED
-	struct task_group		*sched_task_group;
-#endif
-
-#ifdef CONFIG_UCLAMP_TASK
-	/*
-	 * Clamp values requested for a scheduling entity.
-	 * Must be updated with task_rq_lock() held.
-	 */
-	struct uclamp_se		uclamp_req[UCLAMP_CNT];
-	/*
-	 * Effective clamp values used for a scheduling entity.
-	 * Must be updated with task_rq_lock() held.
-	 */
-	struct uclamp_se		uclamp[UCLAMP_CNT];
-#endif
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-	/* List of struct preempt_notifier: */
-	struct hlist_head		preempt_notifiers;
-#endif
-
-#ifdef CONFIG_BLK_DEV_IO_TRACE
-	unsigned int			btrace_seq;
-#endif
-
-	const cpumask_t			*cpus_ptr;
-	cpumask_t			*user_cpus_ptr;
-	cpumask_t			cpus_mask;
-#ifdef CONFIG_TASKS_RCU
-	unsigned long			rcu_tasks_nvcsw;
-	u8				rcu_tasks_holdout;
-	u8				rcu_tasks_idx;
-	int				rcu_tasks_idle_cpu;
-	struct list_head		rcu_tasks_holdout_list;
-#endif /* #ifdef CONFIG_TASKS_RCU */
-	struct sched_info		sched_info;
-
-#ifdef CONFIG_SMP
-	struct plist_node		pushable_tasks;
-	struct rb_node			pushable_dl_tasks;
-#endif
-	/* Per-thread vma caching: */
-	struct vmacache			vmacache;
-
-#ifdef SPLIT_RSS_COUNTING
-	struct task_rss_stat		rss_stat;
-#endif
-	struct restart_block		restart_block;
-	struct prev_cputime		prev_cputime;
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-	struct vtime			vtime;
-#endif
-#ifdef CONFIG_NO_HZ_FULL
-	atomic_t			tick_dep_mask;
-#endif
-	/* Empty if CONFIG_POSIX_CPUTIMERS=n */
-	struct posix_cputimers		posix_cputimers;
-
-#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
-	struct posix_cputimers_work	posix_cputimers_work;
-#endif
+/* Simple struct members: */
+#define DEF(type, name)			type name
 
-#ifdef CONFIG_SYSVIPC
-	struct sysv_sem			sysvsem;
-	struct sysv_shm			sysvshm;
-#endif
-	sigset_t			blocked;
-	sigset_t			real_blocked;
-	/* Restored if set_restore_sigmask() was used: */
-	sigset_t			saved_sigmask;
-	struct sigpending		pending;
-	kuid_t				loginuid;
-	struct seccomp			seccomp;
-	/* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */
-	spinlock_t			alloc_lock;
+/* Array members: */
+#define DEF_A(type, name, size)		type name size
 
-	/* Protection of the PI data structures: */
-	raw_spinlock_t			pi_lock;
-
-#ifdef CONFIG_RT_MUTEXES
-	/* PI waiters blocked on a rt_mutex held by this task: */
-	struct rb_root_cached		pi_waiters;
-#endif
-
-#ifdef CONFIG_DEBUG_MUTEXES
-	/* Mutex deadlock detection: */
-	struct mutex_waiter		*blocked_on;
-#endif
-	kernel_siginfo_t		*last_siginfo;
-#ifdef CONFIG_CPUSETS
-	/* Protected by ->alloc_lock: */
-	nodemask_t			mems_allowed;
-	/* Sequence number to catch updates: */
-	seqcount_spinlock_t		mems_allowed_seq;
-	int				cpuset_mem_spread_rotor;
-	int				cpuset_slab_spread_rotor;
-#endif
-	struct mutex			futex_exit_mutex;
-#ifdef CONFIG_PERF_EVENTS
-	struct perf_event_context	*perf_event_ctxp[perf_nr_task_contexts];
-	struct mutex			perf_event_mutex;
-	struct list_head		perf_event_list;
-#endif
-#ifdef CONFIG_RSEQ
-	struct rseq __user *rseq;
-#endif
-	struct tlbflush_unmap_batch	tlb_ubc;
-
-	refcount_t			rcu_users;
-	struct rcu_head			rcu;
-
-	struct page_frag		task_frag;
-
-#ifdef CONFIG_KCSAN
-	struct kcsan_ctx		kcsan_ctx;
-#ifdef CONFIG_TRACE_IRQFLAGS
-	struct irqtrace_events		kcsan_save_irqtrace;
-#endif
-#endif
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-
-	/*
-	 * Number of functions that haven't been traced
-	 * because of depth overrun:
-	 */
-	atomic_t			trace_overrun;
-
-	/* Pause tracing: */
-	atomic_t			tracing_graph_pause;
-#endif
-#ifdef CONFIG_KMAP_LOCAL
-	struct kmap_ctrl		kmap_ctrl;
-#endif
-	int				pagefault_disabled;
-#ifdef CONFIG_VMAP_STACK
-	struct vm_struct		*stack_vm_area;
-#endif
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-	/* A live task holds one reference: */
-	refcount_t			stack_refcount;
-#endif
-#ifdef CONFIG_KRETPROBES
-	struct llist_head               kretprobe_instances;
-#endif
+struct task_struct_per_task {
+#include "per_task_area_struct_template.h"
+};
 
-	/* CPU-specific state of this task: */
-	struct thread_struct		thread;
+#undef DEF_A
+#undef DEF
 
-	char				_end;
-};
diff --git a/kernel/sched/per_task_area_struct_defs.h b/kernel/sched/per_task_area_struct_defs.h
index 71f2a2884958..1d9b2e039880 100644
--- a/kernel/sched/per_task_area_struct_defs.h
+++ b/kernel/sched/per_task_area_struct_defs.h
@@ -4,162 +4,17 @@
 
 #include <linux/kbuild.h>
 
-#define DEF_PER_TASK(name) DEFINE(PER_TASK_OFFSET__##name, offsetof(struct task_struct_per_task, name))
+#define DEF_PER_TASK(name)		DEFINE(PER_TASK_OFFSET__##name, offsetof(struct task_struct_per_task, name))
 
-void __used per_task_common(void)
-{
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-	DEF_PER_TASK(ti);
-#endif
-	DEF_PER_TASK(stack);
-	DEF_PER_TASK(usage);
-	DEF_PER_TASK(flags);
-	DEF_PER_TASK(ptrace);
-
-#ifdef CONFIG_SMP
-	DEF_PER_TASK(on_cpu);
-	DEF_PER_TASK(wake_entry);
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-	DEF_PER_TASK(cpu);
-#endif
-	DEF_PER_TASK(wakee_flips);
-	DEF_PER_TASK(wakee_flip_decay_ts);
-	DEF_PER_TASK(last_wakee);
-	DEF_PER_TASK(recent_used_cpu);
-	DEF_PER_TASK(wake_cpu);
-#endif
-	DEF_PER_TASK(on_rq);
-	DEF_PER_TASK(sched_class);
-	DEF_PER_TASK(se);
-	DEF_PER_TASK(rt);
-	DEF_PER_TASK(dl);
-
-#ifdef CONFIG_SCHED_CORE
-	DEF_PER_TASK(core_node);
-	DEF_PER_TASK(core_cookie);
-	DEF_PER_TASK(core_occupation);
-#endif
-
-#ifdef CONFIG_CGROUP_SCHED
-	DEF_PER_TASK(sched_task_group);
-#endif
-
-#ifdef CONFIG_UCLAMP_TASK
-	DEF_PER_TASK(uclamp_req);
-	DEF_PER_TASK(uclamp);
-#endif
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-	DEF_PER_TASK(preempt_notifiers);
-#endif
-
-#ifdef CONFIG_BLK_DEV_IO_TRACE
-	DEF_PER_TASK(btrace_seq);
-#endif
-
-	DEF_PER_TASK(cpus_ptr);
-	DEF_PER_TASK(user_cpus_ptr);
-	DEF_PER_TASK(cpus_mask);
-#ifdef CONFIG_TASKS_RCU
-	DEF_PER_TASK(rcu_tasks_nvcsw);
-	DEF_PER_TASK(rcu_tasks_holdout);
-	DEF_PER_TASK(rcu_tasks_idx);
-	DEF_PER_TASK(rcu_tasks_idle_cpu);
-	DEF_PER_TASK(rcu_tasks_holdout_list);
-#endif
-	DEF_PER_TASK(sched_info);
-
-#ifdef CONFIG_SMP
-	DEF_PER_TASK(pushable_tasks);
-	DEF_PER_TASK(pushable_dl_tasks);
-#endif
-	DEF_PER_TASK(vmacache);
+#define DEF(type, name)			DEF_PER_TASK(name)
+#define DEF_A(type, name, size)		DEF_PER_TASK(name)
 
-#ifdef SPLIT_RSS_COUNTING
-	DEF_PER_TASK(rss_stat);
-#endif
-	DEF_PER_TASK(restart_block);
-	DEF_PER_TASK(prev_cputime);
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-	DEF_PER_TASK(vtime);
-#endif
-#ifdef CONFIG_NO_HZ_FULL
-	DEF_PER_TASK(tick_dep_mask);
-#endif
-	DEF_PER_TASK(posix_cputimers);
 
-#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
-	DEF_PER_TASK(posix_cputimers_work);
-#endif
-
-#ifdef CONFIG_SYSVIPC
-	DEF_PER_TASK(sysvsem);
-	DEF_PER_TASK(sysvshm);
-#endif
-	DEF_PER_TASK(blocked);
-	DEF_PER_TASK(real_blocked);
-	DEF_PER_TASK(saved_sigmask);
-	DEF_PER_TASK(pending);
-	DEF_PER_TASK(loginuid);
-	DEF_PER_TASK(seccomp);
-	DEF_PER_TASK(alloc_lock);
-
-	DEF_PER_TASK(pi_lock);
-
-#ifdef CONFIG_RT_MUTEXES
-	DEF_PER_TASK(pi_waiters);
-#endif
-
-#ifdef CONFIG_DEBUG_MUTEXES
-	DEF_PER_TASK(blocked_on);
-#endif
-	DEF_PER_TASK(last_siginfo);
-#ifdef CONFIG_CPUSETS
-	DEF_PER_TASK(mems_allowed);
-	DEF_PER_TASK(mems_allowed_seq);
-	DEF_PER_TASK(cpuset_mem_spread_rotor);
-	DEF_PER_TASK(cpuset_slab_spread_rotor);
-#endif
-	DEF_PER_TASK(futex_exit_mutex);
-#ifdef CONFIG_PERF_EVENTS
-	DEF_PER_TASK(perf_event_ctxp);
-	DEF_PER_TASK(perf_event_mutex);
-	DEF_PER_TASK(perf_event_list);
-#endif
-#ifdef CONFIG_RSEQ
-	DEF_PER_TASK(rseq);
-#endif
-	DEF_PER_TASK(tlb_ubc);
-
-	DEF_PER_TASK(rcu_users);
-	DEF_PER_TASK(rcu);
-
-	DEF_PER_TASK(task_frag);
+void __used per_task_common(void)
+{
+#include "per_task_area_struct_template.h"
+}
 
-#ifdef CONFIG_KCSAN
-	DEF_PER_TASK(kcsan_ctx);
-#ifdef CONFIG_TRACE_IRQFLAGS
-	DEF_PER_TASK(kcsan_save_irqtrace);
-#endif
-#endif
+#undef DEF_A
+#undef DEF
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	DEF_PER_TASK(trace_overrun);
-	DEF_PER_TASK(tracing_graph_pause);
-#endif
-#ifdef CONFIG_KMAP_LOCAL
-	DEF_PER_TASK(kmap_ctrl);
-#endif
-	DEF_PER_TASK(pagefault_disabled);
-#ifdef CONFIG_VMAP_STACK
-	DEF_PER_TASK(stack_vm_area);
-#endif
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-	DEF_PER_TASK(stack_refcount);
-#endif
-#ifdef CONFIG_KRETPROBES
-	DEF_PER_TASK(kretprobe_instances);
-#endif
-	DEF_PER_TASK(thread);
-	DEF_PER_TASK(_end);
-}
diff --git a/kernel/sched/per_task_area_struct_template.h b/kernel/sched/per_task_area_struct_template.h
new file mode 100644
index 000000000000..ed2ccd80c83c
--- /dev/null
+++ b/kernel/sched/per_task_area_struct_template.h
@@ -0,0 +1,198 @@
+
+/*
+ * This is the primary definition of per_task() fields,
+ * which gets turned into the 'struct task_struct_per_task'
+ * structure definition, and into offset definitions,
+ * in per_task_area_struct.h and per_task_area_struct_defs.h:
+ */
+
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+	/*
+	 * For reasons of header soup (see current_thread_info()), this
+	 * must be the first element of task_struct.
+	 */
+	DEF(	struct thread_info,		ti						);
+#endif
+	DEF(	void *,				stack						);
+	DEF(	refcount_t,			usage						);
+
+	/* Per task flags (PF_*), defined further below: */
+	DEF(	unsigned int,			flags						);
+	DEF(	unsigned int,			ptrace						);
+
+#ifdef CONFIG_SMP
+	DEF(	int,				on_cpu						);
+	DEF(	struct __call_single_node,	wake_entry					);
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+	/* Current CPU: */
+	DEF(	unsigned int,			cpu						);
+#endif
+	DEF(	unsigned int,			wakee_flips					);
+	DEF(	unsigned long,			wakee_flip_decay_ts				);
+	DEF(	struct task_struct *,		last_wakee					);
+	DEF(	int,				recent_used_cpu					);
+	DEF(	int,				wake_cpu					);
+#endif
+	DEF(	int,				on_rq						);
+	DEF(	struct sched_class *,		sched_class					);
+	DEF(	struct sched_entity,		se						);
+	DEF(	struct sched_rt_entity,		rt						);
+	DEF(	struct sched_dl_entity,		dl						);
+
+#ifdef CONFIG_SCHED_CORE
+	DEF(	struct rb_node,			core_node					);
+	DEF(	unsigned long,			core_cookie					);
+	DEF(	unsigned int,			core_occupation					);
+#endif
+
+#ifdef CONFIG_CGROUP_SCHED
+	DEF(	struct task_group *,		sched_task_group				);
+#endif
+
+#ifdef CONFIG_UCLAMP_TASK
+	/*
+	 * Clamp values requested for a scheduling entity.
+	 * Must be updated with task_rq_lock() held.
+	 */
+	DEF_A(	struct uclamp_se,		uclamp_req, [UCLAMP_CNT]			);
+	/*
+	 * Effective clamp values used for a scheduling entity.
+	 * Must be updated with task_rq_lock() held.
+	 */
+	DEF_A(	struct uclamp_se,		uclamp, [UCLAMP_CNT]				);
+#endif
+
+#ifdef CONFIG_PREEMPT_NOTIFIERS
+	/* List of struct preempt_notifier: */
+	DEF(	struct hlist_head,		preempt_notifiers				);
+#endif
+
+#ifdef CONFIG_BLK_DEV_IO_TRACE
+	DEF(	unsigned int,			btrace_seq					);
+#endif
+
+	DEF(	const cpumask_t *,		cpus_ptr					);
+	DEF(	cpumask_t *,			user_cpus_ptr					);
+	DEF(	cpumask_t,			cpus_mask					);
+#ifdef CONFIG_TASKS_RCU
+	DEF(	unsigned long,			rcu_tasks_nvcsw					);
+	DEF(	u8,				rcu_tasks_holdout				);
+	DEF(	u8,				rcu_tasks_idx					);
+	DEF(	int,				rcu_tasks_idle_cpu				);
+	DEF(	struct list_head,		rcu_tasks_holdout_list				);
+#endif /* #ifdef CONFIG_TASKS_RCU */
+	DEF(	struct sched_info,		sched_info					);
+
+#ifdef CONFIG_SMP
+	DEF(	struct plist_node,		pushable_tasks					);
+	DEF(	struct rb_node,			pushable_dl_tasks				);
+#endif
+	/* Per-thread vma caching: */
+	DEF(	struct vmacache,		vmacache					);
+
+#ifdef SPLIT_RSS_COUNTING
+	DEF(	struct task_rss_stat,		rss_stat					);
+#endif
+	DEF(	struct restart_block,		restart_block					);
+	DEF(	struct prev_cputime,		prev_cputime					);
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+	DEF(	struct vtime,			vtime						);
+#endif
+#ifdef CONFIG_NO_HZ_FULL
+	DEF(	atomic_t,			tick_dep_mask					);
+#endif
+	/* Empty if CONFIG_POSIX_CPUTIMERS=n */
+	DEF(	struct posix_cputimers,		posix_cputimers					);
+
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+	DEF(	struct posix_cputimers_work,	posix_cputimers_work				);
+#endif
+
+#ifdef CONFIG_SYSVIPC
+	DEF(	struct sysv_sem,		sysvsem						);
+	DEF(	struct sysv_shm,		sysvshm						);
+#endif
+	DEF(	sigset_t,			blocked						);
+	DEF(	sigset_t,			real_blocked					);
+	/* Restored if set_restore_sigmask() was used: */
+	DEF(	sigset_t,			saved_sigmask					);
+	DEF(	struct sigpending,		pending						);
+	DEF(	kuid_t,				loginuid					);
+	DEF(	struct seccomp,			seccomp						);
+	/* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */
+	DEF(	spinlock_t,			alloc_lock					);
+
+	/* Protection of the PI data structures: */
+	DEF(	raw_spinlock_t,			pi_lock						);
+
+#ifdef CONFIG_RT_MUTEXES
+	/* PI waiters blocked on a rt_mutex held by this task: */
+	DEF(	struct rb_root_cached,		pi_waiters					);
+#endif
+
+#ifdef CONFIG_DEBUG_MUTEXES
+	/* Mutex deadlock detection: */
+	DEF(	struct mutex_waiter *,		blocked_on					);
+#endif
+	DEF(	kernel_siginfo_t *,		last_siginfo					);
+#ifdef CONFIG_CPUSETS
+	/* Protected by ->alloc_lock: */
+	DEF(	nodemask_t,			mems_allowed					);
+	/* Sequence number to catch updates: */
+	DEF(	seqcount_spinlock_t,		mems_allowed_seq				);
+	DEF(	int,				cpuset_mem_spread_rotor				);
+	DEF(	int,				cpuset_slab_spread_rotor			);
+#endif
+	DEF(	struct mutex,			futex_exit_mutex				);
+#ifdef CONFIG_PERF_EVENTS
+	DEF_A(	struct perf_event_context *,	perf_event_ctxp, [perf_nr_task_contexts]	);
+	DEF(	struct mutex,			perf_event_mutex				);
+	DEF(	struct list_head,		perf_event_list					);
+#endif
+#ifdef CONFIG_RSEQ
+	DEF(	struct rseq __user *,		rseq						);
+#endif
+	DEF(	struct tlbflush_unmap_batch,	tlb_ubc						);
+
+	DEF(	refcount_t,			rcu_users					);
+	DEF(	struct rcu_head,		rcu						);
+
+	DEF(	struct page_frag,		task_frag					);
+
+#ifdef CONFIG_KCSAN
+	DEF(	struct kcsan_ctx,		kcsan_ctx					);
+#ifdef CONFIG_TRACE_IRQFLAGS
+	DEF(	struct irqtrace_events,		kcsan_save_irqtrace				);
+#endif
+#endif
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+
+	/*
+	 * Number of functions that haven't been traced
+	 * because of depth overrun:
+	 */
+	DEF(	atomic_t,			trace_overrun					);
+
+	/* Pause tracing: */
+	DEF(	atomic_t,			tracing_graph_pause				);
+#endif
+#ifdef CONFIG_KMAP_LOCAL
+	DEF(	struct kmap_ctrl,		kmap_ctrl					);
+#endif
+	DEF(	int,				pagefault_disabled				);
+#ifdef CONFIG_VMAP_STACK
+	DEF(	struct vm_struct *,		stack_vm_area					);
+#endif
+#ifdef CONFIG_THREAD_INFO_IN_TASK
+	/* A live task holds one reference: */
+	DEF(	refcount_t,			stack_refcount					);
+#endif
+#ifdef CONFIG_KRETPROBES
+	DEF(	struct llist_head,		kretprobe_instances				);
+#endif
+
+	/* CPU-specific state of this task: */
+	DEF(	struct thread_struct,		thread						);
+
+	DEF(	char,				_end						);

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant
  2022-01-03 11:12   ` Ingo Molnar
  2022-01-03 13:46     ` Greg Kroah-Hartman
@ 2022-01-04 14:10     ` Ingo Molnar
  2022-01-04 15:14       ` Andy Shevchenko
  2022-01-04 17:51     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
  2 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 14:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro


* Ingo Molnar <mingo@kernel.org> wrote:

> There's one thing ugly about it, the fixed PER_TASK_BYTES limit, I plan 
> to make ->per_task_array[] the last field of task_struct, i.e. change it 
> to:
> 
>         u8                              per_task_area[];
> 
> This actually became possible through the fixing of the x86 FPU code in the 
> following fast-headers commit:
> 
>    4ae0f28bc1c8 headers/deps: x86/fpu: Make task_struct::thread constant size

So I implemented this approach - the patch below removes the PER_TASK_BYTES 
hard-coded limit.

( Didn't make it variable size via per_task_area[] though - we *do* know 
  its size after all at build time already, and known-size structures are 
  better in general than tail-variable-array solutions:

   - They work better with static checkers,
   - and we actually want the offsets into thread_info to be small on embedded platforms

  etc. )

Thanks,

	Ingo

============================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 4 Jan 2022 13:48:05 +0100
Subject: [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant

- Also remove the unnecessary <linux/sched/per_task_types.h> header.

Not-Signed-off-by-yet: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/per_task.h       | 3 ++-
 include/linux/sched/per_task_types.h | 7 -------
 kernel/sched/core.c                  | 4 ++++
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/per_task.h b/include/linux/sched/per_task.h
index e20837e82681..a10538713a26 100644
--- a/include/linux/sched/per_task.h
+++ b/include/linux/sched/per_task.h
@@ -37,7 +37,6 @@
  * A build-time check ensures that we haven't run out of available space.
  */
 
-#include <linux/sched/per_task_types.h>
 #include <linux/compiler.h>
 
 #ifndef __PER_TASK_GEN
@@ -61,4 +60,6 @@
 
 #define per_task_container_of(var, name)	container_of((void *)(var) - per_task_offset(name), struct task_struct, per_task_area[0])
 
+#define PER_TASK_BYTES				(per_task_offset(_end))
+
 #endif /* _LINUX_SCHED_PER_TASK_H */
diff --git a/include/linux/sched/per_task_types.h b/include/linux/sched/per_task_types.h
deleted file mode 100644
index 8af8c10f8dae..000000000000
--- a/include/linux/sched/per_task_types.h
+++ /dev/null
@@ -1,7 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_SCHED_PER_TASK_TYPES_H
-#define _LINUX_SCHED_PER_TASK_TYPES_H
-
-#define PER_TASK_BYTES 8192
-
-#endif /* _LINUX_SCHED_PER_TASK_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bc38b19f6398..fdb5b99ae6e0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -89,6 +89,8 @@
 #include "../../fs/io-wq.h"
 #include "../smpboot.h"
 
+#include "../../../kernel/sched/per_task_area_struct.h"
+
 DEFINE_PER_TASK(unsigned int,				flags);
 
 #ifdef CONFIG_THREAD_INFO_IN_TASK
@@ -9481,6 +9483,8 @@ void __init per_task_init(void)
 {
 	unsigned long per_task_bytes = per_task_offset(_end);
 
+	printk("per_task: sizeof(struct task_struct):          %ld bytes\n", sizeof(struct task_struct));
+	printk("per_task: sizeof(struct task_struct_per_task): %ld bytes\n", sizeof(struct task_struct_per_task));
 	printk("per_task: Using %ld per_task bytes, %ld bytes available\n", per_task_bytes, (long)PER_TASK_BYTES);
 
 	BUG_ON(per_task_offset(_end) > PER_TASK_BYTES);

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
  2022-01-04 11:02     ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
@ 2022-01-04 15:05       ` kernel test robot
  2022-01-04 17:51       ` Nathan Chancellor
  1 sibling, 0 replies; 54+ messages in thread
From: kernel test robot @ 2022-01-04 15:05 UTC (permalink / raw)
  To: Ingo Molnar, Nathan Chancellor, Al Viro, Linus Torvalds,
	Andrew Morton
  Cc: llvm, kbuild-all, LKML, Linux Memory Management List, linux-arch,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman

Hi Ingo,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.16-rc8 next-20211224]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ingo-Molnar/headers-deps-dcache-Move-the-____cacheline_aligned-attribute-to-the-head-of-the-definition/20220104-190351
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c9e6606c7fe92b50a02ce51dda82586ebdf99b48
config: arm64-buildonly-randconfig-r004-20220104 (https://download.01.org/0day-ci/archive/20220104/202201042231.vdt1cNrS-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b50fea47b6c454581fce89af359f3afe5154986c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/a9357af49d3cae2b1b4b8bbb7f1adf9ed381bf46
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ingo-Molnar/headers-deps-dcache-Move-the-____cacheline_aligned-attribute-to-the-head-of-the-definition/20220104-190351
        git checkout a9357af49d3cae2b1b4b8bbb7f1adf9ed381bf46
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/phy/amlogic/ lib/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from arch/arm64/kernel/asm-offsets.c:10:
   In file included from include/linux/arm_sdei.h:8:
   In file included from include/acpi/ghes.h:5:
   In file included from include/acpi/apei.h:9:
   In file included from include/linux/acpi.h:15:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   1 warning generated.
--
   In file included from drivers/phy/amlogic/phy-meson-g12a-usb2.c:16:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   drivers/phy/amlogic/phy-meson-g12a-usb2.c:311:17: warning: cast to smaller integer type 'enum meson_soc_id' from 'const void *' [-Wvoid-pointer-to-enum-cast]
           priv->soc_id = (enum meson_soc_id)of_device_get_match_data(&pdev->dev);
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from lib/radix-tree.c:15:
   In file included from include/linux/cpu.h:17:
   In file included from include/linux/node.h:18:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/radix-tree.c:288:6: warning: no previous prototype for function 'radix_tree_node_rcu_free' [-Wmissing-prototypes]
   void radix_tree_node_rcu_free(struct rcu_head *head)
        ^
   lib/radix-tree.c:288:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void radix_tree_node_rcu_free(struct rcu_head *head)
   ^
   static 
   2 warnings generated.
--
   In file included from lib/test_bitops.c:9:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   1 error generated.
--
   In file included from lib/test_ida.c:10:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/test_ida.c:16:6: warning: no previous prototype for function 'ida_dump' [-Wmissing-prototypes]
   void ida_dump(struct ida *ida) { }
        ^
   lib/test_ida.c:16:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void ida_dump(struct ida *ida) { }
   ^
   static 
   2 warnings generated.
--
   In file included from lib/test_printf.c:10:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/test_printf.c:157:52: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
           test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1);
                                  ~~~~                       ^
                                  %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:157:55: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
           test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1);
                                       ~~~~                     ^
                                       %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:157:58: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
           test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1);
                                            ~~~~                   ^~~
                                            %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:157:63: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
           test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1);
                                                 ~~~~                   ^~~
                                                 %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:157:68: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
           test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1);
                                                      ~~~~                   ^~
                                                      %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:158:52: warning: format specifies type 'char' but the argument has type 'int' [-Wformat]
           test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1);
                                  ~~~~                       ^
                                  %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:158:55: warning: format specifies type 'char' but the argument has type 'int' [-Wformat]
           test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1);
                                       ~~~~                     ^
                                       %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:158:58: warning: format specifies type 'char' but the argument has type 'int' [-Wformat]
           test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1);
                                            ~~~~                   ^~~
                                            %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:158:63: warning: format specifies type 'char' but the argument has type 'int' [-Wformat]
           test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1);
                                                 ~~~~                   ^~~
                                                 %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:158:68: warning: format specifies type 'char' but the argument has type 'int' [-Wformat]
           test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1);
                                                      ~~~~                   ^~
                                                      %d
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:159:41: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat]
           test("2015122420151225", "%ho%ho%#ho", 1037, 5282, -11627);
                                     ~~~          ^~~~
                                     %o
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:159:47: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat]
           test("2015122420151225", "%ho%ho%#ho", 1037, 5282, -11627);
                                        ~~~             ^~~~
                                        %o
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   lib/test_printf.c:159:53: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat]
           test("2015122420151225", "%ho%ho%#ho", 1037, 5282, -11627);
                                           ~~~~               ^~~~~~
                                           %#o
   lib/test_printf.c:137:40: note: expanded from macro 'test'
           __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
                                          ~~~    ^~~~~~~~~~~
   14 warnings generated.
--
   In file included from lib/crc32test.c:28:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/crc32test.c:674:13: warning: variable 'crc' set but not used [-Wunused-but-set-variable]
           static u32 crc;
                      ^
   lib/crc32test.c:754:13: warning: variable 'crc' set but not used [-Wunused-but-set-variable]
           static u32 crc;
                      ^
   3 warnings generated.
--
   In file included from lib/test_rhashtable.c:17:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/test_rhashtable.c:451:18: warning: variable 'insert_retries' set but not used [-Wunused-but-set-variable]
           unsigned int i, insert_retries = 0;
                           ^
   2 warnings generated.
--
   In file included from lib/devmem_is_allowed.c:11:
   In file included from include/linux/mm.h:717:
   In file included from include/linux/huge_mm.h:8:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/devmem_is_allowed.c:20:5: warning: no previous prototype for function 'devmem_is_allowed' [-Wmissing-prototypes]
   int devmem_is_allowed(unsigned long pfn)
       ^
   lib/devmem_is_allowed.c:20:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int devmem_is_allowed(unsigned long pfn)
   ^
   static 
   2 warnings generated.
--
   In file included from lib/lz4/lz4_decompress.c:39:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   lib/lz4/lz4_decompress.c:506:5: warning: no previous prototype for function 'LZ4_decompress_safe_forceExtDict' [-Wmissing-prototypes]
   int LZ4_decompress_safe_forceExtDict(const char *source, char *dest,
       ^
   lib/lz4/lz4_decompress.c:506:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int LZ4_decompress_safe_forceExtDict(const char *source, char *dest,
   ^
   static 
   2 warnings generated.
--
   In file included from arch/arm64/kernel/asm-offsets.c:10:
   In file included from include/linux/arm_sdei.h:8:
   In file included from include/acpi/ghes.h:5:
   In file included from include/acpi/apei.h:9:
   In file included from include/linux/acpi.h:15:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/arm64/include/asm/elf.h:141:
   In file included from include/linux/fs.h:8:
>> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
   ____cacheline_aligned
   ^
   include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
   #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
                                                ^
   1 warning generated.
   arch/arm64/kernel/vdso/vgettimeofday.c:9:5: warning: no previous prototype for function '__kernel_clock_gettime' [-Wmissing-prototypes]
   int __kernel_clock_gettime(clockid_t clock,
       ^
   arch/arm64/kernel/vdso/vgettimeofday.c:9:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __kernel_clock_gettime(clockid_t clock,
   ^
   static 
   arch/arm64/kernel/vdso/vgettimeofday.c:15:5: warning: no previous prototype for function '__kernel_gettimeofday' [-Wmissing-prototypes]
   int __kernel_gettimeofday(struct __kernel_old_timeval *tv,
       ^
   arch/arm64/kernel/vdso/vgettimeofday.c:15:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __kernel_gettimeofday(struct __kernel_old_timeval *tv,
   ^
   static 
   arch/arm64/kernel/vdso/vgettimeofday.c:21:5: warning: no previous prototype for function '__kernel_clock_getres' [-Wmissing-prototypes]
   int __kernel_clock_getres(clockid_t clock_id,
       ^
   arch/arm64/kernel/vdso/vgettimeofday.c:21:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __kernel_clock_getres(clockid_t clock_id,
   ^
   static 
   3 warnings generated.


vim +137 include/linux/dcache.h

   136	
 > 137	____cacheline_aligned
   138	struct dentry_operations {
   139		int (*d_revalidate)(struct dentry *, unsigned int);
   140		int (*d_weak_revalidate)(struct dentry *, unsigned int);
   141		int (*d_hash)(const struct dentry *, struct qstr *);
   142		int (*d_compare)(const struct dentry *,
   143				unsigned int, const char *, const struct qstr *);
   144		int (*d_delete)(const struct dentry *);
   145		int (*d_init)(struct dentry *);
   146		void (*d_release)(struct dentry *);
   147		void (*d_prune)(struct dentry *);
   148		void (*d_iput)(struct dentry *, struct inode *);
   149		char *(*d_dname)(struct dentry *, char *, int);
   150		struct vfsmount *(*d_automount)(struct path *);
   151		int (*d_manage)(const struct path *, bool);
   152		struct dentry *(*d_real)(struct dentry *, const struct inode *);
   153	};
   154	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()
  2022-01-04 13:54       ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar
@ 2022-01-04 15:09         ` Greg Kroah-Hartman
  2022-01-04 15:14           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 54+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-04 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro

On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote:
> 
> * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> 
> > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote:
> > > There's one happy exception though, all the uninlining patches that 
> > > uninline a single-call function are probably fine as-is:
> > 
> > <snip>
> > 
> > >  3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()
> > 
> > Let me go take this right now, no need for this to wait, it should be
> > out of kobject.h as you rightfully show there is only one user.
> 
> Sure - here you go!

I just picked it out of your git tree already :)

Along those lines, any objection to me taking at least one other one?
3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and
6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h>
dependencies, remove <linux/device.h>") look like I can take now into my
USB tree with no problems.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant
  2022-01-04 14:10     ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar
@ 2022-01-04 15:14       ` Andy Shevchenko
  2022-01-04 23:27         ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Andy Shevchenko @ 2022-01-04 15:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg Kroah-Hartman, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro

On Tue, Jan 04, 2022 at 03:10:51PM +0100, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:

> +++ b/kernel/sched/core.c

> +#include "../../../kernel/sched/per_task_area_struct.h"

#include "per_task_area_struct.h" ?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()
  2022-01-04 15:09         ` Greg Kroah-Hartman
@ 2022-01-04 15:14           ` Greg Kroah-Hartman
  2022-01-05  0:11             ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-04 15:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro

On Tue, Jan 04, 2022 at 04:09:57PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote:
> > 
> > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > 
> > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote:
> > > > There's one happy exception though, all the uninlining patches that 
> > > > uninline a single-call function are probably fine as-is:
> > > 
> > > <snip>
> > > 
> > > >  3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()
> > > 
> > > Let me go take this right now, no need for this to wait, it should be
> > > out of kobject.h as you rightfully show there is only one user.
> > 
> > Sure - here you go!
> 
> I just picked it out of your git tree already :)
> 
> Along those lines, any objection to me taking at least one other one?
> 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and
> 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h>
> dependencies, remove <linux/device.h>") look like I can take now into my
> USB tree with no problems.

Also these look good to go now:
	bae9ddd98195 ("headers/prep: Fix non-standard header section: drivers/usb/cdns3/core.h")
	c027175b37e5 ("headers/prep: Fix non-standard header section: drivers/usb/host/ohci-tmio.c")


thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
                   ` (3 preceding siblings ...)
  2022-01-04 12:36 ` Willy Tarreau
@ 2022-01-04 16:05 ` Andy Shevchenko
  2022-01-04 16:18 ` Andy Shevchenko
  2022-01-15  0:42 ` Paul E. McKenney
  6 siblings, 0 replies; 54+ messages in thread
From: Andy Shevchenko @ 2022-01-04 16:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> 
> I'm pleased to announce the first public version of my new "Fast Kernel 
> Headers" project that I've been working on since late 2020, which is a 
> comprehensive rework of the Linux kernel's header hierarchy & header 
> dependencies, with the dual goals of:
> 
>  - speeding up the kernel build (both absolute and incremental build times)
> 
>  - decoupling subsystem type & API definitions from each other
> 
> The fast-headers tree consists of over 25 sub-trees internally, spanning 
> over 2,200 commits, which can be found here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
> 
> As most kernel developers know, there's around ~10,000 main .h headers in 
> the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the 
> last 30+ years they have grown into a complicated & painful set of 
> cross-dependencies we are affectionately calling 'Dependency Hell'.

In the 64e013748e61 ("headers/deps: Optimize <linux/kernel.h>")
the linux/container_of.h and linux/stdarg.h are moved around (in the
linux/kernel.h) without any explanation in the commit message. Is it
necessary? If so, can you add a background note.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
                   ` (4 preceding siblings ...)
  2022-01-04 16:05 ` Andy Shevchenko
@ 2022-01-04 16:18 ` Andy Shevchenko
  2022-01-15  0:42 ` Paul E. McKenney
  6 siblings, 0 replies; 54+ messages in thread
From: Andy Shevchenko @ 2022-01-04 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> 
> I'm pleased to announce the first public version of my new "Fast Kernel 
> Headers" project that I've been working on since late 2020, which is a 
> comprehensive rework of the Linux kernel's header hierarchy & header 
> dependencies, with the dual goals of:
> 
>  - speeding up the kernel build (both absolute and incremental build times)
> 
>  - decoupling subsystem type & API definitions from each other
> 
> The fast-headers tree consists of over 25 sub-trees internally, spanning 
> over 2,200 commits, which can be found here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
> 
> As most kernel developers know, there's around ~10,000 main .h headers in 
> the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the 
> last 30+ years they have grown into a complicated & painful set of 
> cross-dependencies we are affectionately calling 'Dependency Hell'.

$ git grep -n -w kernel.h mingo/sched/headers -- include/ | wc -l
138
$ git grep -n -w kernel.h next/master -- include/ | wc -l
96

Can we rather split kernel.h more? In some cases kernel.h is used just as a
bundle instead of ~2-3 headers.


And I can't get why kernel.h is returned in the drm headers. AFAICT there are no
dependencies:

mingo/sched/headers:include/drm/drm_gem_ttm_helper.h:6:#include <linux/kernel.h>
mingo/sched/headers:include/drm/drm_gem_vram_helper.h:15:#include <linux/kernel.h> /* for container_of() */
mingo/sched/headers:include/drm/drm_mm.h:44:#include <linux/kernel.h>
mingo/sched/headers:include/drm/drm_property.h:28:#include <linux/kernel.h>
mingo/sched/headers:include/drm/intel-gtt.h:9:#include <linux/kernel.h>

Ah, it may be due to base on the vanilla rather than on next, it would be
nice to see this rebased on top of v5.17-rc1 when it's out.


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 10:47   ` Ingo Molnar
                       ` (2 preceding siblings ...)
  2022-01-04 11:19     ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar
@ 2022-01-04 17:25     ` Nick Desaulniers
  2022-01-05  0:43       ` Ingo Molnar
  2022-01-04 17:50     ` Nathan Chancellor
  2022-01-07  0:29     ` Nathan Chancellor
  5 siblings, 1 reply; 54+ messages in thread
From: Nick Desaulniers @ 2022-01-04 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nathan Chancellor, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm, ashimida,
	Arnd Bergmann

On Tue, Jan 4, 2022 at 2:47 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Nathan Chancellor <nathan@kernel.org> wrote:
>
> > Hi Ingo,
> >
> > On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> > I took the series for a spin with clang and GCC on arm64 and x86_64 and
> > I found a few warnings/errors.
>
> Thank you!
>
> > 1. Position of certain attributes
> >
> > In some commits, you move the cacheline_aligned attributes from after
> > the closing brace on structures to before the struct keyword, which
> > causes clang to warn (and error with CONFIG_WERROR):
> >
> > In file included from arch/arm64/kernel/asm-offsets.c:9:
> > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33:
> > In file included from ./include/linux/perf_event_api.h:17:
> > In file included from ./include/linux/perf_event_types.h:41:
> > In file included from ./include/linux/ftrace.h:18:
> > In file included from ./arch/arm64/include/asm/ftrace.h:53:
> > In file included from ./include/linux/compat.h:11:
> > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
> > ____cacheline_aligned
> > ^
> > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
> > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
>
> Yeah, so this is a *really* stupid warning from Clang.
>
> Putting the attribute after 'struct' risks the hard to track down bugs when
> a <linux/cache.h> inclusion is missing, which scenario I pointed out in
> this commit:
>
>     headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
>
>     When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
>     which caused a couple of hundred of mysterious, somewhat obscure link time errors:
>
>       ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>       ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>       ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>       ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>
>     After a bit of head-scratching, what happened is that 'struct dentry_operations'
>     has the ____cacheline_aligned attribute at the tail of the type definition -
>     which turned into a local variable definition when <linux/cache.h> was not
>     included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.
>
>     There were no compile time errors, only link time errors.
>
>     Move the attribute to the head of the definition, in which case
>     a missing <linux/cache.h> inclusion creates an immediate build failure:
>
>       In file included from ./include/linux/fs.h:9,
>                        from ./include/linux/fsverity.h:14,
>                        from fs/verity/fsverity_private.h:18,
>                        from fs/verity/read_metadata.c:8:
>       ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
>         132 | ____cacheline_aligned
>             |                      ^
>             |                      ;
>         133 | struct dentry_operations {
>             | ~~~~~~
>
>     No change in functionality.
>
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
> Can this Clang warning be disabled?

Clang is warning that the attribute will be ignored because of that
positioning. If you disable the warning, code will probably stop
working as intended.  This warning has at least been helping us make
the kernel coding style more consistent.

This made me think of d5b421fe02827 ("docs: Explain the desired
position of function attributes"), where we adding some text to
Documentation/process/coding-style.rst about the positioning of
__attribute__'s in function signatures, but I guess this case is data.
We probably should add something to the coding style about attributes
on data, too.

The C standards body is also working on standardizing attributes; at
the least I expect some of these things to be ironed out more soon.

>
> > 2. Error with CONFIG_SHADOW_CALL_STACK
>
> So this feature depends on Clang:
>
>  # Supported by clang >= 7.0
>  config CC_HAVE_SHADOW_CALL_STACK
>          def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
>
> No way to activate it under my GCC cross-build toolchain, right?
>
> But ... I hacked the build mode on with GCC using this patch:

Dan Li is working on a GCC patch. If you're up for building GCC from source:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html

--

This is a really cool series Ingo.  I'm sure Arnd has seen it by now,
but Arnd has been thinking about this area a lot, too.  I haven't but
I have played with running "include what you use" on the kernel
sources; Kconfig being the biggest impediment to that approach.

To me, I'm most nervous about "backsliding;" let's say this work
lands, at some point probably years in the future, I assume without
any form of automation that we might find ourselves at a similar point
of header dependencies getting all tangled again.

What are your thoughts on where/how/what we could automate to try to
help developers in the future keep their header dependencies simpler?
(Sorry if this was already answered in the cover letter)

It would be really useful if you were planning a talk at something
like plumbers how you go about making these changes.  I really hope
once others understand your workflow that we might help with some form
of automation.  Nice work!
--
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 10:47   ` Ingo Molnar
                       ` (3 preceding siblings ...)
  2022-01-04 17:25     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers
@ 2022-01-04 17:50     ` Nathan Chancellor
  2022-01-05  0:35       ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar
                         ` (2 more replies)
  2022-01-07  0:29     ` Nathan Chancellor
  5 siblings, 3 replies; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-04 17:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote:
> 
> * Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > Hi Ingo,
> > 
> > On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> > > Before going into details about how this tree solves 'dependency hell' 
> > > exactly, here's the current kernel build performance gain with 
> > > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as 
> > > well - see below), using a stock x86 Linux distribution's .config with all 
> > > modules built into the vmlinux:
> > > 
> > >   #
> > >   # Performance counter stats for 'make -j96 vmlinux' (3 runs):
> > >   #
> > >   # (Elapsed time in seconds):
> > >   #
> > > 
> > >   v5.16-rc7:            231.34 +- 0.60 secs, 15.5 builds/hour    # [ vanilla baseline ]
> > >   -fast-headers-v1:     129.97 +- 0.51 secs, 27.7 builds/hour    # +78.0% improvement
> > 
> > This is really impressive; as someone who constantly builds large
> > kernels for test coverage, I am excited about less time to get results.
> > Testing on an 80-core arm64 server (the fastest machine I have access to
> > at the moment) with LLVM, I can see anywhere from 18% to 35% improvement.
> > 
> > 
> > Benchmark 1: ARCH=arm64 defconfig (linux)
> >   Time (mean ± σ):     97.159 s ±  0.246 s    [User: 4828.383 s, System: 611.256 s]
> >   Range (min … max):   96.900 s … 97.648 s    10 runs
> > 
> > Benchmark 2: ARCH=arm64 defconfig (linux-fast-headers)
> >   Time (mean ± σ):     76.300 s ±  0.107 s    [User: 3149.986 s, System: 436.487 s]
> >   Range (min … max):   76.117 s … 76.467 s    10 runs
> 
> That looks good, thanks for giving it a test, and thanks for all the fixes! 
> :-)
> 
> Note that on ARM64 the elapsed time improvement is 'only' 18-35%, because 
> the triple-linking of vmlinux serializes much of the of a build & ARM64 
> doesn't have the kallsyms-objtool feature yet.
> 
> But we can already see how much faster it became, from the user+system time 
> spent building the kernel:
> 
>            vanilla: 4828.383 s + 611.256 s = 5439.639 s
>   -fast-headers-v1: 3149.986 s + 436.487 s = 3586.473 s
> 
> That's a +51% speedup. :-)
D> 
> With CONFIG_KALLSYMS_FAST=y on x86, the final link gets faster by about 
> 60%-70%, so the header improvements will more directly show up in elapsed 
> time as well.
> 
> Plus I spent more time looking at x86 header bloat than at ARM64 header 
> bloat. In the end I think the improvement could probably moved into the 
> broad 60-70% range that I see on x86.
> 
> All the other ARM64 tests show a 37%-43% improvement in CPU time used:
> 
> > Benchmark 1: ARCH=arm64 allmodconfig (linux)
> >   Time (mean ± σ):     390.106 s ±  0.192 s    [User: 23893.382 s, System: 2802.413 s]
> >   Range (min … max):   389.942 s … 390.513 s    7 runs
> > 
> > Benchmark 2: ARCH=arm64 allmodconfig (linux-fast-headers)
> >   Time (mean ± σ):     288.066 s ±  0.621 s    [User: 16436.098 s, System: 2117.352 s]
> >   Range (min … max):   287.131 s … 288.982 s    7 runs
> 
> # (23893.382+2802.413)/(16436.098+2117.352) = +43% in throughput.
> 
> 
> > Benchmark 1: ARCH=arm64 allyesconfig (linux)
> >   Time (mean ± σ):     557.752 s ±  1.019 s    [User: 21227.404 s, System: 2226.121 s]
> >   Range (min … max):   555.833 s … 558.775 s    7 runs
> > 
> > Benchmark 2: ARCH=arm64 allyesconfig (linux-fast-headers)
> >   Time (mean ± σ):     473.815 s ±  1.793 s    [User: 15351.991 s, System: 1689.630 s]
> >   Range (min … max):   471.542 s … 476.830 s    7 runs
> 
> # (21227.404+2226.121)/(15351.991+1689.630) = +37%
> 
> 
> > Benchmark 1: ARCH=x86_64 defconfig (linux)
> >   Time (mean ± σ):     41.122 s ±  0.190 s    [User: 1700.206 s, System: 205.555 s]
> >   Range (min … max):   40.966 s … 41.515 s    7 runs
> > 
> > Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
> >   Time (mean ± σ):     36.357 s ±  0.183 s    [User: 1134.252 s, System: 152.396 s]
> >   Range (min … max):   35.983 s … 36.534 s    7 runs
> 
> 
> # (1700.206+205.555)/(1134.252+152.396) = +48%
> 
> > Summary
> >   'ARCH=x86_64 defconfig (linux-fast-headers)' ran
> >     1.13 ± 0.01 times faster than 'ARCH=x86_64 defconfig (linux)'
> 
> Now this x86-defconfig result you got is a bit weird - it *should* have 
> been around ~50% faster on x86 in terms of elapsed time too.
> 
> Here's how x86-64 defconfig looks like on my system - with 128 GB RAM & 
> fast NVDIMMs and 64 CPUs:
> 
>    #
>    # -v5.16-rc8:
>    #
> 
>    $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null
> 
>    Performance counter stats for 'make -j96 vmlinux' (3 runs):
> 
>    4,906,953,379,372      instructions              #    0.90  insn per cycle           ( +-  0.00% )
>    5,475,163,448,391      cycles                    #    3.898 GHz                      ( +-  0.01% )
>         1,404,614.64 msec cpu-clock                 #   45.864 CPUs utilized            ( +-  0.01% )
> 
>              30.6258 +- 0.0337 seconds time elapsed  ( +-  0.11% )
> 
>    #
>    # -fast-headers-v1:
>    #
> 
>    $ make defconfig
>    $ grep KALLSYMS_FAST .config
>    CONFIG_KALLSYMS_FAST=y
> 
>    $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null
> 
>     Performance counter stats for 'make -j96 vmlinux' (3 runs):
> 
>      3,500,079,269,120      instructions              #    0.90  insn per cycle           ( +-  0.00% )
>      3,872,081,278,824      cycles                    #    3.895 GHz                      ( +-  0.10% )
>             993,448.13 msec cpu-clock                 #   47.306 CPUs utilized            ( +-  0.10% )
> 
>              21.0004 +- 0.0265 seconds time elapsed  ( +-  0.13% )
> 
> That's a +45.8% speedup in elapsed time, and a +41.4% improvement in 
> cpu-clock utilization.
> 
> I'm wondering whether your system has some sort of bottleneck?

Yes, it is entirely possible. That testing was done on Equinix's
c3.large.arm server and I have noticed at times that single threaded
tasks seems to take a little bit longer than on my x86_64 box.

https://metal.equinix.com/product/servers/c3-large-arm/

The all{mod,yes}config tests on that box had a much more noticeable
improvement, along the lines of what you were expecting:


Benchmark 1: ARCH=x86_64 allmodconfig (linux)
  Time (mean ± σ):     387.575 s ±  0.288 s    [User: 23916.296 s, System: 2814.850 s]
  Range (min … max):   387.252 s … 388.295 s    10 runs

Benchmark 2: ARCH=x86_64 allmodconfig (linux-fast-headers)
  Time (mean ± σ):     255.934 s ±  0.972 s    [User: 15130.494 s, System: 2095.091 s]
  Range (min … max):   254.655 s … 257.357 s    10 runs

Summary
  'ARCH=x86_64 allmodconfig (linux-fast-headers)' ran
    1.51 ± 0.01 times faster than 'ARCH=x86_64 allmodconfig (linux)'

# (23916.296+2814.850)/(15130.494+2095.091) = +55.18%


Benchmark 1: ARCH=x86_64 allyesconfig (linux)
  Time (mean ± σ):     568.027 s ±  1.071 s    [User: 21985.096 s, System: 2357.516 s]
  Range (min … max):   566.769 s … 569.801 s    10 runs

Benchmark 2: ARCH=x86_64 allyesconfig (linux-fast-headers)
  Time (mean ± σ):     381.248 s ±  0.919 s    [User: 14916.766 s, System: 1728.218 s]
  Range (min … max):   379.746 s … 382.852 s    10 runs

Summary
  'ARCH=x86_64 allyesconfig (linux-fast-headers)' ran
    1.49 ± 0.00 times faster than 'ARCH=x86_64 allyesconfig (linux)'

# (21985.096+2357.516)/(14916.766+1728.218) = +46.25%

> One thing I do though when running benchmarks is to switch the cpufreq 
> governor to 'performance', via something like:
> 
>    NR_CPUS=$(nproc --all)
> 
>    curr=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
>    next=performance
> 
>    echo "# setting all $NR_CPUS CPUs from '"$curr"' to the '"$next"' governor"
> 
>    for ((cpu=0; cpu<$NR_CPUS; cpu++)); do
>      G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor
>      [ -f $G ] && echo $next > $G
>    done
> 
> This minimizes the amount of noise across iterations and makes the results 
> more dependable:
> 
>              30.6258 +- 0.0337 seconds time elapsed  ( +-  0.11% )
>              21.0004 +- 0.0265 seconds time elapsed  ( +-  0.13% )

Good point. With my main box (AMD EPYC 7502P), with the performance governor...

GCC:

Benchmark 1: ARCH=x86_64 defconfig (linux)
  Time (mean ± σ):     48.685 s ±  0.049 s    [User: 1969.835 s, System: 204.166 s]
  Range (min … max):   48.620 s … 48.782 s    10 runs

Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
  Time (mean ± σ):     46.797 s ±  0.119 s    [User: 1403.854 s, System: 154.336 s]
  Range (min … max):   46.620 s … 47.052 s    10 runs

Summary
  'ARCH=x86_64 defconfig (linux-fast-headers)' ran
    1.04 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)'

LLVM:

Benchmark 1: ARCH=x86_64 defconfig (linux)
  Time (mean ± σ):     51.816 s ±  0.079 s    [User: 2208.577 s, System: 200.410 s]
  Range (min … max):   51.671 s … 51.900 s    10 runs

Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
  Time (mean ± σ):     46.806 s ±  0.062 s    [User: 1438.972 s, System: 154.846 s]
  Range (min … max):   46.696 s … 46.917 s    10 runs

Summary
  'ARCH=x86_64 defconfig (linux-fast-headers)' ran
    1.11 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)'

$ rg KALLSYMS .config
246:CONFIG_KALLSYMS=y
247:# CONFIG_KALLSYMS_ALL is not set
248:CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
249:CONFIG_KALLSYMS_BASE_RELATIVE=y
250:CONFIG_KALLSYMS_FAST=y
706:CONFIG_HAVE_OBJTOOL_KALLSYMS=y

It seems like everything is working right but maybe the build is so
short that there just is not much time for the difference to be as
apparent?

> > > With the fast-headers kernel that's down to ~36,000 lines of code, 
> > > almost a factor of 3 reduction:
> > > 
> > >   # fast-headers-v1:
> > >   kepler:~/mingo.tip.git> wc -l kernel/pid.i
> > >   35941 kernel/pid.i
> > 
> > Coming from someone who often has to reduce a preprocessed kernel source 
> > file with creduce/cvise to report compiler bugs, this will be a very 
> > welcomed change, as those tools will have to do less work, and I can get 
> > my reports done faster.
> 
> That's nice, didn't think of that side effect.
> 
> Could you perhaps measure this too, to see how much of a benefit it is?

Yes, next time that I run into a bug that I have to use those tools on,
I will see if I can benchmark the difference!

> > ########################################################################
> > 
> > I took the series for a spin with clang and GCC on arm64 and x86_64 and
> > I found a few warnings/errors.
> 
> Thank you!
> 
> > 1. Position of certain attributes
> > 
> > In some commits, you move the cacheline_aligned attributes from after
> > the closing brace on structures to before the struct keyword, which
> > causes clang to warn (and error with CONFIG_WERROR):
> > 
> > In file included from arch/arm64/kernel/asm-offsets.c:9:
> > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33:
> > In file included from ./include/linux/perf_event_api.h:17:
> > In file included from ./include/linux/perf_event_types.h:41:
> > In file included from ./include/linux/ftrace.h:18:
> > In file included from ./arch/arm64/include/asm/ftrace.h:53:
> > In file included from ./include/linux/compat.h:11:
> > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
> > ____cacheline_aligned
> > ^
> > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
> > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
> 
> Yeah, so this is a *really* stupid warning from Clang.
> 
> Putting the attribute after 'struct' risks the hard to track down bugs when 
> a <linux/cache.h> inclusion is missing, which scenario I pointed out in 
> this commit:
> 
>     headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
>     
>     When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
>     which caused a couple of hundred of mysterious, somewhat obscure link time errors:
>     
>       ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>       ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>       ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>       ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>     
>     After a bit of head-scratching, what happened is that 'struct dentry_operations'
>     has the ____cacheline_aligned attribute at the tail of the type definition -
>     which turned into a local variable definition when <linux/cache.h> was not
>     included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.
>     
>     There were no compile time errors, only link time errors.
>     
>     Move the attribute to the head of the definition, in which case
>     a missing <linux/cache.h> inclusion creates an immediate build failure:
>     
>       In file included from ./include/linux/fs.h:9,
>                        from ./include/linux/fsverity.h:14,
>                        from fs/verity/fsverity_private.h:18,
>                        from fs/verity/read_metadata.c:8:
>       ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
>         132 | ____cacheline_aligned
>             |                      ^
>             |                      ;
>         133 | struct dentry_operations {
>             | ~~~~~~
>     
>     No change in functionality.
>     
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> Can this Clang warning be disabled?

I'll comment on this in the other thread.

> > 2. Error with CONFIG_SHADOW_CALL_STACK
> 
> So this feature depends on Clang:
> 
>  # Supported by clang >= 7.0
>  config CC_HAVE_SHADOW_CALL_STACK
>          def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
> 
> No way to activate it under my GCC cross-build toolchain, right?
> 
> But ... I hacked the build mode on with GCC using this patch:
> 
> From: Ingo Molnar <mingo@kernel.org>
> Date: Tue, 4 Jan 2022 11:26:09 +0100
> Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing
> 
> NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  Makefile           | 2 +-
>  arch/Kconfig       | 2 +-
>  arch/arm64/Kconfig | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 16d7f83ac368..bbab462e7509 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -888,7 +888,7 @@ LDFLAGS_vmlinux += --gc-sections
>  endif
>  
>  ifdef CONFIG_SHADOW_CALL_STACK
> -CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
> +CC_FLAGS_SCS	:=
>  KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
>  export CC_FLAGS_SCS
>  endif
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4e56f66fdbcf..2103d9da4fe1 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -605,7 +605,7 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK
>  
>  config SHADOW_CALL_STACK
>  	bool "Clang Shadow Call Stack"
> -	depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK
> +	depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
>  	depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER
>  	help
>  	  This option enables Clang's Shadow Call Stack, which uses a
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c4207cf9bb17..952f3e56e0a7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1183,7 +1183,7 @@ config ARCH_HAS_FILTER_PGPROT
>  
>  # Supported by clang >= 7.0
>  config CC_HAVE_SHADOW_CALL_STACK
> -	def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
> +	def_bool y
>  
>  config PARAVIRT
>  	bool "Enable paravirtualization code"
> 
> 
> And was able to trigger at least some of the build errors you saw:
> 
>   In file included from kernel/scs.c:15:
>   ./include/linux/scs.h: In function 'scs_task_reset':
>   ./include/linux/scs.h:26:34: error: implicit declaration of function 'task_thread_info' [-Werror=implicit-function-declaration]
> 
> This is fixed with:
> 
> diff --git a/kernel/scs.c b/kernel/scs.c
> index ca9e707049cb..719ab53adc8a 100644
> --- a/kernel/scs.c
> +++ b/kernel/scs.c
> @@ -5,6 +5,7 @@
>   * Copyright (C) 2019 Google LLC
>   */
>  
> +#include <linux/sched/thread_info_api.h>
>  #include <linux/sched.h>
>  #include <linux/mm_page_address.h>
>  #include <linux/mm_api.h>
> 
> 
> Then there's the build failure in init/main.c:
> 
> > It looks like on mainline, init_shadow_call_stack is in defined and used 
> > in init/init_task.c but now, it is used in init/main.c, with no
> > declaration to allow the compiler to find the definition. I guess moving
> > init_shadow_call_stack out of init/init_task.c to somewhere more common
> > would fix this but it depends on SCS_SIZE, which is defined in
> > include/linux/scs.h, and as soon as I tried to include that in another
> > file, the build broke further... Any ideas you have would be appreciated
> > :) for benchmarking purposes, I just disabled CONFIG_SHADOW_CALL_STACK.
> 
> So I see:
> 
> In file included from ./include/linux/thread_info.h:63,
>                  from ./arch/arm64/include/asm/smp.h:32,
>                  from ./include/linux/smp_api.h:15,
>                  from ./include/linux/percpu.h:6,
>                  from ./include/linux/softirq.h:8,
>                  from init/main.c:17:
> init/main.c: In function 'init_per_task_early':
> ./arch/arm64/include/asm/thread_info.h:113:27: error: 'init_shadow_call_stack' undeclared (first use in this function)
>   113 |         .scs_base       = init_shadow_call_stack,                       \
>       |                           ^~~~~~~~~~~~~~~~~~~~~~
> 
> This looks pretty straightforward, does this patch solve it?
> 
>  include/linux/scs.h | 3 +++
>  init/main.c         | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/include/linux/scs.h b/include/linux/scs.h
> index 18122d9e17ff..863932a9347a 100644
> --- a/include/linux/scs.h
> +++ b/include/linux/scs.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_SCS_H
>  #define _LINUX_SCS_H
>  
> +#include <linux/sched/thread_info_api.h>
>  #include <linux/gfp.h>
>  #include <linux/poison.h>
>  #include <linux/sched.h>
> @@ -25,6 +26,8 @@
>  #define task_scs(tsk)		(task_thread_info(tsk)->scs_base)
>  #define task_scs_sp(tsk)	(task_thread_info(tsk)->scs_sp)
>  
> +extern unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)];
> +
>  void *scs_alloc(int node);
>  void scs_free(void *s);
>  void scs_init(void);
> diff --git a/init/main.c b/init/main.c
> index c9eb3ecbe18c..74ccad445009 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -12,6 +12,7 @@
>  
>  #define DEBUG		/* Enable initcall_debug */
>  
> +#include <linux/scs.h>
>  #include <linux/workqueue_api.h>
>  #include <linux/sysctl.h>
>  #include <linux/softirq.h>
> 
> I've applied these fixes, with that CONFIG_SHADOW_CALL_STACK=y builds fine 
> on ARM64 - but I performed no runtime testing.
> 
> I've backmerged this into:
> 
>     headers/deps: per_task, arm64, x86: Convert task_struct::thread to a per_task() field
> 
> where this bug originated from.
> 
> I.e. I think the bug was simply to make main.c aware of the array, now that 
> the INIT_THREAD initialization is done there.

Yes, that seems right.

Unfortunately, while the kernel now builds, it does not boot in QEMU. I
tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if
I could reproduce that breakage there but the build errors out at that
change (I do see notes of bisection breakage in some of the commits) so
I assume that is expected.

There is no output, even with earlycon, so it seems like something is
going wrong in early boot code. I am not very familiar with the SCS code
so I will see if I can debug this with gdb later (I'll try to see if it
is reproducible with GCC as well; as Nick mentions, there is support
being added to it and I don't mind building from source).

> We could move over the init_shadow_call_stack[] array there and make it 
> static to begin with? I don't think anything truly relies on it being a 
> global symbol.

That is what I thought as well... I'll see if I can ping Sami to see if
there is any reason not to do that.

> > 3. Nested function in arch/x86/kernel/asm-offsets.c
> 
> > diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> > index ff3f8ed5d0a2..a6d56f4697cd 100644
> > --- a/arch/x86/kernel/asm-offsets.c
> > +++ b/arch/x86/kernel/asm-offsets.c
> > @@ -35,10 +35,10 @@
> >  # include "asm-offsets_64.c"
> >  #endif
> > 
> > -static void __used common(void)
> > -{
> >  #include "../../../kernel/sched/per_task_area_struct_defs.h"
> > 
> > +static void __used common(void)
> > +{
> >         BLANK();
> >         DEFINE(TASK_threadsp, offsetof(struct task_struct, per_task_area) +
> >                               offsetof(struct task_struct_per_task, thread) +
> 
> Ha, that code is bogus, it's a merge bug of mine. Super interesting that 
> GCC still managed to include the header ...
> 
> I've applied your fix.
> 
> > 4. Build error in kernel/gcov/clang.c
> 
> > 8 errors generated.
> > 
> > I resolved this with:
> > 
> > diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
> > index 6ee385f6ad47..29f0899ba209 100644
> > --- a/kernel/gcov/clang.c
> > +++ b/kernel/gcov/clang.c
> > @@ -52,6 +52,7 @@
> >  #include <linux/ratelimit.h>
> >  #include <linux/slab.h>
> >  #include <linux/mm.h>
> > +#include <linux/string.h>
> >  #include "gcov.h"
> 
> Thank you - applied!
> 
> >  typedef void (*llvm_gcov_callback)(void);
> > 
> > 
> > 5. BPF errors
> > 
> > With Arch Linux's config (https://github.com/archlinux/svntogit-packages/raw/packages/linux/trunk/config),
> > I see the following errors:
> > 
> > kernel/bpf/preload/iterators/iterators.c:3:10: fatal error: 'linux/sched/signal.h' file not found
> > #include <linux/sched/signal.h>
> >          ^~~~~~~~~~~~~~~~~~~~~~
> > 1 error generated.
> > 
> > kernel/bpf/sysfs_btf.c:21:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration]
> >         memcpy(buf, __start_BTF + off, len);
> >         ^
> > kernel/bpf/sysfs_btf.c:21:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy'
> > 1 error generated.
> > 
> > The second error is obviously fixed by just including string.h as above.
> 
> Applied.
> 
> > I am not sure what is wrong with the first one; the includes all appear
> > to be userland headers, rather than kernel ones, so maybe an -I flag is
> > not present that should be? To work around it, I disabled
> > CONFIG_BPF_PRELOAD.
> 
> Yeah, this should be fixed by simply removing the two stray dependencies 
> that found their way into this user-space code:
> 
>  kernel/bpf/preload/iterators/iterators.bpf.c | 1 -
>  kernel/bpf/preload/iterators/iterators.c     | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/kernel/bpf/preload/iterators/iterators.bpf.c b/kernel/bpf/preload/iterators/iterators.bpf.c
> index 41ae00edeecf..03af863314ea 100644
> --- a/kernel/bpf/preload/iterators/iterators.bpf.c
> +++ b/kernel/bpf/preload/iterators/iterators.bpf.c
> @@ -1,6 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2020 Facebook */
> -#include <linux/seq_file.h>
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
>  #include <bpf/bpf_core_read.h>
> diff --git a/kernel/bpf/preload/iterators/iterators.c b/kernel/bpf/preload/iterators/iterators.c
> index d702cbf7ddaf..5d872a705470 100644
> --- a/kernel/bpf/preload/iterators/iterators.c
> +++ b/kernel/bpf/preload/iterators/iterators.c
> @@ -1,6 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2020 Facebook */
> -#include <linux/sched/signal.h>
>  #include <errno.h>
>  #include <stdio.h>
>  #include <stdlib.h>

Yes, that resolves the error for me.

> > 6. resolve_btfids warning
> > 
> > After working around the above errors, with either GCC or clang, I see
> > the following warnings with Arch Linux's configuration:
> > 
> > WARN: multiple IDs found for 'task_struct': 103, 23549 - using 103
> > WARN: multiple IDs found for 'path': 1166, 23551 - using 1166
> > WARN: multiple IDs found for 'inode': 997, 23561 - using 997
> > WARN: multiple IDs found for 'file': 714, 23566 - using 714
> > WARN: multiple IDs found for 'seq_file': 1120, 23673 - using 1120
> > 
> > Which appears to come from symbols_resolve() in
> > tools/bpf/resolve_btfids/main.c.
> 
> Hm, is this perhaps related to CONFIG_KALLSYMS_FAST=y? If yes then turning 
> it off might help.
> 
> I don't really know this area of BPF all that much, maybe someone else can 
> see what the problem is? The error message is not self-explanatory.

It does not seem related, as I disabled that configuration and still see
it.

I am equally ignorant about BPF so enlisting their help would good.

> > 
> > ########################################################################
> > 
> > I am very excited to see where this goes, it is a herculean effort but I
> > think it will be worth it in the long run. Let me know if there is any
> > more information or input that I can provide, cheers!
> 
> Your testing & patch sending efforts are much appreciated!! You'd help me 
> most by continuing on the same path with new fast-headers releases as well, 
> whenever you find the time. :-)
> 
> BTW., you can always pick up my latest Work-In-Progress branch from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers
> 
> The 'master' branch will carry the release.
> 
> The sched/headers branch is already rebased to -rc8 and has some other 
> changes as well. It should normally work, with less testing than the main 
> releasees, but will at times have fixes at the tail waiting to be 
> backmerged in a bisect-friendly way.

Sure thing, I will continue to follow this and test it as much as I can
to make sure everything continues to work well!

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
  2022-01-04 11:02     ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
  2022-01-04 15:05       ` kernel test robot
@ 2022-01-04 17:51       ` Nathan Chancellor
  2022-01-05  0:20         ` Ingo Molnar
  1 sibling, 1 reply; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-04 17:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Al Viro, Linus Torvalds, Andrew Morton, linux-kernel, linux-arch,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	llvm

On Tue, Jan 04, 2022 at 12:02:34PM +0100, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > > 1. Position of certain attributes
> > > 
> > > In some commits, you move the cacheline_aligned attributes from after
> > > the closing brace on structures to before the struct keyword, which
> > > causes clang to warn (and error with CONFIG_WERROR):
> > > 
> > > In file included from arch/arm64/kernel/asm-offsets.c:9:
> > > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33:
> > > In file included from ./include/linux/perf_event_api.h:17:
> > > In file included from ./include/linux/perf_event_types.h:41:
> > > In file included from ./include/linux/ftrace.h:18:
> > > In file included from ./arch/arm64/include/asm/ftrace.h:53:
> > > In file included from ./include/linux/compat.h:11:
> > > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes]
> > > ____cacheline_aligned
> > > ^
> > > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned'
> > > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
> > 
> > Yeah, so this is a *really* stupid warning from Clang.
> > 
> > Putting the attribute after 'struct' risks the hard to track down bugs when 
> > a <linux/cache.h> inclusion is missing, which scenario I pointed out in 
> > this commit:
> > 
> >     headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
> >     
> >     When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
> >     which caused a couple of hundred of mysterious, somewhat obscure link time errors:
> >     
> >       ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
> >       ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
> >       ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
> >       ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
> >     
> >     After a bit of head-scratching, what happened is that 'struct dentry_operations'
> >     has the ____cacheline_aligned attribute at the tail of the type definition -
> >     which turned into a local variable definition when <linux/cache.h> was not
> >     included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.
> >     
> >     There were no compile time errors, only link time errors.
> >     
> >     Move the attribute to the head of the definition, in which case
> >     a missing <linux/cache.h> inclusion creates an immediate build failure:
> >     
> >       In file included from ./include/linux/fs.h:9,
> >                        from ./include/linux/fsverity.h:14,
> >                        from fs/verity/fsverity_private.h:18,
> >                        from fs/verity/read_metadata.c:8:
> >       ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
> >         132 | ____cacheline_aligned
> >             |                      ^
> >             |                      ;
> >         133 | struct dentry_operations {
> >             | ~~~~~~
> >     
> >     No change in functionality.
> >     
> >     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > 
> > Can this Clang warning be disabled?
> 
> Ok, broke out this issue into its own thread, in form of a patch submission 
> - so that others don't have to wade through a massive tree to find a single 
> commit ...
> 
> I'll of course drop these (non-essential) cleanups if the upstream policy 
> is to follow Clang's quirk/convention, but I find the forced attribute 
> tail-position a sad misfeature, due to the reasons outlined in this patch: 
> a straightforward build failure in case an attribute is not defined is far 
> preferable to spurious creation of variables with link-time warnings that 
> don't actually highlight the exact nature of the bug ...

I don't disagree with that sentiment. However, I went and looked at
GCC's documentation, which seems to agree with clang's warning here.

https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

"You may specify type attributes in an enum, struct or union type
declaration or definition by placing them immediately after the struct,
union or enum keyword. You can also place them just past the closing
curly brace of the definition, but this is less preferred because
logically the type should be fully defined at the closing brace."

Nowhere does it mention that it accepts the attribute before the type
keyword and neither compiler respects the attribute if it comes before
the keyword but at least clang warns: https://godbolt.org/z/E9fTecKPv

$ cat test.c
#include <stdio.h>

struct foo {
    int a;
    int b;
};

struct __attribute__ ((aligned (64))) bar {
    int a;
    int b;
};

__attribute__ ((aligned (64))) struct baz {
    int a;
    int b;
};

int main(void)
{
    printf("struct foo alignment: %zd\n", _Alignof(struct foo));
    printf("struct bar alignment: %zd\n", _Alignof(struct bar));
    printf("struct baz alignment: %zd\n", _Alignof(struct baz));
    return 0;
}

$ gcc --version | head -1
gcc (GCC) 11.2.1 20211231

$ gcc -std=gnu89 -Wall -Wextra test.c; and ./a.out
struct foo alignment: 4
struct bar alignment: 64
struct baz alignment: 4

$ clang --version | head -1
clang version 13.0.0

$ clang -std=gnu89 -Wall -Wextra test.c; and ./a.out
test.c:13:17: warning: attribute 'aligned' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
__attribute__ ((aligned (64))) struct baz {
                ^
1 warning generated.
struct foo alignment: 4
struct bar alignment: 64
struct baz alignment: 4

Cheers,
Nathan

> =====================>
> Date: Sun, 20 Jun 2021 09:41:45 +0200
> Subject: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
> 
> When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header,
> which caused a couple of hundred of mysterious, somewhat obscure link time errors:
> 
>   ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>   ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
>   ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here
>   ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here
> 
> After a bit of head-scratching, what happened is that 'struct dentry_operations'
> has the ____cacheline_aligned attribute at the tail of the type definition -
> which turned into a local variable definition when <linux/cache.h> was not
> included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly.
> 
> There were no compile time errors, only link time errors.
> 
> Move the attribute to the head of the definition, in which case
> a missing <linux/cache.h> inclusion creates an immediate build failure:
> 
>   In file included from ./include/linux/fs.h:9,
>                    from ./include/linux/fsverity.h:14,
>                    from fs/verity/fsverity_private.h:18,
>                    from fs/verity/read_metadata.c:8:
>   ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’
>     132 | ____cacheline_aligned
>         |                      ^
>         |                      ;
>     133 | struct dentry_operations {
>         | ~~~~~~
> 
> No change in functionality.
> 
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  include/linux/dcache.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 41062093ec9b..0482c3d6f1ce 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -129,6 +129,7 @@ enum dentry_d_lock_class
>  	DENTRY_D_LOCK_NESTED
>  };
>  
> +____cacheline_aligned
>  struct dentry_operations {
>  	int (*d_revalidate)(struct dentry *, unsigned int);
>  	int (*d_weak_revalidate)(struct dentry *, unsigned int);
> @@ -144,7 +145,7 @@ struct dentry_operations {
>  	struct vfsmount *(*d_automount)(struct path *);
>  	int (*d_manage)(const struct path *, bool);
>  	struct dentry *(*d_real)(struct dentry *, const struct inode *);
> -} ____cacheline_aligned;
> +};
>  
>  /*
>   * Locking rules for dentry_operations callbacks are to be found in

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 11:12   ` Ingo Molnar
  2022-01-03 13:46     ` Greg Kroah-Hartman
  2022-01-04 14:10     ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar
@ 2022-01-04 17:51     ` Arnd Bergmann
  2022-01-05  0:05       ` Ingo Molnar
  2022-01-05  9:37       ` Andy Shevchenko
  2 siblings, 2 replies; 54+ messages in thread
From: Arnd Bergmann @ 2022-01-04 17:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

On Mon, Jan 3, 2022 at 6:12 AM Ingo Molnar <mingo@kernel.org> wrote:
> * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > > Before going into details about how this tree solves 'dependency hell'
> > > exactly, here's the current kernel build performance gain with
> > > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as
> > > well - see below), using a stock x86 Linux distribution's .config with all
> > > modules built into the vmlinux:
> > >
> > >   #
> > >   # Performance counter stats for 'make -j96 vmlinux' (3 runs):
> > >   #
> > >   # (Elapsed time in seconds):
> > >   #
> > >
> > >   v5.16-rc7:            231.34 +- 0.60 secs, 15.5 builds/hour    # [ vanilla baseline ]
> > >   -fast-headers-v1:     129.97 +- 0.51 secs, 27.7 builds/hour    # +78.0% improvement
> > >
> > > Or in terms of CPU time utilized:
> > >
> > >   v5.16-rc7:            11,474,982.05 msec cpu-clock   # 49.601 CPUs utilized
> > >   -fast-headers-v1:      7,100,730.37 msec cpu-clock   # 54.635 CPUs utilized   # +61.6% improvement
> >
> > Speed up is very impressive, nice job!
>
> Thanks! :-)

I've done some work in this area in the past, didn't quite take it enough of the
way to get this far. The best I saw was 30% improvement with clang, which
tends to be more sensitive than gcc towards header file bloat, as it does more
detailed syntax checking before eliminating dead code.

Did you try both gcc and clang for this?

> > That issue aside, I took a glance at the tree, and overall it looks like
> > a lot of nice cleanups.  Most of these can probably go through the
> > various subsystem trees, after you split them out, for the "major" .h
> > cleanups.  Is that something you are going to be planning on doing?
>
> Yeah, I absolutely plan on doing that too:
>
> - About ~70% of the commits can be split up & parallelized through
>   maintainer trees.
>
> - With the exception of the untangling of sched.h, per_task and the
>   "Optimize Headers" series, where a lot of patches are dependent on each
>   other. These are actually needed to get any measurable benefits from this
>   tree (!). We can do these through the scheduler tree, or through the
>   dedicated headers tree I posted.
>
> The latter monolithic series is pretty much unavoidable, it's the result of
> 30 years of coupling a lot of kernel subsystems to task_struct via embedded
> structs & other complex types, that needed quite a bit of effort to
> untangle, and that untangling needed to happen in-order.
>
> Do these plans this sound good to you?

I haven't had a chance to look at your tree yet, I'm still on vacation
without access to my normal workstation. I would like to run my own
scripts for analyzing the header dependencies on it after I get back
next week.

From what I could tell, linux/sched.h was not the only such problem,
but I saw similarly bad issues with linux/fs.h (which is what I posted
about in November/December), linux/mm.h and linux/netdevice.h
on the high level, in low-level headers there are huge issues with
linux/atomic.h, linux/mutex.h, linux/pgtable.h etc. I expect that you
have addressed these as well, but I'd like to make sure that your
changes are reasonably complete on arm32 and arm64 to avoid
having to do the big cleanup more than once.

My approach to the large mid-level headers is somewhat different:
rather than completely avoiding them from getting included, I would
like to split up the structure definitions from the inline functions.
Linus didn't really like my approach, but I suspect he'll have similar
concerns about your solution for linux/sched.h, especially if we end
up applying the same hack to other commonly used structures
(sk_buff, mm_struct, super_block) in the end. I should be able to
come up with a less handwavy reply after I've actually studied your
approach better.

Most of the patches should be the same either way (adding back
missing includes to drivers, and doing cleanups to commonly
included headers to avoid the deep nesting), the interesting bit
will be how to properly define the larger structures without pulling
in the rest of the world.

         Arnd

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant
  2022-01-04 15:14       ` Andy Shevchenko
@ 2022-01-04 23:27         ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-04 23:27 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Greg Kroah-Hartman, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro


* Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Tue, Jan 04, 2022 at 03:10:51PM +0100, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > +++ b/kernel/sched/core.c
> 
> > +#include "../../../kernel/sched/per_task_area_struct.h"
> 
> #include "per_task_area_struct.h" ?

Indeed - fixed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 17:51     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
@ 2022-01-05  0:05       ` Ingo Molnar
  2022-01-05  1:37         ` Arnd Bergmann
  2022-01-05  9:37       ` Andy Shevchenko
  1 sibling, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List,
	linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro


* Arnd Bergmann <arnd@arndb.de> wrote:

> From what I could tell, linux/sched.h was not the only such problem, but 
> I saw similarly bad issues with linux/fs.h (which is what I posted about 
> in November/December), linux/mm.h and linux/netdevice.h on the high 
> level, in low-level headers there are huge issues with linux/atomic.h, 
> linux/mutex.h, linux/pgtable.h etc. I expect that you have addressed 
> these as well,

Correct, each of these was a problem - and a *lot* of other headers in 
addition to those:

  kepler:~/mingo.tip.git> git diff --stat v5.16-rc8.. include/linux/ arch/*/include/asm/ | grep changed

    1335 files changed, 59677 insertions(+), 56582 deletions(-)

and I reduced all the kernels that showed up in the bloat-profile to a 
fraction of their orignal size:

    ------------------------------------------------------------------------------------------
    | Combined, preprocessed C code size of header, without line markers,
    | with comments stripped:
    ------------------------------.-----------------------------.-----------------------------
                                  | v5.16-rc7                   |  -fast-headers-v1
                                  |-----------------------------|-----------------------------
     #include <linux/sched.h>     | LOC: 13,292 | headers:  324 |  LOC:    769 | headers:   64
     #include <linux/wait.h>      | LOC:  9,369 | headers:  235 |  LOC:    483 | headers:   46
     #include <linux/rcupdate.h>  | LOC:  8,975 | headers:  224 |  LOC:  1,385 | headers:   86
     #include <linux/hrtimer.h>   | LOC: 10,861 | headers:  265 |  LOC:    229 | headers:   37
     #include <linux/fs.h>        | LOC: 22,497 | headers:  427 |  LOC:  1,993 | headers:  120
     #include <linux/cred.h>      | LOC: 17,257 | headers:  368 |  LOC:  4,830 | headers:  129
     #include <linux/dcache.h>    | LOC: 10,545 | headers:  253 |  LOC:    858 | headers:   65
     #include <linux/cgroup.h>    | LOC: 33,518 | headers:  522 |  LOC:  2,477 | headers:  111
     #include <linux/module.h>    | LOC: 16,948 | headers:  339 |  LOC:  2,239 | headers:  122
     #include <linux/kobject.h>   | LOC: 15,210 | headers:  318 |  LOC:    799 | headers:   59
     #include <linux/device.h>    | LOC: 20,505 | headers:  408 |  LOC:  2,131 | headers:  123
     #include <linux/gfp.h>       | LOC: 13,543 | headers:  303 |  LOC:    181 | headers:   26
     #include <linux/slab.h>      | LOC: 14,037 | headers:  307 |  LOC:    999 | headers:   74
     #include <linux/mm.h>        | LOC: 26,727 | headers:  453 |  LOC:  1,855 | headers:  133
     #include <linux/mmzone.h>    | LOC: 12,755 | headers:  293 |  LOC:    832 | headers:   64
     #include <linux/swap.h>      | LOC: 38,292 | headers:  559 |  LOC: 11,085 | headers:  294
     #include <linux/writeback.h> | LOC: 36,481 | headers:  550 |  LOC:  1,566 | headers:   92
     #include <linux/gfp.h>       | LOC: 13,543 | headers:  303 |  LOC:    181 | headers:   26
     #include <linux/skbuff.h>    | LOC: 36,130 | headers:  558 |  LOC:  1,209 | headers:   89
     #include <linux/tcp.h>       | LOC: 60,133 | headers:  725 |  LOC:  3,829 | headers:  153
     #include <linux/udp.h>       | LOC: 59,411 | headers:  721 |  LOC:  3,236 | headers:  146
     #include <linux/filter.h>    | LOC: 54,172 | headers:  689 |  LOC:  4,087 | headers:   73
     #include <linux/interrupt.h> | LOC: 14,085 | headers:  340 |  LOC:  2,629 | headers:  124

     #include <net/sock.h>        | LOC: 58,880 | headers:  715 |  LOC:  1,543 | headers:   98

     #include <asm/processor.h>   | LOC:  7,821 | headers:  204 |  LOC:    618 | headers:   41
     #include <asm/page.h>        | LOC:  1,540 | headers:   97 |  LOC:  1,193 | headers:   82
     #include <asm/pgtable.h>     | LOC: 12,949 | headers:  297 |  LOC:  5,742 | headers:  217

<linux/atomic.h> wasn't a particularly big problem - but it does get 
included everywhere, so I moved the most common atomic_t definition into 
<linux/types.h> (on 64-bit kernels), which allowed a big reduction for the 
majority of cases that don't use the atomic APIs:

 #include <linux/atomic.h>               | LOC:    176 | headers:   26
 #include <linux/atomic_api.h>           | LOC:  2,785 | headers:   52

But <linux/atomic_api.h> is still included in ~75% of .c files, mostly for 
good reasons, because it's a very popular low level API.

> but I'd like to make sure that your changes are reasonably complete on 
> arm32 and arm64 to avoid having to do the big cleanup more than once.

I did test ARM64 extensively in terms of build coverage - but not in terms 
of header bloat, and I'm sure more could be done there!

> My approach to the large mid-level headers is somewhat different: rather 
> than completely avoiding them from getting included, I would like to 
> split up the structure definitions from the inline functions.

That's a big chunk of what the -fast-headers tree does: I've split over 85 
headers into <linux/header_types.h> and <linux/header_api.h>...

I've also split up headers further where needed, in particular mm.h 
required multiple levels of splitting to get the dependencies of the most 
commonly used <linux/mm_types.h> and <linux/mm_api.h> headers under 
control:

  kepler:~/mingo.tip.git> ls -ldt include/linux/mm*api*.h
  -rw-rw-r-- 1 mingo mingo 77130 Jan  4 13:32 include/linux/mm_api.h
  -rw-rw-r-- 1 mingo mingo 22227 Jan  4 13:32 include/linux/mmzone_api.h
  -rw-rw-r-- 1 mingo mingo  6759 Jan  4 13:32 include/linux/mm_api_extra.h
  -rw-rw-r-- 1 mingo mingo   479 Jan  4 13:31 include/linux/mm_api_exe_file.h
  -rw-rw-r-- 1 mingo mingo   960 Jan  4 13:31 include/linux/mm_api_truncate.h
  -rw-rw-r-- 1 mingo mingo  1262 Jan  4 13:31 include/linux/mm_api_kvmalloc.h
  -rw-rw-r-- 1 mingo mingo   719 Jan  4 13:31 include/linux/mm_api_gate_area.h
  -rw-rw-r-- 1 mingo mingo  1342 Jan  4 13:31 include/linux/mm_api_kasan.h
  -rw-rw-r-- 1 mingo mingo  3007 Jan  4 13:31 include/linux/mm_api_tlb_flush.h

The results are pretty nice:

 # vanilla:

   #include <linux/mm.h>                   | LOC: 26,728 | headers:  453

 # -fast-headers:

   #include <linux/mm.h>                   | LOC:  1,855 | headers:  132  # == mm_types.h
   #include <linux/mm_types.h>             | LOC:  1,855 | headers:  131
   #include <linux/mm_api.h>               | LOC:  8,587 | headers:  229

And <linux/mm_api.h> is now included only in about 25% of the .c files - in 
the vanilla kernel the use percentage is over ~90%.

But despite all those reductions, <linux/mm_api.h> is still a header with 
one of the largest cumulative footprints within a (distro) kernel build:

                                                              | stripped lines of code
                                                              |              _____________________________
                                                              |             | headers included recursively
                                                              |             |                _______________________________
                                                              |             |               | usage in a distro kernel build
 ____________                                                 |             |               |         _________________________________________
| header name                                                 |             |               |        | million lines of comment-stripped C code
|                                                             |             |               |        |
  #include <linux/spinlock_api.h>                             | LOC:  5,142 | headers:  123 | 10,168 | MLOC:   52.2 | #############
  #include <linux/device/driver.h>                            | LOC:  4,132 | headers:  169 | 12,306 | MLOC:   50.8 | ############
  #include <linux/mm_api.h>                                   | LOC:  8,584 | headers:  230 |  5,135 | MLOC:   44.0 | ###########
  #include <linux/skbuff_api.h>                               | LOC:  8,404 | headers:  190 |  5,065 | MLOC:   42.5 | ##########
  #include <linux/atomic_api.h>                               | LOC:  2,785 | headers:   52 | 15,282 | MLOC:   42.5 | ##########
  #include <asm/spinlock.h>                                   | LOC:  4,039 | headers:   83 | 10,168 | MLOC:   41.0 | ##########
  #include <asm/qrwlock.h>                                    | LOC:  4,039 | headers:   82 | 10,168 | MLOC:   41.0 | ##########
  #include <asm-generic/qrwlock.h>                            | LOC:  4,039 | headers:   81 | 10,168 | MLOC:   41.0 | ##########
  #include <linux/page_ref.h>                                 | LOC:  5,397 | headers:  168 |  7,578 | MLOC:   40.8 | ##########
  #include <asm/qspinlock.h>                                  | LOC:  3,990 | headers:   80 | 10,169 | MLOC:   40.5 | ##########
  #include <linux/device_types.h>                             | LOC:  2,131 | headers:  122 | 17,424 | MLOC:   37.1 | #########
  #include <linux/module.h>                                   | LOC:  2,239 | headers:  122 | 16,472 | MLOC:   36.8 | #########
  #include <net/cfg80211.h>                                   | LOC: 29,004 | headers:  423 |  1,205 | MLOC:   34.9 | ########
  #include <linux/pci.h>                                      | LOC:  7,092 | headers:  232 |  4,849 | MLOC:   34.3 | ########
  #include <linux/netdevice_api.h>                            | LOC:  8,434 | headers:  225 |  4,065 | MLOC:   34.2 | ########
  #include <linux/refcount_api.h>                             | LOC:  3,421 | headers:   87 |  9,776 | MLOC:   33.4 | ########

( The 'MLOC' footprint estimate is number of usages times 
  preprocessed-stripped-header size. )

I've reduced header bloat through three primary angles of attack:

  - reducing number of inclusions

  - reducing header size itself, by type/API splitting & by segmenting 
    headers along API usage frequency

  - decoupling headers from each other

As you can see, fast-headers -v1 is much improved (on x86), but there's 
plenty of work left, such as <net/cfg80211.h>. :-)

> Linus didn't really like my approach,

Yeah, so without having a significant build time speedup I didn't like my 
approach(es) either, which is why I didn't post this tree for a long time. :-)

But the results speak for themselves IMO, and we cannot ignore this: my 
project actually accelerated as I progressed, because the kernel rebuilds, 
especially incremental ones, became faster and faster...

Linux kernel header dependencies need to be simplified.

> but I suspect he'll have similar 
> concerns about your solution for linux/sched.h, especially if we end up 
> applying the same hack to other commonly used structures (sk_buff, 
> mm_struct, super_block) in the end.

So the per_task approach is pretty much unavoidable under the constraint of 
having no runtime overhead, given that task_struct is a historic union of a 
zillion types, where 99% of the users don't actually need to know about 
those types.

( We could eventually get rid of per_task() as well, by turning complex 
  embedded structs into pointers - but that has runtime overhead due to the 
  indirections, and I tried hard to make this approach runtime-invariant, 
  at least conceptually. )

The header splitting I've done is fundamentally clean (at least 
aspirationally), mostly done along conceptual boundaries or API families.

It's how we'd have implemented many of those headers if we had a time 
machine and went back 30 years. ;-)

> I should be able to come up with a less handwavy reply after I've 
> actually studied your approach better.

Looking forward to it!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()
  2022-01-04 15:14           ` Greg Kroah-Hartman
@ 2022-01-05  0:11             ` Ingo Molnar
  2022-01-05 15:23               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro


* Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Tue, Jan 04, 2022 at 04:09:57PM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote:
> > > 
> > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > > 
> > > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote:
> > > > > There's one happy exception though, all the uninlining patches that 
> > > > > uninline a single-call function are probably fine as-is:
> > > > 
> > > > <snip>
> > > > 
> > > > >  3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()
> > > > 
> > > > Let me go take this right now, no need for this to wait, it should be
> > > > out of kobject.h as you rightfully show there is only one user.
> > > 
> > > Sure - here you go!
> > 
> > I just picked it out of your git tree already :)
> > 
> > Along those lines, any objection to me taking at least one other one?
> > 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and

Ack.

> > 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h>

Ack.

> > dependencies, remove <linux/device.h>") look like I can take now into my
> > USB tree with no problems.
> 
> Also these look good to go now:
> 	bae9ddd98195 ("headers/prep: Fix non-standard header section: drivers/usb/cdns3/core.h")

Ack.

> 	c027175b37e5 ("headers/prep: Fix non-standard header section: drivers/usb/host/ohci-tmio.c")

Ack.

Note that these latter two patches just simplified the task of my 
(simplistic) tooling, which is basically a shell script that inserts
header dependencies to the head of .c and .h files, right in front of
the first #include line it encounters.

These two patches do have some marginal clean-up value too, so I'm not 
opposed to merging them - just wanted to declare their true role. :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition
  2022-01-04 17:51       ` Nathan Chancellor
@ 2022-01-05  0:20         ` Ingo Molnar
  2022-01-05  0:26           ` [PATCH] headers/deps: Attribute placement fixes for Clang & GCC Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:20 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Al Viro, Linus Torvalds, Andrew Morton, linux-kernel, linux-arch,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> Nowhere does it mention that it accepts the attribute before the type 
> keyword and neither compiler respects the attribute if it comes before 
> the keyword but at least clang warns: https://godbolt.org/z/E9fTecKPv
>
> $ cat test.c
> #include <stdio.h>
> 
> struct foo {
>     int a;
>     int b;
> };
> 
> struct __attribute__ ((aligned (64))) bar {
>     int a;
>     int b;
> };
> 
> __attribute__ ((aligned (64))) struct baz {
>     int a;
>     int b;
> };
> 
> int main(void)
> {
>     printf("struct foo alignment: %zd\n", _Alignof(struct foo));
>     printf("struct bar alignment: %zd\n", _Alignof(struct bar));
>     printf("struct baz alignment: %zd\n", _Alignof(struct baz));
>     return 0;
> }
> 
> $ gcc --version | head -1
> gcc (GCC) 11.2.1 20211231
> 
> $ gcc -std=gnu89 -Wall -Wextra test.c; and ./a.out
> struct foo alignment: 4
> struct bar alignment: 64
> struct baz alignment: 4

Ugh - so my changes there are outright buggy.

I'm reverting all those attribute position changes as we speak ...

I'm actually happy about this in a way, as it settles the issue nicely. :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] headers/deps: Attribute placement fixes for Clang & GCC
  2022-01-05  0:20         ` Ingo Molnar
@ 2022-01-05  0:26           ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:26 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Al Viro, Linus Torvalds, Andrew Morton, linux-kernel, linux-arch,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	llvm


* Ingo Molnar <mingo@kernel.org> wrote:

> Ugh - so my changes there are outright buggy.
> 
> I'm reverting all those attribute position changes as we speak ...
> 
> I'm actually happy about this in a way, as it settles the issue nicely. 
> :-)

And, by the way - by putting the attribute after the 'struct' keyword we 
get the best of the two worlds: accidentally non-defined attribute 
shortcuts will still result in a build error.

Below is the fix - should be identical to yours (which was whitespace 
mangled).

I'll backmerge these fixes to the originating commits & push out -v2 later 
today.

Thanks,

	Ingo
---
 include/linux/dcache.h        | 3 +--
 include/linux/fs_types.h      | 3 +--
 include/linux/netdevice_api.h | 2 +-
 include/net/xdp_types.h       | 2 +-
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 520daf638d06..da7e77a7cede 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -127,8 +127,7 @@ enum dentry_d_lock_class
 	DENTRY_D_LOCK_NESTED
 };
 
-____cacheline_aligned
-struct dentry_operations {
+struct ____cacheline_aligned dentry_operations {
 	int (*d_revalidate)(struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
diff --git a/include/linux/fs_types.h b/include/linux/fs_types.h
index b53aadafab1b..e2e1c0827183 100644
--- a/include/linux/fs_types.h
+++ b/include/linux/fs_types.h
@@ -994,8 +994,7 @@ struct file_operations {
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
 } __randomize_layout;
 
-____cacheline_aligned
-struct inode_operations {
+struct ____cacheline_aligned inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
 	const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
 	int (*permission) (struct user_namespace *, struct inode *, int);
diff --git a/include/linux/netdevice_api.h b/include/linux/netdevice_api.h
index 4a8d7688e148..0e5e08dcbb2a 100644
--- a/include/linux/netdevice_api.h
+++ b/include/linux/netdevice_api.h
@@ -49,7 +49,7 @@
 #endif
 
 /* This structure contains an instance of an RX queue. */
-____cacheline_aligned_in_smp struct netdev_rx_queue {
+struct ____cacheline_aligned_in_smp netdev_rx_queue {
 	struct xdp_rxq_info		xdp_rxq;
 #ifdef CONFIG_RPS
 	struct rps_map __rcu		*rps_map;
diff --git a/include/net/xdp_types.h b/include/net/xdp_types.h
index 442028626b35..accc12372bca 100644
--- a/include/net/xdp_types.h
+++ b/include/net/xdp_types.h
@@ -56,7 +56,7 @@ struct xdp_mem_info {
 struct page_pool;
 
 /* perf critical, avoid false-sharing */
-____cacheline_aligned struct xdp_rxq_info {
+struct ____cacheline_aligned xdp_rxq_info {
 	struct net_device *dev;
 	u32 queue_index;
 	u32 reg_state;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs
  2022-01-04 17:50     ` Nathan Chancellor
@ 2022-01-05  0:35       ` Ingo Molnar
  2022-01-05  0:40       ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
  2022-01-08 15:16       ` Ingo Molnar
  2 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:35 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> Good point. With my main box (AMD EPYC 7502P), with the performance governor...
> 
> GCC:
> 
> Benchmark 1: ARCH=x86_64 defconfig (linux)
>   Time (mean ± σ):     48.685 s ±  0.049 s    [User: 1969.835 s, System: 204.166 s]
>   Range (min … max):   48.620 s … 48.782 s    10 runs
> 
> Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
>   Time (mean ± σ):     46.797 s ±  0.119 s    [User: 1403.854 s, System: 154.336 s]
>   Range (min … max):   46.620 s … 47.052 s    10 runs
> 
> Summary
>   'ARCH=x86_64 defconfig (linux-fast-headers)' ran
>     1.04 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)'
> 
> LLVM:
> 
> Benchmark 1: ARCH=x86_64 defconfig (linux)
>   Time (mean ± σ):     51.816 s ±  0.079 s    [User: 2208.577 s, System: 200.410 s]
>   Range (min … max):   51.671 s … 51.900 s    10 runs
> 
> Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers)
>   Time (mean ± σ):     46.806 s ±  0.062 s    [User: 1438.972 s, System: 154.846 s]
>   Range (min … max):   46.696 s … 46.917 s    10 runs
> 
> Summary
>   'ARCH=x86_64 defconfig (linux-fast-headers)' ran
>     1.11 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)'
> 
> $ rg KALLSYMS .config
> 246:CONFIG_KALLSYMS=y
> 247:# CONFIG_KALLSYMS_ALL is not set
> 248:CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
> 249:CONFIG_KALLSYMS_BASE_RELATIVE=y
> 250:CONFIG_KALLSYMS_FAST=y
> 706:CONFIG_HAVE_OBJTOOL_KALLSYMS=y
> 
> It seems like everything is working right but maybe the build is so
> short that there just is not much time for the difference to be as
> apparent?

Yeah, x86 defconfig doesn't have KALLSYMS_ALL - while all distro configs I 
checked have it enabled, because it makes crash printouts / backtraces more 
informative.

Lockep will also enable it unconditionally.

So I've applied the patch below, to make the x86 defconfig more 
representative of what people are using in practice. This will also, as a 
side effect, bring elapsed time improvements closer to what the underlying 
cpu-time improvements offer, in the small-config case too.

Thanks,

	Ingo

====================================>
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 5 Jan 2022 01:31:35 +0100
Subject: [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs

Most distro kernels have this option enabled, to improve debug output.

Lockdep also selects it.

Enable this in the defconfig kernel as well, to make it more
representative of what people are using on x86.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/configs/i386_defconfig   | 1 +
 arch/x86/configs/x86_64_defconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index 5d97a2dfbaa7..71124cf8630c 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -261,3 +261,4 @@ CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
 CONFIG_EARLY_PRINTK_DBGP=y
 CONFIG_DEBUG_BOOT_PARAMS=y
+CONFIG_KALLSYMS_ALL=y
diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
index 30ab3e582d53..92b1169ec90b 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -257,3 +257,4 @@ CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
 CONFIG_EARLY_PRINTK_DBGP=y
 CONFIG_DEBUG_BOOT_PARAMS=y
+CONFIG_KALLSYMS_ALL=y

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 17:50     ` Nathan Chancellor
  2022-01-05  0:35       ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar
@ 2022-01-05  0:40       ` Ingo Molnar
  2022-01-05  1:07         ` Ingo Molnar
  2022-01-05 22:33         ` Nathan Chancellor
  2022-01-08 15:16       ` Ingo Molnar
  2 siblings, 2 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:40 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> > I.e. I think the bug was simply to make main.c aware of the array, now 
> > that the INIT_THREAD initialization is done there.
> 
> Yes, that seems right.
> 
> Unfortunately, while the kernel now builds, it does not boot in QEMU. I 
> tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I 
> could reproduce that breakage there but the build errors out at that 
> change (I do see notes of bisection breakage in some of the commits) so I 
> assume that is expected.

Yeah, there's a breakage window on ARM64, I'll track down that 
bisectability bug.

Decoupling thread_info and task_struct incrementally, so that it bisects 
cleanly on all architectures, was always a big challenge. :-/

> There is no output, even with earlycon, so it seems like something is 
> going wrong in early boot code. I am not very familiar with the SCS code 
> so I will see if I can debug this with gdb later (I'll try to see if it 
> is reproducible with GCC as well; as Nick mentions, there is support 
> being added to it and I don't mind building from source).

Just to make sure: with SCS disabled the same kernel boots fine?

> Sure thing, I will continue to follow this and test it as much as I can 
> to make sure everything continues to work well!

Thank you!

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 17:25     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers
@ 2022-01-05  0:43       ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  0:43 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Nathan Chancellor, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm, ashimida,
	Arnd Bergmann


* Nick Desaulniers <ndesaulniers@google.com> wrote:

> > Can this Clang warning be disabled?
> 
> Clang is warning that the attribute will be ignored because of that 
> positioning. If you disable the warning, code will probably stop working 
> as intended.  This warning has at least been helping us make the kernel 
> coding style more consistent.

Yeah, indeed, Clang is fully correct to warn here, and these changes in my 
tree are outright bugs (which bugs Clang found & reported :-).

See the fixes below - by doing it this way the 'spurious link failure' 
problem when a header include is missing should be fixed as well.

Thanks,

	Ingo
---
 include/linux/dcache.h        | 3 +--
 include/linux/fs_types.h      | 3 +--
 include/linux/netdevice_api.h | 2 +-
 include/net/xdp_types.h       | 2 +-
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 520daf638d06..da7e77a7cede 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -127,8 +127,7 @@ enum dentry_d_lock_class
 	DENTRY_D_LOCK_NESTED
 };
 
-____cacheline_aligned
-struct dentry_operations {
+struct ____cacheline_aligned dentry_operations {
 	int (*d_revalidate)(struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
diff --git a/include/linux/fs_types.h b/include/linux/fs_types.h
index b53aadafab1b..e2e1c0827183 100644
--- a/include/linux/fs_types.h
+++ b/include/linux/fs_types.h
@@ -994,8 +994,7 @@ struct file_operations {
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
 } __randomize_layout;
 
-____cacheline_aligned
-struct inode_operations {
+struct ____cacheline_aligned inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
 	const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
 	int (*permission) (struct user_namespace *, struct inode *, int);
diff --git a/include/linux/netdevice_api.h b/include/linux/netdevice_api.h
index 4a8d7688e148..0e5e08dcbb2a 100644
--- a/include/linux/netdevice_api.h
+++ b/include/linux/netdevice_api.h
@@ -49,7 +49,7 @@
 #endif
 
 /* This structure contains an instance of an RX queue. */
-____cacheline_aligned_in_smp struct netdev_rx_queue {
+struct ____cacheline_aligned_in_smp netdev_rx_queue {
 	struct xdp_rxq_info		xdp_rxq;
 #ifdef CONFIG_RPS
 	struct rps_map __rcu		*rps_map;
diff --git a/include/net/xdp_types.h b/include/net/xdp_types.h
index 442028626b35..accc12372bca 100644
--- a/include/net/xdp_types.h
+++ b/include/net/xdp_types.h
@@ -56,7 +56,7 @@ struct xdp_mem_info {
 struct page_pool;
 
 /* perf critical, avoid false-sharing */
-____cacheline_aligned struct xdp_rxq_info {
+struct ____cacheline_aligned xdp_rxq_info {
 	struct net_device *dev;
 	u32 queue_index;
 	u32 reg_state;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-05  0:40       ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
@ 2022-01-05  1:07         ` Ingo Molnar
  2022-01-05 21:42           ` Nathan Chancellor
  2022-01-05 22:33         ` Nathan Chancellor
  1 sibling, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2022-01-05  1:07 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > > I.e. I think the bug was simply to make main.c aware of the array, now 
> > > that the INIT_THREAD initialization is done there.
> > 
> > Yes, that seems right.
> > 
> > Unfortunately, while the kernel now builds, it does not boot in QEMU. I 
> > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I 
> > could reproduce that breakage there but the build errors out at that 
> > change (I do see notes of bisection breakage in some of the commits) so I 
> > assume that is expected.
> 
> Yeah, there's a breakage window on ARM64, I'll track down that 
> bisectability bug.

I haven't fixed this ARM64 bisection breakage yet, but I've integrated & 
backmerged all the other fixes and changes, and pushed it out to the WIP 
branch:

    # 1755441e323b per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets

    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers

Let me know if there's anything missing or if there's a new breakage.

This is pretty close to what will be -v2.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-05  0:05       ` Ingo Molnar
@ 2022-01-05  1:37         ` Arnd Bergmann
  0 siblings, 0 replies; 54+ messages in thread
From: Arnd Bergmann @ 2022-01-05  1:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Linus Torvalds,
	Linux Kernel Mailing List, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro

On Tue, Jan 4, 2022 at 7:05 PM Ingo Molnar <mingo@kernel.org> wrote:
> * Arnd Bergmann <arnd@arndb.de> wrote:
>
> > From what I could tell, linux/sched.h was not the only such problem, but
> > I saw similarly bad issues with linux/fs.h (which is what I posted about
> > in November/December), linux/mm.h and linux/netdevice.h on the high
> > level, in low-level headers there are huge issues with linux/atomic.h,
> > linux/mutex.h, linux/pgtable.h etc. I expect that you have addressed
> > these as well,
>
> Correct, each of these was a problem - and a *lot* of other headers in
> addition to those:
>
>   kepler:~/mingo.tip.git> git diff --stat v5.16-rc8.. include/linux/ arch/*/include/asm/ | grep changed
>
>     1335 files changed, 59677 insertions(+), 56582 deletions(-)
>
> and I reduced all the kernels that showed up in the bloat-profile to a
> fraction of their orignal size:
>
>     ------------------------------------------------------------------------------------------
>     | Combined, preprocessed C code size of header, without line markers,
>     | with comments stripped:
>     ------------------------------.-----------------------------.-----------------------------
>                                   | v5.16-rc7                   |  -fast-headers-v1
>                                   |-----------------------------|-----------------------------
>      #include <linux/sched.h>     | LOC: 13,292 | headers:  324 |  LOC:    769 | headers:   64
>      #include <linux/wait.h>      | LOC:  9,369 | headers:  235 |  LOC:    483 | headers:   46
>      #include <linux/rcupdate.h>  | LOC:  8,975 | headers:  224 |  LOC:  1,385 | headers:   86
>      #include <linux/hrtimer.h>   | LOC: 10,861 | headers:  265 |  LOC:    229 | headers:   37
>      #include <linux/fs.h>        | LOC: 22,497 | headers:  427 |  LOC:  1,993 | headers:  120
>      #include <linux/cred.h>      | LOC: 17,257 | headers:  368 |  LOC:  4,830 | headers:  129
>      #include <linux/dcache.h>    | LOC: 10,545 | headers:  253 |  LOC:    858 | headers:   65
>      #include <linux/cgroup.h>    | LOC: 33,518 | headers:  522 |  LOC:  2,477 | headers:  111
>      #include <linux/module.h>    | LOC: 16,948 | headers:  339 |  LOC:  2,239 | headers:  122
>      #include <linux/kobject.h>   | LOC: 15,210 | headers:  318 |  LOC:    799 | headers:   59
>      #include <linux/device.h>    | LOC: 20,505 | headers:  408 |  LOC:  2,131 | headers:  123
>      #include <linux/gfp.h>       | LOC: 13,543 | headers:  303 |  LOC:    181 | headers:   26
>      #include <linux/slab.h>      | LOC: 14,037 | headers:  307 |  LOC:    999 | headers:   74
>      #include <linux/mm.h>        | LOC: 26,727 | headers:  453 |  LOC:  1,855 | headers:  133
>      #include <linux/mmzone.h>    | LOC: 12,755 | headers:  293 |  LOC:    832 | headers:   64
>      #include <linux/swap.h>      | LOC: 38,292 | headers:  559 |  LOC: 11,085 | headers:  294
>      #include <linux/writeback.h> | LOC: 36,481 | headers:  550 |  LOC:  1,566 | headers:   92
>      #include <linux/gfp.h>       | LOC: 13,543 | headers:  303 |  LOC:    181 | headers:   26
>      #include <linux/skbuff.h>    | LOC: 36,130 | headers:  558 |  LOC:  1,209 | headers:   89
>      #include <linux/tcp.h>       | LOC: 60,133 | headers:  725 |  LOC:  3,829 | headers:  153
>      #include <linux/udp.h>       | LOC: 59,411 | headers:  721 |  LOC:  3,236 | headers:  146
>      #include <linux/filter.h>    | LOC: 54,172 | headers:  689 |  LOC:  4,087 | headers:   73
>      #include <linux/interrupt.h> | LOC: 14,085 | headers:  340 |  LOC:  2,629 | headers:  124
>
>      #include <net/sock.h>        | LOC: 58,880 | headers:  715 |  LOC:  1,543 | headers:   98
>
>      #include <asm/processor.h>   | LOC:  7,821 | headers:  204 |  LOC:    618 | headers:   41
>      #include <asm/page.h>        | LOC:  1,540 | headers:   97 |  LOC:  1,193 | headers:   82
>      #include <asm/pgtable.h>     | LOC: 12,949 | headers:  297 |  LOC:  5,742 | headers:  217

Ok, this is roughly the list of headers that I had looked at previously.

> <linux/atomic.h> wasn't a particularly big problem - but it does get
> included everywhere, so I moved the most common atomic_t definition into
> <linux/types.h> (on 64-bit kernels), which allowed a big reduction for the
> majority of cases that don't use the atomic APIs:

Good, I have a patch for the same thing, including moving atomic64_t
and atomic_long_t to linux/types.h there -- I don't think it would be good to
have it in different places on 32-bit architectures.

On arm machines, I found atomic.h to be problematic because it is a large
generated header that depends on the barriers which in turn require other
stuff.

>  #include <linux/atomic.h>               | LOC:    176 | headers:   26
>  #include <linux/atomic_api.h>           | LOC:  2,785 | headers:   52
>
> But <linux/atomic_api.h> is still included in ~75% of .c files, mostly for
> good reasons, because it's a very popular low level API.

These are the x86 numbers, right?

> > but I'd like to make sure that your changes are reasonably complete on
> > arm32 and arm64 to avoid having to do the big cleanup more than once.
>
> I did test ARM64 extensively in terms of build coverage - but not in terms
> of header bloat, and I'm sure more could be done there!

My guess is that each architecture has a couple of dark corners that
require cleaning up before we actually see the benefit of the series.
I'm personally most interested in arm32 and arm64 because that's what
I do my testing on, and I'll try to find those corners. One thing I remember
for arm32 is that there is a nasty dependency for get_current() - >
PAGE_SIZE -> asm/pgtable.h, with pgtable including the world again.
You probably got this one, but any such missing thing can can lead to the
other cleanups not helping that much.

> > My approach to the large mid-level headers is somewhat different: rather
> > than completely avoiding them from getting included, I would like to
> > split up the structure definitions from the inline functions.
>
> That's a big chunk of what the -fast-headers tree does: I've split over 85
> headers into <linux/header_types.h> and <linux/header_api.h>...
>
> I've also split up headers further where needed, in particular mm.h
> required multiple levels of splitting to get the dependencies of the most
> commonly used <linux/mm_types.h> and <linux/mm_api.h> headers under
> control:
>
>   kepler:~/mingo.tip.git> ls -ldt include/linux/mm*api*.h
>   -rw-rw-r-- 1 mingo mingo 77130 Jan  4 13:32 include/linux/mm_api.h
>   -rw-rw-r-- 1 mingo mingo 22227 Jan  4 13:32 include/linux/mmzone_api.h
>   -rw-rw-r-- 1 mingo mingo  6759 Jan  4 13:32 include/linux/mm_api_extra.h
>   -rw-rw-r-- 1 mingo mingo   479 Jan  4 13:31 include/linux/mm_api_exe_file.h
>   -rw-rw-r-- 1 mingo mingo   960 Jan  4 13:31 include/linux/mm_api_truncate.h
>   -rw-rw-r-- 1 mingo mingo  1262 Jan  4 13:31 include/linux/mm_api_kvmalloc.h
>   -rw-rw-r-- 1 mingo mingo   719 Jan  4 13:31 include/linux/mm_api_gate_area.h
>   -rw-rw-r-- 1 mingo mingo  1342 Jan  4 13:31 include/linux/mm_api_kasan.h
>   -rw-rw-r-- 1 mingo mingo  3007 Jan  4 13:31 include/linux/mm_api_tlb_flush.h

Ah, good. That is pretty close to what I had in mind as well, so maybe
we can convince Linus after all. ;-)

> The results are pretty nice:
>
>  # vanilla:
>
>    #include <linux/mm.h>                   | LOC: 26,728 | headers:  453
>
>  # -fast-headers:
>
>    #include <linux/mm.h>                   | LOC:  1,855 | headers:  132  # == mm_types.h
>    #include <linux/mm_types.h>             | LOC:  1,855 | headers:  131
>    #include <linux/mm_api.h>               | LOC:  8,587 | headers:  229
>
> And <linux/mm_api.h> is now included only in about 25% of the .c files - in
> the vanilla kernel the use percentage is over ~90%.
>
> But despite all those reductions, <linux/mm_api.h> is still a header with
> one of the largest cumulative footprints within a (distro) kernel build:
>
>                                                               | stripped lines of code
>                                                               |              _____________________________
>                                                               |             | headers included recursively
>                                                               |             |                _______________________________
>                                                               |             |               | usage in a distro kernel build
>  ____________                                                 |             |               |         _________________________________________
> | header name                                                 |             |               |        | million lines of comment-stripped C code
> |                                                             |             |               |        |
>   #include <linux/spinlock_api.h>                             | LOC:  5,142 | headers:  123 | 10,168 | MLOC:   52.2 | #############
>   #include <linux/device/driver.h>                            | LOC:  4,132 | headers:  169 | 12,306 | MLOC:   50.8 | ############
>   #include <linux/mm_api.h>                                   | LOC:  8,584 | headers:  230 |  5,135 | MLOC:   44.0 | ###########
>   #include <linux/skbuff_api.h>                               | LOC:  8,404 | headers:  190 |  5,065 | MLOC:   42.5 | ##########
>   #include <linux/atomic_api.h>                               | LOC:  2,785 | headers:   52 | 15,282 | MLOC:   42.5 | ##########
>   #include <asm/spinlock.h>                                   | LOC:  4,039 | headers:   83 | 10,168 | MLOC:   41.0 | ##########
>   #include <asm/qrwlock.h>                                    | LOC:  4,039 | headers:   82 | 10,168 | MLOC:   41.0 | ##########
>   #include <asm-generic/qrwlock.h>                            | LOC:  4,039 | headers:   81 | 10,168 | MLOC:   41.0 | ##########
>   #include <linux/page_ref.h>                                 | LOC:  5,397 | headers:  168 |  7,578 | MLOC:   40.8 | ##########
>   #include <asm/qspinlock.h>                                  | LOC:  3,990 | headers:   80 | 10,169 | MLOC:   40.5 | ##########
>   #include <linux/device_types.h>                             | LOC:  2,131 | headers:  122 | 17,424 | MLOC:   37.1 | #########
>   #include <linux/module.h>                                   | LOC:  2,239 | headers:  122 | 16,472 | MLOC:   36.8 | #########
>   #include <net/cfg80211.h>                                   | LOC: 29,004 | headers:  423 |  1,205 | MLOC:   34.9 | ########
>   #include <linux/pci.h>                                      | LOC:  7,092 | headers:  232 |  4,849 | MLOC:   34.3 | ########
>   #include <linux/netdevice_api.h>                            | LOC:  8,434 | headers:  225 |  4,065 | MLOC:   34.2 | ########
>   #include <linux/refcount_api.h>                             | LOC:  3,421 | headers:   87 |  9,776 | MLOC:   33.4 | ########
>
> ( The 'MLOC' footprint estimate is number of usages times
>   preprocessed-stripped-header size. )

This is also the metric that I used in my scripts, except I measured
the preprocessed
size in bytes instead of lines, which should make little difference.

> I've reduced header bloat through three primary angles of attack:
>
>   - reducing number of inclusions
>
>   - reducing header size itself, by type/API splitting & by segmenting
>     headers along API usage frequency
>
>   - decoupling headers from each other
>
> As you can see, fast-headers -v1 is much improved (on x86), but there's
> plenty of work left, such as <net/cfg80211.h>. :-)

Right. I mainly focused on splitting types from the rest, which I think
brings most of the benefits, but taking it further as you did here
helps more.

> > Linus didn't really like my approach,
>
> Yeah, so without having a significant build time speedup I didn't like my
> approach(es) either, which is why I didn't post this tree for a long time. :-)
>
> But the results speak for themselves IMO, and we cannot ignore this: my
> project actually accelerated as I progressed, because the kernel rebuilds,
> especially incremental ones, became faster and faster...
>
> Linux kernel header dependencies need to be simplified.

Agreed. In my 2020 experiments, I managed to get from the point of cleaning
up ~100 headers with very little effect (when everything was still included
through some other header) to cleaning up the next 100 and seeing huge
improvements but also getting discouraged because it started breaking
every driver due to missing indirect includes.

> > but I suspect he'll have similar
> > concerns about your solution for linux/sched.h, especially if we end up
> > applying the same hack to other commonly used structures (sk_buff,
> > mm_struct, super_block) in the end.
>
> So the per_task approach is pretty much unavoidable under the constraint of
> having no runtime overhead, given that task_struct is a historic union of a
> zillion types, where 99% of the users don't actually need to know about
> those types.
>
> ( We could eventually get rid of per_task() as well, by turning complex
>   embedded structs into pointers - but that has runtime overhead due to the
>   indirections, and I tried hard to make this approach runtime-invariant,
>   at least conceptually. )

Would it be possible to have one common task_struct definition that has
all the frequently-accessed fields, plus another larger structure that
embeds the smaller structure plus all the other stuff? I suppose that
would require even larger scale reworks, but it may be a nicer end
result. (again, I have yet to read your patches, so there is probably
an obvious answer why you didn't do this).

          Arnd

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 17:51     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
  2022-01-05  0:05       ` Ingo Molnar
@ 2022-01-05  9:37       ` Andy Shevchenko
  1 sibling, 0 replies; 54+ messages in thread
From: Andy Shevchenko @ 2022-01-05  9:37 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds,
	Linux Kernel Mailing List, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro

On Wed, Jan 5, 2022 at 4:08 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Mon, Jan 3, 2022 at 6:12 AM Ingo Molnar <mingo@kernel.org> wrote:

...

> Most of the patches should be the same either way (adding back
> missing includes to drivers, and doing cleanups to commonly
> included headers to avoid the deep nesting), the interesting bit
> will be how to properly define the larger structures without pulling
> in the rest of the world.

I'm wondering if the compiler can provide us the statistics of usage
on a per custom type basis. In this case the highest frequency will
probably mean that we better have that type in a separate header or
tree of _independent_ headers.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()
  2022-01-05  0:11             ` Ingo Molnar
@ 2022-01-05 15:23               ` Greg Kroah-Hartman
  2022-01-06 11:26                 ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-05 15:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro

On Wed, Jan 05, 2022 at 01:11:03AM +0100, Ingo Molnar wrote:
> 
> * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> 
> > On Tue, Jan 04, 2022 at 04:09:57PM +0100, Greg Kroah-Hartman wrote:
> > > On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote:
> > > > 
> > > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > > > 
> > > > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote:
> > > > > > There's one happy exception though, all the uninlining patches that 
> > > > > > uninline a single-call function are probably fine as-is:
> > > > > 
> > > > > <snip>
> > > > > 
> > > > > >  3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children()
> > > > > 
> > > > > Let me go take this right now, no need for this to wait, it should be
> > > > > out of kobject.h as you rightfully show there is only one user.
> > > > 
> > > > Sure - here you go!
> > > 
> > > I just picked it out of your git tree already :)
> > > 
> > > Along those lines, any objection to me taking at least one other one?
> > > 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and
> 
> Ack.
> 
> > > 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h>
> 
> Ack.

This one required me to fix up a usb core file that was only including
this .h file and not kernel.h which it also needed.  Now resolved in my
tree.

> > > dependencies, remove <linux/device.h>") look like I can take now into my
> > > USB tree with no problems.
> > 
> > Also these look good to go now:
> > 	bae9ddd98195 ("headers/prep: Fix non-standard header section: drivers/usb/cdns3/core.h")
> 
> Ack.
> 
> > 	c027175b37e5 ("headers/prep: Fix non-standard header section: drivers/usb/host/ohci-tmio.c")
> 
> Ack.
> 
> Note that these latter two patches just simplified the task of my 
> (simplistic) tooling, which is basically a shell script that inserts
> header dependencies to the head of .c and .h files, right in front of
> the first #include line it encounters.
> 
> These two patches do have some marginal clean-up value too, so I'm not 
> opposed to merging them - just wanted to declare their true role. :-)

They all are sane cleanups, so I've taken them in my tree now.  Make
your patchset a bit smaller against 5.17-rc1 when that comes around :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-05  1:07         ` Ingo Molnar
@ 2022-01-05 21:42           ` Nathan Chancellor
  2022-01-08 10:32             ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar
                               ` (4 more replies)
  0 siblings, 5 replies; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-05 21:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

On Wed, Jan 05, 2022 at 02:07:42AM +0100, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > * Nathan Chancellor <nathan@kernel.org> wrote:
> > 
> > > > I.e. I think the bug was simply to make main.c aware of the array, now 
> > > > that the INIT_THREAD initialization is done there.
> > > 
> > > Yes, that seems right.
> > > 
> > > Unfortunately, while the kernel now builds, it does not boot in QEMU. I 
> > > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I 
> > > could reproduce that breakage there but the build errors out at that 
> > > change (I do see notes of bisection breakage in some of the commits) so I 
> > > assume that is expected.
> > 
> > Yeah, there's a breakage window on ARM64, I'll track down that 
> > bisectability bug.
> 
> I haven't fixed this ARM64 bisection breakage yet, but I've integrated & 
> backmerged all the other fixes and changes, and pushed it out to the WIP 
> branch:
> 
>     # 1755441e323b per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers
> 
> Let me know if there's anything missing or if there's a new breakage.

I ended up running this through my full set of clang builds and a few
GCC builds and found a few issues, which most of which appear to be
compiler agnostic.

This whole report is against commit 1755441e323b ("per_task: Implement
single template to define 'struct task_struct_per_task' fields and
offsets").

In case it is relevant...

$ gcc --version | head -1
gcc (GCC) 11.2.1 20211231



1. kernel/stackleak.c build failure:

$ make -skj"$(nproc)" ARCH=x86_64 allmodconfig kernel/stackleak.o
kernel/stackleak.c: In function ‘stackleak_erase’:
kernel/stackleak.c:92:13: error: implicit declaration of function ‘on_thread_stack’; did you mean ‘setup_thread_stack’? [-Werror=implicit-function-declaration]
   92 |         if (on_thread_stack())
      |             ^~~~~~~~~~~~~~~
      |             setup_thread_stack
kernel/stackleak.c:95:28: error: implicit declaration of function ‘current_top_of_stack’ [-Werror=implicit-function-declaration]
   95 |                 boundary = current_top_of_stack();
      |                            ^~~~~~~~~~~~~~~~~~~~
kernel/stackleak.c: In function ‘stackleak_track_stack’:
kernel/stackleak.c:119:14: error: implicit declaration of function ‘ALIGN’ [-Werror=implicit-function-declaration]
  119 |         sp = ALIGN(sp, sizeof(unsigned long));
      |              ^~~~~
cc1: all warnings being treated as errors

This is fixed with the following diff although I am unsure if that is as
minimal as it should be.

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index ce161a8e8d97..d67c5475183b 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -10,8 +10,10 @@
  * reveal and blocks some uninitialized stack variable attacks.
  */
 
+#include <asm/processor_api.h>
 #include <linux/stackleak.h>
 #include <linux/kprobes.h>
+#include <linux/align.h>
 
 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
 #include <linux/jump_label.h>



2. Build failures with CONFIG_UAPI_HEADER_TEST=y and O=...

This was originally reproduced with allmodconfig but this is a simpler
reproducer I think.

$ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 defconfig

$ scripts/config --file .build/x86_64/.config -e HEADERS_INSTALL -e UAPI_HEADER_TEST

$ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 olddefconfig usr/
In file included from <command-line>:
./usr/include/linux/rds.h:38:10: fatal error: uapi/linux/sockios.h: No such file or directory
   38 | #include <uapi/linux/sockios.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/rds.hdrtest] Error 1
In file included from ./usr/include/linux/qrtr.h:5,
                 from <command-line>:
./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory
    5 | #include <uapi/linux/socket_types.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from ./usr/include/linux/in.h:24,
                 from ./usr/include/linux/nfs_mount.h:12,
                 from <command-line>:
./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory
    5 | #include <uapi/linux/socket_types.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/qrtr.hdrtest] Error 1
make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/nfs_mount.hdrtest] Error 1
...

I don't see this when just building in the tree. I am guessing that
commit f989e243f1f4 ("headers/deps: uapi/headers: Create
usr/include/uapi symbolic link") needs to account for this?



3. Build failure with CONFIG_SAMPLE_CONNECTOR=m and O=...

I am guessing this has a similar root cause as above, since that commit
mentions an error similar to this.

$ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 allmodconfig samples/connector/
In file included from /home/nathan/cbl/src/linux-fast-headers/samples/connector/ucon.c:14:
usr/include/linux/netlink.h:5:10: fatal error: uapi/linux/types.h: No such file or directory
    5 | #include <uapi/linux/types.h>
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.



4. modpost warning around __sw_hweight64

With the first issue resolved:

$ make -skj"$(nproc)" ARCH=i386 allmodconfig
WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ...
Is "__sw_hweight64" prototyped in <asm/asm-prototypes.h>?



5. Build error in arch/arm64/kvm/hyp/nvhe with LTO

With arm64 + CONFIG_LTO_CLANG_THIN=y, I see:

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig

$ scripts/config -e LTO_CLANG_THIN

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/
ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro
>>> .macro __put, val, name
>>> ^
make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1

I was not able to figure out the exact include chain but CONFIG_LTO
causes asm/alternative-macros.h to be included in asm/rwonce.h, which
eventually gets included in either asm/cache.h or asm/memory.h.

I managed to solve this with the following diff but I am not sure if
there is a better or cleaner way to do that.

diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 1bce62fa908a..e19572a205d0 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -5,7 +5,7 @@
 #ifndef __ASM_RWONCE_H
 #define __ASM_RWONCE_H
 
-#ifdef CONFIG_LTO
+#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT)
 
 #include <linux/compiler_types.h>
 #include <asm/alternative-macros.h>
@@ -66,7 +66,7 @@
 })
 
 #endif	/* !BUILD_VDSO */
-#endif	/* CONFIG_LTO */
+#endif	/* CONFIG_LTO && !LINKER_SCRIPT */
 
 #include <asm-generic/rwonce.h>
 

I'll see if I can flush out any other issues.

Cheers,
Nathan

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-05  0:40       ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
  2022-01-05  1:07         ` Ingo Molnar
@ 2022-01-05 22:33         ` Nathan Chancellor
  1 sibling, 0 replies; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-05 22:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

On Wed, Jan 05, 2022 at 01:40:32AM +0100, Ingo Molnar wrote:
> 
> * Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > Unfortunately, while the kernel now builds, it does not boot in QEMU. I 
> > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I 
> > could reproduce that breakage there but the build errors out at that 
> > change (I do see notes of bisection breakage in some of the commits) so I 
> > assume that is expected.
> 
> Yeah, there's a breakage window on ARM64, I'll track down that 
> bisectability bug.
> 
> Decoupling thread_info and task_struct incrementally, so that it bisects 
> cleanly on all architectures, was always a big challenge. :-/
> 
> > There is no output, even with earlycon, so it seems like something is 
> > going wrong in early boot code. I am not very familiar with the SCS code 
> > so I will see if I can debug this with gdb later (I'll try to see if it 
> > is reproducible with GCC as well; as Nick mentions, there is support 
> > being added to it and I don't mind building from source).
> 
> Just to make sure: with SCS disabled the same kernel boots fine?

Correct (thank you for making sure, I have definitely not tested that
before...).

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 O=.build/arm64 defconfig Image.gz

$ boot-qemu.sh -a arm64 -k .build/arm64 -t 30s
...
[    0.000000] Linux version 5.16.0-rc8-798083-g1755441e323b (nathan@archlinux-ax161) (ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 4602f4169a21e75b82261ba1599046b157d1d021), LLD 14.0.0) #1 SMP PREEMPT Wed Jan 5 21:51:29 UTC 2022
...

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 O=.build/arm64.scs defconfig

$ scripts/config --file .build/arm64.scs/.config -e SHADOW_CALL_STACK

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 O=.build/arm64.scs olddefconfig Image.gz
...
qemu-system-aarch64: terminating on signal 15 from pid 690472 (timeout)
+ RET=124
+ set +x

Going back to v5.16-rc8, everything works fine.

$ boot-qemu.sh -a arm64 -k .build/arm64 -t 30s
...
[    0.000000] Linux version 5.16.0-rc8-795784-gc9e6606c7fe9 (nathan@archlinux-ax161) (ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 4602f4169a21e75b82261ba1599046b157d1d021), LLD 14.0.0) #1 SMP PREEMPT Wed Jan 5 22:27:39 UTC 2022
...

I don't think I will have time to look at this today but I will try
tomorrow. Having the bisectability bug fixed would help narrow things
down but I am almost certain it is something up with the new per_task
infrastructure but I'll have to dig around and see if I can understand
that first.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children()
  2022-01-05 15:23               ` Greg Kroah-Hartman
@ 2022-01-06 11:26                 ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-06 11:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller,
	Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro


* Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> > Note that these latter two patches just simplified the task of my 
> > (simplistic) tooling, which is basically a shell script that inserts 
> > header dependencies to the head of .c and .h files, right in front of 
> > the first #include line it encounters.
> > 
> > These two patches do have some marginal clean-up value too, so I'm not 
> > opposed to merging them - just wanted to declare their true role. :-)
> 
> They all are sane cleanups, so I've taken them in my tree now.  Make your 
> patchset a bit smaller against 5.17-rc1 when that comes around :)

Thank you! :-)

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 10:47   ` Ingo Molnar
                       ` (4 preceding siblings ...)
  2022-01-04 17:50     ` Nathan Chancellor
@ 2022-01-07  0:29     ` Nathan Chancellor
  2022-01-08 11:54       ` Ingo Molnar
  5 siblings, 1 reply; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-07  0:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote:
> > > With the fast-headers kernel that's down to ~36,000 lines of code, 
> > > almost a factor of 3 reduction:
> > > 
> > >   # fast-headers-v1:
> > >   kepler:~/mingo.tip.git> wc -l kernel/pid.i
> > >   35941 kernel/pid.i
> > 
> > Coming from someone who often has to reduce a preprocessed kernel source 
> > file with creduce/cvise to report compiler bugs, this will be a very 
> > welcomed change, as those tools will have to do less work, and I can get 
> > my reports done faster.
> 
> That's nice, didn't think of that side effect.
> 
> Could you perhaps measure this too, to see how much of a benefit it is?

As it turns out, I got an opportunity to measure this sooner rather than
later [1]. Using cvise [2] with an identical set of toolchains and
interestingness test [3], reducing net/core/skbuff.c took significantly
less time with the version from the fast-headers tree.

v5.16-rc8:

$ wc -l skbuff.i
105135 skbuff.i

$ time cvise test.fish skbuff.i
...
________________________________________________________
Executed in  114.02 mins    fish           external
   usr time  1180.43 mins   69.29 millis  1180.43 mins
   sys time  229.80 mins  248.11 millis  229.79 mins

fast-headers:

$ wc -l skbuff.i
78765 skbuff.i

$ time cvise test.fish skbuff.i
...
________________________________________________________
Executed in   47.38 mins    fish           external
   usr time  620.17 mins   32.78 millis  620.17 mins
   sys time  123.70 mins  122.38 millis  123.70 mins

I was not expecting that much of a difference but it somewhat makes
sense, as the tool spends less time eliminated unused code and the
compiler invocations will be incrementally quicker as the input becomes
smaller.

[1]: https://github.com/ClangBuiltLinux/linux/issues/1563
[2]: https://github.com/marxin/cvise
[3]: https://github.com/nathanchance/creduce-files/tree/61056fd763ae3bfb53ff0ae4c1d95550c7c0a5b7/cbl-1563

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h>
  2022-01-05 21:42           ` Nathan Chancellor
@ 2022-01-08 10:32             ` Ingo Molnar
  2022-01-08 11:08             ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 10:32 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> 1. kernel/stackleak.c build failure:
> 
> $ make -skj"$(nproc)" ARCH=x86_64 allmodconfig kernel/stackleak.o
> kernel/stackleak.c: In function ‘stackleak_erase’:
> kernel/stackleak.c:92:13: error: implicit declaration of function ‘on_thread_stack’; did you mean ‘setup_thread_stack’? [-Werror=implicit-function-declaration]

So it turns out that my build environment didn't have the stackleak code 
enabled at all:

  kepler:~/mingo.tip.git> make ARCH=x86_64 allmodconfig
  #
  # configuration written to .config
  #
  kepler:~/mingo.tip.git> grep -E 'STACKLEAK|GCC_PLUGIN' .config
  CONFIG_HAVE_ARCH_STACKLEAK=y
  CONFIG_HAVE_GCC_PLUGINS=y

... because it failed this condition:

 menuconfig GCC_PLUGINS
 ...
        depends on $(success,test -e $(shell,$(CC) -print-file-name=plugin)/include/plugin-version.h)

... because there were no plugin headers:

  kepler:~/mingo.tip.git> gcc -print-file-name=plugin
  /usr/lib/gcc/x86_64-linux-gnu/10/plugin

  kepler:~/mingo.tip.git> ls $(gcc -print-file-name=plugin)/include/
  ls: cannot access '/usr/lib/gcc/x86_64-linux-gnu/10/plugin/include/': No such file or directory

... because I needed to install the plugin-development packages for gcc-10.

After installing those I have stackleak:

  kepler:~/mingo.tip.git> grep STACKLEAK .config
  CONFIG_HAVE_ARCH_STACKLEAK=y
  CONFIG_GCC_PLUGIN_STACKLEAK=y
  CONFIG_STACKLEAK_TRACK_MIN_SIZE=100
  CONFIG_STACKLEAK_METRICS=y
  CONFIG_STACKLEAK_RUNTIME_DISABLE=y

and was able to reproduce your build failure. :-)

> This is fixed with the following diff although I am unsure if that is as
> minimal as it should be.
> 
> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> index ce161a8e8d97..d67c5475183b 100644
> --- a/kernel/stackleak.c
> +++ b/kernel/stackleak.c
> @@ -10,8 +10,10 @@
>   * reveal and blocks some uninitialized stack variable attacks.
>   */
>  
> +#include <asm/processor_api.h>
>  #include <linux/stackleak.h>
>  #include <linux/kprobes.h>
> +#include <linux/align.h>

Yeah - I used a simpler & more generic header: <linux/ptrace_api.h> - see 
the patch below.

But your solution is functionally equivalent. This fix will be included in 
-v2, hopefully released later today.

Thanks,

	Ingo

===============>
From: Ingo Molnar <mingo@kernel.org>
Date: Sat, 8 Jan 2022 11:29:17 +0100
Subject: [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/stackleak.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index ce161a8e8d97..fde49e2f209a 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -10,6 +10,7 @@
  * reveal and blocks some uninitialized stack variable attacks.
  */
 
+#include <linux/ptrace_api.h>
 #include <linux/stackleak.h>
 #include <linux/kprobes.h>
 

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link
  2022-01-05 21:42           ` Nathan Chancellor
  2022-01-08 10:32             ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar
@ 2022-01-08 11:08             ` Ingo Molnar
  2022-01-08 11:18             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 11:08 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> 2. Build failures with CONFIG_UAPI_HEADER_TEST=y and O=...
> 
> This was originally reproduced with allmodconfig but this is a simpler
> reproducer I think.
> 
> $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 defconfig
> 
> $ scripts/config --file .build/x86_64/.config -e HEADERS_INSTALL -e UAPI_HEADER_TEST
> 
> $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 olddefconfig usr/

The simplified & scripted reproducer is very useful, thanks a ton!

> In file included from <command-line>:
> ./usr/include/linux/rds.h:38:10: fatal error: uapi/linux/sockios.h: No such file or directory
>    38 | #include <uapi/linux/sockios.h>
>       |          ^~~~~~~~~~~~~~~~~~~~~~
> compilation terminated.
> make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/rds.hdrtest] Error 1
> In file included from ./usr/include/linux/qrtr.h:5,
>                  from <command-line>:
> ./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory
>     5 | #include <uapi/linux/socket_types.h>
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> compilation terminated.
> In file included from ./usr/include/linux/in.h:24,
>                  from ./usr/include/linux/nfs_mount.h:12,
>                  from <command-line>:
> ./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory
>     5 | #include <uapi/linux/socket_types.h>
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> compilation terminated.
> make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/qrtr.hdrtest] Error 1
> make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/nfs_mount.hdrtest] Error 1
> ...
> 
> I don't see this when just building in the tree. I am guessing that
> commit f989e243f1f4 ("headers/deps: uapi/headers: Create
> usr/include/uapi symbolic link") needs to account for this?

Yeah. Here's my second attempt that creates the symlink as the 
header-install make process, as it should - also pushed out into 
sched/headers.

(My Makefile-fu isn't overly powerful though, so this is just an attempt.)

This fix will be backmerged into f989e243f1f4 in -v2.

Thanks,

	Ingo

=========================>
From: Ingo Molnar <mingo@kernel.org>
Date: Sat, 8 Jan 2022 12:05:57 +0100
Subject: [PATCH] FIX: f989e243f1f4 headers/deps: uapi/headers: Create usr/include/uapi symbolic link

---
 scripts/Makefile.headersinst | 3 +++
 usr/include/uapi             | 1 -
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst
index 029d85bb0b23..8ac831458143 100644
--- a/scripts/Makefile.headersinst
+++ b/scripts/Makefile.headersinst
@@ -78,6 +78,9 @@ existing-headers := $(filter $(old-headers), $(all-headers))
 
 -include $(foreach f,$(existing-headers),$(dir $(f)).$(notdir $(f)).cmd)
 
+# link the <uapi/*> namespace:
+LINK := $(shell ln -sf ../include $(objtree)/$(dst)/uapi)
+
 PHONY += FORCE
 FORCE:
 
diff --git a/usr/include/uapi b/usr/include/uapi
deleted file mode 120000
index f5030fe88998..000000000000
--- a/usr/include/uapi
+++ /dev/null
@@ -1 +0,0 @@
-../include
\ No newline at end of file

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-05 21:42           ` Nathan Chancellor
  2022-01-08 10:32             ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar
  2022-01-08 11:08             ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar
@ 2022-01-08 11:18             ` Ingo Molnar
  2022-01-08 11:38             ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar
  2022-01-08 11:49             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
  4 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 11:18 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> 3. Build failure with CONFIG_SAMPLE_CONNECTOR=m and O=...
> 
> I am guessing this has a similar root cause as above, since that commit
> mentions an error similar to this.
> 
> $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 allmodconfig samples/connector/
> In file included from /home/nathan/cbl/src/linux-fast-headers/samples/connector/ucon.c:14:
> usr/include/linux/netlink.h:5:10: fatal error: uapi/linux/types.h: No such file or directory
>     5 | #include <uapi/linux/types.h>
>       |          ^~~~~~~~~~~~~~~~~~~~
> compilation terminated.

Correct - this test now passes with the UAPI symlink fix applied:

  kepler:~/mingo.tip.git> make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 allmodconfig samples/connector/
  kepler:~/mingo.tip.git> 

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation
  2022-01-05 21:42           ` Nathan Chancellor
                               ` (2 preceding siblings ...)
  2022-01-08 11:18             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
@ 2022-01-08 11:38             ` Ingo Molnar
  2022-01-08 11:49             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
  4 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 11:38 UTC (permalink / raw)
  To: Nathan Chancellor, Borislav Petkov, Thomas Gleixner
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> 4. modpost warning around __sw_hweight64
> 
> With the first issue resolved:
> 
> $ make -skj"$(nproc)" ARCH=i386 allmodconfig
> WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ...
> Is "__sw_hweight64" prototyped in <asm/asm-prototypes.h>?

So I was hoping that this commit made explicit all the random indirect 
header dependencies x86's <asm/asm-prototypes.h> imports on mainline:

    headers/prep: x86/kbuild: Add symbol prototype header dependencies for modversions

... but a i386 case slipped through.

But, this actually highlights a real x86 symbol export bug IMO.

__arch_hweight64() on x86-32 is defined in the 
arch/x86/include/asm/arch_hweight.h header as an inline, using 
__arch_hweight32():


  #ifdef CONFIG_X86_32
  static inline unsigned long __arch_hweight64(__u64 w)
  {
          return  __arch_hweight32((u32)w) +
                  __arch_hweight32((u32)(w >> 32));
  }

*But* there's also a __sw_hweight64() assembly implementation:

  arch/x86/lib/hweight.S

  SYM_FUNC_START(__sw_hweight64)
  #ifdef CONFIG_X86_64
  ...
  #else /* CONFIG_X86_32 */
        /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */
        pushl   %ecx

        call    __sw_hweight32
        movl    %eax, %ecx                      # stash away result
        movl    %edx, %eax                      # second part of input
        call    __sw_hweight32
        addl    %ecx, %eax                      # result

        popl    %ecx
        ret
  #endif

But this __sw_hweight64 assembly implementation is unused - and it's 
essentially doing the same thing that the inline wrapper does. Then we 
export this unused helper with no prototype.

This went unnoticed in mainline, because mainline defines the prototype for 
the unused prototype.

So I think the real solution to resolve this is by removing the unused 
32-bit variant - see the patch below.

Thanks,

	Ingo

======================>
From: Ingo Molnar <mingo@kernel.org>
Date: Sat, 8 Jan 2022 12:33:58 +0100
Subject: [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation

Header cleanups in the fast-headers tree highlighted that we have an
unused assembly implementation for __sw_hweight64():

    WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ...

__arch_hweight64() on x86-32 is defined in the
arch/x86/include/asm/arch_hweight.h header as an inline, using
__arch_hweight32():

  #ifdef CONFIG_X86_32
  static inline unsigned long __arch_hweight64(__u64 w)
  {
          return  __arch_hweight32((u32)w) +
                  __arch_hweight32((u32)(w >> 32));
  }

*But* there's also a __sw_hweight64() assembly implementation:

  arch/x86/lib/hweight.S

  SYM_FUNC_START(__sw_hweight64)
  #ifdef CONFIG_X86_64
  ...
  #else /* CONFIG_X86_32 */
        /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */
        pushl   %ecx

        call    __sw_hweight32
        movl    %eax, %ecx                      # stash away result
        movl    %edx, %eax                      # second part of input
        call    __sw_hweight32
        addl    %ecx, %eax                      # result

        popl    %ecx
        ret
  #endif

But this __sw_hweight64 assembly implementation is unused - and it's
essentially doing the same thing that the inline wrapper does.

Remove the assembly version and add a comment about it.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/lib/hweight.S | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/lib/hweight.S b/arch/x86/lib/hweight.S
index dbf8cc97b7f5..585e2f1372d0 100644
--- a/arch/x86/lib/hweight.S
+++ b/arch/x86/lib/hweight.S
@@ -36,8 +36,12 @@ SYM_FUNC_START(__sw_hweight32)
 SYM_FUNC_END(__sw_hweight32)
 EXPORT_SYMBOL(__sw_hweight32)
 
-SYM_FUNC_START(__sw_hweight64)
+/*
+ * No 32-bit variant, because it's implemented as an inline wrapper
+ * on top of __arch_hweight32():
+ */
 #ifdef CONFIG_X86_64
+SYM_FUNC_START(__sw_hweight64)
 	pushq   %rdi
 	pushq   %rdx
 
@@ -66,18 +70,6 @@ SYM_FUNC_START(__sw_hweight64)
 	popq    %rdx
 	popq    %rdi
 	ret
-#else /* CONFIG_X86_32 */
-	/* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */
-	pushl   %ecx
-
-	call    __sw_hweight32
-	movl    %eax, %ecx                      # stash away result
-	movl    %edx, %eax                      # second part of input
-	call    __sw_hweight32
-	addl    %ecx, %eax                      # result
-
-	popl    %ecx
-	ret
-#endif
 SYM_FUNC_END(__sw_hweight64)
 EXPORT_SYMBOL(__sw_hweight64)
+#endif

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-05 21:42           ` Nathan Chancellor
                               ` (3 preceding siblings ...)
  2022-01-08 11:38             ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar
@ 2022-01-08 11:49             ` Ingo Molnar
  2022-01-08 12:17               ` Ingo Molnar
  2022-01-10 20:03               ` Nathan Chancellor
  4 siblings, 2 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 11:49 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO
> 
> With arm64 + CONFIG_LTO_CLANG_THIN=y, I see:
> 
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig
> 
> $ scripts/config -e LTO_CLANG_THIN
> 
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/
> ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro
> >>> .macro __put, val, name
> >>> ^
> make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1
> 
> I was not able to figure out the exact include chain but CONFIG_LTO
> causes asm/alternative-macros.h to be included in asm/rwonce.h, which
> eventually gets included in either asm/cache.h or asm/memory.h.
> 
> I managed to solve this with the following diff but I am not sure if
> there is a better or cleaner way to do that.
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 1bce62fa908a..e19572a205d0 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -5,7 +5,7 @@
>  #ifndef __ASM_RWONCE_H
>  #define __ASM_RWONCE_H
>  
> -#ifdef CONFIG_LTO
> +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT)
>  
>  #include <linux/compiler_types.h>
>  #include <asm/alternative-macros.h>
> @@ -66,7 +66,7 @@
>  })
>  
>  #endif	/* !BUILD_VDSO */
> -#endif	/* CONFIG_LTO */
> +#endif	/* CONFIG_LTO && !LINKER_SCRIPT */

So the error message suggests that the linker script somehow ends up 
including asm-generic/export.h:

  kepler:~/mingo.tip.git> git grep 'macro __put'
  include/asm-generic/export.h:.macro __put, val, name

?

But I'd guess that similar to the __ASSEMBLY__ patterns we have in headers, 
not including the rwonce.h bits if LINKER_SCRIPT is defined is probably 
close to the right solution - but it would also know how such a low level 
header ended up in a linker script. Might have been to pick up some offset 
or size definition somewhere?

I.e. how did the build end up including asm/rwonce.h?

You can generally debug such weird dependency chains by putting a
debug #warning into the affected header - such as the patch below.

This prints a stack of the header dependencies:

    CC      kernel/sched/core.o
  In file included from ./include/linux/compiler.h:263,
                 from ./include/linux/static_call_types.h:7,
                 from ./include/linux/kernel.h:6,
                 from ./include/linux/highmem.h:5,
                 from kernel/sched/core.c:9:
  ./arch/arm64/include/asm/rwonce.h:8:2: warning: #warning debug [-Wcpp]
      8 | #warning debug

... and should in principle also work in the linker script context.

Thanks,

	Ingo

===============>
 arch/arm64/include/asm/rwonce.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 1bce62fa908a..5b3305381481 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -5,6 +5,8 @@
 #ifndef __ASM_RWONCE_H
 #define __ASM_RWONCE_H
 
+#warning debug
+
 #ifdef CONFIG_LTO
 
 #include <linux/compiler_types.h>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-07  0:29     ` Nathan Chancellor
@ 2022-01-08 11:54       ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 11:54 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote:
> > > > With the fast-headers kernel that's down to ~36,000 lines of code, 
> > > > almost a factor of 3 reduction:
> > > > 
> > > >   # fast-headers-v1:
> > > >   kepler:~/mingo.tip.git> wc -l kernel/pid.i
> > > >   35941 kernel/pid.i
> > > 
> > > Coming from someone who often has to reduce a preprocessed kernel source 
> > > file with creduce/cvise to report compiler bugs, this will be a very 
> > > welcomed change, as those tools will have to do less work, and I can get 
> > > my reports done faster.
> > 
> > That's nice, didn't think of that side effect.
> > 
> > Could you perhaps measure this too, to see how much of a benefit it is?
> 
> As it turns out, I got an opportunity to measure this sooner rather than
> later [1]. Using cvise [2] with an identical set of toolchains and
> interestingness test [3], reducing net/core/skbuff.c took significantly
> less time with the version from the fast-headers tree.
> 
> v5.16-rc8:
> 
> $ wc -l skbuff.i
> 105135 skbuff.i
> 
> $ time cvise test.fish skbuff.i
> ...
> ________________________________________________________
> Executed in  114.02 mins    fish           external
>    usr time  1180.43 mins   69.29 millis  1180.43 mins
>    sys time  229.80 mins  248.11 millis  229.79 mins
> 
> fast-headers:
> 
> $ wc -l skbuff.i
> 78765 skbuff.i
> 
> $ time cvise test.fish skbuff.i
> ...
> ________________________________________________________
> Executed in   47.38 mins    fish           external
>    usr time  620.17 mins   32.78 millis  620.17 mins
>    sys time  123.70 mins  122.38 millis  123.70 mins
> 
> I was not expecting that much of a difference but it somewhat makes 
> sense, as the tool spends less time eliminated unused code and the 
> compiler invocations will be incrementally quicker as the input becomes 
> smaller.

Indeed, that's a +140% speedup in build performance, not bad. :-)

I also got around testing Clang (12) myself, and with my 'reference distro 
config' I got these results:

 #
 # v5.16-rc8
 #
 Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

 55,638,543,274,254      instructions              #    0.77  insn per cycle           ( +-  0.01% )
 72,074,911,968,393      cycles                    #    3.901 GHz                      ( +-  0.04% )
      18,490,451.51 msec cpu-clock                 #   54.740 CPUs utilized            ( +-  0.04% )

                 337.788 +- 0.834 seconds time elapsed  ( +-  0.25% )

 #
 # -fast-headers-v2-rc3
 #
 Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

 30,904,130,243,855      instructions              #    0.76  insn per cycle           ( +-  0.02% )
 40,703,482,733,690      cycles                    #    3.898 GHz                      ( +-  0.00% )
      10,443,670.86 msec cpu-clock                 #   58.093 CPUs utilized            ( +-  0.00% )

                 179.773 +- 0.829 seconds time elapsed  ( +-  0.46% )

That's a +88% build speedup on Clang - even better than the +78% speedup on 
GCC(-10).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-08 11:49             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
@ 2022-01-08 12:17               ` Ingo Molnar
  2022-01-10 20:03               ` Nathan Chancellor
  1 sibling, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 12:17 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Ingo Molnar <mingo@kernel.org> wrote:

> * Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO
> > 
> > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see:
> > 
> > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig
> > 
> > $ scripts/config -e LTO_CLANG_THIN
> > 
> > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/
> > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro
> > >>> .macro __put, val, name
> > >>> ^
> > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1
> > 
> > I was not able to figure out the exact include chain but CONFIG_LTO
> > causes asm/alternative-macros.h to be included in asm/rwonce.h, which
> > eventually gets included in either asm/cache.h or asm/memory.h.
> > 
> > I managed to solve this with the following diff but I am not sure if
> > there is a better or cleaner way to do that.
> > 
> > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > index 1bce62fa908a..e19572a205d0 100644
> > --- a/arch/arm64/include/asm/rwonce.h
> > +++ b/arch/arm64/include/asm/rwonce.h
> > @@ -5,7 +5,7 @@
> >  #ifndef __ASM_RWONCE_H
> >  #define __ASM_RWONCE_H
> >  
> > -#ifdef CONFIG_LTO
> > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT)
> >  
> >  #include <linux/compiler_types.h>
> >  #include <asm/alternative-macros.h>
> > @@ -66,7 +66,7 @@
> >  })
> >  
> >  #endif	/* !BUILD_VDSO */
> > -#endif	/* CONFIG_LTO */
> > +#endif	/* CONFIG_LTO && !LINKER_SCRIPT */

In any case I've added your fix to the fast-headers tree, with a comment 
that this might just be a workaround.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-04 17:50     ` Nathan Chancellor
  2022-01-05  0:35       ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar
  2022-01-05  0:40       ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
@ 2022-01-08 15:16       ` Ingo Molnar
  2 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2022-01-08 15:16 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm


* Nathan Chancellor <nathan@kernel.org> wrote:

> I tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if 
> I could reproduce that breakage there but the build errors out at that 
> change (I do see notes of bisection breakage in some of the commits) so I 
> assume that is expected.

Yeah, so the underlying problem is that these two commits want to be a 
single commit:

  # Commit #117
  headers/deps: Move task->thread_info to per_task()

  # Commit #106
  headers/deps: Move thread_info APIs to <linux/sched/thread_info_api.h>

As we can only switch ARM64's <asm/preempt.h> to use per_task() - which 
requires <linux/sched.h> - if we first fix & simplify <linux/sched.h>'s 
header dependencies, which is done to a sufficient level by:

  # Commit #556
  headers/deps: Optimize <linux/sched.h> dependencies, remove <linux/sched/thread_info_api_lowlevel.h> inclusion


So it's a catch-22, and quite a complication, and a bisection breakage 
distance of ~450 commits, with a lot of ordering assumptions & conflicts 
along the way, should we attempt to move the first two to later stages. :-/

But today I've restructured the tree, and the -v2-to-be tree is now fully 
bisectable on ARM64 too. :-)

There's a single, late per_cpu() conversion commit, after the first phase 
of <linux/sched.h> simplifications:

   headers/deps: Move task->thread_info to per_task()

I'd guess that either this one is that breaks SCS for you, or the ::thread 
conversion:

   headers/deps: per_task, arm64, x86: Convert task_struct::thread to a per_task() field

I've pushed out these fixes to the sched/headers branch a couple of minutes 
ago, and this will be part of the -v2 release as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-03 16:29       ` Ingo Molnar
@ 2022-01-10 10:28         ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2022-01-10 10:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg Kroah-Hartman, Linus Torvalds, linux-kernel, linux-arch,
	Andrew Morton, Thomas Gleixner, David S. Miller, Ard Biesheuvel,
	Josh Poimboeuf, Jonathan Corbet, Al Viro

On Mon, Jan 03, 2022 at 05:29:02PM +0100, Ingo Molnar wrote:
> Yeah, so I *did* find this somewhat suboptimal too, and developed an 
> earlier version that used linker section tricks to gain the field offsets 
> more automatically.
> 
> It was an unmitigated disaster: was fragile on x86 already (which has a zoo 
> of linking quirks with no precedent of doing this before bounds.c 
> processing), but on ARM64 and probably on most of the other RISC-ish 
> architectures there was also a real runtime code generation cost of using 
> linker tricks: 2-3 extra instructions per per_task() use - clearly 
> unacceptable.
> 
> Found this out the hard way after making it boot & work on ARM64 and 
> looking at the assembly output, trying to figure out why the generated code 
> size increased. :-/

Right, I suggested you do the per-cpu thing. And then Mark reported that
code-gen issue on arm64.

I'm still thinking the toolchains ought to look at fixing that. It'll be
too late to use for per-task, but at least the current per-cpu usages
will (eventually) get better code-gen.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-08 11:49             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
  2022-01-08 12:17               ` Ingo Molnar
@ 2022-01-10 20:03               ` Nathan Chancellor
  2022-01-10 20:05                 ` Nathan Chancellor
  1 sibling, 1 reply; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-10 20:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

On Sat, Jan 08, 2022 at 12:49:04PM +0100, Ingo Molnar wrote:
> 
> * Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO
> > 
> > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see:
> > 
> > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig
> > 
> > $ scripts/config -e LTO_CLANG_THIN
> > 
> > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/
> > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro
> > >>> .macro __put, val, name
> > >>> ^
> > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1
> > 
> > I was not able to figure out the exact include chain but CONFIG_LTO
> > causes asm/alternative-macros.h to be included in asm/rwonce.h, which
> > eventually gets included in either asm/cache.h or asm/memory.h.
> > 
> > I managed to solve this with the following diff but I am not sure if
> > there is a better or cleaner way to do that.
> > 
> > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > index 1bce62fa908a..e19572a205d0 100644
> > --- a/arch/arm64/include/asm/rwonce.h
> > +++ b/arch/arm64/include/asm/rwonce.h
> > @@ -5,7 +5,7 @@
> >  #ifndef __ASM_RWONCE_H
> >  #define __ASM_RWONCE_H
> >  
> > -#ifdef CONFIG_LTO
> > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT)
> >  
> >  #include <linux/compiler_types.h>
> >  #include <asm/alternative-macros.h>
> > @@ -66,7 +66,7 @@
> >  })
> >  
> >  #endif	/* !BUILD_VDSO */
> > -#endif	/* CONFIG_LTO */
> > +#endif	/* CONFIG_LTO && !LINKER_SCRIPT */
> 
> So the error message suggests that the linker script somehow ends up 
> including asm-generic/export.h:
> 
>   kepler:~/mingo.tip.git> git grep 'macro __put'
>   include/asm-generic/export.h:.macro __put, val, name
> 
> ?

Correct.

> But I'd guess that similar to the __ASSEMBLY__ patterns we have in headers, 
> not including the rwonce.h bits if LINKER_SCRIPT is defined is probably 
> close to the right solution - but it would also know how such a low level 
> header ended up in a linker script. Might have been to pick up some offset 
> or size definition somewhere?
> 
> I.e. how did the build end up including asm/rwonce.h?
> 
> You can generally debug such weird dependency chains by putting a
> debug #warning into the affected header - such as the patch below.
> 
> This prints a stack of the header dependencies:
> 
>     CC      kernel/sched/core.o
>   In file included from ./include/linux/compiler.h:263,
>                  from ./include/linux/static_call_types.h:7,
>                  from ./include/linux/kernel.h:6,
>                  from ./include/linux/highmem.h:5,
>                  from kernel/sched/core.c:9:
>   ./arch/arm64/include/asm/rwonce.h:8:2: warning: #warning debug [-Wcpp]
>       8 | #warning debug
> 
> ... and should in principle also work in the linker script context.

Neat trick! I added

#ifdef LINKER_SCRIPT
#warning debug
#endif

to arch/arm64/include/asm/rwonce.h and built with ThinLTO, which reveals:

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig

$ scripts/config -d LTO_NONE -e LTO_CLANG_THIN

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/
In file included from arch/arm64/kvm/hyp/nvhe/hyp.lds.S:12:
In file included from ./arch/arm64/include/asm/memory.h:18:
In file included from ./arch/arm64/include/asm/thread_info.h:11:
In file included from ./include/linux/compiler.h:263:
./arch/arm64/include/asm/rwonce.h:9:2: warning: debug [-W#warnings]
#warning debug
 ^
1 warning generated.

I wonder if the compiler.h include could be broken up? I removed it
altogether just to see what would break and defconfig, defconfig +
CONFIG_LTO_CLANG_THIN=y, and allmodconfig all continue to build.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
  2022-01-10 20:03               ` Nathan Chancellor
@ 2022-01-10 20:05                 ` Nathan Chancellor
  0 siblings, 0 replies; 54+ messages in thread
From: Nathan Chancellor @ 2022-01-10 20:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro, llvm

On Mon, Jan 10, 2022 at 01:03:54PM -0700, Nathan Chancellor wrote:
> On Sat, Jan 08, 2022 at 12:49:04PM +0100, Ingo Molnar wrote:
> > 
> > * Nathan Chancellor <nathan@kernel.org> wrote:
> > 
> > > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO
> > > 
> > > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see:
> > > 
> > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig
> > > 
> > > $ scripts/config -e LTO_CLANG_THIN
> > > 
> > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/
> > > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro
> > > >>> .macro __put, val, name
> > > >>> ^
> > > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1
> > > 
> > > I was not able to figure out the exact include chain but CONFIG_LTO
> > > causes asm/alternative-macros.h to be included in asm/rwonce.h, which
> > > eventually gets included in either asm/cache.h or asm/memory.h.
> > > 
> > > I managed to solve this with the following diff but I am not sure if
> > > there is a better or cleaner way to do that.
> > > 
> > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > > index 1bce62fa908a..e19572a205d0 100644
> > > --- a/arch/arm64/include/asm/rwonce.h
> > > +++ b/arch/arm64/include/asm/rwonce.h
> > > @@ -5,7 +5,7 @@
> > >  #ifndef __ASM_RWONCE_H
> > >  #define __ASM_RWONCE_H
> > >  
> > > -#ifdef CONFIG_LTO
> > > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT)
> > >  
> > >  #include <linux/compiler_types.h>
> > >  #include <asm/alternative-macros.h>
> > > @@ -66,7 +66,7 @@
> > >  })
> > >  
> > >  #endif	/* !BUILD_VDSO */
> > > -#endif	/* CONFIG_LTO */
> > > +#endif	/* CONFIG_LTO && !LINKER_SCRIPT */
> > 
> > So the error message suggests that the linker script somehow ends up 
> > including asm-generic/export.h:
> > 
> >   kepler:~/mingo.tip.git> git grep 'macro __put'
> >   include/asm-generic/export.h:.macro __put, val, name
> > 
> > ?
> 
> Correct.
> 
> > But I'd guess that similar to the __ASSEMBLY__ patterns we have in headers, 
> > not including the rwonce.h bits if LINKER_SCRIPT is defined is probably 
> > close to the right solution - but it would also know how such a low level 
> > header ended up in a linker script. Might have been to pick up some offset 
> > or size definition somewhere?
> > 
> > I.e. how did the build end up including asm/rwonce.h?
> > 
> > You can generally debug such weird dependency chains by putting a
> > debug #warning into the affected header - such as the patch below.
> > 
> > This prints a stack of the header dependencies:
> > 
> >     CC      kernel/sched/core.o
> >   In file included from ./include/linux/compiler.h:263,
> >                  from ./include/linux/static_call_types.h:7,
> >                  from ./include/linux/kernel.h:6,
> >                  from ./include/linux/highmem.h:5,
> >                  from kernel/sched/core.c:9:
> >   ./arch/arm64/include/asm/rwonce.h:8:2: warning: #warning debug [-Wcpp]
> >       8 | #warning debug
> > 
> > ... and should in principle also work in the linker script context.
> 
> Neat trick! I added
> 
> #ifdef LINKER_SCRIPT
> #warning debug
> #endif
> 
> to arch/arm64/include/asm/rwonce.h and built with ThinLTO, which reveals:
> 
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig
> 
> $ scripts/config -d LTO_NONE -e LTO_CLANG_THIN
> 
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/
> In file included from arch/arm64/kvm/hyp/nvhe/hyp.lds.S:12:
> In file included from ./arch/arm64/include/asm/memory.h:18:
> In file included from ./arch/arm64/include/asm/thread_info.h:11:
> In file included from ./include/linux/compiler.h:263:
> ./arch/arm64/include/asm/rwonce.h:9:2: warning: debug [-W#warnings]
> #warning debug
>  ^
> 1 warning generated.
> 
> I wonder if the compiler.h include could be broken up? I removed it
> altogether just to see what would break and defconfig, defconfig +
> CONFIG_LTO_CLANG_THIN=y, and allmodconfig all continue to build.

Sorry, got ahead of myself there and forgot to include the diff:

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f1bf6f6243ac..6da41eaa64bb 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -8,8 +8,6 @@
 #ifndef __ASM_THREAD_INFO_H
 #define __ASM_THREAD_INFO_H
 
-#include <linux/compiler.h>
-
 #ifndef __ASSEMBLY__
 
 struct task_struct;

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"
       [not found] <YdIfz+LMewetSaEB@gmail.com>
                   ` (5 preceding siblings ...)
  2022-01-04 16:18 ` Andy Shevchenko
@ 2022-01-15  0:42 ` Paul E. McKenney
  6 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2022-01-15  0:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman,
	David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet,
	Al Viro

On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote:
> 
> I'm pleased to announce the first public version of my new "Fast Kernel 
> Headers" project that I've been working on since late 2020, which is a 
> comprehensive rework of the Linux kernel's header hierarchy & header 
> dependencies, with the dual goals of:
> 
>  - speeding up the kernel build (both absolute and incremental build times)
> 
>  - decoupling subsystem type & API definitions from each other

Yow!!!  ;-)

[ . . . ]

>       headers/uninline: Uninline multi-use function: finish_rcuwait()

This one looks fine on its own merits, so I grabbed it from your git tree:

ecdadb5289d1 ("headers/uninline: Uninline multi-use function: finish_rcuwait()")

>       headers/deps: RCU: Remove __read_mostly annotations from externs

And same with this one:

1c8af2245fd7 ("headers/deps: RCU: Remove __read_mostly annotations from externs")


Of course, if you would rather keep these, please let me know and I will
drop them.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2022-01-15  0:42 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <YdIfz+LMewetSaEB@gmail.com>
2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman
2022-01-03 11:12   ` Ingo Molnar
2022-01-03 13:46     ` Greg Kroah-Hartman
2022-01-03 16:29       ` Ingo Molnar
2022-01-10 10:28         ` Peter Zijlstra
2022-01-04 14:10     ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar
2022-01-04 15:14       ` Andy Shevchenko
2022-01-04 23:27         ` Ingo Molnar
2022-01-04 17:51     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
2022-01-05  0:05       ` Ingo Molnar
2022-01-05  1:37         ` Arnd Bergmann
2022-01-05  9:37       ` Andy Shevchenko
2022-01-04 14:05   ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar
2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov
2022-01-04 10:54   ` Ingo Molnar
2022-01-04 13:34     ` Greg Kroah-Hartman
2022-01-04 13:54       ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar
2022-01-04 15:09         ` Greg Kroah-Hartman
2022-01-04 15:14           ` Greg Kroah-Hartman
2022-01-05  0:11             ` Ingo Molnar
2022-01-05 15:23               ` Greg Kroah-Hartman
2022-01-06 11:26                 ` Ingo Molnar
2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
2022-01-04 10:47   ` Ingo Molnar
2022-01-04 10:56     ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar
2022-01-04 11:02     ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
2022-01-04 15:05       ` kernel test robot
2022-01-04 17:51       ` Nathan Chancellor
2022-01-05  0:20         ` Ingo Molnar
2022-01-05  0:26           ` [PATCH] headers/deps: Attribute placement fixes for Clang & GCC Ingo Molnar
2022-01-04 11:19     ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar
2022-01-04 17:25     ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers
2022-01-05  0:43       ` Ingo Molnar
2022-01-04 17:50     ` Nathan Chancellor
2022-01-05  0:35       ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar
2022-01-05  0:40       ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-05  1:07         ` Ingo Molnar
2022-01-05 21:42           ` Nathan Chancellor
2022-01-08 10:32             ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar
2022-01-08 11:08             ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar
2022-01-08 11:18             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-08 11:38             ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar
2022-01-08 11:49             ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-08 12:17               ` Ingo Molnar
2022-01-10 20:03               ` Nathan Chancellor
2022-01-10 20:05                 ` Nathan Chancellor
2022-01-05 22:33         ` Nathan Chancellor
2022-01-08 15:16       ` Ingo Molnar
2022-01-07  0:29     ` Nathan Chancellor
2022-01-08 11:54       ` Ingo Molnar
2022-01-04 12:36 ` Willy Tarreau
2022-01-04 16:05 ` Andy Shevchenko
2022-01-04 16:18 ` Andy Shevchenko
2022-01-15  0:42 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).