* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> @ 2022-01-03 10:11 ` Greg Kroah-Hartman 2022-01-03 11:12 ` Ingo Molnar 2022-01-04 14:05 ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar 2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov ` (5 subsequent siblings) 6 siblings, 2 replies; 54+ messages in thread From: Greg Kroah-Hartman @ 2022-01-03 10:11 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > I'm pleased to announce the first public version of my new "Fast Kernel > Headers" project that I've been working on since late 2020, which is a > comprehensive rework of the Linux kernel's header hierarchy & header > dependencies, with the dual goals of: > > - speeding up the kernel build (both absolute and incremental build times) > > - decoupling subsystem type & API definitions from each other > > The fast-headers tree consists of over 25 sub-trees internally, spanning > over 2,200 commits, which can be found here: > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master > > As most kernel developers know, there's around ~10,000 main .h headers in > the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the > last 30+ years they have grown into a complicated & painful set of > cross-dependencies we are affectionately calling 'Dependency Hell'. > > Before going into details about how this tree solves 'dependency hell' > exactly, here's the current kernel build performance gain with > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as > well - see below), using a stock x86 Linux distribution's .config with all > modules built into the vmlinux: > > # > # Performance counter stats for 'make -j96 vmlinux' (3 runs): > # > # (Elapsed time in seconds): > # > > v5.16-rc7: 231.34 +- 0.60 secs, 15.5 builds/hour # [ vanilla baseline ] > -fast-headers-v1: 129.97 +- 0.51 secs, 27.7 builds/hour # +78.0% improvement > > Or in terms of CPU time utilized: > > v5.16-rc7: 11,474,982.05 msec cpu-clock # 49.601 CPUs utilized > -fast-headers-v1: 7,100,730.37 msec cpu-clock # 54.635 CPUs utilized # +61.6% improvement Speed up is very impressive, nice job! > Techniques used by the fast-headers tree to reduce header size & dependencies: > > - Aggressive decoupling of high level headers from each other, starting > with <linux/sched.h>. Since 'struct task_struct' is a union of many > subsystems, there's a new "per_task" infrastructure modeled after the > per_cpu framework, which creates fields in task_struct without having > to modify sched.h or the 'struct task_struct' type: > > DECLARE_PER_TASK(type, name); > ... > per_task(current, name) = val; > > The per_task() facility then seamlessly creates an offset into the > task_struct->per_task_area[] array, and uses the asm-offsets.h > mechanism to create offsets into it early in the build. > > There's no runtime overhead disadvantage from using per_task() framework, > the generated code is functionally equivalent to types embedded in > task_struct. This is "interesting", but how are you going to keep the kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task definition in sync? It seems that you manually created this (which is great for testing), but over the long-term, trying to manually determine what needs to be done here to keep everything lined up properly is going to be a major pain. That issue aside, I took a glance at the tree, and overall it looks like a lot of nice cleanups. Most of these can probably go through the various subsystem trees, after you split them out, for the "major" .h cleanups. Is that something you are going to be planning on doing? thanks, greg k-h ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman @ 2022-01-03 11:12 ` Ingo Molnar 2022-01-03 13:46 ` Greg Kroah-Hartman ` (2 more replies) 2022-01-04 14:05 ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar 1 sibling, 3 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-03 11:12 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > Before going into details about how this tree solves 'dependency hell' > > exactly, here's the current kernel build performance gain with > > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as > > well - see below), using a stock x86 Linux distribution's .config with all > > modules built into the vmlinux: > > > > # > > # Performance counter stats for 'make -j96 vmlinux' (3 runs): > > # > > # (Elapsed time in seconds): > > # > > > > v5.16-rc7: 231.34 +- 0.60 secs, 15.5 builds/hour # [ vanilla baseline ] > > -fast-headers-v1: 129.97 +- 0.51 secs, 27.7 builds/hour # +78.0% improvement > > > > Or in terms of CPU time utilized: > > > > v5.16-rc7: 11,474,982.05 msec cpu-clock # 49.601 CPUs utilized > > -fast-headers-v1: 7,100,730.37 msec cpu-clock # 54.635 CPUs utilized # +61.6% improvement > > Speed up is very impressive, nice job! Thanks! :-) > > Techniques used by the fast-headers tree to reduce header size & dependencies: > > > > - Aggressive decoupling of high level headers from each other, starting > > with <linux/sched.h>. Since 'struct task_struct' is a union of many > > subsystems, there's a new "per_task" infrastructure modeled after the > > per_cpu framework, which creates fields in task_struct without having > > to modify sched.h or the 'struct task_struct' type: > > > > DECLARE_PER_TASK(type, name); > > ... > > per_task(current, name) = val; > > > > The per_task() facility then seamlessly creates an offset into the > > task_struct->per_task_area[] array, and uses the asm-offsets.h > > mechanism to create offsets into it early in the build. > > > > There's no runtime overhead disadvantage from using per_task() framework, > > the generated code is functionally equivalent to types embedded in > > task_struct. > > This is "interesting", but how are you going to keep the > kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task > definition in sync? I have plans to clean this up further - see below - but in general I'd *discourage* the embedding of new complex types to task_struct. In practice, most new task_struct fields are either simple types or pointers to structs, which can be added to task_struct without having to define a complex type for <linux/sched.h>. For example here's the list of the last 5 extensions of task_struct, since November 2020 - I copy & pasted them out of git log -p include/linux/sched.h: + unsigned in_eventfd_signal:1; + cpumask_t *user_cpus_ptr; + unsigned int saved_state; + unsigned long saved_state_change; + struct bpf_run_ctx *bpf_ctx; All of those new fields are either simple C types or struct pointers - none of those extensions need per_task() handling per se. The overall policy to extend task_struct, going forward, would be to: - Either make simple-type or struct-pointer additions to task_struct, that don't couple <linux/sched.h> to other subsystems. - Or, if you absolutely must - and we don't want to forbid this - use the per_task() machinery to create a simple accessor to a complex embedded type. > [...] It seems that you manually created this (which is great for > testing), but over the long-term, trying to manually determine what needs > to be done here to keep everything lined up properly is going to be a > major pain. Note that under the policy above - and even according to the practice of the last ~1.5 years - it should be exceedingly rare having to extend the per_task() facility. There's one thing ugly about it, the fixed PER_TASK_BYTES limit, I plan to make ->per_task_array[] the last field of task_struct, i.e. change it to: u8 per_task_area[]; This actually became possible through the fixing of the x86 FPU code in the following fast-headers commit: 4ae0f28bc1c8 headers/deps: x86/fpu: Make task_struct::thread constant size In the last ~1 year existence of the per_task() facility I didn't have any maintenance troubles with these fields getting out of sync, but we could also auto-generate kernel/sched/per_task_area_struct_defs.h from kernel/sched/per_task_area_struct.h via a build-time script, and make kernel/sched/per_task_area_struct.h the only method to define such fields. > That issue aside, I took a glance at the tree, and overall it looks like > a lot of nice cleanups. Most of these can probably go through the > various subsystem trees, after you split them out, for the "major" .h > cleanups. Is that something you are going to be planning on doing? Yeah, I absolutely plan on doing that too: - About ~70% of the commits can be split up & parallelized through maintainer trees. - With the exception of the untangling of sched.h, per_task and the "Optimize Headers" series, where a lot of patches are dependent on each other. These are actually needed to get any measurable benefits from this tree (!). We can do these through the scheduler tree, or through the dedicated headers tree I posted. The latter monolithic series is pretty much unavoidable, it's the result of 30 years of coupling a lot of kernel subsystems to task_struct via embedded structs & other complex types, that needed quite a bit of effort to untangle, and that untangling needed to happen in-order. Do these plans this sound good to you? Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 11:12 ` Ingo Molnar @ 2022-01-03 13:46 ` Greg Kroah-Hartman 2022-01-03 16:29 ` Ingo Molnar 2022-01-04 14:10 ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar 2022-01-04 17:51 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann 2 siblings, 1 reply; 54+ messages in thread From: Greg Kroah-Hartman @ 2022-01-03 13:46 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Mon, Jan 03, 2022 at 12:12:50PM +0100, Ingo Molnar wrote: > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > This is "interesting", but how are you going to keep the > > kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task > > definition in sync? > > I have plans to clean this up further - see below - but in general I'd > *discourage* the embedding of new complex types to task_struct. > > In practice, most new task_struct fields are either simple types or > pointers to structs, which can be added to task_struct without having to > define a complex type for <linux/sched.h>. > > For example here's the list of the last 5 extensions of task_struct, since > November 2020 - I copy & pasted them out of git log -p include/linux/sched.h: > > + unsigned in_eventfd_signal:1; > > + cpumask_t *user_cpus_ptr; > > + unsigned int saved_state; > > + unsigned long saved_state_change; > > + struct bpf_run_ctx *bpf_ctx; > > All of those new fields are either simple C types or struct pointers - none > of those extensions need per_task() handling per se. > > The overall policy to extend task_struct, going forward, would be to: > > - Either make simple-type or struct-pointer additions to task_struct, that > don't couple <linux/sched.h> to other subsystems. > > - Or, if you absolutely must - and we don't want to forbid this - use the > per_task() machinery to create a simple accessor to a complex embedded > type. I'll leave all of this up to the scheduler developers, but it still looks odd to me. The mess we create trying to work around issues in C :) > > That issue aside, I took a glance at the tree, and overall it looks like > > a lot of nice cleanups. Most of these can probably go through the > > various subsystem trees, after you split them out, for the "major" .h > > cleanups. Is that something you are going to be planning on doing? > > Yeah, I absolutely plan on doing that too: > > - About ~70% of the commits can be split up & parallelized through > maintainer trees. > > - With the exception of the untangling of sched.h, per_task and the > "Optimize Headers" series, where a lot of patches are dependent on each > other. These are actually needed to get any measurable benefits from this > tree (!). We can do these through the scheduler tree, or through the > dedicated headers tree I posted. > > The latter monolithic series is pretty much unavoidable, it's the result of > 30 years of coupling a lot of kernel subsystems to task_struct via embedded > structs & other complex types, that needed quite a bit of effort to > untangle, and that untangling needed to happen in-order. > > Do these plans this sound good to you? Yes, taking the majority through the maintainer trees and then doing the remaining bits in a single tree seems sane, that one tree will be easier to review as well. thanks, greg k-h ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 13:46 ` Greg Kroah-Hartman @ 2022-01-03 16:29 ` Ingo Molnar 2022-01-10 10:28 ` Peter Zijlstra 0 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-03 16:29 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > The overall policy to extend task_struct, going forward, would be to: > > > > - Either make simple-type or struct-pointer additions to task_struct, that > > don't couple <linux/sched.h> to other subsystems. > > > > - Or, if you absolutely must - and we don't want to forbid this - use the > > per_task() machinery to create a simple accessor to a complex embedded > > type. > > I'll leave all of this up to the scheduler developers, but it still looks > odd to me. The mess we create trying to work around issues in C :) Yeah, so I *did* find this somewhat suboptimal too, and developed an earlier version that used linker section tricks to gain the field offsets more automatically. It was an unmitigated disaster: was fragile on x86 already (which has a zoo of linking quirks with no precedent of doing this before bounds.c processing), but on ARM64 and probably on most of the other RISC-ish architectures there was also a real runtime code generation cost of using linker tricks: 2-3 extra instructions per per_task() use - clearly unacceptable. Found this out the hard way after making it boot & work on ARM64 and looking at the assembly output, trying to figure out why the generated code size increased. :-/ Anyway, the current method has the big advantage of being obviously invariant wrt. code generation compared to the previous code, on every architecture. > > Do these plans sound good to you? > > Yes, taking the majority through the maintainer trees and then doing the > remaining bits in a single tree seems sane, that one tree will be easier > to review as well. Ok. Will definitely offer it up piecemail-wise, in reviewable chunks, via existing processes & flows. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 16:29 ` Ingo Molnar @ 2022-01-10 10:28 ` Peter Zijlstra 0 siblings, 0 replies; 54+ messages in thread From: Peter Zijlstra @ 2022-01-10 10:28 UTC (permalink / raw) To: Ingo Molnar Cc: Greg Kroah-Hartman, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Mon, Jan 03, 2022 at 05:29:02PM +0100, Ingo Molnar wrote: > Yeah, so I *did* find this somewhat suboptimal too, and developed an > earlier version that used linker section tricks to gain the field offsets > more automatically. > > It was an unmitigated disaster: was fragile on x86 already (which has a zoo > of linking quirks with no precedent of doing this before bounds.c > processing), but on ARM64 and probably on most of the other RISC-ish > architectures there was also a real runtime code generation cost of using > linker tricks: 2-3 extra instructions per per_task() use - clearly > unacceptable. > > Found this out the hard way after making it boot & work on ARM64 and > looking at the assembly output, trying to figure out why the generated code > size increased. :-/ Right, I suggested you do the per-cpu thing. And then Mark reported that code-gen issue on arm64. I'm still thinking the toolchains ought to look at fixing that. It'll be too late to use for per-task, but at least the current per-cpu usages will (eventually) get better code-gen. ^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant 2022-01-03 11:12 ` Ingo Molnar 2022-01-03 13:46 ` Greg Kroah-Hartman @ 2022-01-04 14:10 ` Ingo Molnar 2022-01-04 15:14 ` Andy Shevchenko 2022-01-04 17:51 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann 2 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 14:10 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Ingo Molnar <mingo@kernel.org> wrote: > There's one thing ugly about it, the fixed PER_TASK_BYTES limit, I plan > to make ->per_task_array[] the last field of task_struct, i.e. change it > to: > > u8 per_task_area[]; > > This actually became possible through the fixing of the x86 FPU code in the > following fast-headers commit: > > 4ae0f28bc1c8 headers/deps: x86/fpu: Make task_struct::thread constant size So I implemented this approach - the patch below removes the PER_TASK_BYTES hard-coded limit. ( Didn't make it variable size via per_task_area[] though - we *do* know its size after all at build time already, and known-size structures are better in general than tail-variable-array solutions: - They work better with static checkers, - and we actually want the offsets into thread_info to be small on embedded platforms etc. ) Thanks, Ingo ============================> From: Ingo Molnar <mingo@kernel.org> Date: Tue, 4 Jan 2022 13:48:05 +0100 Subject: [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant - Also remove the unnecessary <linux/sched/per_task_types.h> header. Not-Signed-off-by-yet: Ingo Molnar <mingo@kernel.org> --- include/linux/sched/per_task.h | 3 ++- include/linux/sched/per_task_types.h | 7 ------- kernel/sched/core.c | 4 ++++ 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/linux/sched/per_task.h b/include/linux/sched/per_task.h index e20837e82681..a10538713a26 100644 --- a/include/linux/sched/per_task.h +++ b/include/linux/sched/per_task.h @@ -37,7 +37,6 @@ * A build-time check ensures that we haven't run out of available space. */ -#include <linux/sched/per_task_types.h> #include <linux/compiler.h> #ifndef __PER_TASK_GEN @@ -61,4 +60,6 @@ #define per_task_container_of(var, name) container_of((void *)(var) - per_task_offset(name), struct task_struct, per_task_area[0]) +#define PER_TASK_BYTES (per_task_offset(_end)) + #endif /* _LINUX_SCHED_PER_TASK_H */ diff --git a/include/linux/sched/per_task_types.h b/include/linux/sched/per_task_types.h deleted file mode 100644 index 8af8c10f8dae..000000000000 --- a/include/linux/sched/per_task_types.h +++ /dev/null @@ -1,7 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_SCHED_PER_TASK_TYPES_H -#define _LINUX_SCHED_PER_TASK_TYPES_H - -#define PER_TASK_BYTES 8192 - -#endif /* _LINUX_SCHED_PER_TASK_TYPES_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bc38b19f6398..fdb5b99ae6e0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -89,6 +89,8 @@ #include "../../fs/io-wq.h" #include "../smpboot.h" +#include "../../../kernel/sched/per_task_area_struct.h" + DEFINE_PER_TASK(unsigned int, flags); #ifdef CONFIG_THREAD_INFO_IN_TASK @@ -9481,6 +9483,8 @@ void __init per_task_init(void) { unsigned long per_task_bytes = per_task_offset(_end); + printk("per_task: sizeof(struct task_struct): %ld bytes\n", sizeof(struct task_struct)); + printk("per_task: sizeof(struct task_struct_per_task): %ld bytes\n", sizeof(struct task_struct_per_task)); printk("per_task: Using %ld per_task bytes, %ld bytes available\n", per_task_bytes, (long)PER_TASK_BYTES); BUG_ON(per_task_offset(_end) > PER_TASK_BYTES); ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant 2022-01-04 14:10 ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar @ 2022-01-04 15:14 ` Andy Shevchenko 2022-01-04 23:27 ` Ingo Molnar 0 siblings, 1 reply; 54+ messages in thread From: Andy Shevchenko @ 2022-01-04 15:14 UTC (permalink / raw) To: Ingo Molnar Cc: Greg Kroah-Hartman, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Tue, Jan 04, 2022 at 03:10:51PM +0100, Ingo Molnar wrote: > * Ingo Molnar <mingo@kernel.org> wrote: > +++ b/kernel/sched/core.c > +#include "../../../kernel/sched/per_task_area_struct.h" #include "per_task_area_struct.h" ? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant 2022-01-04 15:14 ` Andy Shevchenko @ 2022-01-04 23:27 ` Ingo Molnar 0 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 23:27 UTC (permalink / raw) To: Andy Shevchenko Cc: Greg Kroah-Hartman, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Andy Shevchenko <andriy.shevchenko@intel.com> wrote: > On Tue, Jan 04, 2022 at 03:10:51PM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@kernel.org> wrote: > > > +++ b/kernel/sched/core.c > > > +#include "../../../kernel/sched/per_task_area_struct.h" > > #include "per_task_area_struct.h" ? Indeed - fixed. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 11:12 ` Ingo Molnar 2022-01-03 13:46 ` Greg Kroah-Hartman 2022-01-04 14:10 ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar @ 2022-01-04 17:51 ` Arnd Bergmann 2022-01-05 0:05 ` Ingo Molnar 2022-01-05 9:37 ` Andy Shevchenko 2 siblings, 2 replies; 54+ messages in thread From: Arnd Bergmann @ 2022-01-04 17:51 UTC (permalink / raw) To: Ingo Molnar Cc: Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Mon, Jan 3, 2022 at 6:12 AM Ingo Molnar <mingo@kernel.org> wrote: > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > Before going into details about how this tree solves 'dependency hell' > > > exactly, here's the current kernel build performance gain with > > > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as > > > well - see below), using a stock x86 Linux distribution's .config with all > > > modules built into the vmlinux: > > > > > > # > > > # Performance counter stats for 'make -j96 vmlinux' (3 runs): > > > # > > > # (Elapsed time in seconds): > > > # > > > > > > v5.16-rc7: 231.34 +- 0.60 secs, 15.5 builds/hour # [ vanilla baseline ] > > > -fast-headers-v1: 129.97 +- 0.51 secs, 27.7 builds/hour # +78.0% improvement > > > > > > Or in terms of CPU time utilized: > > > > > > v5.16-rc7: 11,474,982.05 msec cpu-clock # 49.601 CPUs utilized > > > -fast-headers-v1: 7,100,730.37 msec cpu-clock # 54.635 CPUs utilized # +61.6% improvement > > > > Speed up is very impressive, nice job! > > Thanks! :-) I've done some work in this area in the past, didn't quite take it enough of the way to get this far. The best I saw was 30% improvement with clang, which tends to be more sensitive than gcc towards header file bloat, as it does more detailed syntax checking before eliminating dead code. Did you try both gcc and clang for this? > > That issue aside, I took a glance at the tree, and overall it looks like > > a lot of nice cleanups. Most of these can probably go through the > > various subsystem trees, after you split them out, for the "major" .h > > cleanups. Is that something you are going to be planning on doing? > > Yeah, I absolutely plan on doing that too: > > - About ~70% of the commits can be split up & parallelized through > maintainer trees. > > - With the exception of the untangling of sched.h, per_task and the > "Optimize Headers" series, where a lot of patches are dependent on each > other. These are actually needed to get any measurable benefits from this > tree (!). We can do these through the scheduler tree, or through the > dedicated headers tree I posted. > > The latter monolithic series is pretty much unavoidable, it's the result of > 30 years of coupling a lot of kernel subsystems to task_struct via embedded > structs & other complex types, that needed quite a bit of effort to > untangle, and that untangling needed to happen in-order. > > Do these plans this sound good to you? I haven't had a chance to look at your tree yet, I'm still on vacation without access to my normal workstation. I would like to run my own scripts for analyzing the header dependencies on it after I get back next week. From what I could tell, linux/sched.h was not the only such problem, but I saw similarly bad issues with linux/fs.h (which is what I posted about in November/December), linux/mm.h and linux/netdevice.h on the high level, in low-level headers there are huge issues with linux/atomic.h, linux/mutex.h, linux/pgtable.h etc. I expect that you have addressed these as well, but I'd like to make sure that your changes are reasonably complete on arm32 and arm64 to avoid having to do the big cleanup more than once. My approach to the large mid-level headers is somewhat different: rather than completely avoiding them from getting included, I would like to split up the structure definitions from the inline functions. Linus didn't really like my approach, but I suspect he'll have similar concerns about your solution for linux/sched.h, especially if we end up applying the same hack to other commonly used structures (sk_buff, mm_struct, super_block) in the end. I should be able to come up with a less handwavy reply after I've actually studied your approach better. Most of the patches should be the same either way (adding back missing includes to drivers, and doing cleanups to commonly included headers to avoid the deep nesting), the interesting bit will be how to properly define the larger structures without pulling in the rest of the world. Arnd ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 17:51 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann @ 2022-01-05 0:05 ` Ingo Molnar 2022-01-05 1:37 ` Arnd Bergmann 2022-01-05 9:37 ` Andy Shevchenko 1 sibling, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:05 UTC (permalink / raw) To: Arnd Bergmann Cc: Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Arnd Bergmann <arnd@arndb.de> wrote: > From what I could tell, linux/sched.h was not the only such problem, but > I saw similarly bad issues with linux/fs.h (which is what I posted about > in November/December), linux/mm.h and linux/netdevice.h on the high > level, in low-level headers there are huge issues with linux/atomic.h, > linux/mutex.h, linux/pgtable.h etc. I expect that you have addressed > these as well, Correct, each of these was a problem - and a *lot* of other headers in addition to those: kepler:~/mingo.tip.git> git diff --stat v5.16-rc8.. include/linux/ arch/*/include/asm/ | grep changed 1335 files changed, 59677 insertions(+), 56582 deletions(-) and I reduced all the kernels that showed up in the bloat-profile to a fraction of their orignal size: ------------------------------------------------------------------------------------------ | Combined, preprocessed C code size of header, without line markers, | with comments stripped: ------------------------------.-----------------------------.----------------------------- | v5.16-rc7 | -fast-headers-v1 |-----------------------------|----------------------------- #include <linux/sched.h> | LOC: 13,292 | headers: 324 | LOC: 769 | headers: 64 #include <linux/wait.h> | LOC: 9,369 | headers: 235 | LOC: 483 | headers: 46 #include <linux/rcupdate.h> | LOC: 8,975 | headers: 224 | LOC: 1,385 | headers: 86 #include <linux/hrtimer.h> | LOC: 10,861 | headers: 265 | LOC: 229 | headers: 37 #include <linux/fs.h> | LOC: 22,497 | headers: 427 | LOC: 1,993 | headers: 120 #include <linux/cred.h> | LOC: 17,257 | headers: 368 | LOC: 4,830 | headers: 129 #include <linux/dcache.h> | LOC: 10,545 | headers: 253 | LOC: 858 | headers: 65 #include <linux/cgroup.h> | LOC: 33,518 | headers: 522 | LOC: 2,477 | headers: 111 #include <linux/module.h> | LOC: 16,948 | headers: 339 | LOC: 2,239 | headers: 122 #include <linux/kobject.h> | LOC: 15,210 | headers: 318 | LOC: 799 | headers: 59 #include <linux/device.h> | LOC: 20,505 | headers: 408 | LOC: 2,131 | headers: 123 #include <linux/gfp.h> | LOC: 13,543 | headers: 303 | LOC: 181 | headers: 26 #include <linux/slab.h> | LOC: 14,037 | headers: 307 | LOC: 999 | headers: 74 #include <linux/mm.h> | LOC: 26,727 | headers: 453 | LOC: 1,855 | headers: 133 #include <linux/mmzone.h> | LOC: 12,755 | headers: 293 | LOC: 832 | headers: 64 #include <linux/swap.h> | LOC: 38,292 | headers: 559 | LOC: 11,085 | headers: 294 #include <linux/writeback.h> | LOC: 36,481 | headers: 550 | LOC: 1,566 | headers: 92 #include <linux/gfp.h> | LOC: 13,543 | headers: 303 | LOC: 181 | headers: 26 #include <linux/skbuff.h> | LOC: 36,130 | headers: 558 | LOC: 1,209 | headers: 89 #include <linux/tcp.h> | LOC: 60,133 | headers: 725 | LOC: 3,829 | headers: 153 #include <linux/udp.h> | LOC: 59,411 | headers: 721 | LOC: 3,236 | headers: 146 #include <linux/filter.h> | LOC: 54,172 | headers: 689 | LOC: 4,087 | headers: 73 #include <linux/interrupt.h> | LOC: 14,085 | headers: 340 | LOC: 2,629 | headers: 124 #include <net/sock.h> | LOC: 58,880 | headers: 715 | LOC: 1,543 | headers: 98 #include <asm/processor.h> | LOC: 7,821 | headers: 204 | LOC: 618 | headers: 41 #include <asm/page.h> | LOC: 1,540 | headers: 97 | LOC: 1,193 | headers: 82 #include <asm/pgtable.h> | LOC: 12,949 | headers: 297 | LOC: 5,742 | headers: 217 <linux/atomic.h> wasn't a particularly big problem - but it does get included everywhere, so I moved the most common atomic_t definition into <linux/types.h> (on 64-bit kernels), which allowed a big reduction for the majority of cases that don't use the atomic APIs: #include <linux/atomic.h> | LOC: 176 | headers: 26 #include <linux/atomic_api.h> | LOC: 2,785 | headers: 52 But <linux/atomic_api.h> is still included in ~75% of .c files, mostly for good reasons, because it's a very popular low level API. > but I'd like to make sure that your changes are reasonably complete on > arm32 and arm64 to avoid having to do the big cleanup more than once. I did test ARM64 extensively in terms of build coverage - but not in terms of header bloat, and I'm sure more could be done there! > My approach to the large mid-level headers is somewhat different: rather > than completely avoiding them from getting included, I would like to > split up the structure definitions from the inline functions. That's a big chunk of what the -fast-headers tree does: I've split over 85 headers into <linux/header_types.h> and <linux/header_api.h>... I've also split up headers further where needed, in particular mm.h required multiple levels of splitting to get the dependencies of the most commonly used <linux/mm_types.h> and <linux/mm_api.h> headers under control: kepler:~/mingo.tip.git> ls -ldt include/linux/mm*api*.h -rw-rw-r-- 1 mingo mingo 77130 Jan 4 13:32 include/linux/mm_api.h -rw-rw-r-- 1 mingo mingo 22227 Jan 4 13:32 include/linux/mmzone_api.h -rw-rw-r-- 1 mingo mingo 6759 Jan 4 13:32 include/linux/mm_api_extra.h -rw-rw-r-- 1 mingo mingo 479 Jan 4 13:31 include/linux/mm_api_exe_file.h -rw-rw-r-- 1 mingo mingo 960 Jan 4 13:31 include/linux/mm_api_truncate.h -rw-rw-r-- 1 mingo mingo 1262 Jan 4 13:31 include/linux/mm_api_kvmalloc.h -rw-rw-r-- 1 mingo mingo 719 Jan 4 13:31 include/linux/mm_api_gate_area.h -rw-rw-r-- 1 mingo mingo 1342 Jan 4 13:31 include/linux/mm_api_kasan.h -rw-rw-r-- 1 mingo mingo 3007 Jan 4 13:31 include/linux/mm_api_tlb_flush.h The results are pretty nice: # vanilla: #include <linux/mm.h> | LOC: 26,728 | headers: 453 # -fast-headers: #include <linux/mm.h> | LOC: 1,855 | headers: 132 # == mm_types.h #include <linux/mm_types.h> | LOC: 1,855 | headers: 131 #include <linux/mm_api.h> | LOC: 8,587 | headers: 229 And <linux/mm_api.h> is now included only in about 25% of the .c files - in the vanilla kernel the use percentage is over ~90%. But despite all those reductions, <linux/mm_api.h> is still a header with one of the largest cumulative footprints within a (distro) kernel build: | stripped lines of code | _____________________________ | | headers included recursively | | _______________________________ | | | usage in a distro kernel build ____________ | | | _________________________________________ | header name | | | | million lines of comment-stripped C code | | | | | #include <linux/spinlock_api.h> | LOC: 5,142 | headers: 123 | 10,168 | MLOC: 52.2 | ############# #include <linux/device/driver.h> | LOC: 4,132 | headers: 169 | 12,306 | MLOC: 50.8 | ############ #include <linux/mm_api.h> | LOC: 8,584 | headers: 230 | 5,135 | MLOC: 44.0 | ########### #include <linux/skbuff_api.h> | LOC: 8,404 | headers: 190 | 5,065 | MLOC: 42.5 | ########## #include <linux/atomic_api.h> | LOC: 2,785 | headers: 52 | 15,282 | MLOC: 42.5 | ########## #include <asm/spinlock.h> | LOC: 4,039 | headers: 83 | 10,168 | MLOC: 41.0 | ########## #include <asm/qrwlock.h> | LOC: 4,039 | headers: 82 | 10,168 | MLOC: 41.0 | ########## #include <asm-generic/qrwlock.h> | LOC: 4,039 | headers: 81 | 10,168 | MLOC: 41.0 | ########## #include <linux/page_ref.h> | LOC: 5,397 | headers: 168 | 7,578 | MLOC: 40.8 | ########## #include <asm/qspinlock.h> | LOC: 3,990 | headers: 80 | 10,169 | MLOC: 40.5 | ########## #include <linux/device_types.h> | LOC: 2,131 | headers: 122 | 17,424 | MLOC: 37.1 | ######### #include <linux/module.h> | LOC: 2,239 | headers: 122 | 16,472 | MLOC: 36.8 | ######### #include <net/cfg80211.h> | LOC: 29,004 | headers: 423 | 1,205 | MLOC: 34.9 | ######## #include <linux/pci.h> | LOC: 7,092 | headers: 232 | 4,849 | MLOC: 34.3 | ######## #include <linux/netdevice_api.h> | LOC: 8,434 | headers: 225 | 4,065 | MLOC: 34.2 | ######## #include <linux/refcount_api.h> | LOC: 3,421 | headers: 87 | 9,776 | MLOC: 33.4 | ######## ( The 'MLOC' footprint estimate is number of usages times preprocessed-stripped-header size. ) I've reduced header bloat through three primary angles of attack: - reducing number of inclusions - reducing header size itself, by type/API splitting & by segmenting headers along API usage frequency - decoupling headers from each other As you can see, fast-headers -v1 is much improved (on x86), but there's plenty of work left, such as <net/cfg80211.h>. :-) > Linus didn't really like my approach, Yeah, so without having a significant build time speedup I didn't like my approach(es) either, which is why I didn't post this tree for a long time. :-) But the results speak for themselves IMO, and we cannot ignore this: my project actually accelerated as I progressed, because the kernel rebuilds, especially incremental ones, became faster and faster... Linux kernel header dependencies need to be simplified. > but I suspect he'll have similar > concerns about your solution for linux/sched.h, especially if we end up > applying the same hack to other commonly used structures (sk_buff, > mm_struct, super_block) in the end. So the per_task approach is pretty much unavoidable under the constraint of having no runtime overhead, given that task_struct is a historic union of a zillion types, where 99% of the users don't actually need to know about those types. ( We could eventually get rid of per_task() as well, by turning complex embedded structs into pointers - but that has runtime overhead due to the indirections, and I tried hard to make this approach runtime-invariant, at least conceptually. ) The header splitting I've done is fundamentally clean (at least aspirationally), mostly done along conceptual boundaries or API families. It's how we'd have implemented many of those headers if we had a time machine and went back 30 years. ;-) > I should be able to come up with a less handwavy reply after I've > actually studied your approach better. Looking forward to it! Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-05 0:05 ` Ingo Molnar @ 2022-01-05 1:37 ` Arnd Bergmann 0 siblings, 0 replies; 54+ messages in thread From: Arnd Bergmann @ 2022-01-05 1:37 UTC (permalink / raw) To: Ingo Molnar Cc: Arnd Bergmann, Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Tue, Jan 4, 2022 at 7:05 PM Ingo Molnar <mingo@kernel.org> wrote: > * Arnd Bergmann <arnd@arndb.de> wrote: > > > From what I could tell, linux/sched.h was not the only such problem, but > > I saw similarly bad issues with linux/fs.h (which is what I posted about > > in November/December), linux/mm.h and linux/netdevice.h on the high > > level, in low-level headers there are huge issues with linux/atomic.h, > > linux/mutex.h, linux/pgtable.h etc. I expect that you have addressed > > these as well, > > Correct, each of these was a problem - and a *lot* of other headers in > addition to those: > > kepler:~/mingo.tip.git> git diff --stat v5.16-rc8.. include/linux/ arch/*/include/asm/ | grep changed > > 1335 files changed, 59677 insertions(+), 56582 deletions(-) > > and I reduced all the kernels that showed up in the bloat-profile to a > fraction of their orignal size: > > ------------------------------------------------------------------------------------------ > | Combined, preprocessed C code size of header, without line markers, > | with comments stripped: > ------------------------------.-----------------------------.----------------------------- > | v5.16-rc7 | -fast-headers-v1 > |-----------------------------|----------------------------- > #include <linux/sched.h> | LOC: 13,292 | headers: 324 | LOC: 769 | headers: 64 > #include <linux/wait.h> | LOC: 9,369 | headers: 235 | LOC: 483 | headers: 46 > #include <linux/rcupdate.h> | LOC: 8,975 | headers: 224 | LOC: 1,385 | headers: 86 > #include <linux/hrtimer.h> | LOC: 10,861 | headers: 265 | LOC: 229 | headers: 37 > #include <linux/fs.h> | LOC: 22,497 | headers: 427 | LOC: 1,993 | headers: 120 > #include <linux/cred.h> | LOC: 17,257 | headers: 368 | LOC: 4,830 | headers: 129 > #include <linux/dcache.h> | LOC: 10,545 | headers: 253 | LOC: 858 | headers: 65 > #include <linux/cgroup.h> | LOC: 33,518 | headers: 522 | LOC: 2,477 | headers: 111 > #include <linux/module.h> | LOC: 16,948 | headers: 339 | LOC: 2,239 | headers: 122 > #include <linux/kobject.h> | LOC: 15,210 | headers: 318 | LOC: 799 | headers: 59 > #include <linux/device.h> | LOC: 20,505 | headers: 408 | LOC: 2,131 | headers: 123 > #include <linux/gfp.h> | LOC: 13,543 | headers: 303 | LOC: 181 | headers: 26 > #include <linux/slab.h> | LOC: 14,037 | headers: 307 | LOC: 999 | headers: 74 > #include <linux/mm.h> | LOC: 26,727 | headers: 453 | LOC: 1,855 | headers: 133 > #include <linux/mmzone.h> | LOC: 12,755 | headers: 293 | LOC: 832 | headers: 64 > #include <linux/swap.h> | LOC: 38,292 | headers: 559 | LOC: 11,085 | headers: 294 > #include <linux/writeback.h> | LOC: 36,481 | headers: 550 | LOC: 1,566 | headers: 92 > #include <linux/gfp.h> | LOC: 13,543 | headers: 303 | LOC: 181 | headers: 26 > #include <linux/skbuff.h> | LOC: 36,130 | headers: 558 | LOC: 1,209 | headers: 89 > #include <linux/tcp.h> | LOC: 60,133 | headers: 725 | LOC: 3,829 | headers: 153 > #include <linux/udp.h> | LOC: 59,411 | headers: 721 | LOC: 3,236 | headers: 146 > #include <linux/filter.h> | LOC: 54,172 | headers: 689 | LOC: 4,087 | headers: 73 > #include <linux/interrupt.h> | LOC: 14,085 | headers: 340 | LOC: 2,629 | headers: 124 > > #include <net/sock.h> | LOC: 58,880 | headers: 715 | LOC: 1,543 | headers: 98 > > #include <asm/processor.h> | LOC: 7,821 | headers: 204 | LOC: 618 | headers: 41 > #include <asm/page.h> | LOC: 1,540 | headers: 97 | LOC: 1,193 | headers: 82 > #include <asm/pgtable.h> | LOC: 12,949 | headers: 297 | LOC: 5,742 | headers: 217 Ok, this is roughly the list of headers that I had looked at previously. > <linux/atomic.h> wasn't a particularly big problem - but it does get > included everywhere, so I moved the most common atomic_t definition into > <linux/types.h> (on 64-bit kernels), which allowed a big reduction for the > majority of cases that don't use the atomic APIs: Good, I have a patch for the same thing, including moving atomic64_t and atomic_long_t to linux/types.h there -- I don't think it would be good to have it in different places on 32-bit architectures. On arm machines, I found atomic.h to be problematic because it is a large generated header that depends on the barriers which in turn require other stuff. > #include <linux/atomic.h> | LOC: 176 | headers: 26 > #include <linux/atomic_api.h> | LOC: 2,785 | headers: 52 > > But <linux/atomic_api.h> is still included in ~75% of .c files, mostly for > good reasons, because it's a very popular low level API. These are the x86 numbers, right? > > but I'd like to make sure that your changes are reasonably complete on > > arm32 and arm64 to avoid having to do the big cleanup more than once. > > I did test ARM64 extensively in terms of build coverage - but not in terms > of header bloat, and I'm sure more could be done there! My guess is that each architecture has a couple of dark corners that require cleaning up before we actually see the benefit of the series. I'm personally most interested in arm32 and arm64 because that's what I do my testing on, and I'll try to find those corners. One thing I remember for arm32 is that there is a nasty dependency for get_current() - > PAGE_SIZE -> asm/pgtable.h, with pgtable including the world again. You probably got this one, but any such missing thing can can lead to the other cleanups not helping that much. > > My approach to the large mid-level headers is somewhat different: rather > > than completely avoiding them from getting included, I would like to > > split up the structure definitions from the inline functions. > > That's a big chunk of what the -fast-headers tree does: I've split over 85 > headers into <linux/header_types.h> and <linux/header_api.h>... > > I've also split up headers further where needed, in particular mm.h > required multiple levels of splitting to get the dependencies of the most > commonly used <linux/mm_types.h> and <linux/mm_api.h> headers under > control: > > kepler:~/mingo.tip.git> ls -ldt include/linux/mm*api*.h > -rw-rw-r-- 1 mingo mingo 77130 Jan 4 13:32 include/linux/mm_api.h > -rw-rw-r-- 1 mingo mingo 22227 Jan 4 13:32 include/linux/mmzone_api.h > -rw-rw-r-- 1 mingo mingo 6759 Jan 4 13:32 include/linux/mm_api_extra.h > -rw-rw-r-- 1 mingo mingo 479 Jan 4 13:31 include/linux/mm_api_exe_file.h > -rw-rw-r-- 1 mingo mingo 960 Jan 4 13:31 include/linux/mm_api_truncate.h > -rw-rw-r-- 1 mingo mingo 1262 Jan 4 13:31 include/linux/mm_api_kvmalloc.h > -rw-rw-r-- 1 mingo mingo 719 Jan 4 13:31 include/linux/mm_api_gate_area.h > -rw-rw-r-- 1 mingo mingo 1342 Jan 4 13:31 include/linux/mm_api_kasan.h > -rw-rw-r-- 1 mingo mingo 3007 Jan 4 13:31 include/linux/mm_api_tlb_flush.h Ah, good. That is pretty close to what I had in mind as well, so maybe we can convince Linus after all. ;-) > The results are pretty nice: > > # vanilla: > > #include <linux/mm.h> | LOC: 26,728 | headers: 453 > > # -fast-headers: > > #include <linux/mm.h> | LOC: 1,855 | headers: 132 # == mm_types.h > #include <linux/mm_types.h> | LOC: 1,855 | headers: 131 > #include <linux/mm_api.h> | LOC: 8,587 | headers: 229 > > And <linux/mm_api.h> is now included only in about 25% of the .c files - in > the vanilla kernel the use percentage is over ~90%. > > But despite all those reductions, <linux/mm_api.h> is still a header with > one of the largest cumulative footprints within a (distro) kernel build: > > | stripped lines of code > | _____________________________ > | | headers included recursively > | | _______________________________ > | | | usage in a distro kernel build > ____________ | | | _________________________________________ > | header name | | | | million lines of comment-stripped C code > | | | | | > #include <linux/spinlock_api.h> | LOC: 5,142 | headers: 123 | 10,168 | MLOC: 52.2 | ############# > #include <linux/device/driver.h> | LOC: 4,132 | headers: 169 | 12,306 | MLOC: 50.8 | ############ > #include <linux/mm_api.h> | LOC: 8,584 | headers: 230 | 5,135 | MLOC: 44.0 | ########### > #include <linux/skbuff_api.h> | LOC: 8,404 | headers: 190 | 5,065 | MLOC: 42.5 | ########## > #include <linux/atomic_api.h> | LOC: 2,785 | headers: 52 | 15,282 | MLOC: 42.5 | ########## > #include <asm/spinlock.h> | LOC: 4,039 | headers: 83 | 10,168 | MLOC: 41.0 | ########## > #include <asm/qrwlock.h> | LOC: 4,039 | headers: 82 | 10,168 | MLOC: 41.0 | ########## > #include <asm-generic/qrwlock.h> | LOC: 4,039 | headers: 81 | 10,168 | MLOC: 41.0 | ########## > #include <linux/page_ref.h> | LOC: 5,397 | headers: 168 | 7,578 | MLOC: 40.8 | ########## > #include <asm/qspinlock.h> | LOC: 3,990 | headers: 80 | 10,169 | MLOC: 40.5 | ########## > #include <linux/device_types.h> | LOC: 2,131 | headers: 122 | 17,424 | MLOC: 37.1 | ######### > #include <linux/module.h> | LOC: 2,239 | headers: 122 | 16,472 | MLOC: 36.8 | ######### > #include <net/cfg80211.h> | LOC: 29,004 | headers: 423 | 1,205 | MLOC: 34.9 | ######## > #include <linux/pci.h> | LOC: 7,092 | headers: 232 | 4,849 | MLOC: 34.3 | ######## > #include <linux/netdevice_api.h> | LOC: 8,434 | headers: 225 | 4,065 | MLOC: 34.2 | ######## > #include <linux/refcount_api.h> | LOC: 3,421 | headers: 87 | 9,776 | MLOC: 33.4 | ######## > > ( The 'MLOC' footprint estimate is number of usages times > preprocessed-stripped-header size. ) This is also the metric that I used in my scripts, except I measured the preprocessed size in bytes instead of lines, which should make little difference. > I've reduced header bloat through three primary angles of attack: > > - reducing number of inclusions > > - reducing header size itself, by type/API splitting & by segmenting > headers along API usage frequency > > - decoupling headers from each other > > As you can see, fast-headers -v1 is much improved (on x86), but there's > plenty of work left, such as <net/cfg80211.h>. :-) Right. I mainly focused on splitting types from the rest, which I think brings most of the benefits, but taking it further as you did here helps more. > > Linus didn't really like my approach, > > Yeah, so without having a significant build time speedup I didn't like my > approach(es) either, which is why I didn't post this tree for a long time. :-) > > But the results speak for themselves IMO, and we cannot ignore this: my > project actually accelerated as I progressed, because the kernel rebuilds, > especially incremental ones, became faster and faster... > > Linux kernel header dependencies need to be simplified. Agreed. In my 2020 experiments, I managed to get from the point of cleaning up ~100 headers with very little effect (when everything was still included through some other header) to cleaning up the next 100 and seeing huge improvements but also getting discouraged because it started breaking every driver due to missing indirect includes. > > but I suspect he'll have similar > > concerns about your solution for linux/sched.h, especially if we end up > > applying the same hack to other commonly used structures (sk_buff, > > mm_struct, super_block) in the end. > > So the per_task approach is pretty much unavoidable under the constraint of > having no runtime overhead, given that task_struct is a historic union of a > zillion types, where 99% of the users don't actually need to know about > those types. > > ( We could eventually get rid of per_task() as well, by turning complex > embedded structs into pointers - but that has runtime overhead due to the > indirections, and I tried hard to make this approach runtime-invariant, > at least conceptually. ) Would it be possible to have one common task_struct definition that has all the frequently-accessed fields, plus another larger structure that embeds the smaller structure plus all the other stuff? I suppose that would require even larger scale reworks, but it may be a nicer end result. (again, I have yet to read your patches, so there is probably an obvious answer why you didn't do this). Arnd ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 17:51 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann 2022-01-05 0:05 ` Ingo Molnar @ 2022-01-05 9:37 ` Andy Shevchenko 1 sibling, 0 replies; 54+ messages in thread From: Andy Shevchenko @ 2022-01-05 9:37 UTC (permalink / raw) To: Arnd Bergmann Cc: Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Wed, Jan 5, 2022 at 4:08 AM Arnd Bergmann <arnd@arndb.de> wrote: > On Mon, Jan 3, 2022 at 6:12 AM Ingo Molnar <mingo@kernel.org> wrote: ... > Most of the patches should be the same either way (adding back > missing includes to drivers, and doing cleanups to commonly > included headers to avoid the deep nesting), the interesting bit > will be how to properly define the larger structures without pulling > in the rest of the world. I'm wondering if the compiler can provide us the statistics of usage on a per custom type basis. In this case the highest frequency will probably mean that we better have that type in a separate header or tree of _independent_ headers. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets 2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman 2022-01-03 11:12 ` Ingo Molnar @ 2022-01-04 14:05 ` Ingo Molnar 1 sibling, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 14:05 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > Techniques used by the fast-headers tree to reduce header size & dependencies: > > > > - Aggressive decoupling of high level headers from each other, starting > > with <linux/sched.h>. Since 'struct task_struct' is a union of many > > subsystems, there's a new "per_task" infrastructure modeled after the > > per_cpu framework, which creates fields in task_struct without having > > to modify sched.h or the 'struct task_struct' type: > > > > DECLARE_PER_TASK(type, name); > > ... > > per_task(current, name) = val; > > > > The per_task() facility then seamlessly creates an offset into the > > task_struct->per_task_area[] array, and uses the asm-offsets.h > > mechanism to create offsets into it early in the build. > > > > There's no runtime overhead disadvantage from using per_task() framework, > > the generated code is functionally equivalent to types embedded in > > task_struct. > > This is "interesting", but how are you going to keep the > kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task > definition in sync? It seems that you manually created this (which is > great for testing), but over the long-term, trying to manually determine > what needs to be done here to keep everything lined up properly is going > to be a major pain. On a second thought, I found a solution for this problem and implemented it - delta patch attached. The idea is to unify the two files into a single 'template' definition in: kernel/sched/per_task_area_struct_template.h ... with the following, slightly non-standard syntax: #ifdef CONFIG_THREAD_INFO_IN_TASK /* * For reasons of header soup (see current_thread_info()), this * must be the first element of task_struct. */ DEF( struct thread_info, ti ); #endif DEF( void *, stack ); DEF( refcount_t, usage ); /* Per task flags (PF_*), defined further below: */ DEF( unsigned int, flags ); DEF( unsigned int, ptrace ); This looks 'almost' like a C structure definition - but is wrapped in the DEF() macro. Once we have that template, we can use it both to generate the 'struct task_struct_per_task' definition, and to pick up the field offsets for the per_task() asm-offsets.h machinery. The advantage is that it solves the problems you mentioned above: the per-task structure and the offset definitions can never get out of sync - the #ifdefs and the field names will always match. It's also net reduction in code: 3 files changed, 216 insertions(+), 341 deletions(-) Does this approach look better to you? This patch builds and boots fine in the latest -fast-headers tree. I'm still of two minds about whether to keep the per-task structure tucked away in kernel/sched/, hopefully creating a barrier against spurious additions to task_struct by putting it next to scary scheduler code - or should we move it into a more formal and easier to access/modify location in include/sched/? Another additional (minor) advantage would be that these uglies: arch/arm64/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h" arch/arm64/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h" arch/mips/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h" arch/mips/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h" arch/sparc/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h" arch/sparc/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h" arch/x86/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct.h" arch/x86/kernel/asm-offsets.c:#include "../../../kernel/sched/per_task_area_struct_defs.h" would turn into standard include lines: #include <linux/sched/per_task_defs.h> Thanks, Ingo ======================> From: Ingo Molnar <mingo@kernel.org> Date: Tue, 4 Jan 2022 14:31:12 +0100 Subject: [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Greg observed that the 'struct task_struct_per_task definition' and the offset definitions are structural duplicates of each other: kernel/sched/per_task_area_struct.h kernel/sched/per_task_area_struct_defs.h These require care during maintenance and could get out of sync. To address this problem, introduce a single definition template: kernel/sched/per_task_area_template.h And use the template and different preprocessor macros to implement the two pieces of functionality. The syntax in the template is C-alike struct field definitions, wrapped in the DEF() and DEF_A() macros: #ifdef CONFIG_THREAD_INFO_IN_TASK /* * For reasons of header soup (see current_thread_info()), this * must be the first element of task_struct. */ DEF( struct thread_info, ti ); #endif DEF( void *, stack ); DEF( refcount_t, usage ); /* Per task flags (PF_*), defined further below: */ DEF( unsigned int, flags ); DEF( unsigned int, ptrace ); Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/per_task_area_struct.h | 196 ++------------------------ kernel/sched/per_task_area_struct_defs.h | 163 ++-------------------- kernel/sched/per_task_area_struct_template.h | 198 +++++++++++++++++++++++++++ 3 files changed, 216 insertions(+), 341 deletions(-) diff --git a/kernel/sched/per_task_area_struct.h b/kernel/sched/per_task_area_struct.h index fad3c24df500..4508160e49ec 100644 --- a/kernel/sched/per_task_area_struct.h +++ b/kernel/sched/per_task_area_struct.h @@ -40,194 +40,16 @@ #include "sched.h" -struct task_struct_per_task { -#ifdef CONFIG_THREAD_INFO_IN_TASK - /* - * For reasons of header soup (see current_thread_info()), this - * must be the first element of task_struct. - */ - struct thread_info ti; -#endif - void *stack; - refcount_t usage; - /* Per task flags (PF_*), defined further below: */ - unsigned int flags; - unsigned int ptrace; - -#ifdef CONFIG_SMP - int on_cpu; - struct __call_single_node wake_entry; -#ifdef CONFIG_THREAD_INFO_IN_TASK - /* Current CPU: */ - unsigned int cpu; -#endif - unsigned int wakee_flips; - unsigned long wakee_flip_decay_ts; - struct task_struct *last_wakee; - int recent_used_cpu; - int wake_cpu; -#endif - int on_rq; - struct sched_class *sched_class; - struct sched_entity se; - struct sched_rt_entity rt; - struct sched_dl_entity dl; - -#ifdef CONFIG_SCHED_CORE - struct rb_node core_node; - unsigned long core_cookie; - unsigned int core_occupation; -#endif - -#ifdef CONFIG_CGROUP_SCHED - struct task_group *sched_task_group; -#endif - -#ifdef CONFIG_UCLAMP_TASK - /* - * Clamp values requested for a scheduling entity. - * Must be updated with task_rq_lock() held. - */ - struct uclamp_se uclamp_req[UCLAMP_CNT]; - /* - * Effective clamp values used for a scheduling entity. - * Must be updated with task_rq_lock() held. - */ - struct uclamp_se uclamp[UCLAMP_CNT]; -#endif - -#ifdef CONFIG_PREEMPT_NOTIFIERS - /* List of struct preempt_notifier: */ - struct hlist_head preempt_notifiers; -#endif - -#ifdef CONFIG_BLK_DEV_IO_TRACE - unsigned int btrace_seq; -#endif - - const cpumask_t *cpus_ptr; - cpumask_t *user_cpus_ptr; - cpumask_t cpus_mask; -#ifdef CONFIG_TASKS_RCU - unsigned long rcu_tasks_nvcsw; - u8 rcu_tasks_holdout; - u8 rcu_tasks_idx; - int rcu_tasks_idle_cpu; - struct list_head rcu_tasks_holdout_list; -#endif /* #ifdef CONFIG_TASKS_RCU */ - struct sched_info sched_info; - -#ifdef CONFIG_SMP - struct plist_node pushable_tasks; - struct rb_node pushable_dl_tasks; -#endif - /* Per-thread vma caching: */ - struct vmacache vmacache; - -#ifdef SPLIT_RSS_COUNTING - struct task_rss_stat rss_stat; -#endif - struct restart_block restart_block; - struct prev_cputime prev_cputime; -#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN - struct vtime vtime; -#endif -#ifdef CONFIG_NO_HZ_FULL - atomic_t tick_dep_mask; -#endif - /* Empty if CONFIG_POSIX_CPUTIMERS=n */ - struct posix_cputimers posix_cputimers; - -#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK - struct posix_cputimers_work posix_cputimers_work; -#endif +/* Simple struct members: */ +#define DEF(type, name) type name -#ifdef CONFIG_SYSVIPC - struct sysv_sem sysvsem; - struct sysv_shm sysvshm; -#endif - sigset_t blocked; - sigset_t real_blocked; - /* Restored if set_restore_sigmask() was used: */ - sigset_t saved_sigmask; - struct sigpending pending; - kuid_t loginuid; - struct seccomp seccomp; - /* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */ - spinlock_t alloc_lock; +/* Array members: */ +#define DEF_A(type, name, size) type name size - /* Protection of the PI data structures: */ - raw_spinlock_t pi_lock; - -#ifdef CONFIG_RT_MUTEXES - /* PI waiters blocked on a rt_mutex held by this task: */ - struct rb_root_cached pi_waiters; -#endif - -#ifdef CONFIG_DEBUG_MUTEXES - /* Mutex deadlock detection: */ - struct mutex_waiter *blocked_on; -#endif - kernel_siginfo_t *last_siginfo; -#ifdef CONFIG_CPUSETS - /* Protected by ->alloc_lock: */ - nodemask_t mems_allowed; - /* Sequence number to catch updates: */ - seqcount_spinlock_t mems_allowed_seq; - int cpuset_mem_spread_rotor; - int cpuset_slab_spread_rotor; -#endif - struct mutex futex_exit_mutex; -#ifdef CONFIG_PERF_EVENTS - struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts]; - struct mutex perf_event_mutex; - struct list_head perf_event_list; -#endif -#ifdef CONFIG_RSEQ - struct rseq __user *rseq; -#endif - struct tlbflush_unmap_batch tlb_ubc; - - refcount_t rcu_users; - struct rcu_head rcu; - - struct page_frag task_frag; - -#ifdef CONFIG_KCSAN - struct kcsan_ctx kcsan_ctx; -#ifdef CONFIG_TRACE_IRQFLAGS - struct irqtrace_events kcsan_save_irqtrace; -#endif -#endif - -#ifdef CONFIG_FUNCTION_GRAPH_TRACER - - /* - * Number of functions that haven't been traced - * because of depth overrun: - */ - atomic_t trace_overrun; - - /* Pause tracing: */ - atomic_t tracing_graph_pause; -#endif -#ifdef CONFIG_KMAP_LOCAL - struct kmap_ctrl kmap_ctrl; -#endif - int pagefault_disabled; -#ifdef CONFIG_VMAP_STACK - struct vm_struct *stack_vm_area; -#endif -#ifdef CONFIG_THREAD_INFO_IN_TASK - /* A live task holds one reference: */ - refcount_t stack_refcount; -#endif -#ifdef CONFIG_KRETPROBES - struct llist_head kretprobe_instances; -#endif +struct task_struct_per_task { +#include "per_task_area_struct_template.h" +}; - /* CPU-specific state of this task: */ - struct thread_struct thread; +#undef DEF_A +#undef DEF - char _end; -}; diff --git a/kernel/sched/per_task_area_struct_defs.h b/kernel/sched/per_task_area_struct_defs.h index 71f2a2884958..1d9b2e039880 100644 --- a/kernel/sched/per_task_area_struct_defs.h +++ b/kernel/sched/per_task_area_struct_defs.h @@ -4,162 +4,17 @@ #include <linux/kbuild.h> -#define DEF_PER_TASK(name) DEFINE(PER_TASK_OFFSET__##name, offsetof(struct task_struct_per_task, name)) +#define DEF_PER_TASK(name) DEFINE(PER_TASK_OFFSET__##name, offsetof(struct task_struct_per_task, name)) -void __used per_task_common(void) -{ -#ifdef CONFIG_THREAD_INFO_IN_TASK - DEF_PER_TASK(ti); -#endif - DEF_PER_TASK(stack); - DEF_PER_TASK(usage); - DEF_PER_TASK(flags); - DEF_PER_TASK(ptrace); - -#ifdef CONFIG_SMP - DEF_PER_TASK(on_cpu); - DEF_PER_TASK(wake_entry); -#ifdef CONFIG_THREAD_INFO_IN_TASK - DEF_PER_TASK(cpu); -#endif - DEF_PER_TASK(wakee_flips); - DEF_PER_TASK(wakee_flip_decay_ts); - DEF_PER_TASK(last_wakee); - DEF_PER_TASK(recent_used_cpu); - DEF_PER_TASK(wake_cpu); -#endif - DEF_PER_TASK(on_rq); - DEF_PER_TASK(sched_class); - DEF_PER_TASK(se); - DEF_PER_TASK(rt); - DEF_PER_TASK(dl); - -#ifdef CONFIG_SCHED_CORE - DEF_PER_TASK(core_node); - DEF_PER_TASK(core_cookie); - DEF_PER_TASK(core_occupation); -#endif - -#ifdef CONFIG_CGROUP_SCHED - DEF_PER_TASK(sched_task_group); -#endif - -#ifdef CONFIG_UCLAMP_TASK - DEF_PER_TASK(uclamp_req); - DEF_PER_TASK(uclamp); -#endif - -#ifdef CONFIG_PREEMPT_NOTIFIERS - DEF_PER_TASK(preempt_notifiers); -#endif - -#ifdef CONFIG_BLK_DEV_IO_TRACE - DEF_PER_TASK(btrace_seq); -#endif - - DEF_PER_TASK(cpus_ptr); - DEF_PER_TASK(user_cpus_ptr); - DEF_PER_TASK(cpus_mask); -#ifdef CONFIG_TASKS_RCU - DEF_PER_TASK(rcu_tasks_nvcsw); - DEF_PER_TASK(rcu_tasks_holdout); - DEF_PER_TASK(rcu_tasks_idx); - DEF_PER_TASK(rcu_tasks_idle_cpu); - DEF_PER_TASK(rcu_tasks_holdout_list); -#endif - DEF_PER_TASK(sched_info); - -#ifdef CONFIG_SMP - DEF_PER_TASK(pushable_tasks); - DEF_PER_TASK(pushable_dl_tasks); -#endif - DEF_PER_TASK(vmacache); +#define DEF(type, name) DEF_PER_TASK(name) +#define DEF_A(type, name, size) DEF_PER_TASK(name) -#ifdef SPLIT_RSS_COUNTING - DEF_PER_TASK(rss_stat); -#endif - DEF_PER_TASK(restart_block); - DEF_PER_TASK(prev_cputime); -#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN - DEF_PER_TASK(vtime); -#endif -#ifdef CONFIG_NO_HZ_FULL - DEF_PER_TASK(tick_dep_mask); -#endif - DEF_PER_TASK(posix_cputimers); -#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK - DEF_PER_TASK(posix_cputimers_work); -#endif - -#ifdef CONFIG_SYSVIPC - DEF_PER_TASK(sysvsem); - DEF_PER_TASK(sysvshm); -#endif - DEF_PER_TASK(blocked); - DEF_PER_TASK(real_blocked); - DEF_PER_TASK(saved_sigmask); - DEF_PER_TASK(pending); - DEF_PER_TASK(loginuid); - DEF_PER_TASK(seccomp); - DEF_PER_TASK(alloc_lock); - - DEF_PER_TASK(pi_lock); - -#ifdef CONFIG_RT_MUTEXES - DEF_PER_TASK(pi_waiters); -#endif - -#ifdef CONFIG_DEBUG_MUTEXES - DEF_PER_TASK(blocked_on); -#endif - DEF_PER_TASK(last_siginfo); -#ifdef CONFIG_CPUSETS - DEF_PER_TASK(mems_allowed); - DEF_PER_TASK(mems_allowed_seq); - DEF_PER_TASK(cpuset_mem_spread_rotor); - DEF_PER_TASK(cpuset_slab_spread_rotor); -#endif - DEF_PER_TASK(futex_exit_mutex); -#ifdef CONFIG_PERF_EVENTS - DEF_PER_TASK(perf_event_ctxp); - DEF_PER_TASK(perf_event_mutex); - DEF_PER_TASK(perf_event_list); -#endif -#ifdef CONFIG_RSEQ - DEF_PER_TASK(rseq); -#endif - DEF_PER_TASK(tlb_ubc); - - DEF_PER_TASK(rcu_users); - DEF_PER_TASK(rcu); - - DEF_PER_TASK(task_frag); +void __used per_task_common(void) +{ +#include "per_task_area_struct_template.h" +} -#ifdef CONFIG_KCSAN - DEF_PER_TASK(kcsan_ctx); -#ifdef CONFIG_TRACE_IRQFLAGS - DEF_PER_TASK(kcsan_save_irqtrace); -#endif -#endif +#undef DEF_A +#undef DEF -#ifdef CONFIG_FUNCTION_GRAPH_TRACER - DEF_PER_TASK(trace_overrun); - DEF_PER_TASK(tracing_graph_pause); -#endif -#ifdef CONFIG_KMAP_LOCAL - DEF_PER_TASK(kmap_ctrl); -#endif - DEF_PER_TASK(pagefault_disabled); -#ifdef CONFIG_VMAP_STACK - DEF_PER_TASK(stack_vm_area); -#endif -#ifdef CONFIG_THREAD_INFO_IN_TASK - DEF_PER_TASK(stack_refcount); -#endif -#ifdef CONFIG_KRETPROBES - DEF_PER_TASK(kretprobe_instances); -#endif - DEF_PER_TASK(thread); - DEF_PER_TASK(_end); -} diff --git a/kernel/sched/per_task_area_struct_template.h b/kernel/sched/per_task_area_struct_template.h new file mode 100644 index 000000000000..ed2ccd80c83c --- /dev/null +++ b/kernel/sched/per_task_area_struct_template.h @@ -0,0 +1,198 @@ + +/* + * This is the primary definition of per_task() fields, + * which gets turned into the 'struct task_struct_per_task' + * structure definition, and into offset definitions, + * in per_task_area_struct.h and per_task_area_struct_defs.h: + */ + +#ifdef CONFIG_THREAD_INFO_IN_TASK + /* + * For reasons of header soup (see current_thread_info()), this + * must be the first element of task_struct. + */ + DEF( struct thread_info, ti ); +#endif + DEF( void *, stack ); + DEF( refcount_t, usage ); + + /* Per task flags (PF_*), defined further below: */ + DEF( unsigned int, flags ); + DEF( unsigned int, ptrace ); + +#ifdef CONFIG_SMP + DEF( int, on_cpu ); + DEF( struct __call_single_node, wake_entry ); +#ifdef CONFIG_THREAD_INFO_IN_TASK + /* Current CPU: */ + DEF( unsigned int, cpu ); +#endif + DEF( unsigned int, wakee_flips ); + DEF( unsigned long, wakee_flip_decay_ts ); + DEF( struct task_struct *, last_wakee ); + DEF( int, recent_used_cpu ); + DEF( int, wake_cpu ); +#endif + DEF( int, on_rq ); + DEF( struct sched_class *, sched_class ); + DEF( struct sched_entity, se ); + DEF( struct sched_rt_entity, rt ); + DEF( struct sched_dl_entity, dl ); + +#ifdef CONFIG_SCHED_CORE + DEF( struct rb_node, core_node ); + DEF( unsigned long, core_cookie ); + DEF( unsigned int, core_occupation ); +#endif + +#ifdef CONFIG_CGROUP_SCHED + DEF( struct task_group *, sched_task_group ); +#endif + +#ifdef CONFIG_UCLAMP_TASK + /* + * Clamp values requested for a scheduling entity. + * Must be updated with task_rq_lock() held. + */ + DEF_A( struct uclamp_se, uclamp_req, [UCLAMP_CNT] ); + /* + * Effective clamp values used for a scheduling entity. + * Must be updated with task_rq_lock() held. + */ + DEF_A( struct uclamp_se, uclamp, [UCLAMP_CNT] ); +#endif + +#ifdef CONFIG_PREEMPT_NOTIFIERS + /* List of struct preempt_notifier: */ + DEF( struct hlist_head, preempt_notifiers ); +#endif + +#ifdef CONFIG_BLK_DEV_IO_TRACE + DEF( unsigned int, btrace_seq ); +#endif + + DEF( const cpumask_t *, cpus_ptr ); + DEF( cpumask_t *, user_cpus_ptr ); + DEF( cpumask_t, cpus_mask ); +#ifdef CONFIG_TASKS_RCU + DEF( unsigned long, rcu_tasks_nvcsw ); + DEF( u8, rcu_tasks_holdout ); + DEF( u8, rcu_tasks_idx ); + DEF( int, rcu_tasks_idle_cpu ); + DEF( struct list_head, rcu_tasks_holdout_list ); +#endif /* #ifdef CONFIG_TASKS_RCU */ + DEF( struct sched_info, sched_info ); + +#ifdef CONFIG_SMP + DEF( struct plist_node, pushable_tasks ); + DEF( struct rb_node, pushable_dl_tasks ); +#endif + /* Per-thread vma caching: */ + DEF( struct vmacache, vmacache ); + +#ifdef SPLIT_RSS_COUNTING + DEF( struct task_rss_stat, rss_stat ); +#endif + DEF( struct restart_block, restart_block ); + DEF( struct prev_cputime, prev_cputime ); +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN + DEF( struct vtime, vtime ); +#endif +#ifdef CONFIG_NO_HZ_FULL + DEF( atomic_t, tick_dep_mask ); +#endif + /* Empty if CONFIG_POSIX_CPUTIMERS=n */ + DEF( struct posix_cputimers, posix_cputimers ); + +#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK + DEF( struct posix_cputimers_work, posix_cputimers_work ); +#endif + +#ifdef CONFIG_SYSVIPC + DEF( struct sysv_sem, sysvsem ); + DEF( struct sysv_shm, sysvshm ); +#endif + DEF( sigset_t, blocked ); + DEF( sigset_t, real_blocked ); + /* Restored if set_restore_sigmask() was used: */ + DEF( sigset_t, saved_sigmask ); + DEF( struct sigpending, pending ); + DEF( kuid_t, loginuid ); + DEF( struct seccomp, seccomp ); + /* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */ + DEF( spinlock_t, alloc_lock ); + + /* Protection of the PI data structures: */ + DEF( raw_spinlock_t, pi_lock ); + +#ifdef CONFIG_RT_MUTEXES + /* PI waiters blocked on a rt_mutex held by this task: */ + DEF( struct rb_root_cached, pi_waiters ); +#endif + +#ifdef CONFIG_DEBUG_MUTEXES + /* Mutex deadlock detection: */ + DEF( struct mutex_waiter *, blocked_on ); +#endif + DEF( kernel_siginfo_t *, last_siginfo ); +#ifdef CONFIG_CPUSETS + /* Protected by ->alloc_lock: */ + DEF( nodemask_t, mems_allowed ); + /* Sequence number to catch updates: */ + DEF( seqcount_spinlock_t, mems_allowed_seq ); + DEF( int, cpuset_mem_spread_rotor ); + DEF( int, cpuset_slab_spread_rotor ); +#endif + DEF( struct mutex, futex_exit_mutex ); +#ifdef CONFIG_PERF_EVENTS + DEF_A( struct perf_event_context *, perf_event_ctxp, [perf_nr_task_contexts] ); + DEF( struct mutex, perf_event_mutex ); + DEF( struct list_head, perf_event_list ); +#endif +#ifdef CONFIG_RSEQ + DEF( struct rseq __user *, rseq ); +#endif + DEF( struct tlbflush_unmap_batch, tlb_ubc ); + + DEF( refcount_t, rcu_users ); + DEF( struct rcu_head, rcu ); + + DEF( struct page_frag, task_frag ); + +#ifdef CONFIG_KCSAN + DEF( struct kcsan_ctx, kcsan_ctx ); +#ifdef CONFIG_TRACE_IRQFLAGS + DEF( struct irqtrace_events, kcsan_save_irqtrace ); +#endif +#endif + +#ifdef CONFIG_FUNCTION_GRAPH_TRACER + + /* + * Number of functions that haven't been traced + * because of depth overrun: + */ + DEF( atomic_t, trace_overrun ); + + /* Pause tracing: */ + DEF( atomic_t, tracing_graph_pause ); +#endif +#ifdef CONFIG_KMAP_LOCAL + DEF( struct kmap_ctrl, kmap_ctrl ); +#endif + DEF( int, pagefault_disabled ); +#ifdef CONFIG_VMAP_STACK + DEF( struct vm_struct *, stack_vm_area ); +#endif +#ifdef CONFIG_THREAD_INFO_IN_TASK + /* A live task holds one reference: */ + DEF( refcount_t, stack_refcount ); +#endif +#ifdef CONFIG_KRETPROBES + DEF( struct llist_head, kretprobe_instances ); +#endif + + /* CPU-specific state of this task: */ + DEF( struct thread_struct, thread ); + + DEF( char, _end ); ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> 2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman @ 2022-01-03 13:54 ` Kirill A. Shutemov 2022-01-04 10:54 ` Ingo Molnar 2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor ` (4 subsequent siblings) 6 siblings, 1 reply; 54+ messages in thread From: Kirill A. Shutemov @ 2022-01-03 13:54 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > - As to testing & runtime behavior: while all of these patches are > intended to be bug-free, I did find a couple of semi-bugs in the kernel > where a specific order of headers guaranteed a particular code > generation outcome - and if that header order was disturbed, the kernel > would silently break and fail to boot ... Looks like you are doing a lot of uninlining. Do you see any runtime performance degradation with the patchset? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov @ 2022-01-04 10:54 ` Ingo Molnar 2022-01-04 13:34 ` Greg Kroah-Hartman 0 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 10:54 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro \ * Kirill A. Shutemov <kirill@shutemov.name> wrote: > On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > - As to testing & runtime behavior: while all of these patches are > > intended to be bug-free, I did find a couple of semi-bugs in the kernel > > where a specific order of headers guaranteed a particular code > > generation outcome - and if that header order was disturbed, the kernel > > would silently break and fail to boot ... > > Looks like you are doing a lot of uninlining. Do you see any runtime > performance degradation with the patchset? I haven't tested that yet - and it's pretty hard to performance test uninlining patches directly. But what I've done is that I basically looked at the context and tried to make a judgement call based on generated code. In all the uninlining patches where I thought it might not be clear whether it's proper to uninline I added detailed analysis, such as this one: commit d94530f1abcbfd2500e90e151e7c67ff48ab3259 Author: Ingo Molnar <mingo@kernel.org> Date: Sat Nov 20 18:20:58 2021 +0100 headers/uninline: Uninline multi-use function: put_page() Ever since the page_is_devmap_managed() logic was added to put_page() in: 07d802699528: ("mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages") put_page() has become a much larger function of over 2 dozen instructions: 0000000000004d30 <put_page>: 4d30: e8 00 00 00 00 call 4d35 <put_page+0x5> 4d35: 55 push %rbp 4d36: 48 8b 47 08 mov 0x8(%rdi),%rax 4d3a: 48 8d 50 ff lea -0x1(%rax),%rdx 4d3e: a8 01 test $0x1,%al 4d40: 48 89 e5 mov %rsp,%rbp 4d43: 48 0f 45 fa cmovne %rdx,%rdi 4d47: 66 90 xchg %ax,%ax 4d49: f0 ff 4f 34 lock decl 0x34(%rdi) 4d4d: 74 27 je 4d76 <put_page+0x46> 4d4f: 5d pop %rbp 4d50: c3 ret 4d51: 48 8b 07 mov (%rdi),%rax 4d54: 48 c1 e8 33 shr $0x33,%rax 4d58: 83 e0 07 and $0x7,%eax 4d5b: 83 f8 04 cmp $0x4,%eax 4d5e: 75 e9 jne 4d49 <put_page+0x19> 4d60: 48 8b 47 08 mov 0x8(%rdi),%rax 4d64: 8b 40 68 mov 0x68(%rax),%eax 4d67: 83 e8 01 sub $0x1,%eax 4d6a: 83 f8 01 cmp $0x1,%eax 4d6d: 77 da ja 4d49 <put_page+0x19> 4d6f: e8 00 00 00 00 call 4d74 <put_page+0x44> 4d74: 5d pop %rbp 4d75: c3 ret 4d76: e8 00 00 00 00 call 4d7b <put_page+0x4b> 4d7b: 5d pop %rbp 4d7c: c3 ret Uninline it. To counter some of the runtime overhead of the extra function call, inline the __put_page() instance into put_page() - this is now possible without extra bloat. There's a measurable improvement in vmlinux text size, on a distro kernel build, by ~4 KB. Doing so also decouples <linux/mm_api.h> from <linux/memremap.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> I think it's pretty much a given that we don't want to inline 2 dozen instructions for every put_page() call and we don't need performance testing. Admittedly my 'judgement call' was colored by the overall goal to decouple types and headers, so please do double check! None of the uninlining patches are critical to this tree - there's various other ways headers can be decoupled other than uninlining. There's one happy exception though, all the uninlining patches that uninline a single-call function are probably fine as-is: ef1028c44345 headers/uninline: Uninline single-use function: mips: page_size_ftlb() 98bc89e85e3f headers/uninline: Uninline single-use function: set_page_links() e368b54381e9 headers/uninline: Uninline single-use function: cpupid_to_nid() 36b59978a96d headers/uninline: Uninline single-use function: wb_domain_size_changed() 4c95e8f21924 headers/uninline: Uninline single-use function: skb_metadata_differs() 28195c3f7eba headers/uninline: Uninline single-use function: for_each_netdev_feature() 3c82b720eb01 headers/uninline: Uninline single-use function: SPI_STATISTICS_ADD_*() e7c48e440df3 headers/uninline: Uninline single-use function: qdisc_run() ba0bfe18c8cc headers/uninline: Uninline single-use function: dev_validate_header() 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() 0e15d2fb85f9 headers/uninline: Uninline single-use function: xfrm_dev_state_free() 45d5233e1f5f headers/uninline: Uninline single-use function: flow_dissector_init_keys() 7a897b0747b2 headers/uninline: Uninline single-use function: reqsk_alloc() f9003f1bd834 headers/uninline: Uninline single-use function: skb_propagate_pfmemalloc() 54ea5750f484 headers/uninline: Uninline single-use function: syscall_tracepoint_update() 5a1dc0bca4a4 headers/uninline: Uninline single-use function: proc_sys_poll_event() 0af72df4042d headers/uninline: Uninline single-use function: ep_take_care_of_epollwakeup() 13a8bd09a93a headers/uninline: Uninline single-use function: ptrace_event_pid() f2b8980d4178 headers/uninline: Uninline single-use function: itimerspec64_valid() ec111205e6de headers/uninline: Uninline single-use function: sk_under_cgroup_hierarchy() d623ba9eb252 headers/uninline: Uninline single-use function: wb_find_current() and wb_get_create_current() Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 10:54 ` Ingo Molnar @ 2022-01-04 13:34 ` Greg Kroah-Hartman 2022-01-04 13:54 ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar 0 siblings, 1 reply; 54+ messages in thread From: Greg Kroah-Hartman @ 2022-01-04 13:34 UTC (permalink / raw) To: Ingo Molnar Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote: > There's one happy exception though, all the uninlining patches that > uninline a single-call function are probably fine as-is: <snip> > 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() Let me go take this right now, no need for this to wait, it should be out of kobject.h as you rightfully show there is only one user. thanks, greg k-h ^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() 2022-01-04 13:34 ` Greg Kroah-Hartman @ 2022-01-04 13:54 ` Ingo Molnar 2022-01-04 15:09 ` Greg Kroah-Hartman 0 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 13:54 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote: > > There's one happy exception though, all the uninlining patches that > > uninline a single-call function are probably fine as-is: > > <snip> > > > 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() > > Let me go take this right now, no need for this to wait, it should be > out of kobject.h as you rightfully show there is only one user. Sure - here you go! Thanks, Ingo =============================> From: Ingo Molnar <mingo@kernel.org> Date: Sun, 29 Aug 2021 09:18:53 +0200 Subject: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() This was the only usage of <linux/kref_api.h> in <linux/kobject_api.h>, so we'll able to decouple the two after this change. Signed-off-by: Ingo Molnar <mingo@kernel.org> --- drivers/base/core.c | 17 +++++++++++++++++ include/linux/kobject.h | 17 ----------------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index fd034d742447..e1f2a5791c0e 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -3029,6 +3029,23 @@ static inline struct kobject *get_glue_dir(struct device *dev) return dev->kobj.parent; } +/** + * kobject_has_children - Returns whether a kobject has children. + * @kobj: the object to test + * + * This will return whether a kobject has other kobjects as children. + * + * It does NOT account for the presence of attribute files, only sub + * directories. It also assumes there is no concurrent addition or + * removal of such children, and thus relies on external locking. + */ +static inline bool kobject_has_children(struct kobject *kobj) +{ + WARN_ON_ONCE(kref_read(&kobj->kref) == 0); + + return kobj->sd && kobj->sd->dir.subdirs; +} + /* * make sure cleaning up dir as the last step, we need to make * sure .release handler of kobject is run with holding the diff --git a/include/linux/kobject.h b/include/linux/kobject.h index efd56f990a46..e1c600a377f7 100644 --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -117,23 +117,6 @@ extern void kobject_get_ownership(struct kobject *kobj, kuid_t *uid, kgid_t *gid); extern char *kobject_get_path(struct kobject *kobj, gfp_t flag); -/** - * kobject_has_children - Returns whether a kobject has children. - * @kobj: the object to test - * - * This will return whether a kobject has other kobjects as children. - * - * It does NOT account for the presence of attribute files, only sub - * directories. It also assumes there is no concurrent addition or - * removal of such children, and thus relies on external locking. - */ -static inline bool kobject_has_children(struct kobject *kobj) -{ - WARN_ON_ONCE(kref_read(&kobj->kref) == 0); - - return kobj->sd && kobj->sd->dir.subdirs; -} - struct kobj_type { void (*release)(struct kobject *kobj); const struct sysfs_ops *sysfs_ops; ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() 2022-01-04 13:54 ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar @ 2022-01-04 15:09 ` Greg Kroah-Hartman 2022-01-04 15:14 ` Greg Kroah-Hartman 0 siblings, 1 reply; 54+ messages in thread From: Greg Kroah-Hartman @ 2022-01-04 15:09 UTC (permalink / raw) To: Ingo Molnar Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote: > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote: > > > There's one happy exception though, all the uninlining patches that > > > uninline a single-call function are probably fine as-is: > > > > <snip> > > > > > 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() > > > > Let me go take this right now, no need for this to wait, it should be > > out of kobject.h as you rightfully show there is only one user. > > Sure - here you go! I just picked it out of your git tree already :) Along those lines, any objection to me taking at least one other one? 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h> dependencies, remove <linux/device.h>") look like I can take now into my USB tree with no problems. thanks, greg k-h ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() 2022-01-04 15:09 ` Greg Kroah-Hartman @ 2022-01-04 15:14 ` Greg Kroah-Hartman 2022-01-05 0:11 ` Ingo Molnar 0 siblings, 1 reply; 54+ messages in thread From: Greg Kroah-Hartman @ 2022-01-04 15:14 UTC (permalink / raw) To: Ingo Molnar Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Tue, Jan 04, 2022 at 04:09:57PM +0100, Greg Kroah-Hartman wrote: > On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote: > > > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote: > > > > There's one happy exception though, all the uninlining patches that > > > > uninline a single-call function are probably fine as-is: > > > > > > <snip> > > > > > > > 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() > > > > > > Let me go take this right now, no need for this to wait, it should be > > > out of kobject.h as you rightfully show there is only one user. > > > > Sure - here you go! > > I just picked it out of your git tree already :) > > Along those lines, any objection to me taking at least one other one? > 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and > 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h> > dependencies, remove <linux/device.h>") look like I can take now into my > USB tree with no problems. Also these look good to go now: bae9ddd98195 ("headers/prep: Fix non-standard header section: drivers/usb/cdns3/core.h") c027175b37e5 ("headers/prep: Fix non-standard header section: drivers/usb/host/ohci-tmio.c") thanks, greg k-h ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() 2022-01-04 15:14 ` Greg Kroah-Hartman @ 2022-01-05 0:11 ` Ingo Molnar 2022-01-05 15:23 ` Greg Kroah-Hartman 0 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:11 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Tue, Jan 04, 2022 at 04:09:57PM +0100, Greg Kroah-Hartman wrote: > > On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote: > > > > > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote: > > > > > There's one happy exception though, all the uninlining patches that > > > > > uninline a single-call function are probably fine as-is: > > > > > > > > <snip> > > > > > > > > > 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() > > > > > > > > Let me go take this right now, no need for this to wait, it should be > > > > out of kobject.h as you rightfully show there is only one user. > > > > > > Sure - here you go! > > > > I just picked it out of your git tree already :) > > > > Along those lines, any objection to me taking at least one other one? > > 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and Ack. > > 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h> Ack. > > dependencies, remove <linux/device.h>") look like I can take now into my > > USB tree with no problems. > > Also these look good to go now: > bae9ddd98195 ("headers/prep: Fix non-standard header section: drivers/usb/cdns3/core.h") Ack. > c027175b37e5 ("headers/prep: Fix non-standard header section: drivers/usb/host/ohci-tmio.c") Ack. Note that these latter two patches just simplified the task of my (simplistic) tooling, which is basically a shell script that inserts header dependencies to the head of .c and .h files, right in front of the first #include line it encounters. These two patches do have some marginal clean-up value too, so I'm not opposed to merging them - just wanted to declare their true role. :-) Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() 2022-01-05 0:11 ` Ingo Molnar @ 2022-01-05 15:23 ` Greg Kroah-Hartman 2022-01-06 11:26 ` Ingo Molnar 0 siblings, 1 reply; 54+ messages in thread From: Greg Kroah-Hartman @ 2022-01-05 15:23 UTC (permalink / raw) To: Ingo Molnar Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Wed, Jan 05, 2022 at 01:11:03AM +0100, Ingo Molnar wrote: > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > On Tue, Jan 04, 2022 at 04:09:57PM +0100, Greg Kroah-Hartman wrote: > > > On Tue, Jan 04, 2022 at 02:54:31PM +0100, Ingo Molnar wrote: > > > > > > > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > > > On Tue, Jan 04, 2022 at 11:54:55AM +0100, Ingo Molnar wrote: > > > > > > There's one happy exception though, all the uninlining patches that > > > > > > uninline a single-call function are probably fine as-is: > > > > > > > > > > <snip> > > > > > > > > > > > 3443e75fd1f8 headers/uninline: Uninline single-use function: kobject_has_children() > > > > > > > > > > Let me go take this right now, no need for this to wait, it should be > > > > > out of kobject.h as you rightfully show there is only one user. > > > > > > > > Sure - here you go! > > > > > > I just picked it out of your git tree already :) > > > > > > Along those lines, any objection to me taking at least one other one? > > > 3f8757078d27 ("headers/prep: usb: gadget: Fix namespace collision") and > > Ack. > > > > 6fb993fa3832 ("headers/deps: USB: Optimize <linux/usb/ch9.h> > > Ack. This one required me to fix up a usb core file that was only including this .h file and not kernel.h which it also needed. Now resolved in my tree. > > > dependencies, remove <linux/device.h>") look like I can take now into my > > > USB tree with no problems. > > > > Also these look good to go now: > > bae9ddd98195 ("headers/prep: Fix non-standard header section: drivers/usb/cdns3/core.h") > > Ack. > > > c027175b37e5 ("headers/prep: Fix non-standard header section: drivers/usb/host/ohci-tmio.c") > > Ack. > > Note that these latter two patches just simplified the task of my > (simplistic) tooling, which is basically a shell script that inserts > header dependencies to the head of .c and .h files, right in front of > the first #include line it encounters. > > These two patches do have some marginal clean-up value too, so I'm not > opposed to merging them - just wanted to declare their true role. :-) They all are sane cleanups, so I've taken them in my tree now. Make your patchset a bit smaller against 5.17-rc1 when that comes around :) thanks, greg k-h ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() 2022-01-05 15:23 ` Greg Kroah-Hartman @ 2022-01-06 11:26 ` Ingo Molnar 0 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-06 11:26 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Kirill A. Shutemov, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro * Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > Note that these latter two patches just simplified the task of my > > (simplistic) tooling, which is basically a shell script that inserts > > header dependencies to the head of .c and .h files, right in front of > > the first #include line it encounters. > > > > These two patches do have some marginal clean-up value too, so I'm not > > opposed to merging them - just wanted to declare their true role. :-) > > They all are sane cleanups, so I've taken them in my tree now. Make your > patchset a bit smaller against 5.17-rc1 when that comes around :) Thank you! :-) Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> 2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman 2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov @ 2022-01-03 17:54 ` Nathan Chancellor 2022-01-04 10:47 ` Ingo Molnar 2022-01-04 12:36 ` Willy Tarreau ` (3 subsequent siblings) 6 siblings, 1 reply; 54+ messages in thread From: Nathan Chancellor @ 2022-01-03 17:54 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm Hi Ingo, On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > Before going into details about how this tree solves 'dependency hell' > exactly, here's the current kernel build performance gain with > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as > well - see below), using a stock x86 Linux distribution's .config with all > modules built into the vmlinux: > > # > # Performance counter stats for 'make -j96 vmlinux' (3 runs): > # > # (Elapsed time in seconds): > # > > v5.16-rc7: 231.34 +- 0.60 secs, 15.5 builds/hour # [ vanilla baseline ] > -fast-headers-v1: 129.97 +- 0.51 secs, 27.7 builds/hour # +78.0% improvement This is really impressive; as someone who constantly builds large kernels for test coverage, I am excited about less time to get results. Testing on an 80-core arm64 server (the fastest machine I have access to at the moment) with LLVM, I can see anywhere from 18% to 35% improvement. Benchmark 1: ARCH=arm64 defconfig (linux) Time (mean ± σ): 97.159 s ± 0.246 s [User: 4828.383 s, System: 611.256 s] Range (min … max): 96.900 s … 97.648 s 10 runs Benchmark 2: ARCH=arm64 defconfig (linux-fast-headers) Time (mean ± σ): 76.300 s ± 0.107 s [User: 3149.986 s, System: 436.487 s] Range (min … max): 76.117 s … 76.467 s 10 runs Summary 'ARCH=arm64 defconfig (linux-fast-headers)' ran 1.27 ± 0.00 times faster than 'ARCH=arm64 defconfig (linux)' Benchmark 1: ARCH=arm64 allmodconfig (linux) Time (mean ± σ): 390.106 s ± 0.192 s [User: 23893.382 s, System: 2802.413 s] Range (min … max): 389.942 s … 390.513 s 7 runs Benchmark 2: ARCH=arm64 allmodconfig (linux-fast-headers) Time (mean ± σ): 288.066 s ± 0.621 s [User: 16436.098 s, System: 2117.352 s] Range (min … max): 287.131 s … 288.982 s 7 runs Summary 'ARCH=arm64 allmodconfig (linux-fast-headers)' ran 1.35 ± 0.00 times faster than 'ARCH=arm64 allmodconfig (linux)' Benchmark 1: ARCH=arm64 allyesconfig (linux) Time (mean ± σ): 557.752 s ± 1.019 s [User: 21227.404 s, System: 2226.121 s] Range (min … max): 555.833 s … 558.775 s 7 runs Benchmark 2: ARCH=arm64 allyesconfig (linux-fast-headers) Time (mean ± σ): 473.815 s ± 1.793 s [User: 15351.991 s, System: 1689.630 s] Range (min … max): 471.542 s … 476.830 s 7 runs Summary 'ARCH=arm64 allyesconfig (linux-fast-headers)' ran 1.18 ± 0.00 times faster than 'ARCH=arm64 allyesconfig (linux)' I wanted to test the same x86_64 configs last night but I ran out of time before bed due to some issues that I was only able to look at this morning (more on those below). I'll just have to settle for defconfig right now, whichs shows a modest improvement. Benchmark 1: ARCH=x86_64 defconfig (linux) Time (mean ± σ): 41.122 s ± 0.190 s [User: 1700.206 s, System: 205.555 s] Range (min … max): 40.966 s … 41.515 s 7 runs Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) Time (mean ± σ): 36.357 s ± 0.183 s [User: 1134.252 s, System: 152.396 s] Range (min … max): 35.983 s … 36.534 s 7 runs Summary 'ARCH=x86_64 defconfig (linux-fast-headers)' ran 1.13 ± 0.01 times faster than 'ARCH=x86_64 defconfig (linux)' > For example, the preprocessed kernel/pid.c file explodes into over 94,000 > lines of code on the vanilla kernel: > > # v5.16-rc7: > > kepler:~/mingo.tip.git> make kernel/pid.i > kepler:~/mingo.tip.git> wc -l kernel/pid.i > 94569 kernel/pid.i > > The compiler has to go through those 95,000 lines of code - even if a lot > of it is trivial fluff not actually used by kernel/pid.c. > > With the fast-headers kernel that's down to ~36,000 lines of code, almost a > factor of 3 reduction: > > # fast-headers-v1: > kepler:~/mingo.tip.git> wc -l kernel/pid.i > 35941 kernel/pid.i Coming from someone who often has to reduce a preprocessed kernel source file with creduce/cvise to report compiler bugs, this will be a very welcomed change, as those tools will have to do less work, and I can get my reports done faster. ######################################################################## I took the series for a spin with clang and GCC on arm64 and x86_64 and I found a few warnings/errors. 1. Position of certain attributes In some commits, you move the cacheline_aligned attributes from after the closing brace on structures to before the struct keyword, which causes clang to warn (and error with CONFIG_WERROR): In file included from arch/arm64/kernel/asm-offsets.c:9: In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33: In file included from ./include/linux/perf_event_api.h:17: In file included from ./include/linux/perf_event_types.h:41: In file included from ./include/linux/ftrace.h:18: In file included from ./arch/arm64/include/asm/ftrace.h:53: In file included from ./include/linux/compat.h:11: ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] ____cacheline_aligned ^ ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ My diff to fix this looks like: diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 520daf638d06..da7e77a7cede 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -127,8 +127,7 @@ enum dentry_d_lock_class DENTRY_D_LOCK_NESTED }; -____cacheline_aligned -struct dentry_operations { +struct ____cacheline_aligned dentry_operations { int (*d_revalidate)(struct dentry *, unsigned int); int (*d_weak_revalidate)(struct dentry *, unsigned int); int (*d_hash)(const struct dentry *, struct qstr *); diff --git a/include/linux/fs_types.h b/include/linux/fs_types.h index b53aadafab1b..e2e1c0827183 100644 --- a/include/linux/fs_types.h +++ b/include/linux/fs_types.h @@ -994,8 +994,7 @@ struct file_operations { int (*fadvise)(struct file *, loff_t, loff_t, int); } __randomize_layout; -____cacheline_aligned -struct inode_operations { +struct ____cacheline_aligned inode_operations { struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *); int (*permission) (struct user_namespace *, struct inode *, int); diff --git a/include/linux/netdevice_api.h b/include/linux/netdevice_api.h index 4a8d7688e148..0e5e08dcbb2a 100644 --- a/include/linux/netdevice_api.h +++ b/include/linux/netdevice_api.h @@ -49,7 +49,7 @@ #endif /* This structure contains an instance of an RX queue. */ -____cacheline_aligned_in_smp struct netdev_rx_queue { +struct ____cacheline_aligned_in_smp netdev_rx_queue { struct xdp_rxq_info xdp_rxq; #ifdef CONFIG_RPS struct rps_map __rcu *rps_map; diff --git a/include/net/xdp_types.h b/include/net/xdp_types.h index 442028626b35..accc12372bca 100644 --- a/include/net/xdp_types.h +++ b/include/net/xdp_types.h @@ -56,7 +56,7 @@ struct xdp_mem_info { struct page_pool; /* perf critical, avoid false-sharing */ -____cacheline_aligned struct xdp_rxq_info { +struct ____cacheline_aligned xdp_rxq_info { struct net_device *dev; u32 queue_index; u32 reg_state; 2. Error with CONFIG_SHADOW_CALL_STACK With ARCH=arm64 defconfig + CONFIG_SHADOW_CALL_STACK, I see the following error: $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig menuconfig init/ init/main.c:916:50: error: use of undeclared identifier 'init_shadow_call_stack' per_task(&init_task, ti) = (struct thread_info) INIT_THREAD_INFO(init_task); ^ ./arch/arm64/include/asm/thread_info.h:123:2: note: expanded from macro 'INIT_THREAD_INFO' INIT_SCS \ ^ ./arch/arm64/include/asm/thread_info.h:113:14: note: expanded from macro 'INIT_SCS' .scs_base = init_shadow_call_stack, \ ^ init/main.c:916:50: error: use of undeclared identifier 'init_shadow_call_stack' ./arch/arm64/include/asm/thread_info.h:123:2: note: expanded from macro 'INIT_THREAD_INFO' INIT_SCS \ ^ ./arch/arm64/include/asm/thread_info.h:114:13: note: expanded from macro 'INIT_SCS' .scs_sp = init_shadow_call_stack, ^ 2 errors generated. It looks like on mainline, init_shadow_call_stack is in defined and used in init/init_task.c but now, it is used in init/main.c, with no declaration to allow the compiler to find the definition. I guess moving init_shadow_call_stack out of init/init_task.c to somewhere more common would fix this but it depends on SCS_SIZE, which is defined in include/linux/scs.h, and as soon as I tried to include that in another file, the build broke further... Any ideas you have would be appreciated :) for benchmarking purposes, I just disabled CONFIG_SHADOW_CALL_STACK. 3. Nested function in arch/x86/kernel/asm-offsets.c $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 defconfig all In file included from arch/x86/kernel/asm-offsets.c:40: arch/x86/kernel/../../../kernel/sched/per_task_area_struct_defs.h:10:1: error: function definition is not allowed here { ^ 1 error generated. Clang does not and will not support nested functions; any instances of those in the kernel were eliminated when formalizing clang support. I am not really sure if this was intentional or not? Looking at the other asm-offsets.c files, I see the include outside of any function. Moving it out of the common() function does not appear to break the build for defconfig, allmodconfig, or my distribution config and it boots in QEMU and my AMD based test desktop. diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index ff3f8ed5d0a2..a6d56f4697cd 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -35,10 +35,10 @@ # include "asm-offsets_64.c" #endif -static void __used common(void) -{ #include "../../../kernel/sched/per_task_area_struct_defs.h" +static void __used common(void) +{ BLANK(); DEFINE(TASK_threadsp, offsetof(struct task_struct, per_task_area) + offsetof(struct task_struct_per_task, thread) + 4. Build error in kernel/gcov/clang.c $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 distclean allmodconfig kernel/gcov/clang.o kernel/gcov/clang.c:232:3: error: implicitly declaring library function 'memset' with type 'void *(void *, int, unsigned long)' [-Werror,-Wimplicit-function-declaration] memset(fn->counters, 0, ^ kernel/gcov/clang.c:232:3: note: include the header <string.h> or explicitly provide a declaration for 'memset' kernel/gcov/clang.c:291:32: error: implicit declaration of function 'kmemdup' [-Werror,-Wimplicit-function-declaration] struct gcov_fn_info *fn_dup = kmemdup(fn, sizeof(*fn), ^ kernel/gcov/clang.c:291:23: error: incompatible integer to pointer conversion initializing 'struct gcov_fn_info *' with an expression of type 'int' [-Werror,-Wint-conversion] struct gcov_fn_info *fn_dup = kmemdup(fn, sizeof(*fn), ^ ~~~~~~~~~~~~~~~~~~~~~~~~ kernel/gcov/clang.c:304:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration] memcpy(fn_dup->counters, fn->counters, cv_size); ^ kernel/gcov/clang.c:304:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy' kernel/gcov/clang.c:320:8: error: implicit declaration of function 'kmemdup' [-Werror,-Wimplicit-function-declaration] dup = kmemdup(info, sizeof(*dup), GFP_KERNEL); ^ kernel/gcov/clang.c:320:6: error: incompatible integer to pointer conversion assigning to 'struct gcov_info *' from 'int' [-Werror,-Wint-conversion] dup = kmemdup(info, sizeof(*dup), GFP_KERNEL); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/gcov/clang.c:325:18: error: implicit declaration of function 'kstrdup' [-Werror,-Wimplicit-function-declaration] dup->filename = kstrdup(info->filename, GFP_KERNEL); ^ kernel/gcov/clang.c:325:16: error: incompatible integer to pointer conversion assigning to 'const char *' from 'int' [-Werror,-Wint-conversion] dup->filename = kstrdup(info->filename, GFP_KERNEL); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8 errors generated. I resolved this with: diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c index 6ee385f6ad47..29f0899ba209 100644 --- a/kernel/gcov/clang.c +++ b/kernel/gcov/clang.c @@ -52,6 +52,7 @@ #include <linux/ratelimit.h> #include <linux/slab.h> #include <linux/mm.h> +#include <linux/string.h> #include "gcov.h" typedef void (*llvm_gcov_callback)(void); 5. BPF errors With Arch Linux's config (https://github.com/archlinux/svntogit-packages/raw/packages/linux/trunk/config), I see the following errors: kernel/bpf/preload/iterators/iterators.c:3:10: fatal error: 'linux/sched/signal.h' file not found #include <linux/sched/signal.h> ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. kernel/bpf/sysfs_btf.c:21:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration] memcpy(buf, __start_BTF + off, len); ^ kernel/bpf/sysfs_btf.c:21:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy' 1 error generated. The second error is obviously fixed by just including string.h as above. I am not sure what is wrong with the first one; the includes all appear to be userland headers, rather than kernel ones, so maybe an -I flag is not present that should be? To work around it, I disabled CONFIG_BPF_PRELOAD. 6. resolve_btfids warning After working around the above errors, with either GCC or clang, I see the following warnings with Arch Linux's configuration: WARN: multiple IDs found for 'task_struct': 103, 23549 - using 103 WARN: multiple IDs found for 'path': 1166, 23551 - using 1166 WARN: multiple IDs found for 'inode': 997, 23561 - using 997 WARN: multiple IDs found for 'file': 714, 23566 - using 714 WARN: multiple IDs found for 'seq_file': 1120, 23673 - using 1120 Which appears to come from symbols_resolve() in tools/bpf/resolve_btfids/main.c. ######################################################################## I am very excited to see where this goes, it is a herculean effort but I think it will be worth it in the long run. Let me know if there is any more information or input that I can provide, cheers! Nathan ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor @ 2022-01-04 10:47 ` Ingo Molnar 2022-01-04 10:56 ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar ` (5 more replies) 0 siblings, 6 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 10:47 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > Hi Ingo, > > On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > Before going into details about how this tree solves 'dependency hell' > > exactly, here's the current kernel build performance gain with > > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as > > well - see below), using a stock x86 Linux distribution's .config with all > > modules built into the vmlinux: > > > > # > > # Performance counter stats for 'make -j96 vmlinux' (3 runs): > > # > > # (Elapsed time in seconds): > > # > > > > v5.16-rc7: 231.34 +- 0.60 secs, 15.5 builds/hour # [ vanilla baseline ] > > -fast-headers-v1: 129.97 +- 0.51 secs, 27.7 builds/hour # +78.0% improvement > > This is really impressive; as someone who constantly builds large > kernels for test coverage, I am excited about less time to get results. > Testing on an 80-core arm64 server (the fastest machine I have access to > at the moment) with LLVM, I can see anywhere from 18% to 35% improvement. > > > Benchmark 1: ARCH=arm64 defconfig (linux) > Time (mean ± σ): 97.159 s ± 0.246 s [User: 4828.383 s, System: 611.256 s] > Range (min … max): 96.900 s … 97.648 s 10 runs > > Benchmark 2: ARCH=arm64 defconfig (linux-fast-headers) > Time (mean ± σ): 76.300 s ± 0.107 s [User: 3149.986 s, System: 436.487 s] > Range (min … max): 76.117 s … 76.467 s 10 runs That looks good, thanks for giving it a test, and thanks for all the fixes! :-) Note that on ARM64 the elapsed time improvement is 'only' 18-35%, because the triple-linking of vmlinux serializes much of the of a build & ARM64 doesn't have the kallsyms-objtool feature yet. But we can already see how much faster it became, from the user+system time spent building the kernel: vanilla: 4828.383 s + 611.256 s = 5439.639 s -fast-headers-v1: 3149.986 s + 436.487 s = 3586.473 s That's a +51% speedup. :-) With CONFIG_KALLSYMS_FAST=y on x86, the final link gets faster by about 60%-70%, so the header improvements will more directly show up in elapsed time as well. Plus I spent more time looking at x86 header bloat than at ARM64 header bloat. In the end I think the improvement could probably moved into the broad 60-70% range that I see on x86. All the other ARM64 tests show a 37%-43% improvement in CPU time used: > Benchmark 1: ARCH=arm64 allmodconfig (linux) > Time (mean ± σ): 390.106 s ± 0.192 s [User: 23893.382 s, System: 2802.413 s] > Range (min … max): 389.942 s … 390.513 s 7 runs > > Benchmark 2: ARCH=arm64 allmodconfig (linux-fast-headers) > Time (mean ± σ): 288.066 s ± 0.621 s [User: 16436.098 s, System: 2117.352 s] > Range (min … max): 287.131 s … 288.982 s 7 runs # (23893.382+2802.413)/(16436.098+2117.352) = +43% in throughput. > Benchmark 1: ARCH=arm64 allyesconfig (linux) > Time (mean ± σ): 557.752 s ± 1.019 s [User: 21227.404 s, System: 2226.121 s] > Range (min … max): 555.833 s … 558.775 s 7 runs > > Benchmark 2: ARCH=arm64 allyesconfig (linux-fast-headers) > Time (mean ± σ): 473.815 s ± 1.793 s [User: 15351.991 s, System: 1689.630 s] > Range (min … max): 471.542 s … 476.830 s 7 runs # (21227.404+2226.121)/(15351.991+1689.630) = +37% > Benchmark 1: ARCH=x86_64 defconfig (linux) > Time (mean ± σ): 41.122 s ± 0.190 s [User: 1700.206 s, System: 205.555 s] > Range (min … max): 40.966 s … 41.515 s 7 runs > > Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) > Time (mean ± σ): 36.357 s ± 0.183 s [User: 1134.252 s, System: 152.396 s] > Range (min … max): 35.983 s … 36.534 s 7 runs # (1700.206+205.555)/(1134.252+152.396) = +48% > Summary > 'ARCH=x86_64 defconfig (linux-fast-headers)' ran > 1.13 ± 0.01 times faster than 'ARCH=x86_64 defconfig (linux)' Now this x86-defconfig result you got is a bit weird - it *should* have been around ~50% faster on x86 in terms of elapsed time too. Here's how x86-64 defconfig looks like on my system - with 128 GB RAM & fast NVDIMMs and 64 CPUs: # # -v5.16-rc8: # $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null Performance counter stats for 'make -j96 vmlinux' (3 runs): 4,906,953,379,372 instructions # 0.90 insn per cycle ( +- 0.00% ) 5,475,163,448,391 cycles # 3.898 GHz ( +- 0.01% ) 1,404,614.64 msec cpu-clock # 45.864 CPUs utilized ( +- 0.01% ) 30.6258 +- 0.0337 seconds time elapsed ( +- 0.11% ) # # -fast-headers-v1: # $ make defconfig $ grep KALLSYMS_FAST .config CONFIG_KALLSYMS_FAST=y $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null Performance counter stats for 'make -j96 vmlinux' (3 runs): 3,500,079,269,120 instructions # 0.90 insn per cycle ( +- 0.00% ) 3,872,081,278,824 cycles # 3.895 GHz ( +- 0.10% ) 993,448.13 msec cpu-clock # 47.306 CPUs utilized ( +- 0.10% ) 21.0004 +- 0.0265 seconds time elapsed ( +- 0.13% ) That's a +45.8% speedup in elapsed time, and a +41.4% improvement in cpu-clock utilization. I'm wondering whether your system has some sort of bottleneck? One thing I do though when running benchmarks is to switch the cpufreq governor to 'performance', via something like: NR_CPUS=$(nproc --all) curr=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor) next=performance echo "# setting all $NR_CPUS CPUs from '"$curr"' to the '"$next"' governor" for ((cpu=0; cpu<$NR_CPUS; cpu++)); do G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor [ -f $G ] && echo $next > $G done This minimizes the amount of noise across iterations and makes the results more dependable: 30.6258 +- 0.0337 seconds time elapsed ( +- 0.11% ) 21.0004 +- 0.0265 seconds time elapsed ( +- 0.13% ) > > With the fast-headers kernel that's down to ~36,000 lines of code, > > almost a factor of 3 reduction: > > > > # fast-headers-v1: > > kepler:~/mingo.tip.git> wc -l kernel/pid.i > > 35941 kernel/pid.i > > Coming from someone who often has to reduce a preprocessed kernel source > file with creduce/cvise to report compiler bugs, this will be a very > welcomed change, as those tools will have to do less work, and I can get > my reports done faster. That's nice, didn't think of that side effect. Could you perhaps measure this too, to see how much of a benefit it is? > ######################################################################## > > I took the series for a spin with clang and GCC on arm64 and x86_64 and > I found a few warnings/errors. Thank you! > 1. Position of certain attributes > > In some commits, you move the cacheline_aligned attributes from after > the closing brace on structures to before the struct keyword, which > causes clang to warn (and error with CONFIG_WERROR): > > In file included from arch/arm64/kernel/asm-offsets.c:9: > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33: > In file included from ./include/linux/perf_event_api.h:17: > In file included from ./include/linux/perf_event_types.h:41: > In file included from ./include/linux/ftrace.h:18: > In file included from ./arch/arm64/include/asm/ftrace.h:53: > In file included from ./include/linux/compat.h:11: > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] > ____cacheline_aligned > ^ > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) Yeah, so this is a *really* stupid warning from Clang. Putting the attribute after 'struct' risks the hard to track down bugs when a <linux/cache.h> inclusion is missing, which scenario I pointed out in this commit: headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, which caused a couple of hundred of mysterious, somewhat obscure link time errors: ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here After a bit of head-scratching, what happened is that 'struct dentry_operations' has the ____cacheline_aligned attribute at the tail of the type definition - which turned into a local variable definition when <linux/cache.h> was not included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. There were no compile time errors, only link time errors. Move the attribute to the head of the definition, in which case a missing <linux/cache.h> inclusion creates an immediate build failure: In file included from ./include/linux/fs.h:9, from ./include/linux/fsverity.h:14, from fs/verity/fsverity_private.h:18, from fs/verity/read_metadata.c:8: ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ 132 | ____cacheline_aligned | ^ | ; 133 | struct dentry_operations { | ~~~~~~ No change in functionality. Signed-off-by: Ingo Molnar <mingo@kernel.org> Can this Clang warning be disabled? > 2. Error with CONFIG_SHADOW_CALL_STACK So this feature depends on Clang: # Supported by clang >= 7.0 config CC_HAVE_SHADOW_CALL_STACK def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) No way to activate it under my GCC cross-build toolchain, right? But ... I hacked the build mode on with GCC using this patch: From: Ingo Molnar <mingo@kernel.org> Date: Tue, 4 Jan 2022 11:26:09 +0100 Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org> --- Makefile | 2 +- arch/Kconfig | 2 +- arch/arm64/Kconfig | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 16d7f83ac368..bbab462e7509 100644 --- a/Makefile +++ b/Makefile @@ -888,7 +888,7 @@ LDFLAGS_vmlinux += --gc-sections endif ifdef CONFIG_SHADOW_CALL_STACK -CC_FLAGS_SCS := -fsanitize=shadow-call-stack +CC_FLAGS_SCS := KBUILD_CFLAGS += $(CC_FLAGS_SCS) export CC_FLAGS_SCS endif diff --git a/arch/Kconfig b/arch/Kconfig index 4e56f66fdbcf..2103d9da4fe1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -605,7 +605,7 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK config SHADOW_CALL_STACK bool "Clang Shadow Call Stack" - depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER help This option enables Clang's Shadow Call Stack, which uses a diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17..952f3e56e0a7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1183,7 +1183,7 @@ config ARCH_HAS_FILTER_PGPROT # Supported by clang >= 7.0 config CC_HAVE_SHADOW_CALL_STACK - def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) + def_bool y config PARAVIRT bool "Enable paravirtualization code" And was able to trigger at least some of the build errors you saw: In file included from kernel/scs.c:15: ./include/linux/scs.h: In function 'scs_task_reset': ./include/linux/scs.h:26:34: error: implicit declaration of function 'task_thread_info' [-Werror=implicit-function-declaration] This is fixed with: diff --git a/kernel/scs.c b/kernel/scs.c index ca9e707049cb..719ab53adc8a 100644 --- a/kernel/scs.c +++ b/kernel/scs.c @@ -5,6 +5,7 @@ * Copyright (C) 2019 Google LLC */ +#include <linux/sched/thread_info_api.h> #include <linux/sched.h> #include <linux/mm_page_address.h> #include <linux/mm_api.h> Then there's the build failure in init/main.c: > It looks like on mainline, init_shadow_call_stack is in defined and used > in init/init_task.c but now, it is used in init/main.c, with no > declaration to allow the compiler to find the definition. I guess moving > init_shadow_call_stack out of init/init_task.c to somewhere more common > would fix this but it depends on SCS_SIZE, which is defined in > include/linux/scs.h, and as soon as I tried to include that in another > file, the build broke further... Any ideas you have would be appreciated > :) for benchmarking purposes, I just disabled CONFIG_SHADOW_CALL_STACK. So I see: In file included from ./include/linux/thread_info.h:63, from ./arch/arm64/include/asm/smp.h:32, from ./include/linux/smp_api.h:15, from ./include/linux/percpu.h:6, from ./include/linux/softirq.h:8, from init/main.c:17: init/main.c: In function 'init_per_task_early': ./arch/arm64/include/asm/thread_info.h:113:27: error: 'init_shadow_call_stack' undeclared (first use in this function) 113 | .scs_base = init_shadow_call_stack, \ | ^~~~~~~~~~~~~~~~~~~~~~ This looks pretty straightforward, does this patch solve it? include/linux/scs.h | 3 +++ init/main.c | 1 + 2 files changed, 4 insertions(+) diff --git a/include/linux/scs.h b/include/linux/scs.h index 18122d9e17ff..863932a9347a 100644 --- a/include/linux/scs.h +++ b/include/linux/scs.h @@ -8,6 +8,7 @@ #ifndef _LINUX_SCS_H #define _LINUX_SCS_H +#include <linux/sched/thread_info_api.h> #include <linux/gfp.h> #include <linux/poison.h> #include <linux/sched.h> @@ -25,6 +26,8 @@ #define task_scs(tsk) (task_thread_info(tsk)->scs_base) #define task_scs_sp(tsk) (task_thread_info(tsk)->scs_sp) +extern unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)]; + void *scs_alloc(int node); void scs_free(void *s); void scs_init(void); diff --git a/init/main.c b/init/main.c index c9eb3ecbe18c..74ccad445009 100644 --- a/init/main.c +++ b/init/main.c @@ -12,6 +12,7 @@ #define DEBUG /* Enable initcall_debug */ +#include <linux/scs.h> #include <linux/workqueue_api.h> #include <linux/sysctl.h> #include <linux/softirq.h> I've applied these fixes, with that CONFIG_SHADOW_CALL_STACK=y builds fine on ARM64 - but I performed no runtime testing. I've backmerged this into: headers/deps: per_task, arm64, x86: Convert task_struct::thread to a per_task() field where this bug originated from. I.e. I think the bug was simply to make main.c aware of the array, now that the INIT_THREAD initialization is done there. We could move over the init_shadow_call_stack[] array there and make it static to begin with? I don't think anything truly relies on it being a global symbol. > 3. Nested function in arch/x86/kernel/asm-offsets.c > diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c > index ff3f8ed5d0a2..a6d56f4697cd 100644 > --- a/arch/x86/kernel/asm-offsets.c > +++ b/arch/x86/kernel/asm-offsets.c > @@ -35,10 +35,10 @@ > # include "asm-offsets_64.c" > #endif > > -static void __used common(void) > -{ > #include "../../../kernel/sched/per_task_area_struct_defs.h" > > +static void __used common(void) > +{ > BLANK(); > DEFINE(TASK_threadsp, offsetof(struct task_struct, per_task_area) + > offsetof(struct task_struct_per_task, thread) + Ha, that code is bogus, it's a merge bug of mine. Super interesting that GCC still managed to include the header ... I've applied your fix. > 4. Build error in kernel/gcov/clang.c > 8 errors generated. > > I resolved this with: > > diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c > index 6ee385f6ad47..29f0899ba209 100644 > --- a/kernel/gcov/clang.c > +++ b/kernel/gcov/clang.c > @@ -52,6 +52,7 @@ > #include <linux/ratelimit.h> > #include <linux/slab.h> > #include <linux/mm.h> > +#include <linux/string.h> > #include "gcov.h" Thank you - applied! > typedef void (*llvm_gcov_callback)(void); > > > 5. BPF errors > > With Arch Linux's config (https://github.com/archlinux/svntogit-packages/raw/packages/linux/trunk/config), > I see the following errors: > > kernel/bpf/preload/iterators/iterators.c:3:10: fatal error: 'linux/sched/signal.h' file not found > #include <linux/sched/signal.h> > ^~~~~~~~~~~~~~~~~~~~~~ > 1 error generated. > > kernel/bpf/sysfs_btf.c:21:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration] > memcpy(buf, __start_BTF + off, len); > ^ > kernel/bpf/sysfs_btf.c:21:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy' > 1 error generated. > > The second error is obviously fixed by just including string.h as above. Applied. > I am not sure what is wrong with the first one; the includes all appear > to be userland headers, rather than kernel ones, so maybe an -I flag is > not present that should be? To work around it, I disabled > CONFIG_BPF_PRELOAD. Yeah, this should be fixed by simply removing the two stray dependencies that found their way into this user-space code: kernel/bpf/preload/iterators/iterators.bpf.c | 1 - kernel/bpf/preload/iterators/iterators.c | 1 - 2 files changed, 2 deletions(-) diff --git a/kernel/bpf/preload/iterators/iterators.bpf.c b/kernel/bpf/preload/iterators/iterators.bpf.c index 41ae00edeecf..03af863314ea 100644 --- a/kernel/bpf/preload/iterators/iterators.bpf.c +++ b/kernel/bpf/preload/iterators/iterators.bpf.c @@ -1,6 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2020 Facebook */ -#include <linux/seq_file.h> #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_core_read.h> diff --git a/kernel/bpf/preload/iterators/iterators.c b/kernel/bpf/preload/iterators/iterators.c index d702cbf7ddaf..5d872a705470 100644 --- a/kernel/bpf/preload/iterators/iterators.c +++ b/kernel/bpf/preload/iterators/iterators.c @@ -1,6 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2020 Facebook */ -#include <linux/sched/signal.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> > 6. resolve_btfids warning > > After working around the above errors, with either GCC or clang, I see > the following warnings with Arch Linux's configuration: > > WARN: multiple IDs found for 'task_struct': 103, 23549 - using 103 > WARN: multiple IDs found for 'path': 1166, 23551 - using 1166 > WARN: multiple IDs found for 'inode': 997, 23561 - using 997 > WARN: multiple IDs found for 'file': 714, 23566 - using 714 > WARN: multiple IDs found for 'seq_file': 1120, 23673 - using 1120 > > Which appears to come from symbols_resolve() in > tools/bpf/resolve_btfids/main.c. Hm, is this perhaps related to CONFIG_KALLSYMS_FAST=y? If yes then turning it off might help. I don't really know this area of BPF all that much, maybe someone else can see what the problem is? The error message is not self-explanatory. > > ######################################################################## > > I am very excited to see where this goes, it is a herculean effort but I > think it will be worth it in the long run. Let me know if there is any > more information or input that I can provide, cheers! Your testing & patch sending efforts are much appreciated!! You'd help me most by continuing on the same path with new fast-headers releases as well, whenever you find the time. :-) BTW., you can always pick up my latest Work-In-Progress branch from: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers The 'master' branch will carry the release. The sched/headers branch is already rebased to -rc8 and has some other changes as well. It should normally work, with less testing than the main releasees, but will at times have fixes at the tail waiting to be backmerged in a bisect-friendly way. Thanks, Ingo ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing 2022-01-04 10:47 ` Ingo Molnar @ 2022-01-04 10:56 ` Ingo Molnar 2022-01-04 11:02 ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar ` (4 subsequent siblings) 5 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 10:56 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Ingo Molnar <mingo@kernel.org> wrote: > > 2. Error with CONFIG_SHADOW_CALL_STACK > > So this feature depends on Clang: > > # Supported by clang >= 7.0 > config CC_HAVE_SHADOW_CALL_STACK > def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) > > No way to activate it under my GCC cross-build toolchain, right? > > But ... I hacked the build mode on with GCC using this patch: > > From: Ingo Molnar <mingo@kernel.org> > Date: Tue, 4 Jan 2022 11:26:09 +0100 > Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing > > NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org> Ok, I've attached patch again instead embedding it in the middle of a long discussion, for future reference. Thanks, Ingo =====================> From: Ingo Molnar <mingo@kernel.org> Date: Tue, 4 Jan 2022 11:26:09 +0100 Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org> --- Makefile | 2 +- arch/Kconfig | 2 +- arch/arm64/Kconfig | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 16d7f83ac368..bbab462e7509 100644 --- a/Makefile +++ b/Makefile @@ -888,7 +888,7 @@ LDFLAGS_vmlinux += --gc-sections endif ifdef CONFIG_SHADOW_CALL_STACK -CC_FLAGS_SCS := -fsanitize=shadow-call-stack +CC_FLAGS_SCS := KBUILD_CFLAGS += $(CC_FLAGS_SCS) export CC_FLAGS_SCS endif diff --git a/arch/Kconfig b/arch/Kconfig index 4e56f66fdbcf..2103d9da4fe1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -605,7 +605,7 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK config SHADOW_CALL_STACK bool "Clang Shadow Call Stack" - depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER help This option enables Clang's Shadow Call Stack, which uses a diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17..952f3e56e0a7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1183,7 +1183,7 @@ config ARCH_HAS_FILTER_PGPROT # Supported by clang >= 7.0 config CC_HAVE_SHADOW_CALL_STACK - def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) + def_bool y config PARAVIRT bool "Enable paravirtualization code" ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition 2022-01-04 10:47 ` Ingo Molnar 2022-01-04 10:56 ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar @ 2022-01-04 11:02 ` Ingo Molnar 2022-01-04 15:05 ` kernel test robot 2022-01-04 17:51 ` Nathan Chancellor 2022-01-04 11:19 ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar ` (3 subsequent siblings) 5 siblings, 2 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 11:02 UTC (permalink / raw) To: Nathan Chancellor, Al Viro, Linus Torvalds, Andrew Morton Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Ingo Molnar <mingo@kernel.org> wrote: > > 1. Position of certain attributes > > > > In some commits, you move the cacheline_aligned attributes from after > > the closing brace on structures to before the struct keyword, which > > causes clang to warn (and error with CONFIG_WERROR): > > > > In file included from arch/arm64/kernel/asm-offsets.c:9: > > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33: > > In file included from ./include/linux/perf_event_api.h:17: > > In file included from ./include/linux/perf_event_types.h:41: > > In file included from ./include/linux/ftrace.h:18: > > In file included from ./arch/arm64/include/asm/ftrace.h:53: > > In file included from ./include/linux/compat.h:11: > > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] > > ____cacheline_aligned > > ^ > > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' > > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) > > Yeah, so this is a *really* stupid warning from Clang. > > Putting the attribute after 'struct' risks the hard to track down bugs when > a <linux/cache.h> inclusion is missing, which scenario I pointed out in > this commit: > > headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition > > When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, > which caused a couple of hundred of mysterious, somewhat obscure link time errors: > > ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > > After a bit of head-scratching, what happened is that 'struct dentry_operations' > has the ____cacheline_aligned attribute at the tail of the type definition - > which turned into a local variable definition when <linux/cache.h> was not > included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. > > There were no compile time errors, only link time errors. > > Move the attribute to the head of the definition, in which case > a missing <linux/cache.h> inclusion creates an immediate build failure: > > In file included from ./include/linux/fs.h:9, > from ./include/linux/fsverity.h:14, > from fs/verity/fsverity_private.h:18, > from fs/verity/read_metadata.c:8: > ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ > 132 | ____cacheline_aligned > | ^ > | ; > 133 | struct dentry_operations { > | ~~~~~~ > > No change in functionality. > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > Can this Clang warning be disabled? Ok, broke out this issue into its own thread, in form of a patch submission - so that others don't have to wade through a massive tree to find a single commit ... I'll of course drop these (non-essential) cleanups if the upstream policy is to follow Clang's quirk/convention, but I find the forced attribute tail-position a sad misfeature, due to the reasons outlined in this patch: a straightforward build failure in case an attribute is not defined is far preferable to spurious creation of variables with link-time warnings that don't actually highlight the exact nature of the bug ... Thanks, Ingo =====================> Date: Sun, 20 Jun 2021 09:41:45 +0200 Subject: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, which caused a couple of hundred of mysterious, somewhat obscure link time errors: ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here After a bit of head-scratching, what happened is that 'struct dentry_operations' has the ____cacheline_aligned attribute at the tail of the type definition - which turned into a local variable definition when <linux/cache.h> was not included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. There were no compile time errors, only link time errors. Move the attribute to the head of the definition, in which case a missing <linux/cache.h> inclusion creates an immediate build failure: In file included from ./include/linux/fs.h:9, from ./include/linux/fsverity.h:14, from fs/verity/fsverity_private.h:18, from fs/verity/read_metadata.c:8: ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ 132 | ____cacheline_aligned | ^ | ; 133 | struct dentry_operations { | ~~~~~~ No change in functionality. Signed-off-by: Ingo Molnar <mingo@kernel.org> --- include/linux/dcache.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 41062093ec9b..0482c3d6f1ce 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -129,6 +129,7 @@ enum dentry_d_lock_class DENTRY_D_LOCK_NESTED }; +____cacheline_aligned struct dentry_operations { int (*d_revalidate)(struct dentry *, unsigned int); int (*d_weak_revalidate)(struct dentry *, unsigned int); @@ -144,7 +145,7 @@ struct dentry_operations { struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, const struct inode *); -} ____cacheline_aligned; +}; /* * Locking rules for dentry_operations callbacks are to be found in ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition 2022-01-04 11:02 ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar @ 2022-01-04 15:05 ` kernel test robot 2022-01-04 17:51 ` Nathan Chancellor 1 sibling, 0 replies; 54+ messages in thread From: kernel test robot @ 2022-01-04 15:05 UTC (permalink / raw) To: Ingo Molnar, Nathan Chancellor, Al Viro, Linus Torvalds, Andrew Morton Cc: llvm, kbuild-all, LKML, Linux Memory Management List, linux-arch, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman Hi Ingo, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.16-rc8 next-20211224] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Ingo-Molnar/headers-deps-dcache-Move-the-____cacheline_aligned-attribute-to-the-head-of-the-definition/20220104-190351 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c9e6606c7fe92b50a02ce51dda82586ebdf99b48 config: arm64-buildonly-randconfig-r004-20220104 (https://download.01.org/0day-ci/archive/20220104/202201042231.vdt1cNrS-lkp@intel.com/config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b50fea47b6c454581fce89af359f3afe5154986c) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm64 cross compiling tool for clang build # apt-get install binutils-aarch64-linux-gnu # https://github.com/0day-ci/linux/commit/a9357af49d3cae2b1b4b8bbb7f1adf9ed381bf46 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Ingo-Molnar/headers-deps-dcache-Move-the-____cacheline_aligned-attribute-to-the-head-of-the-definition/20220104-190351 git checkout a9357af49d3cae2b1b4b8bbb7f1adf9ed381bf46 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/phy/amlogic/ lib/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>): In file included from arch/arm64/kernel/asm-offsets.c:10: In file included from include/linux/arm_sdei.h:8: In file included from include/acpi/ghes.h:5: In file included from include/acpi/apei.h:9: In file included from include/linux/acpi.h:15: In file included from include/linux/device.h:32: In file included from include/linux/device/driver.h:21: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ 1 warning generated. -- In file included from drivers/phy/amlogic/phy-meson-g12a-usb2.c:16: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ drivers/phy/amlogic/phy-meson-g12a-usb2.c:311:17: warning: cast to smaller integer type 'enum meson_soc_id' from 'const void *' [-Wvoid-pointer-to-enum-cast] priv->soc_id = (enum meson_soc_id)of_device_get_match_data(&pdev->dev); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 warnings generated. -- In file included from lib/radix-tree.c:15: In file included from include/linux/cpu.h:17: In file included from include/linux/node.h:18: In file included from include/linux/device.h:32: In file included from include/linux/device/driver.h:21: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/radix-tree.c:288:6: warning: no previous prototype for function 'radix_tree_node_rcu_free' [-Wmissing-prototypes] void radix_tree_node_rcu_free(struct rcu_head *head) ^ lib/radix-tree.c:288:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void radix_tree_node_rcu_free(struct rcu_head *head) ^ static 2 warnings generated. -- In file included from lib/test_bitops.c:9: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ 1 error generated. -- In file included from lib/test_ida.c:10: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/test_ida.c:16:6: warning: no previous prototype for function 'ida_dump' [-Wmissing-prototypes] void ida_dump(struct ida *ida) { } ^ lib/test_ida.c:16:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void ida_dump(struct ida *ida) { } ^ static 2 warnings generated. -- In file included from lib/test_printf.c:10: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/test_printf.c:157:52: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1); ~~~~ ^ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:157:55: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1); ~~~~ ^ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:157:58: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1); ~~~~ ^~~ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:157:63: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1); ~~~~ ^~~ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:157:68: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] test("0|1|1|128|255", "%hhu|%hhu|%hhu|%hhu|%hhu", 0, 1, 257, 128, -1); ~~~~ ^~ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:158:52: warning: format specifies type 'char' but the argument has type 'int' [-Wformat] test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1); ~~~~ ^ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:158:55: warning: format specifies type 'char' but the argument has type 'int' [-Wformat] test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1); ~~~~ ^ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:158:58: warning: format specifies type 'char' but the argument has type 'int' [-Wformat] test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1); ~~~~ ^~~ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:158:63: warning: format specifies type 'char' but the argument has type 'int' [-Wformat] test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1); ~~~~ ^~~ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:158:68: warning: format specifies type 'char' but the argument has type 'int' [-Wformat] test("0|1|1|-128|-1", "%hhd|%hhd|%hhd|%hhd|%hhd", 0, 1, 257, 128, -1); ~~~~ ^~ %d lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:159:41: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] test("2015122420151225", "%ho%ho%#ho", 1037, 5282, -11627); ~~~ ^~~~ %o lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:159:47: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] test("2015122420151225", "%ho%ho%#ho", 1037, 5282, -11627); ~~~ ^~~~ %o lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ lib/test_printf.c:159:53: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] test("2015122420151225", "%ho%ho%#ho", 1037, 5282, -11627); ~~~~ ^~~~~~ %#o lib/test_printf.c:137:40: note: expanded from macro 'test' __test(expect, strlen(expect), fmt, ##__VA_ARGS__) ~~~ ^~~~~~~~~~~ 14 warnings generated. -- In file included from lib/crc32test.c:28: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/crc32test.c:674:13: warning: variable 'crc' set but not used [-Wunused-but-set-variable] static u32 crc; ^ lib/crc32test.c:754:13: warning: variable 'crc' set but not used [-Wunused-but-set-variable] static u32 crc; ^ 3 warnings generated. -- In file included from lib/test_rhashtable.c:17: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/test_rhashtable.c:451:18: warning: variable 'insert_retries' set but not used [-Wunused-but-set-variable] unsigned int i, insert_retries = 0; ^ 2 warnings generated. -- In file included from lib/devmem_is_allowed.c:11: In file included from include/linux/mm.h:717: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/devmem_is_allowed.c:20:5: warning: no previous prototype for function 'devmem_is_allowed' [-Wmissing-prototypes] int devmem_is_allowed(unsigned long pfn) ^ lib/devmem_is_allowed.c:20:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int devmem_is_allowed(unsigned long pfn) ^ static 2 warnings generated. -- In file included from lib/lz4/lz4_decompress.c:39: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ lib/lz4/lz4_decompress.c:506:5: warning: no previous prototype for function 'LZ4_decompress_safe_forceExtDict' [-Wmissing-prototypes] int LZ4_decompress_safe_forceExtDict(const char *source, char *dest, ^ lib/lz4/lz4_decompress.c:506:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int LZ4_decompress_safe_forceExtDict(const char *source, char *dest, ^ static 2 warnings generated. -- In file included from arch/arm64/kernel/asm-offsets.c:10: In file included from include/linux/arm_sdei.h:8: In file included from include/acpi/ghes.h:5: In file included from include/acpi/apei.h:9: In file included from include/linux/acpi.h:15: In file included from include/linux/device.h:32: In file included from include/linux/device/driver.h:21: In file included from include/linux/module.h:19: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:141: In file included from include/linux/fs.h:8: >> include/linux/dcache.h:137:1: warning: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] ____cacheline_aligned ^ include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) ^ 1 warning generated. arch/arm64/kernel/vdso/vgettimeofday.c:9:5: warning: no previous prototype for function '__kernel_clock_gettime' [-Wmissing-prototypes] int __kernel_clock_gettime(clockid_t clock, ^ arch/arm64/kernel/vdso/vgettimeofday.c:9:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int __kernel_clock_gettime(clockid_t clock, ^ static arch/arm64/kernel/vdso/vgettimeofday.c:15:5: warning: no previous prototype for function '__kernel_gettimeofday' [-Wmissing-prototypes] int __kernel_gettimeofday(struct __kernel_old_timeval *tv, ^ arch/arm64/kernel/vdso/vgettimeofday.c:15:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int __kernel_gettimeofday(struct __kernel_old_timeval *tv, ^ static arch/arm64/kernel/vdso/vgettimeofday.c:21:5: warning: no previous prototype for function '__kernel_clock_getres' [-Wmissing-prototypes] int __kernel_clock_getres(clockid_t clock_id, ^ arch/arm64/kernel/vdso/vgettimeofday.c:21:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int __kernel_clock_getres(clockid_t clock_id, ^ static 3 warnings generated. vim +137 include/linux/dcache.h 136 > 137 ____cacheline_aligned 138 struct dentry_operations { 139 int (*d_revalidate)(struct dentry *, unsigned int); 140 int (*d_weak_revalidate)(struct dentry *, unsigned int); 141 int (*d_hash)(const struct dentry *, struct qstr *); 142 int (*d_compare)(const struct dentry *, 143 unsigned int, const char *, const struct qstr *); 144 int (*d_delete)(const struct dentry *); 145 int (*d_init)(struct dentry *); 146 void (*d_release)(struct dentry *); 147 void (*d_prune)(struct dentry *); 148 void (*d_iput)(struct dentry *, struct inode *); 149 char *(*d_dname)(struct dentry *, char *, int); 150 struct vfsmount *(*d_automount)(struct path *); 151 int (*d_manage)(const struct path *, bool); 152 struct dentry *(*d_real)(struct dentry *, const struct inode *); 153 }; 154 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition 2022-01-04 11:02 ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar 2022-01-04 15:05 ` kernel test robot @ 2022-01-04 17:51 ` Nathan Chancellor 2022-01-05 0:20 ` Ingo Molnar 1 sibling, 1 reply; 54+ messages in thread From: Nathan Chancellor @ 2022-01-04 17:51 UTC (permalink / raw) To: Ingo Molnar Cc: Al Viro, Linus Torvalds, Andrew Morton, linux-kernel, linux-arch, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, llvm On Tue, Jan 04, 2022 at 12:02:34PM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@kernel.org> wrote: > > > > 1. Position of certain attributes > > > > > > In some commits, you move the cacheline_aligned attributes from after > > > the closing brace on structures to before the struct keyword, which > > > causes clang to warn (and error with CONFIG_WERROR): > > > > > > In file included from arch/arm64/kernel/asm-offsets.c:9: > > > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33: > > > In file included from ./include/linux/perf_event_api.h:17: > > > In file included from ./include/linux/perf_event_types.h:41: > > > In file included from ./include/linux/ftrace.h:18: > > > In file included from ./arch/arm64/include/asm/ftrace.h:53: > > > In file included from ./include/linux/compat.h:11: > > > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] > > > ____cacheline_aligned > > > ^ > > > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' > > > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) > > > > Yeah, so this is a *really* stupid warning from Clang. > > > > Putting the attribute after 'struct' risks the hard to track down bugs when > > a <linux/cache.h> inclusion is missing, which scenario I pointed out in > > this commit: > > > > headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition > > > > When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, > > which caused a couple of hundred of mysterious, somewhat obscure link time errors: > > > > ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > > ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > > ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > > ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > > > > After a bit of head-scratching, what happened is that 'struct dentry_operations' > > has the ____cacheline_aligned attribute at the tail of the type definition - > > which turned into a local variable definition when <linux/cache.h> was not > > included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. > > > > There were no compile time errors, only link time errors. > > > > Move the attribute to the head of the definition, in which case > > a missing <linux/cache.h> inclusion creates an immediate build failure: > > > > In file included from ./include/linux/fs.h:9, > > from ./include/linux/fsverity.h:14, > > from fs/verity/fsverity_private.h:18, > > from fs/verity/read_metadata.c:8: > > ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ > > 132 | ____cacheline_aligned > > | ^ > > | ; > > 133 | struct dentry_operations { > > | ~~~~~~ > > > > No change in functionality. > > > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > > > Can this Clang warning be disabled? > > Ok, broke out this issue into its own thread, in form of a patch submission > - so that others don't have to wade through a massive tree to find a single > commit ... > > I'll of course drop these (non-essential) cleanups if the upstream policy > is to follow Clang's quirk/convention, but I find the forced attribute > tail-position a sad misfeature, due to the reasons outlined in this patch: > a straightforward build failure in case an attribute is not defined is far > preferable to spurious creation of variables with link-time warnings that > don't actually highlight the exact nature of the bug ... I don't disagree with that sentiment. However, I went and looked at GCC's documentation, which seems to agree with clang's warning here. https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html "You may specify type attributes in an enum, struct or union type declaration or definition by placing them immediately after the struct, union or enum keyword. You can also place them just past the closing curly brace of the definition, but this is less preferred because logically the type should be fully defined at the closing brace." Nowhere does it mention that it accepts the attribute before the type keyword and neither compiler respects the attribute if it comes before the keyword but at least clang warns: https://godbolt.org/z/E9fTecKPv $ cat test.c #include <stdio.h> struct foo { int a; int b; }; struct __attribute__ ((aligned (64))) bar { int a; int b; }; __attribute__ ((aligned (64))) struct baz { int a; int b; }; int main(void) { printf("struct foo alignment: %zd\n", _Alignof(struct foo)); printf("struct bar alignment: %zd\n", _Alignof(struct bar)); printf("struct baz alignment: %zd\n", _Alignof(struct baz)); return 0; } $ gcc --version | head -1 gcc (GCC) 11.2.1 20211231 $ gcc -std=gnu89 -Wall -Wextra test.c; and ./a.out struct foo alignment: 4 struct bar alignment: 64 struct baz alignment: 4 $ clang --version | head -1 clang version 13.0.0 $ clang -std=gnu89 -Wall -Wextra test.c; and ./a.out test.c:13:17: warning: attribute 'aligned' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes] __attribute__ ((aligned (64))) struct baz { ^ 1 warning generated. struct foo alignment: 4 struct bar alignment: 64 struct baz alignment: 4 Cheers, Nathan > =====================> > Date: Sun, 20 Jun 2021 09:41:45 +0200 > Subject: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition > > When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, > which caused a couple of hundred of mysterious, somewhat obscure link time errors: > > ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > > After a bit of head-scratching, what happened is that 'struct dentry_operations' > has the ____cacheline_aligned attribute at the tail of the type definition - > which turned into a local variable definition when <linux/cache.h> was not > included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. > > There were no compile time errors, only link time errors. > > Move the attribute to the head of the definition, in which case > a missing <linux/cache.h> inclusion creates an immediate build failure: > > In file included from ./include/linux/fs.h:9, > from ./include/linux/fsverity.h:14, > from fs/verity/fsverity_private.h:18, > from fs/verity/read_metadata.c:8: > ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ > 132 | ____cacheline_aligned > | ^ > | ; > 133 | struct dentry_operations { > | ~~~~~~ > > No change in functionality. > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > --- > include/linux/dcache.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index 41062093ec9b..0482c3d6f1ce 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -129,6 +129,7 @@ enum dentry_d_lock_class > DENTRY_D_LOCK_NESTED > }; > > +____cacheline_aligned > struct dentry_operations { > int (*d_revalidate)(struct dentry *, unsigned int); > int (*d_weak_revalidate)(struct dentry *, unsigned int); > @@ -144,7 +145,7 @@ struct dentry_operations { > struct vfsmount *(*d_automount)(struct path *); > int (*d_manage)(const struct path *, bool); > struct dentry *(*d_real)(struct dentry *, const struct inode *); > -} ____cacheline_aligned; > +}; > > /* > * Locking rules for dentry_operations callbacks are to be found in ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition 2022-01-04 17:51 ` Nathan Chancellor @ 2022-01-05 0:20 ` Ingo Molnar 2022-01-05 0:26 ` [PATCH] headers/deps: Attribute placement fixes for Clang & GCC Ingo Molnar 0 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:20 UTC (permalink / raw) To: Nathan Chancellor Cc: Al Viro, Linus Torvalds, Andrew Morton, linux-kernel, linux-arch, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > Nowhere does it mention that it accepts the attribute before the type > keyword and neither compiler respects the attribute if it comes before > the keyword but at least clang warns: https://godbolt.org/z/E9fTecKPv > > $ cat test.c > #include <stdio.h> > > struct foo { > int a; > int b; > }; > > struct __attribute__ ((aligned (64))) bar { > int a; > int b; > }; > > __attribute__ ((aligned (64))) struct baz { > int a; > int b; > }; > > int main(void) > { > printf("struct foo alignment: %zd\n", _Alignof(struct foo)); > printf("struct bar alignment: %zd\n", _Alignof(struct bar)); > printf("struct baz alignment: %zd\n", _Alignof(struct baz)); > return 0; > } > > $ gcc --version | head -1 > gcc (GCC) 11.2.1 20211231 > > $ gcc -std=gnu89 -Wall -Wextra test.c; and ./a.out > struct foo alignment: 4 > struct bar alignment: 64 > struct baz alignment: 4 Ugh - so my changes there are outright buggy. I'm reverting all those attribute position changes as we speak ... I'm actually happy about this in a way, as it settles the issue nicely. :-) Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH] headers/deps: Attribute placement fixes for Clang & GCC 2022-01-05 0:20 ` Ingo Molnar @ 2022-01-05 0:26 ` Ingo Molnar 0 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:26 UTC (permalink / raw) To: Nathan Chancellor Cc: Al Viro, Linus Torvalds, Andrew Morton, linux-kernel, linux-arch, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, llvm * Ingo Molnar <mingo@kernel.org> wrote: > Ugh - so my changes there are outright buggy. > > I'm reverting all those attribute position changes as we speak ... > > I'm actually happy about this in a way, as it settles the issue nicely. > :-) And, by the way - by putting the attribute after the 'struct' keyword we get the best of the two worlds: accidentally non-defined attribute shortcuts will still result in a build error. Below is the fix - should be identical to yours (which was whitespace mangled). I'll backmerge these fixes to the originating commits & push out -v2 later today. Thanks, Ingo --- include/linux/dcache.h | 3 +-- include/linux/fs_types.h | 3 +-- include/linux/netdevice_api.h | 2 +- include/net/xdp_types.h | 2 +- 4 files changed, 4 insertions(+), 6 deletions(-) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 520daf638d06..da7e77a7cede 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -127,8 +127,7 @@ enum dentry_d_lock_class DENTRY_D_LOCK_NESTED }; -____cacheline_aligned -struct dentry_operations { +struct ____cacheline_aligned dentry_operations { int (*d_revalidate)(struct dentry *, unsigned int); int (*d_weak_revalidate)(struct dentry *, unsigned int); int (*d_hash)(const struct dentry *, struct qstr *); diff --git a/include/linux/fs_types.h b/include/linux/fs_types.h index b53aadafab1b..e2e1c0827183 100644 --- a/include/linux/fs_types.h +++ b/include/linux/fs_types.h @@ -994,8 +994,7 @@ struct file_operations { int (*fadvise)(struct file *, loff_t, loff_t, int); } __randomize_layout; -____cacheline_aligned -struct inode_operations { +struct ____cacheline_aligned inode_operations { struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *); int (*permission) (struct user_namespace *, struct inode *, int); diff --git a/include/linux/netdevice_api.h b/include/linux/netdevice_api.h index 4a8d7688e148..0e5e08dcbb2a 100644 --- a/include/linux/netdevice_api.h +++ b/include/linux/netdevice_api.h @@ -49,7 +49,7 @@ #endif /* This structure contains an instance of an RX queue. */ -____cacheline_aligned_in_smp struct netdev_rx_queue { +struct ____cacheline_aligned_in_smp netdev_rx_queue { struct xdp_rxq_info xdp_rxq; #ifdef CONFIG_RPS struct rps_map __rcu *rps_map; diff --git a/include/net/xdp_types.h b/include/net/xdp_types.h index 442028626b35..accc12372bca 100644 --- a/include/net/xdp_types.h +++ b/include/net/xdp_types.h @@ -56,7 +56,7 @@ struct xdp_mem_info { struct page_pool; /* perf critical, avoid false-sharing */ -____cacheline_aligned struct xdp_rxq_info { +struct ____cacheline_aligned xdp_rxq_info { struct net_device *dev; u32 queue_index; u32 reg_state; ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [TREE] "Fast Kernel Headers" Tree WIP/development branch 2022-01-04 10:47 ` Ingo Molnar 2022-01-04 10:56 ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar 2022-01-04 11:02 ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar @ 2022-01-04 11:19 ` Ingo Molnar 2022-01-04 17:25 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers ` (2 subsequent siblings) 5 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-04 11:19 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Ingo Molnar <mingo@kernel.org> wrote: > > ######################################################################## > > > > I am very excited to see where this goes, it is a herculean effort but > > I think it will be worth it in the long run. Let me know if there is > > any more information or input that I can provide, cheers! > > Your testing & patch sending efforts are much appreciated!! You'd help me > most by continuing on the same path with new fast-headers releases as > well, whenever you find the time. :-) > > BTW., you can always pick up my latest Work-In-Progress branch from: > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers > > The 'master' branch will carry the release. > > The sched/headers branch is already rebased to -rc8 and has some other > changes as well. It should normally work, with less testing than the main > releasees, but will at times have fixes at the tail waiting to be > backmerged in a bisect-friendly way. Ok, broke out the sched/headers WIP branch into a separate announcement, in case others want to test: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers Note that I sometimes will update the 'master' branch as well, without a standalone announcement, if there's some important fix or the previous version moved away too much. Also, where I backmerged your fixes to manual commits I credited you with: [ Fixes by Nathan Chancellor ] Fixed-by: Nathan Chancellor <nathan@kernel.org> The (rare) exception would be straight dependency additions such as the <linux/string.h> additions, which are auto-generated from scratch to keep it maintainable & reviewable - if that's fine with you. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 10:47 ` Ingo Molnar ` (2 preceding siblings ...) 2022-01-04 11:19 ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar @ 2022-01-04 17:25 ` Nick Desaulniers 2022-01-05 0:43 ` Ingo Molnar 2022-01-04 17:50 ` Nathan Chancellor 2022-01-07 0:29 ` Nathan Chancellor 5 siblings, 1 reply; 54+ messages in thread From: Nick Desaulniers @ 2022-01-04 17:25 UTC (permalink / raw) To: Ingo Molnar Cc: Nathan Chancellor, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm, ashimida, Arnd Bergmann On Tue, Jan 4, 2022 at 2:47 AM Ingo Molnar <mingo@kernel.org> wrote: > > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > Hi Ingo, > > > > On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > I took the series for a spin with clang and GCC on arm64 and x86_64 and > > I found a few warnings/errors. > > Thank you! > > > 1. Position of certain attributes > > > > In some commits, you move the cacheline_aligned attributes from after > > the closing brace on structures to before the struct keyword, which > > causes clang to warn (and error with CONFIG_WERROR): > > > > In file included from arch/arm64/kernel/asm-offsets.c:9: > > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33: > > In file included from ./include/linux/perf_event_api.h:17: > > In file included from ./include/linux/perf_event_types.h:41: > > In file included from ./include/linux/ftrace.h:18: > > In file included from ./arch/arm64/include/asm/ftrace.h:53: > > In file included from ./include/linux/compat.h:11: > > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] > > ____cacheline_aligned > > ^ > > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' > > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) > > Yeah, so this is a *really* stupid warning from Clang. > > Putting the attribute after 'struct' risks the hard to track down bugs when > a <linux/cache.h> inclusion is missing, which scenario I pointed out in > this commit: > > headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition > > When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, > which caused a couple of hundred of mysterious, somewhat obscure link time errors: > > ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > > After a bit of head-scratching, what happened is that 'struct dentry_operations' > has the ____cacheline_aligned attribute at the tail of the type definition - > which turned into a local variable definition when <linux/cache.h> was not > included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. > > There were no compile time errors, only link time errors. > > Move the attribute to the head of the definition, in which case > a missing <linux/cache.h> inclusion creates an immediate build failure: > > In file included from ./include/linux/fs.h:9, > from ./include/linux/fsverity.h:14, > from fs/verity/fsverity_private.h:18, > from fs/verity/read_metadata.c:8: > ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ > 132 | ____cacheline_aligned > | ^ > | ; > 133 | struct dentry_operations { > | ~~~~~~ > > No change in functionality. > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > Can this Clang warning be disabled? Clang is warning that the attribute will be ignored because of that positioning. If you disable the warning, code will probably stop working as intended. This warning has at least been helping us make the kernel coding style more consistent. This made me think of d5b421fe02827 ("docs: Explain the desired position of function attributes"), where we adding some text to Documentation/process/coding-style.rst about the positioning of __attribute__'s in function signatures, but I guess this case is data. We probably should add something to the coding style about attributes on data, too. The C standards body is also working on standardizing attributes; at the least I expect some of these things to be ironed out more soon. > > > 2. Error with CONFIG_SHADOW_CALL_STACK > > So this feature depends on Clang: > > # Supported by clang >= 7.0 > config CC_HAVE_SHADOW_CALL_STACK > def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) > > No way to activate it under my GCC cross-build toolchain, right? > > But ... I hacked the build mode on with GCC using this patch: Dan Li is working on a GCC patch. If you're up for building GCC from source: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html -- This is a really cool series Ingo. I'm sure Arnd has seen it by now, but Arnd has been thinking about this area a lot, too. I haven't but I have played with running "include what you use" on the kernel sources; Kconfig being the biggest impediment to that approach. To me, I'm most nervous about "backsliding;" let's say this work lands, at some point probably years in the future, I assume without any form of automation that we might find ourselves at a similar point of header dependencies getting all tangled again. What are your thoughts on where/how/what we could automate to try to help developers in the future keep their header dependencies simpler? (Sorry if this was already answered in the cover letter) It would be really useful if you were planning a talk at something like plumbers how you go about making these changes. I really hope once others understand your workflow that we might help with some form of automation. Nice work! -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 17:25 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers @ 2022-01-05 0:43 ` Ingo Molnar 0 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:43 UTC (permalink / raw) To: Nick Desaulniers Cc: Nathan Chancellor, Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm, ashimida, Arnd Bergmann * Nick Desaulniers <ndesaulniers@google.com> wrote: > > Can this Clang warning be disabled? > > Clang is warning that the attribute will be ignored because of that > positioning. If you disable the warning, code will probably stop working > as intended. This warning has at least been helping us make the kernel > coding style more consistent. Yeah, indeed, Clang is fully correct to warn here, and these changes in my tree are outright bugs (which bugs Clang found & reported :-). See the fixes below - by doing it this way the 'spurious link failure' problem when a header include is missing should be fixed as well. Thanks, Ingo --- include/linux/dcache.h | 3 +-- include/linux/fs_types.h | 3 +-- include/linux/netdevice_api.h | 2 +- include/net/xdp_types.h | 2 +- 4 files changed, 4 insertions(+), 6 deletions(-) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 520daf638d06..da7e77a7cede 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -127,8 +127,7 @@ enum dentry_d_lock_class DENTRY_D_LOCK_NESTED }; -____cacheline_aligned -struct dentry_operations { +struct ____cacheline_aligned dentry_operations { int (*d_revalidate)(struct dentry *, unsigned int); int (*d_weak_revalidate)(struct dentry *, unsigned int); int (*d_hash)(const struct dentry *, struct qstr *); diff --git a/include/linux/fs_types.h b/include/linux/fs_types.h index b53aadafab1b..e2e1c0827183 100644 --- a/include/linux/fs_types.h +++ b/include/linux/fs_types.h @@ -994,8 +994,7 @@ struct file_operations { int (*fadvise)(struct file *, loff_t, loff_t, int); } __randomize_layout; -____cacheline_aligned -struct inode_operations { +struct ____cacheline_aligned inode_operations { struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *); int (*permission) (struct user_namespace *, struct inode *, int); diff --git a/include/linux/netdevice_api.h b/include/linux/netdevice_api.h index 4a8d7688e148..0e5e08dcbb2a 100644 --- a/include/linux/netdevice_api.h +++ b/include/linux/netdevice_api.h @@ -49,7 +49,7 @@ #endif /* This structure contains an instance of an RX queue. */ -____cacheline_aligned_in_smp struct netdev_rx_queue { +struct ____cacheline_aligned_in_smp netdev_rx_queue { struct xdp_rxq_info xdp_rxq; #ifdef CONFIG_RPS struct rps_map __rcu *rps_map; diff --git a/include/net/xdp_types.h b/include/net/xdp_types.h index 442028626b35..accc12372bca 100644 --- a/include/net/xdp_types.h +++ b/include/net/xdp_types.h @@ -56,7 +56,7 @@ struct xdp_mem_info { struct page_pool; /* perf critical, avoid false-sharing */ -____cacheline_aligned struct xdp_rxq_info { +struct ____cacheline_aligned xdp_rxq_info { struct net_device *dev; u32 queue_index; u32 reg_state; ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 10:47 ` Ingo Molnar ` (3 preceding siblings ...) 2022-01-04 17:25 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers @ 2022-01-04 17:50 ` Nathan Chancellor 2022-01-05 0:35 ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar ` (2 more replies) 2022-01-07 0:29 ` Nathan Chancellor 5 siblings, 3 replies; 54+ messages in thread From: Nathan Chancellor @ 2022-01-04 17:50 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote: > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > Hi Ingo, > > > > On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > > Before going into details about how this tree solves 'dependency hell' > > > exactly, here's the current kernel build performance gain with > > > CONFIG_FAST_HEADERS=y enabled, (and with CONFIG_KALLSYMS_FAST=y enabled as > > > well - see below), using a stock x86 Linux distribution's .config with all > > > modules built into the vmlinux: > > > > > > # > > > # Performance counter stats for 'make -j96 vmlinux' (3 runs): > > > # > > > # (Elapsed time in seconds): > > > # > > > > > > v5.16-rc7: 231.34 +- 0.60 secs, 15.5 builds/hour # [ vanilla baseline ] > > > -fast-headers-v1: 129.97 +- 0.51 secs, 27.7 builds/hour # +78.0% improvement > > > > This is really impressive; as someone who constantly builds large > > kernels for test coverage, I am excited about less time to get results. > > Testing on an 80-core arm64 server (the fastest machine I have access to > > at the moment) with LLVM, I can see anywhere from 18% to 35% improvement. > > > > > > Benchmark 1: ARCH=arm64 defconfig (linux) > > Time (mean ± σ): 97.159 s ± 0.246 s [User: 4828.383 s, System: 611.256 s] > > Range (min … max): 96.900 s … 97.648 s 10 runs > > > > Benchmark 2: ARCH=arm64 defconfig (linux-fast-headers) > > Time (mean ± σ): 76.300 s ± 0.107 s [User: 3149.986 s, System: 436.487 s] > > Range (min … max): 76.117 s … 76.467 s 10 runs > > That looks good, thanks for giving it a test, and thanks for all the fixes! > :-) > > Note that on ARM64 the elapsed time improvement is 'only' 18-35%, because > the triple-linking of vmlinux serializes much of the of a build & ARM64 > doesn't have the kallsyms-objtool feature yet. > > But we can already see how much faster it became, from the user+system time > spent building the kernel: > > vanilla: 4828.383 s + 611.256 s = 5439.639 s > -fast-headers-v1: 3149.986 s + 436.487 s = 3586.473 s > > That's a +51% speedup. :-) D> > With CONFIG_KALLSYMS_FAST=y on x86, the final link gets faster by about > 60%-70%, so the header improvements will more directly show up in elapsed > time as well. > > Plus I spent more time looking at x86 header bloat than at ARM64 header > bloat. In the end I think the improvement could probably moved into the > broad 60-70% range that I see on x86. > > All the other ARM64 tests show a 37%-43% improvement in CPU time used: > > > Benchmark 1: ARCH=arm64 allmodconfig (linux) > > Time (mean ± σ): 390.106 s ± 0.192 s [User: 23893.382 s, System: 2802.413 s] > > Range (min … max): 389.942 s … 390.513 s 7 runs > > > > Benchmark 2: ARCH=arm64 allmodconfig (linux-fast-headers) > > Time (mean ± σ): 288.066 s ± 0.621 s [User: 16436.098 s, System: 2117.352 s] > > Range (min … max): 287.131 s … 288.982 s 7 runs > > # (23893.382+2802.413)/(16436.098+2117.352) = +43% in throughput. > > > > Benchmark 1: ARCH=arm64 allyesconfig (linux) > > Time (mean ± σ): 557.752 s ± 1.019 s [User: 21227.404 s, System: 2226.121 s] > > Range (min … max): 555.833 s … 558.775 s 7 runs > > > > Benchmark 2: ARCH=arm64 allyesconfig (linux-fast-headers) > > Time (mean ± σ): 473.815 s ± 1.793 s [User: 15351.991 s, System: 1689.630 s] > > Range (min … max): 471.542 s … 476.830 s 7 runs > > # (21227.404+2226.121)/(15351.991+1689.630) = +37% > > > > Benchmark 1: ARCH=x86_64 defconfig (linux) > > Time (mean ± σ): 41.122 s ± 0.190 s [User: 1700.206 s, System: 205.555 s] > > Range (min … max): 40.966 s … 41.515 s 7 runs > > > > Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) > > Time (mean ± σ): 36.357 s ± 0.183 s [User: 1134.252 s, System: 152.396 s] > > Range (min … max): 35.983 s … 36.534 s 7 runs > > > # (1700.206+205.555)/(1134.252+152.396) = +48% > > > Summary > > 'ARCH=x86_64 defconfig (linux-fast-headers)' ran > > 1.13 ± 0.01 times faster than 'ARCH=x86_64 defconfig (linux)' > > Now this x86-defconfig result you got is a bit weird - it *should* have > been around ~50% faster on x86 in terms of elapsed time too. > > Here's how x86-64 defconfig looks like on my system - with 128 GB RAM & > fast NVDIMMs and 64 CPUs: > > # > # -v5.16-rc8: > # > > $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null > > Performance counter stats for 'make -j96 vmlinux' (3 runs): > > 4,906,953,379,372 instructions # 0.90 insn per cycle ( +- 0.00% ) > 5,475,163,448,391 cycles # 3.898 GHz ( +- 0.01% ) > 1,404,614.64 msec cpu-clock # 45.864 CPUs utilized ( +- 0.01% ) > > 30.6258 +- 0.0337 seconds time elapsed ( +- 0.11% ) > > # > # -fast-headers-v1: > # > > $ make defconfig > $ grep KALLSYMS_FAST .config > CONFIG_KALLSYMS_FAST=y > > $ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "make clean >/dev/null" make -j96 vmlinux >/dev/null > > Performance counter stats for 'make -j96 vmlinux' (3 runs): > > 3,500,079,269,120 instructions # 0.90 insn per cycle ( +- 0.00% ) > 3,872,081,278,824 cycles # 3.895 GHz ( +- 0.10% ) > 993,448.13 msec cpu-clock # 47.306 CPUs utilized ( +- 0.10% ) > > 21.0004 +- 0.0265 seconds time elapsed ( +- 0.13% ) > > That's a +45.8% speedup in elapsed time, and a +41.4% improvement in > cpu-clock utilization. > > I'm wondering whether your system has some sort of bottleneck? Yes, it is entirely possible. That testing was done on Equinix's c3.large.arm server and I have noticed at times that single threaded tasks seems to take a little bit longer than on my x86_64 box. https://metal.equinix.com/product/servers/c3-large-arm/ The all{mod,yes}config tests on that box had a much more noticeable improvement, along the lines of what you were expecting: Benchmark 1: ARCH=x86_64 allmodconfig (linux) Time (mean ± σ): 387.575 s ± 0.288 s [User: 23916.296 s, System: 2814.850 s] Range (min … max): 387.252 s … 388.295 s 10 runs Benchmark 2: ARCH=x86_64 allmodconfig (linux-fast-headers) Time (mean ± σ): 255.934 s ± 0.972 s [User: 15130.494 s, System: 2095.091 s] Range (min … max): 254.655 s … 257.357 s 10 runs Summary 'ARCH=x86_64 allmodconfig (linux-fast-headers)' ran 1.51 ± 0.01 times faster than 'ARCH=x86_64 allmodconfig (linux)' # (23916.296+2814.850)/(15130.494+2095.091) = +55.18% Benchmark 1: ARCH=x86_64 allyesconfig (linux) Time (mean ± σ): 568.027 s ± 1.071 s [User: 21985.096 s, System: 2357.516 s] Range (min … max): 566.769 s … 569.801 s 10 runs Benchmark 2: ARCH=x86_64 allyesconfig (linux-fast-headers) Time (mean ± σ): 381.248 s ± 0.919 s [User: 14916.766 s, System: 1728.218 s] Range (min … max): 379.746 s … 382.852 s 10 runs Summary 'ARCH=x86_64 allyesconfig (linux-fast-headers)' ran 1.49 ± 0.00 times faster than 'ARCH=x86_64 allyesconfig (linux)' # (21985.096+2357.516)/(14916.766+1728.218) = +46.25% > One thing I do though when running benchmarks is to switch the cpufreq > governor to 'performance', via something like: > > NR_CPUS=$(nproc --all) > > curr=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor) > next=performance > > echo "# setting all $NR_CPUS CPUs from '"$curr"' to the '"$next"' governor" > > for ((cpu=0; cpu<$NR_CPUS; cpu++)); do > G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor > [ -f $G ] && echo $next > $G > done > > This minimizes the amount of noise across iterations and makes the results > more dependable: > > 30.6258 +- 0.0337 seconds time elapsed ( +- 0.11% ) > 21.0004 +- 0.0265 seconds time elapsed ( +- 0.13% ) Good point. With my main box (AMD EPYC 7502P), with the performance governor... GCC: Benchmark 1: ARCH=x86_64 defconfig (linux) Time (mean ± σ): 48.685 s ± 0.049 s [User: 1969.835 s, System: 204.166 s] Range (min … max): 48.620 s … 48.782 s 10 runs Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) Time (mean ± σ): 46.797 s ± 0.119 s [User: 1403.854 s, System: 154.336 s] Range (min … max): 46.620 s … 47.052 s 10 runs Summary 'ARCH=x86_64 defconfig (linux-fast-headers)' ran 1.04 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)' LLVM: Benchmark 1: ARCH=x86_64 defconfig (linux) Time (mean ± σ): 51.816 s ± 0.079 s [User: 2208.577 s, System: 200.410 s] Range (min … max): 51.671 s … 51.900 s 10 runs Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) Time (mean ± σ): 46.806 s ± 0.062 s [User: 1438.972 s, System: 154.846 s] Range (min … max): 46.696 s … 46.917 s 10 runs Summary 'ARCH=x86_64 defconfig (linux-fast-headers)' ran 1.11 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)' $ rg KALLSYMS .config 246:CONFIG_KALLSYMS=y 247:# CONFIG_KALLSYMS_ALL is not set 248:CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y 249:CONFIG_KALLSYMS_BASE_RELATIVE=y 250:CONFIG_KALLSYMS_FAST=y 706:CONFIG_HAVE_OBJTOOL_KALLSYMS=y It seems like everything is working right but maybe the build is so short that there just is not much time for the difference to be as apparent? > > > With the fast-headers kernel that's down to ~36,000 lines of code, > > > almost a factor of 3 reduction: > > > > > > # fast-headers-v1: > > > kepler:~/mingo.tip.git> wc -l kernel/pid.i > > > 35941 kernel/pid.i > > > > Coming from someone who often has to reduce a preprocessed kernel source > > file with creduce/cvise to report compiler bugs, this will be a very > > welcomed change, as those tools will have to do less work, and I can get > > my reports done faster. > > That's nice, didn't think of that side effect. > > Could you perhaps measure this too, to see how much of a benefit it is? Yes, next time that I run into a bug that I have to use those tools on, I will see if I can benchmark the difference! > > ######################################################################## > > > > I took the series for a spin with clang and GCC on arm64 and x86_64 and > > I found a few warnings/errors. > > Thank you! > > > 1. Position of certain attributes > > > > In some commits, you move the cacheline_aligned attributes from after > > the closing brace on structures to before the struct keyword, which > > causes clang to warn (and error with CONFIG_WERROR): > > > > In file included from arch/arm64/kernel/asm-offsets.c:9: > > In file included from arch/arm64/kernel/../../../kernel/sched/per_task_area_struct.h:33: > > In file included from ./include/linux/perf_event_api.h:17: > > In file included from ./include/linux/perf_event_types.h:41: > > In file included from ./include/linux/ftrace.h:18: > > In file included from ./arch/arm64/include/asm/ftrace.h:53: > > In file included from ./include/linux/compat.h:11: > > ./include/linux/fs_types.h:997:1: error: attribute '__aligned__' is ignored, place it after "struct" to apply attribute to type declaration [-Werror,-Wignored-attributes] > > ____cacheline_aligned > > ^ > > ./include/linux/cache.h:41:46: note: expanded from macro '____cacheline_aligned' > > #define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) > > Yeah, so this is a *really* stupid warning from Clang. > > Putting the attribute after 'struct' risks the hard to track down bugs when > a <linux/cache.h> inclusion is missing, which scenario I pointed out in > this commit: > > headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition > > When changing <linux/dcache.h> I removed the <linux/spinlock_api.h> header, > which caused a couple of hundred of mysterious, somewhat obscure link time errors: > > ld: net/sctp/tsnmap.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/tsnmap.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > ld: net/sctp/debug.o:(.bss+0x0): multiple definition of `____cacheline_aligned_in_smp'; init/do_mounts_rd.o:(.bss+0x0): first defined here > ld: net/sctp/debug.o:(.bss+0x40): multiple definition of `____cacheline_aligned'; init/do_mounts_rd.o:(.bss+0x40): first defined here > > After a bit of head-scratching, what happened is that 'struct dentry_operations' > has the ____cacheline_aligned attribute at the tail of the type definition - > which turned into a local variable definition when <linux/cache.h> was not > included - which <linux/spinlock_api.h> includes into <linux/dcache.h> indirectly. > > There were no compile time errors, only link time errors. > > Move the attribute to the head of the definition, in which case > a missing <linux/cache.h> inclusion creates an immediate build failure: > > In file included from ./include/linux/fs.h:9, > from ./include/linux/fsverity.h:14, > from fs/verity/fsverity_private.h:18, > from fs/verity/read_metadata.c:8: > ./include/linux/dcache.h:132:22: error: expected ‘;’ before ‘struct’ > 132 | ____cacheline_aligned > | ^ > | ; > 133 | struct dentry_operations { > | ~~~~~~ > > No change in functionality. > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > Can this Clang warning be disabled? I'll comment on this in the other thread. > > 2. Error with CONFIG_SHADOW_CALL_STACK > > So this feature depends on Clang: > > # Supported by clang >= 7.0 > config CC_HAVE_SHADOW_CALL_STACK > def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) > > No way to activate it under my GCC cross-build toolchain, right? > > But ... I hacked the build mode on with GCC using this patch: > > From: Ingo Molnar <mingo@kernel.org> > Date: Tue, 4 Jan 2022 11:26:09 +0100 > Subject: [PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing > > NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org> > --- > Makefile | 2 +- > arch/Kconfig | 2 +- > arch/arm64/Kconfig | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Makefile b/Makefile > index 16d7f83ac368..bbab462e7509 100644 > --- a/Makefile > +++ b/Makefile > @@ -888,7 +888,7 @@ LDFLAGS_vmlinux += --gc-sections > endif > > ifdef CONFIG_SHADOW_CALL_STACK > -CC_FLAGS_SCS := -fsanitize=shadow-call-stack > +CC_FLAGS_SCS := > KBUILD_CFLAGS += $(CC_FLAGS_SCS) > export CC_FLAGS_SCS > endif > diff --git a/arch/Kconfig b/arch/Kconfig > index 4e56f66fdbcf..2103d9da4fe1 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -605,7 +605,7 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK > > config SHADOW_CALL_STACK > bool "Clang Shadow Call Stack" > - depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK > depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER > help > This option enables Clang's Shadow Call Stack, which uses a > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index c4207cf9bb17..952f3e56e0a7 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1183,7 +1183,7 @@ config ARCH_HAS_FILTER_PGPROT > > # Supported by clang >= 7.0 > config CC_HAVE_SHADOW_CALL_STACK > - def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) > + def_bool y > > config PARAVIRT > bool "Enable paravirtualization code" > > > And was able to trigger at least some of the build errors you saw: > > In file included from kernel/scs.c:15: > ./include/linux/scs.h: In function 'scs_task_reset': > ./include/linux/scs.h:26:34: error: implicit declaration of function 'task_thread_info' [-Werror=implicit-function-declaration] > > This is fixed with: > > diff --git a/kernel/scs.c b/kernel/scs.c > index ca9e707049cb..719ab53adc8a 100644 > --- a/kernel/scs.c > +++ b/kernel/scs.c > @@ -5,6 +5,7 @@ > * Copyright (C) 2019 Google LLC > */ > > +#include <linux/sched/thread_info_api.h> > #include <linux/sched.h> > #include <linux/mm_page_address.h> > #include <linux/mm_api.h> > > > Then there's the build failure in init/main.c: > > > It looks like on mainline, init_shadow_call_stack is in defined and used > > in init/init_task.c but now, it is used in init/main.c, with no > > declaration to allow the compiler to find the definition. I guess moving > > init_shadow_call_stack out of init/init_task.c to somewhere more common > > would fix this but it depends on SCS_SIZE, which is defined in > > include/linux/scs.h, and as soon as I tried to include that in another > > file, the build broke further... Any ideas you have would be appreciated > > :) for benchmarking purposes, I just disabled CONFIG_SHADOW_CALL_STACK. > > So I see: > > In file included from ./include/linux/thread_info.h:63, > from ./arch/arm64/include/asm/smp.h:32, > from ./include/linux/smp_api.h:15, > from ./include/linux/percpu.h:6, > from ./include/linux/softirq.h:8, > from init/main.c:17: > init/main.c: In function 'init_per_task_early': > ./arch/arm64/include/asm/thread_info.h:113:27: error: 'init_shadow_call_stack' undeclared (first use in this function) > 113 | .scs_base = init_shadow_call_stack, \ > | ^~~~~~~~~~~~~~~~~~~~~~ > > This looks pretty straightforward, does this patch solve it? > > include/linux/scs.h | 3 +++ > init/main.c | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/include/linux/scs.h b/include/linux/scs.h > index 18122d9e17ff..863932a9347a 100644 > --- a/include/linux/scs.h > +++ b/include/linux/scs.h > @@ -8,6 +8,7 @@ > #ifndef _LINUX_SCS_H > #define _LINUX_SCS_H > > +#include <linux/sched/thread_info_api.h> > #include <linux/gfp.h> > #include <linux/poison.h> > #include <linux/sched.h> > @@ -25,6 +26,8 @@ > #define task_scs(tsk) (task_thread_info(tsk)->scs_base) > #define task_scs_sp(tsk) (task_thread_info(tsk)->scs_sp) > > +extern unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)]; > + > void *scs_alloc(int node); > void scs_free(void *s); > void scs_init(void); > diff --git a/init/main.c b/init/main.c > index c9eb3ecbe18c..74ccad445009 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -12,6 +12,7 @@ > > #define DEBUG /* Enable initcall_debug */ > > +#include <linux/scs.h> > #include <linux/workqueue_api.h> > #include <linux/sysctl.h> > #include <linux/softirq.h> > > I've applied these fixes, with that CONFIG_SHADOW_CALL_STACK=y builds fine > on ARM64 - but I performed no runtime testing. > > I've backmerged this into: > > headers/deps: per_task, arm64, x86: Convert task_struct::thread to a per_task() field > > where this bug originated from. > > I.e. I think the bug was simply to make main.c aware of the array, now that > the INIT_THREAD initialization is done there. Yes, that seems right. Unfortunately, while the kernel now builds, it does not boot in QEMU. I tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I could reproduce that breakage there but the build errors out at that change (I do see notes of bisection breakage in some of the commits) so I assume that is expected. There is no output, even with earlycon, so it seems like something is going wrong in early boot code. I am not very familiar with the SCS code so I will see if I can debug this with gdb later (I'll try to see if it is reproducible with GCC as well; as Nick mentions, there is support being added to it and I don't mind building from source). > We could move over the init_shadow_call_stack[] array there and make it > static to begin with? I don't think anything truly relies on it being a > global symbol. That is what I thought as well... I'll see if I can ping Sami to see if there is any reason not to do that. > > 3. Nested function in arch/x86/kernel/asm-offsets.c > > > diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c > > index ff3f8ed5d0a2..a6d56f4697cd 100644 > > --- a/arch/x86/kernel/asm-offsets.c > > +++ b/arch/x86/kernel/asm-offsets.c > > @@ -35,10 +35,10 @@ > > # include "asm-offsets_64.c" > > #endif > > > > -static void __used common(void) > > -{ > > #include "../../../kernel/sched/per_task_area_struct_defs.h" > > > > +static void __used common(void) > > +{ > > BLANK(); > > DEFINE(TASK_threadsp, offsetof(struct task_struct, per_task_area) + > > offsetof(struct task_struct_per_task, thread) + > > Ha, that code is bogus, it's a merge bug of mine. Super interesting that > GCC still managed to include the header ... > > I've applied your fix. > > > 4. Build error in kernel/gcov/clang.c > > > 8 errors generated. > > > > I resolved this with: > > > > diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c > > index 6ee385f6ad47..29f0899ba209 100644 > > --- a/kernel/gcov/clang.c > > +++ b/kernel/gcov/clang.c > > @@ -52,6 +52,7 @@ > > #include <linux/ratelimit.h> > > #include <linux/slab.h> > > #include <linux/mm.h> > > +#include <linux/string.h> > > #include "gcov.h" > > Thank you - applied! > > > typedef void (*llvm_gcov_callback)(void); > > > > > > 5. BPF errors > > > > With Arch Linux's config (https://github.com/archlinux/svntogit-packages/raw/packages/linux/trunk/config), > > I see the following errors: > > > > kernel/bpf/preload/iterators/iterators.c:3:10: fatal error: 'linux/sched/signal.h' file not found > > #include <linux/sched/signal.h> > > ^~~~~~~~~~~~~~~~~~~~~~ > > 1 error generated. > > > > kernel/bpf/sysfs_btf.c:21:2: error: implicitly declaring library function 'memcpy' with type 'void *(void *, const void *, unsigned long)' [-Werror,-Wimplicit-function-declaration] > > memcpy(buf, __start_BTF + off, len); > > ^ > > kernel/bpf/sysfs_btf.c:21:2: note: include the header <string.h> or explicitly provide a declaration for 'memcpy' > > 1 error generated. > > > > The second error is obviously fixed by just including string.h as above. > > Applied. > > > I am not sure what is wrong with the first one; the includes all appear > > to be userland headers, rather than kernel ones, so maybe an -I flag is > > not present that should be? To work around it, I disabled > > CONFIG_BPF_PRELOAD. > > Yeah, this should be fixed by simply removing the two stray dependencies > that found their way into this user-space code: > > kernel/bpf/preload/iterators/iterators.bpf.c | 1 - > kernel/bpf/preload/iterators/iterators.c | 1 - > 2 files changed, 2 deletions(-) > > diff --git a/kernel/bpf/preload/iterators/iterators.bpf.c b/kernel/bpf/preload/iterators/iterators.bpf.c > index 41ae00edeecf..03af863314ea 100644 > --- a/kernel/bpf/preload/iterators/iterators.bpf.c > +++ b/kernel/bpf/preload/iterators/iterators.bpf.c > @@ -1,6 +1,5 @@ > // SPDX-License-Identifier: GPL-2.0 > /* Copyright (c) 2020 Facebook */ > -#include <linux/seq_file.h> > #include <linux/bpf.h> > #include <bpf/bpf_helpers.h> > #include <bpf/bpf_core_read.h> > diff --git a/kernel/bpf/preload/iterators/iterators.c b/kernel/bpf/preload/iterators/iterators.c > index d702cbf7ddaf..5d872a705470 100644 > --- a/kernel/bpf/preload/iterators/iterators.c > +++ b/kernel/bpf/preload/iterators/iterators.c > @@ -1,6 +1,5 @@ > // SPDX-License-Identifier: GPL-2.0 > /* Copyright (c) 2020 Facebook */ > -#include <linux/sched/signal.h> > #include <errno.h> > #include <stdio.h> > #include <stdlib.h> Yes, that resolves the error for me. > > 6. resolve_btfids warning > > > > After working around the above errors, with either GCC or clang, I see > > the following warnings with Arch Linux's configuration: > > > > WARN: multiple IDs found for 'task_struct': 103, 23549 - using 103 > > WARN: multiple IDs found for 'path': 1166, 23551 - using 1166 > > WARN: multiple IDs found for 'inode': 997, 23561 - using 997 > > WARN: multiple IDs found for 'file': 714, 23566 - using 714 > > WARN: multiple IDs found for 'seq_file': 1120, 23673 - using 1120 > > > > Which appears to come from symbols_resolve() in > > tools/bpf/resolve_btfids/main.c. > > Hm, is this perhaps related to CONFIG_KALLSYMS_FAST=y? If yes then turning > it off might help. > > I don't really know this area of BPF all that much, maybe someone else can > see what the problem is? The error message is not self-explanatory. It does not seem related, as I disabled that configuration and still see it. I am equally ignorant about BPF so enlisting their help would good. > > > > ######################################################################## > > > > I am very excited to see where this goes, it is a herculean effort but I > > think it will be worth it in the long run. Let me know if there is any > > more information or input that I can provide, cheers! > > Your testing & patch sending efforts are much appreciated!! You'd help me > most by continuing on the same path with new fast-headers releases as well, > whenever you find the time. :-) > > BTW., you can always pick up my latest Work-In-Progress branch from: > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers > > The 'master' branch will carry the release. > > The sched/headers branch is already rebased to -rc8 and has some other > changes as well. It should normally work, with less testing than the main > releasees, but will at times have fixes at the tail waiting to be > backmerged in a bisect-friendly way. Sure thing, I will continue to follow this and test it as much as I can to make sure everything continues to work well! Cheers, Nathan ^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs 2022-01-04 17:50 ` Nathan Chancellor @ 2022-01-05 0:35 ` Ingo Molnar 2022-01-05 0:40 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar 2022-01-08 15:16 ` Ingo Molnar 2 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:35 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > Good point. With my main box (AMD EPYC 7502P), with the performance governor... > > GCC: > > Benchmark 1: ARCH=x86_64 defconfig (linux) > Time (mean ± σ): 48.685 s ± 0.049 s [User: 1969.835 s, System: 204.166 s] > Range (min … max): 48.620 s … 48.782 s 10 runs > > Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) > Time (mean ± σ): 46.797 s ± 0.119 s [User: 1403.854 s, System: 154.336 s] > Range (min … max): 46.620 s … 47.052 s 10 runs > > Summary > 'ARCH=x86_64 defconfig (linux-fast-headers)' ran > 1.04 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)' > > LLVM: > > Benchmark 1: ARCH=x86_64 defconfig (linux) > Time (mean ± σ): 51.816 s ± 0.079 s [User: 2208.577 s, System: 200.410 s] > Range (min … max): 51.671 s … 51.900 s 10 runs > > Benchmark 2: ARCH=x86_64 defconfig (linux-fast-headers) > Time (mean ± σ): 46.806 s ± 0.062 s [User: 1438.972 s, System: 154.846 s] > Range (min … max): 46.696 s … 46.917 s 10 runs > > Summary > 'ARCH=x86_64 defconfig (linux-fast-headers)' ran > 1.11 ± 0.00 times faster than 'ARCH=x86_64 defconfig (linux)' > > $ rg KALLSYMS .config > 246:CONFIG_KALLSYMS=y > 247:# CONFIG_KALLSYMS_ALL is not set > 248:CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y > 249:CONFIG_KALLSYMS_BASE_RELATIVE=y > 250:CONFIG_KALLSYMS_FAST=y > 706:CONFIG_HAVE_OBJTOOL_KALLSYMS=y > > It seems like everything is working right but maybe the build is so > short that there just is not much time for the difference to be as > apparent? Yeah, x86 defconfig doesn't have KALLSYMS_ALL - while all distro configs I checked have it enabled, because it makes crash printouts / backtraces more informative. Lockep will also enable it unconditionally. So I've applied the patch below, to make the x86 defconfig more representative of what people are using in practice. This will also, as a side effect, bring elapsed time improvements closer to what the underlying cpu-time improvements offer, in the small-config case too. Thanks, Ingo ====================================> From: Ingo Molnar <mingo@kernel.org> Date: Wed, 5 Jan 2022 01:31:35 +0100 Subject: [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Most distro kernels have this option enabled, to improve debug output. Lockdep also selects it. Enable this in the defconfig kernel as well, to make it more representative of what people are using on x86. Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/configs/i386_defconfig | 1 + arch/x86/configs/x86_64_defconfig | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig index 5d97a2dfbaa7..71124cf8630c 100644 --- a/arch/x86/configs/i386_defconfig +++ b/arch/x86/configs/i386_defconfig @@ -261,3 +261,4 @@ CONFIG_BLK_DEV_IO_TRACE=y CONFIG_PROVIDE_OHCI1394_DMA_INIT=y CONFIG_EARLY_PRINTK_DBGP=y CONFIG_DEBUG_BOOT_PARAMS=y +CONFIG_KALLSYMS_ALL=y diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig index 30ab3e582d53..92b1169ec90b 100644 --- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -257,3 +257,4 @@ CONFIG_BLK_DEV_IO_TRACE=y CONFIG_PROVIDE_OHCI1394_DMA_INIT=y CONFIG_EARLY_PRINTK_DBGP=y CONFIG_DEBUG_BOOT_PARAMS=y +CONFIG_KALLSYMS_ALL=y ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 17:50 ` Nathan Chancellor 2022-01-05 0:35 ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar @ 2022-01-05 0:40 ` Ingo Molnar 2022-01-05 1:07 ` Ingo Molnar 2022-01-05 22:33 ` Nathan Chancellor 2022-01-08 15:16 ` Ingo Molnar 2 siblings, 2 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 0:40 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > > I.e. I think the bug was simply to make main.c aware of the array, now > > that the INIT_THREAD initialization is done there. > > Yes, that seems right. > > Unfortunately, while the kernel now builds, it does not boot in QEMU. I > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I > could reproduce that breakage there but the build errors out at that > change (I do see notes of bisection breakage in some of the commits) so I > assume that is expected. Yeah, there's a breakage window on ARM64, I'll track down that bisectability bug. Decoupling thread_info and task_struct incrementally, so that it bisects cleanly on all architectures, was always a big challenge. :-/ > There is no output, even with earlycon, so it seems like something is > going wrong in early boot code. I am not very familiar with the SCS code > so I will see if I can debug this with gdb later (I'll try to see if it > is reproducible with GCC as well; as Nick mentions, there is support > being added to it and I don't mind building from source). Just to make sure: with SCS disabled the same kernel boots fine? > Sure thing, I will continue to follow this and test it as much as I can > to make sure everything continues to work well! Thank you! Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-05 0:40 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar @ 2022-01-05 1:07 ` Ingo Molnar 2022-01-05 21:42 ` Nathan Chancellor 2022-01-05 22:33 ` Nathan Chancellor 1 sibling, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2022-01-05 1:07 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Ingo Molnar <mingo@kernel.org> wrote: > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > > I.e. I think the bug was simply to make main.c aware of the array, now > > > that the INIT_THREAD initialization is done there. > > > > Yes, that seems right. > > > > Unfortunately, while the kernel now builds, it does not boot in QEMU. I > > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I > > could reproduce that breakage there but the build errors out at that > > change (I do see notes of bisection breakage in some of the commits) so I > > assume that is expected. > > Yeah, there's a breakage window on ARM64, I'll track down that > bisectability bug. I haven't fixed this ARM64 bisection breakage yet, but I've integrated & backmerged all the other fixes and changes, and pushed it out to the WIP branch: # 1755441e323b per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers Let me know if there's anything missing or if there's a new breakage. This is pretty close to what will be -v2. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-05 1:07 ` Ingo Molnar @ 2022-01-05 21:42 ` Nathan Chancellor 2022-01-08 10:32 ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar ` (4 more replies) 0 siblings, 5 replies; 54+ messages in thread From: Nathan Chancellor @ 2022-01-05 21:42 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm On Wed, Jan 05, 2022 at 02:07:42AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@kernel.org> wrote: > > > > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > > > > I.e. I think the bug was simply to make main.c aware of the array, now > > > > that the INIT_THREAD initialization is done there. > > > > > > Yes, that seems right. > > > > > > Unfortunately, while the kernel now builds, it does not boot in QEMU. I > > > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I > > > could reproduce that breakage there but the build errors out at that > > > change (I do see notes of bisection breakage in some of the commits) so I > > > assume that is expected. > > > > Yeah, there's a breakage window on ARM64, I'll track down that > > bisectability bug. > > I haven't fixed this ARM64 bisection breakage yet, but I've integrated & > backmerged all the other fixes and changes, and pushed it out to the WIP > branch: > > # 1755441e323b per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers > > Let me know if there's anything missing or if there's a new breakage. I ended up running this through my full set of clang builds and a few GCC builds and found a few issues, which most of which appear to be compiler agnostic. This whole report is against commit 1755441e323b ("per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets"). In case it is relevant... $ gcc --version | head -1 gcc (GCC) 11.2.1 20211231 1. kernel/stackleak.c build failure: $ make -skj"$(nproc)" ARCH=x86_64 allmodconfig kernel/stackleak.o kernel/stackleak.c: In function ‘stackleak_erase’: kernel/stackleak.c:92:13: error: implicit declaration of function ‘on_thread_stack’; did you mean ‘setup_thread_stack’? [-Werror=implicit-function-declaration] 92 | if (on_thread_stack()) | ^~~~~~~~~~~~~~~ | setup_thread_stack kernel/stackleak.c:95:28: error: implicit declaration of function ‘current_top_of_stack’ [-Werror=implicit-function-declaration] 95 | boundary = current_top_of_stack(); | ^~~~~~~~~~~~~~~~~~~~ kernel/stackleak.c: In function ‘stackleak_track_stack’: kernel/stackleak.c:119:14: error: implicit declaration of function ‘ALIGN’ [-Werror=implicit-function-declaration] 119 | sp = ALIGN(sp, sizeof(unsigned long)); | ^~~~~ cc1: all warnings being treated as errors This is fixed with the following diff although I am unsure if that is as minimal as it should be. diff --git a/kernel/stackleak.c b/kernel/stackleak.c index ce161a8e8d97..d67c5475183b 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -10,8 +10,10 @@ * reveal and blocks some uninitialized stack variable attacks. */ +#include <asm/processor_api.h> #include <linux/stackleak.h> #include <linux/kprobes.h> +#include <linux/align.h> #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE #include <linux/jump_label.h> 2. Build failures with CONFIG_UAPI_HEADER_TEST=y and O=... This was originally reproduced with allmodconfig but this is a simpler reproducer I think. $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 defconfig $ scripts/config --file .build/x86_64/.config -e HEADERS_INSTALL -e UAPI_HEADER_TEST $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 olddefconfig usr/ In file included from <command-line>: ./usr/include/linux/rds.h:38:10: fatal error: uapi/linux/sockios.h: No such file or directory 38 | #include <uapi/linux/sockios.h> | ^~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/rds.hdrtest] Error 1 In file included from ./usr/include/linux/qrtr.h:5, from <command-line>: ./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory 5 | #include <uapi/linux/socket_types.h> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. In file included from ./usr/include/linux/in.h:24, from ./usr/include/linux/nfs_mount.h:12, from <command-line>: ./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory 5 | #include <uapi/linux/socket_types.h> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/qrtr.hdrtest] Error 1 make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/nfs_mount.hdrtest] Error 1 ... I don't see this when just building in the tree. I am guessing that commit f989e243f1f4 ("headers/deps: uapi/headers: Create usr/include/uapi symbolic link") needs to account for this? 3. Build failure with CONFIG_SAMPLE_CONNECTOR=m and O=... I am guessing this has a similar root cause as above, since that commit mentions an error similar to this. $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 allmodconfig samples/connector/ In file included from /home/nathan/cbl/src/linux-fast-headers/samples/connector/ucon.c:14: usr/include/linux/netlink.h:5:10: fatal error: uapi/linux/types.h: No such file or directory 5 | #include <uapi/linux/types.h> | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. 4. modpost warning around __sw_hweight64 With the first issue resolved: $ make -skj"$(nproc)" ARCH=i386 allmodconfig WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ... Is "__sw_hweight64" prototyped in <asm/asm-prototypes.h>? 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO With arm64 + CONFIG_LTO_CLANG_THIN=y, I see: $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig $ scripts/config -e LTO_CLANG_THIN $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/ ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro >>> .macro __put, val, name >>> ^ make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1 I was not able to figure out the exact include chain but CONFIG_LTO causes asm/alternative-macros.h to be included in asm/rwonce.h, which eventually gets included in either asm/cache.h or asm/memory.h. I managed to solve this with the following diff but I am not sure if there is a better or cleaner way to do that. diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h index 1bce62fa908a..e19572a205d0 100644 --- a/arch/arm64/include/asm/rwonce.h +++ b/arch/arm64/include/asm/rwonce.h @@ -5,7 +5,7 @@ #ifndef __ASM_RWONCE_H #define __ASM_RWONCE_H -#ifdef CONFIG_LTO +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT) #include <linux/compiler_types.h> #include <asm/alternative-macros.h> @@ -66,7 +66,7 @@ }) #endif /* !BUILD_VDSO */ -#endif /* CONFIG_LTO */ +#endif /* CONFIG_LTO && !LINKER_SCRIPT */ #include <asm-generic/rwonce.h> I'll see if I can flush out any other issues. Cheers, Nathan ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> 2022-01-05 21:42 ` Nathan Chancellor @ 2022-01-08 10:32 ` Ingo Molnar 2022-01-08 11:08 ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar ` (3 subsequent siblings) 4 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 10:32 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > 1. kernel/stackleak.c build failure: > > $ make -skj"$(nproc)" ARCH=x86_64 allmodconfig kernel/stackleak.o > kernel/stackleak.c: In function ‘stackleak_erase’: > kernel/stackleak.c:92:13: error: implicit declaration of function ‘on_thread_stack’; did you mean ‘setup_thread_stack’? [-Werror=implicit-function-declaration] So it turns out that my build environment didn't have the stackleak code enabled at all: kepler:~/mingo.tip.git> make ARCH=x86_64 allmodconfig # # configuration written to .config # kepler:~/mingo.tip.git> grep -E 'STACKLEAK|GCC_PLUGIN' .config CONFIG_HAVE_ARCH_STACKLEAK=y CONFIG_HAVE_GCC_PLUGINS=y ... because it failed this condition: menuconfig GCC_PLUGINS ... depends on $(success,test -e $(shell,$(CC) -print-file-name=plugin)/include/plugin-version.h) ... because there were no plugin headers: kepler:~/mingo.tip.git> gcc -print-file-name=plugin /usr/lib/gcc/x86_64-linux-gnu/10/plugin kepler:~/mingo.tip.git> ls $(gcc -print-file-name=plugin)/include/ ls: cannot access '/usr/lib/gcc/x86_64-linux-gnu/10/plugin/include/': No such file or directory ... because I needed to install the plugin-development packages for gcc-10. After installing those I have stackleak: kepler:~/mingo.tip.git> grep STACKLEAK .config CONFIG_HAVE_ARCH_STACKLEAK=y CONFIG_GCC_PLUGIN_STACKLEAK=y CONFIG_STACKLEAK_TRACK_MIN_SIZE=100 CONFIG_STACKLEAK_METRICS=y CONFIG_STACKLEAK_RUNTIME_DISABLE=y and was able to reproduce your build failure. :-) > This is fixed with the following diff although I am unsure if that is as > minimal as it should be. > > diff --git a/kernel/stackleak.c b/kernel/stackleak.c > index ce161a8e8d97..d67c5475183b 100644 > --- a/kernel/stackleak.c > +++ b/kernel/stackleak.c > @@ -10,8 +10,10 @@ > * reveal and blocks some uninitialized stack variable attacks. > */ > > +#include <asm/processor_api.h> > #include <linux/stackleak.h> > #include <linux/kprobes.h> > +#include <linux/align.h> Yeah - I used a simpler & more generic header: <linux/ptrace_api.h> - see the patch below. But your solution is functionally equivalent. This fix will be included in -v2, hopefully released later today. Thanks, Ingo ===============> From: Ingo Molnar <mingo@kernel.org> Date: Sat, 8 Jan 2022 11:29:17 +0100 Subject: [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/stackleak.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/stackleak.c b/kernel/stackleak.c index ce161a8e8d97..fde49e2f209a 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -10,6 +10,7 @@ * reveal and blocks some uninitialized stack variable attacks. */ +#include <linux/ptrace_api.h> #include <linux/stackleak.h> #include <linux/kprobes.h> ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link 2022-01-05 21:42 ` Nathan Chancellor 2022-01-08 10:32 ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar @ 2022-01-08 11:08 ` Ingo Molnar 2022-01-08 11:18 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar ` (2 subsequent siblings) 4 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 11:08 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > 2. Build failures with CONFIG_UAPI_HEADER_TEST=y and O=... > > This was originally reproduced with allmodconfig but this is a simpler > reproducer I think. > > $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 defconfig > > $ scripts/config --file .build/x86_64/.config -e HEADERS_INSTALL -e UAPI_HEADER_TEST > > $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 olddefconfig usr/ The simplified & scripted reproducer is very useful, thanks a ton! > In file included from <command-line>: > ./usr/include/linux/rds.h:38:10: fatal error: uapi/linux/sockios.h: No such file or directory > 38 | #include <uapi/linux/sockios.h> > | ^~~~~~~~~~~~~~~~~~~~~~ > compilation terminated. > make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/rds.hdrtest] Error 1 > In file included from ./usr/include/linux/qrtr.h:5, > from <command-line>: > ./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory > 5 | #include <uapi/linux/socket_types.h> > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ > compilation terminated. > In file included from ./usr/include/linux/in.h:24, > from ./usr/include/linux/nfs_mount.h:12, > from <command-line>: > ./usr/include/linux/socket.h:5:10: fatal error: uapi/linux/socket_types.h: No such file or directory > 5 | #include <uapi/linux/socket_types.h> > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ > compilation terminated. > make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/qrtr.hdrtest] Error 1 > make[4]: *** [/home/nathan/cbl/src/linux-fast-headers/usr/include/Makefile:106: usr/include/linux/nfs_mount.hdrtest] Error 1 > ... > > I don't see this when just building in the tree. I am guessing that > commit f989e243f1f4 ("headers/deps: uapi/headers: Create > usr/include/uapi symbolic link") needs to account for this? Yeah. Here's my second attempt that creates the symlink as the header-install make process, as it should - also pushed out into sched/headers. (My Makefile-fu isn't overly powerful though, so this is just an attempt.) This fix will be backmerged into f989e243f1f4 in -v2. Thanks, Ingo =========================> From: Ingo Molnar <mingo@kernel.org> Date: Sat, 8 Jan 2022 12:05:57 +0100 Subject: [PATCH] FIX: f989e243f1f4 headers/deps: uapi/headers: Create usr/include/uapi symbolic link --- scripts/Makefile.headersinst | 3 +++ usr/include/uapi | 1 - 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst index 029d85bb0b23..8ac831458143 100644 --- a/scripts/Makefile.headersinst +++ b/scripts/Makefile.headersinst @@ -78,6 +78,9 @@ existing-headers := $(filter $(old-headers), $(all-headers)) -include $(foreach f,$(existing-headers),$(dir $(f)).$(notdir $(f)).cmd) +# link the <uapi/*> namespace: +LINK := $(shell ln -sf ../include $(objtree)/$(dst)/uapi) + PHONY += FORCE FORCE: diff --git a/usr/include/uapi b/usr/include/uapi deleted file mode 120000 index f5030fe88998..000000000000 --- a/usr/include/uapi +++ /dev/null @@ -1 +0,0 @@ -../include \ No newline at end of file ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-05 21:42 ` Nathan Chancellor 2022-01-08 10:32 ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar 2022-01-08 11:08 ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar @ 2022-01-08 11:18 ` Ingo Molnar 2022-01-08 11:38 ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar 2022-01-08 11:49 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar 4 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 11:18 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > 3. Build failure with CONFIG_SAMPLE_CONNECTOR=m and O=... > > I am guessing this has a similar root cause as above, since that commit > mentions an error similar to this. > > $ make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 allmodconfig samples/connector/ > In file included from /home/nathan/cbl/src/linux-fast-headers/samples/connector/ucon.c:14: > usr/include/linux/netlink.h:5:10: fatal error: uapi/linux/types.h: No such file or directory > 5 | #include <uapi/linux/types.h> > | ^~~~~~~~~~~~~~~~~~~~ > compilation terminated. Correct - this test now passes with the UAPI symlink fix applied: kepler:~/mingo.tip.git> make -skj"$(nproc)" ARCH=x86_64 O=.build/x86_64 allmodconfig samples/connector/ kepler:~/mingo.tip.git> Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation 2022-01-05 21:42 ` Nathan Chancellor ` (2 preceding siblings ...) 2022-01-08 11:18 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar @ 2022-01-08 11:38 ` Ingo Molnar 2022-01-08 11:49 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar 4 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 11:38 UTC (permalink / raw) To: Nathan Chancellor, Borislav Petkov, Thomas Gleixner Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > 4. modpost warning around __sw_hweight64 > > With the first issue resolved: > > $ make -skj"$(nproc)" ARCH=i386 allmodconfig > WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ... > Is "__sw_hweight64" prototyped in <asm/asm-prototypes.h>? So I was hoping that this commit made explicit all the random indirect header dependencies x86's <asm/asm-prototypes.h> imports on mainline: headers/prep: x86/kbuild: Add symbol prototype header dependencies for modversions ... but a i386 case slipped through. But, this actually highlights a real x86 symbol export bug IMO. __arch_hweight64() on x86-32 is defined in the arch/x86/include/asm/arch_hweight.h header as an inline, using __arch_hweight32(): #ifdef CONFIG_X86_32 static inline unsigned long __arch_hweight64(__u64 w) { return __arch_hweight32((u32)w) + __arch_hweight32((u32)(w >> 32)); } *But* there's also a __sw_hweight64() assembly implementation: arch/x86/lib/hweight.S SYM_FUNC_START(__sw_hweight64) #ifdef CONFIG_X86_64 ... #else /* CONFIG_X86_32 */ /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */ pushl %ecx call __sw_hweight32 movl %eax, %ecx # stash away result movl %edx, %eax # second part of input call __sw_hweight32 addl %ecx, %eax # result popl %ecx ret #endif But this __sw_hweight64 assembly implementation is unused - and it's essentially doing the same thing that the inline wrapper does. Then we export this unused helper with no prototype. This went unnoticed in mainline, because mainline defines the prototype for the unused prototype. So I think the real solution to resolve this is by removing the unused 32-bit variant - see the patch below. Thanks, Ingo ======================> From: Ingo Molnar <mingo@kernel.org> Date: Sat, 8 Jan 2022 12:33:58 +0100 Subject: [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Header cleanups in the fast-headers tree highlighted that we have an unused assembly implementation for __sw_hweight64(): WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ... __arch_hweight64() on x86-32 is defined in the arch/x86/include/asm/arch_hweight.h header as an inline, using __arch_hweight32(): #ifdef CONFIG_X86_32 static inline unsigned long __arch_hweight64(__u64 w) { return __arch_hweight32((u32)w) + __arch_hweight32((u32)(w >> 32)); } *But* there's also a __sw_hweight64() assembly implementation: arch/x86/lib/hweight.S SYM_FUNC_START(__sw_hweight64) #ifdef CONFIG_X86_64 ... #else /* CONFIG_X86_32 */ /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */ pushl %ecx call __sw_hweight32 movl %eax, %ecx # stash away result movl %edx, %eax # second part of input call __sw_hweight32 addl %ecx, %eax # result popl %ecx ret #endif But this __sw_hweight64 assembly implementation is unused - and it's essentially doing the same thing that the inline wrapper does. Remove the assembly version and add a comment about it. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/lib/hweight.S | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/arch/x86/lib/hweight.S b/arch/x86/lib/hweight.S index dbf8cc97b7f5..585e2f1372d0 100644 --- a/arch/x86/lib/hweight.S +++ b/arch/x86/lib/hweight.S @@ -36,8 +36,12 @@ SYM_FUNC_START(__sw_hweight32) SYM_FUNC_END(__sw_hweight32) EXPORT_SYMBOL(__sw_hweight32) -SYM_FUNC_START(__sw_hweight64) +/* + * No 32-bit variant, because it's implemented as an inline wrapper + * on top of __arch_hweight32(): + */ #ifdef CONFIG_X86_64 +SYM_FUNC_START(__sw_hweight64) pushq %rdi pushq %rdx @@ -66,18 +70,6 @@ SYM_FUNC_START(__sw_hweight64) popq %rdx popq %rdi ret -#else /* CONFIG_X86_32 */ - /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */ - pushl %ecx - - call __sw_hweight32 - movl %eax, %ecx # stash away result - movl %edx, %eax # second part of input - call __sw_hweight32 - addl %ecx, %eax # result - - popl %ecx - ret -#endif SYM_FUNC_END(__sw_hweight64) EXPORT_SYMBOL(__sw_hweight64) +#endif ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-05 21:42 ` Nathan Chancellor ` (3 preceding siblings ...) 2022-01-08 11:38 ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar @ 2022-01-08 11:49 ` Ingo Molnar 2022-01-08 12:17 ` Ingo Molnar 2022-01-10 20:03 ` Nathan Chancellor 4 siblings, 2 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 11:49 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO > > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see: > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig > > $ scripts/config -e LTO_CLANG_THIN > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/ > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro > >>> .macro __put, val, name > >>> ^ > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1 > > I was not able to figure out the exact include chain but CONFIG_LTO > causes asm/alternative-macros.h to be included in asm/rwonce.h, which > eventually gets included in either asm/cache.h or asm/memory.h. > > I managed to solve this with the following diff but I am not sure if > there is a better or cleaner way to do that. > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > index 1bce62fa908a..e19572a205d0 100644 > --- a/arch/arm64/include/asm/rwonce.h > +++ b/arch/arm64/include/asm/rwonce.h > @@ -5,7 +5,7 @@ > #ifndef __ASM_RWONCE_H > #define __ASM_RWONCE_H > > -#ifdef CONFIG_LTO > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT) > > #include <linux/compiler_types.h> > #include <asm/alternative-macros.h> > @@ -66,7 +66,7 @@ > }) > > #endif /* !BUILD_VDSO */ > -#endif /* CONFIG_LTO */ > +#endif /* CONFIG_LTO && !LINKER_SCRIPT */ So the error message suggests that the linker script somehow ends up including asm-generic/export.h: kepler:~/mingo.tip.git> git grep 'macro __put' include/asm-generic/export.h:.macro __put, val, name ? But I'd guess that similar to the __ASSEMBLY__ patterns we have in headers, not including the rwonce.h bits if LINKER_SCRIPT is defined is probably close to the right solution - but it would also know how such a low level header ended up in a linker script. Might have been to pick up some offset or size definition somewhere? I.e. how did the build end up including asm/rwonce.h? You can generally debug such weird dependency chains by putting a debug #warning into the affected header - such as the patch below. This prints a stack of the header dependencies: CC kernel/sched/core.o In file included from ./include/linux/compiler.h:263, from ./include/linux/static_call_types.h:7, from ./include/linux/kernel.h:6, from ./include/linux/highmem.h:5, from kernel/sched/core.c:9: ./arch/arm64/include/asm/rwonce.h:8:2: warning: #warning debug [-Wcpp] 8 | #warning debug ... and should in principle also work in the linker script context. Thanks, Ingo ===============> arch/arm64/include/asm/rwonce.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h index 1bce62fa908a..5b3305381481 100644 --- a/arch/arm64/include/asm/rwonce.h +++ b/arch/arm64/include/asm/rwonce.h @@ -5,6 +5,8 @@ #ifndef __ASM_RWONCE_H #define __ASM_RWONCE_H +#warning debug + #ifdef CONFIG_LTO #include <linux/compiler_types.h> ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-08 11:49 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar @ 2022-01-08 12:17 ` Ingo Molnar 2022-01-10 20:03 ` Nathan Chancellor 1 sibling, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 12:17 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Ingo Molnar <mingo@kernel.org> wrote: > * Nathan Chancellor <nathan@kernel.org> wrote: > > > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO > > > > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see: > > > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig > > > > $ scripts/config -e LTO_CLANG_THIN > > > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/ > > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro > > >>> .macro __put, val, name > > >>> ^ > > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1 > > > > I was not able to figure out the exact include chain but CONFIG_LTO > > causes asm/alternative-macros.h to be included in asm/rwonce.h, which > > eventually gets included in either asm/cache.h or asm/memory.h. > > > > I managed to solve this with the following diff but I am not sure if > > there is a better or cleaner way to do that. > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > index 1bce62fa908a..e19572a205d0 100644 > > --- a/arch/arm64/include/asm/rwonce.h > > +++ b/arch/arm64/include/asm/rwonce.h > > @@ -5,7 +5,7 @@ > > #ifndef __ASM_RWONCE_H > > #define __ASM_RWONCE_H > > > > -#ifdef CONFIG_LTO > > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT) > > > > #include <linux/compiler_types.h> > > #include <asm/alternative-macros.h> > > @@ -66,7 +66,7 @@ > > }) > > > > #endif /* !BUILD_VDSO */ > > -#endif /* CONFIG_LTO */ > > +#endif /* CONFIG_LTO && !LINKER_SCRIPT */ In any case I've added your fix to the fast-headers tree, with a comment that this might just be a workaround. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-08 11:49 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar 2022-01-08 12:17 ` Ingo Molnar @ 2022-01-10 20:03 ` Nathan Chancellor 2022-01-10 20:05 ` Nathan Chancellor 1 sibling, 1 reply; 54+ messages in thread From: Nathan Chancellor @ 2022-01-10 20:03 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm On Sat, Jan 08, 2022 at 12:49:04PM +0100, Ingo Molnar wrote: > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO > > > > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see: > > > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig > > > > $ scripts/config -e LTO_CLANG_THIN > > > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/ > > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro > > >>> .macro __put, val, name > > >>> ^ > > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1 > > > > I was not able to figure out the exact include chain but CONFIG_LTO > > causes asm/alternative-macros.h to be included in asm/rwonce.h, which > > eventually gets included in either asm/cache.h or asm/memory.h. > > > > I managed to solve this with the following diff but I am not sure if > > there is a better or cleaner way to do that. > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > index 1bce62fa908a..e19572a205d0 100644 > > --- a/arch/arm64/include/asm/rwonce.h > > +++ b/arch/arm64/include/asm/rwonce.h > > @@ -5,7 +5,7 @@ > > #ifndef __ASM_RWONCE_H > > #define __ASM_RWONCE_H > > > > -#ifdef CONFIG_LTO > > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT) > > > > #include <linux/compiler_types.h> > > #include <asm/alternative-macros.h> > > @@ -66,7 +66,7 @@ > > }) > > > > #endif /* !BUILD_VDSO */ > > -#endif /* CONFIG_LTO */ > > +#endif /* CONFIG_LTO && !LINKER_SCRIPT */ > > So the error message suggests that the linker script somehow ends up > including asm-generic/export.h: > > kepler:~/mingo.tip.git> git grep 'macro __put' > include/asm-generic/export.h:.macro __put, val, name > > ? Correct. > But I'd guess that similar to the __ASSEMBLY__ patterns we have in headers, > not including the rwonce.h bits if LINKER_SCRIPT is defined is probably > close to the right solution - but it would also know how such a low level > header ended up in a linker script. Might have been to pick up some offset > or size definition somewhere? > > I.e. how did the build end up including asm/rwonce.h? > > You can generally debug such weird dependency chains by putting a > debug #warning into the affected header - such as the patch below. > > This prints a stack of the header dependencies: > > CC kernel/sched/core.o > In file included from ./include/linux/compiler.h:263, > from ./include/linux/static_call_types.h:7, > from ./include/linux/kernel.h:6, > from ./include/linux/highmem.h:5, > from kernel/sched/core.c:9: > ./arch/arm64/include/asm/rwonce.h:8:2: warning: #warning debug [-Wcpp] > 8 | #warning debug > > ... and should in principle also work in the linker script context. Neat trick! I added #ifdef LINKER_SCRIPT #warning debug #endif to arch/arm64/include/asm/rwonce.h and built with ThinLTO, which reveals: $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig $ scripts/config -d LTO_NONE -e LTO_CLANG_THIN $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/ In file included from arch/arm64/kvm/hyp/nvhe/hyp.lds.S:12: In file included from ./arch/arm64/include/asm/memory.h:18: In file included from ./arch/arm64/include/asm/thread_info.h:11: In file included from ./include/linux/compiler.h:263: ./arch/arm64/include/asm/rwonce.h:9:2: warning: debug [-W#warnings] #warning debug ^ 1 warning generated. I wonder if the compiler.h include could be broken up? I removed it altogether just to see what would break and defconfig, defconfig + CONFIG_LTO_CLANG_THIN=y, and allmodconfig all continue to build. Cheers, Nathan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-10 20:03 ` Nathan Chancellor @ 2022-01-10 20:05 ` Nathan Chancellor 0 siblings, 0 replies; 54+ messages in thread From: Nathan Chancellor @ 2022-01-10 20:05 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm On Mon, Jan 10, 2022 at 01:03:54PM -0700, Nathan Chancellor wrote: > On Sat, Jan 08, 2022 at 12:49:04PM +0100, Ingo Molnar wrote: > > > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > > > 5. Build error in arch/arm64/kvm/hyp/nvhe with LTO > > > > > > With arm64 + CONFIG_LTO_CLANG_THIN=y, I see: > > > > > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig > > > > > > $ scripts/config -e LTO_CLANG_THIN > > > > > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/nvhe/ > > > ld.lld: error: arch/arm64/kvm/hyp/nvhe/hyp.lds:2: unknown directive: .macro > > > >>> .macro __put, val, name > > > >>> ^ > > > make[5]: *** [arch/arm64/kvm/hyp/nvhe/Makefile:51: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o] Error 1 > > > > > > I was not able to figure out the exact include chain but CONFIG_LTO > > > causes asm/alternative-macros.h to be included in asm/rwonce.h, which > > > eventually gets included in either asm/cache.h or asm/memory.h. > > > > > > I managed to solve this with the following diff but I am not sure if > > > there is a better or cleaner way to do that. > > > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > index 1bce62fa908a..e19572a205d0 100644 > > > --- a/arch/arm64/include/asm/rwonce.h > > > +++ b/arch/arm64/include/asm/rwonce.h > > > @@ -5,7 +5,7 @@ > > > #ifndef __ASM_RWONCE_H > > > #define __ASM_RWONCE_H > > > > > > -#ifdef CONFIG_LTO > > > +#if defined(CONFIG_LTO) && !defined(LINKER_SCRIPT) > > > > > > #include <linux/compiler_types.h> > > > #include <asm/alternative-macros.h> > > > @@ -66,7 +66,7 @@ > > > }) > > > > > > #endif /* !BUILD_VDSO */ > > > -#endif /* CONFIG_LTO */ > > > +#endif /* CONFIG_LTO && !LINKER_SCRIPT */ > > > > So the error message suggests that the linker script somehow ends up > > including asm-generic/export.h: > > > > kepler:~/mingo.tip.git> git grep 'macro __put' > > include/asm-generic/export.h:.macro __put, val, name > > > > ? > > Correct. > > > But I'd guess that similar to the __ASSEMBLY__ patterns we have in headers, > > not including the rwonce.h bits if LINKER_SCRIPT is defined is probably > > close to the right solution - but it would also know how such a low level > > header ended up in a linker script. Might have been to pick up some offset > > or size definition somewhere? > > > > I.e. how did the build end up including asm/rwonce.h? > > > > You can generally debug such weird dependency chains by putting a > > debug #warning into the affected header - such as the patch below. > > > > This prints a stack of the header dependencies: > > > > CC kernel/sched/core.o > > In file included from ./include/linux/compiler.h:263, > > from ./include/linux/static_call_types.h:7, > > from ./include/linux/kernel.h:6, > > from ./include/linux/highmem.h:5, > > from kernel/sched/core.c:9: > > ./arch/arm64/include/asm/rwonce.h:8:2: warning: #warning debug [-Wcpp] > > 8 | #warning debug > > > > ... and should in principle also work in the linker script context. > > Neat trick! I added > > #ifdef LINKER_SCRIPT > #warning debug > #endif > > to arch/arm64/include/asm/rwonce.h and built with ThinLTO, which reveals: > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 defconfig > > $ scripts/config -d LTO_NONE -e LTO_CLANG_THIN > > $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig arch/arm64/kvm/hyp/ > In file included from arch/arm64/kvm/hyp/nvhe/hyp.lds.S:12: > In file included from ./arch/arm64/include/asm/memory.h:18: > In file included from ./arch/arm64/include/asm/thread_info.h:11: > In file included from ./include/linux/compiler.h:263: > ./arch/arm64/include/asm/rwonce.h:9:2: warning: debug [-W#warnings] > #warning debug > ^ > 1 warning generated. > > I wonder if the compiler.h include could be broken up? I removed it > altogether just to see what would break and defconfig, defconfig + > CONFIG_LTO_CLANG_THIN=y, and allmodconfig all continue to build. Sorry, got ahead of myself there and forgot to include the diff: diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index f1bf6f6243ac..6da41eaa64bb 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -8,8 +8,6 @@ #ifndef __ASM_THREAD_INFO_H #define __ASM_THREAD_INFO_H -#include <linux/compiler.h> - #ifndef __ASSEMBLY__ struct task_struct; ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-05 0:40 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar 2022-01-05 1:07 ` Ingo Molnar @ 2022-01-05 22:33 ` Nathan Chancellor 1 sibling, 0 replies; 54+ messages in thread From: Nathan Chancellor @ 2022-01-05 22:33 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm On Wed, Jan 05, 2022 at 01:40:32AM +0100, Ingo Molnar wrote: > > * Nathan Chancellor <nathan@kernel.org> wrote: > > > Unfortunately, while the kernel now builds, it does not boot in QEMU. I > > tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if I > > could reproduce that breakage there but the build errors out at that > > change (I do see notes of bisection breakage in some of the commits) so I > > assume that is expected. > > Yeah, there's a breakage window on ARM64, I'll track down that > bisectability bug. > > Decoupling thread_info and task_struct incrementally, so that it bisects > cleanly on all architectures, was always a big challenge. :-/ > > > There is no output, even with earlycon, so it seems like something is > > going wrong in early boot code. I am not very familiar with the SCS code > > so I will see if I can debug this with gdb later (I'll try to see if it > > is reproducible with GCC as well; as Nick mentions, there is support > > being added to it and I don't mind building from source). > > Just to make sure: with SCS disabled the same kernel boots fine? Correct (thank you for making sure, I have definitely not tested that before...). $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 O=.build/arm64 defconfig Image.gz $ boot-qemu.sh -a arm64 -k .build/arm64 -t 30s ... [ 0.000000] Linux version 5.16.0-rc8-798083-g1755441e323b (nathan@archlinux-ax161) (ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 4602f4169a21e75b82261ba1599046b157d1d021), LLD 14.0.0) #1 SMP PREEMPT Wed Jan 5 21:51:29 UTC 2022 ... $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 O=.build/arm64.scs defconfig $ scripts/config --file .build/arm64.scs/.config -e SHADOW_CALL_STACK $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 O=.build/arm64.scs olddefconfig Image.gz ... qemu-system-aarch64: terminating on signal 15 from pid 690472 (timeout) + RET=124 + set +x Going back to v5.16-rc8, everything works fine. $ boot-qemu.sh -a arm64 -k .build/arm64 -t 30s ... [ 0.000000] Linux version 5.16.0-rc8-795784-gc9e6606c7fe9 (nathan@archlinux-ax161) (ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 4602f4169a21e75b82261ba1599046b157d1d021), LLD 14.0.0) #1 SMP PREEMPT Wed Jan 5 22:27:39 UTC 2022 ... I don't think I will have time to look at this today but I will try tomorrow. Having the bisectability bug fixed would help narrow things down but I am almost certain it is something up with the new per_task infrastructure but I'll have to dig around and see if I can understand that first. Cheers, Nathan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 17:50 ` Nathan Chancellor 2022-01-05 0:35 ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar 2022-01-05 0:40 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar @ 2022-01-08 15:16 ` Ingo Molnar 2 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 15:16 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > I tried to checkout at 9006a48618cc0cacd3f59ff053e6509a9af5cc18 to see if > I could reproduce that breakage there but the build errors out at that > change (I do see notes of bisection breakage in some of the commits) so I > assume that is expected. Yeah, so the underlying problem is that these two commits want to be a single commit: # Commit #117 headers/deps: Move task->thread_info to per_task() # Commit #106 headers/deps: Move thread_info APIs to <linux/sched/thread_info_api.h> As we can only switch ARM64's <asm/preempt.h> to use per_task() - which requires <linux/sched.h> - if we first fix & simplify <linux/sched.h>'s header dependencies, which is done to a sufficient level by: # Commit #556 headers/deps: Optimize <linux/sched.h> dependencies, remove <linux/sched/thread_info_api_lowlevel.h> inclusion So it's a catch-22, and quite a complication, and a bisection breakage distance of ~450 commits, with a lot of ordering assumptions & conflicts along the way, should we attempt to move the first two to later stages. :-/ But today I've restructured the tree, and the -v2-to-be tree is now fully bisectable on ARM64 too. :-) There's a single, late per_cpu() conversion commit, after the first phase of <linux/sched.h> simplifications: headers/deps: Move task->thread_info to per_task() I'd guess that either this one is that breaks SCS for you, or the ::thread conversion: headers/deps: per_task, arm64, x86: Convert task_struct::thread to a per_task() field I've pushed out these fixes to the sched/headers branch a couple of minutes ago, and this will be part of the -v2 release as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-04 10:47 ` Ingo Molnar ` (4 preceding siblings ...) 2022-01-04 17:50 ` Nathan Chancellor @ 2022-01-07 0:29 ` Nathan Chancellor 2022-01-08 11:54 ` Ingo Molnar 5 siblings, 1 reply; 54+ messages in thread From: Nathan Chancellor @ 2022-01-07 0:29 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote: > > > With the fast-headers kernel that's down to ~36,000 lines of code, > > > almost a factor of 3 reduction: > > > > > > # fast-headers-v1: > > > kepler:~/mingo.tip.git> wc -l kernel/pid.i > > > 35941 kernel/pid.i > > > > Coming from someone who often has to reduce a preprocessed kernel source > > file with creduce/cvise to report compiler bugs, this will be a very > > welcomed change, as those tools will have to do less work, and I can get > > my reports done faster. > > That's nice, didn't think of that side effect. > > Could you perhaps measure this too, to see how much of a benefit it is? As it turns out, I got an opportunity to measure this sooner rather than later [1]. Using cvise [2] with an identical set of toolchains and interestingness test [3], reducing net/core/skbuff.c took significantly less time with the version from the fast-headers tree. v5.16-rc8: $ wc -l skbuff.i 105135 skbuff.i $ time cvise test.fish skbuff.i ... ________________________________________________________ Executed in 114.02 mins fish external usr time 1180.43 mins 69.29 millis 1180.43 mins sys time 229.80 mins 248.11 millis 229.79 mins fast-headers: $ wc -l skbuff.i 78765 skbuff.i $ time cvise test.fish skbuff.i ... ________________________________________________________ Executed in 47.38 mins fish external usr time 620.17 mins 32.78 millis 620.17 mins sys time 123.70 mins 122.38 millis 123.70 mins I was not expecting that much of a difference but it somewhat makes sense, as the tool spends less time eliminated unused code and the compiler invocations will be incrementally quicker as the input becomes smaller. [1]: https://github.com/ClangBuiltLinux/linux/issues/1563 [2]: https://github.com/marxin/cvise [3]: https://github.com/nathanchance/creduce-files/tree/61056fd763ae3bfb53ff0ae4c1d95550c7c0a5b7/cbl-1563 Cheers, Nathan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" 2022-01-07 0:29 ` Nathan Chancellor @ 2022-01-08 11:54 ` Ingo Molnar 0 siblings, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2022-01-08 11:54 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro, llvm * Nathan Chancellor <nathan@kernel.org> wrote: > On Tue, Jan 04, 2022 at 11:47:30AM +0100, Ingo Molnar wrote: > > > > With the fast-headers kernel that's down to ~36,000 lines of code, > > > > almost a factor of 3 reduction: > > > > > > > > # fast-headers-v1: > > > > kepler:~/mingo.tip.git> wc -l kernel/pid.i > > > > 35941 kernel/pid.i > > > > > > Coming from someone who often has to reduce a preprocessed kernel source > > > file with creduce/cvise to report compiler bugs, this will be a very > > > welcomed change, as those tools will have to do less work, and I can get > > > my reports done faster. > > > > That's nice, didn't think of that side effect. > > > > Could you perhaps measure this too, to see how much of a benefit it is? > > As it turns out, I got an opportunity to measure this sooner rather than > later [1]. Using cvise [2] with an identical set of toolchains and > interestingness test [3], reducing net/core/skbuff.c took significantly > less time with the version from the fast-headers tree. > > v5.16-rc8: > > $ wc -l skbuff.i > 105135 skbuff.i > > $ time cvise test.fish skbuff.i > ... > ________________________________________________________ > Executed in 114.02 mins fish external > usr time 1180.43 mins 69.29 millis 1180.43 mins > sys time 229.80 mins 248.11 millis 229.79 mins > > fast-headers: > > $ wc -l skbuff.i > 78765 skbuff.i > > $ time cvise test.fish skbuff.i > ... > ________________________________________________________ > Executed in 47.38 mins fish external > usr time 620.17 mins 32.78 millis 620.17 mins > sys time 123.70 mins 122.38 millis 123.70 mins > > I was not expecting that much of a difference but it somewhat makes > sense, as the tool spends less time eliminated unused code and the > compiler invocations will be incrementally quicker as the input becomes > smaller. Indeed, that's a +140% speedup in build performance, not bad. :-) I also got around testing Clang (12) myself, and with my 'reference distro config' I got these results: # # v5.16-rc8 # Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs): 55,638,543,274,254 instructions # 0.77 insn per cycle ( +- 0.01% ) 72,074,911,968,393 cycles # 3.901 GHz ( +- 0.04% ) 18,490,451.51 msec cpu-clock # 54.740 CPUs utilized ( +- 0.04% ) 337.788 +- 0.834 seconds time elapsed ( +- 0.25% ) # # -fast-headers-v2-rc3 # Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs): 30,904,130,243,855 instructions # 0.76 insn per cycle ( +- 0.02% ) 40,703,482,733,690 cycles # 3.898 GHz ( +- 0.00% ) 10,443,670.86 msec cpu-clock # 58.093 CPUs utilized ( +- 0.00% ) 179.773 +- 0.829 seconds time elapsed ( +- 0.46% ) That's a +88% build speedup on Clang - even better than the +78% speedup on GCC(-10). Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> ` (2 preceding siblings ...) 2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor @ 2022-01-04 12:36 ` Willy Tarreau 2022-01-04 16:05 ` Andy Shevchenko ` (2 subsequent siblings) 6 siblings, 0 replies; 54+ messages in thread From: Willy Tarreau @ 2022-01-04 12:36 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro Hi Ingo! First, great work! I'm particularly interested in this work because I went through a similar process a bout 6 months ago in haproxy and saved 40-45% build time, and thought how well the same principles could apply to the kernel if anyone had felt brave enough to engage into that. I do appreciate how tedious a work it can be and do really sympathise with you on this! A few comments below: On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > - Uninlining: there's a number of unnecessary inline functions that also > couple otherwise unrelated headers to each other. The fast-headers tree > contains over 100 uninlining commits. > > - Type & API header decoupling. This is one of the most effective techniques > to reduce size - but it can rarely be done in a straightforward fashion, > and has to be prepared by various decoupling measures, such as the moving > of inline functions or the creation of new headers for less frequently used > APIs and types. These were the main two key points I went through as well and found them to be extremely effective. The essential build time in my case came from the same inline functions being built hundreds of times for nothing, just because a header file was included for just one type. I had already decoupled types and API long ago but that didn't stand long enough for a few files that were included everywhere. What I noticed is that ideally we'd need to have 3 layers: - types alone - function prototypes alone, depending on the former if needed - inline functions, depending on the two former ones, if needed Most code doesn't need need the inline functions, especially other headers, and being able to only cross-include type definitions is extremely helpful. In my case something that further improved this effectiveness was to use a lot more incomplete types everywhere possible. There's no reason to include foo.h just to have a definition of "struct foo" from "bar.h" if you're only using it as a pointer in "struct bar". Just prepend "struct foo;" before struct bar and be done with it. This showed me how horrible typedefs are: there seems to be no way to create incomplete definitions for them. So I had to create an even lower level tiny include file for just the few ones I needed (mostly ints). I hadn't found a perfect way to deal with macros. Sometimes you consider them as inline functions and they seem to be better placed there, and sometimes you figure they are used in type declarations and you have to have them somewhere else. And when a macro is needed between multiple type definitions (e.g. an array size), it becomes more delicate because you quickly realize that a dedicated file for all such settings would make sense, but it can complicate maintenance. Another point I didn't feel brave enough to experiment with was to guard include files around the #include directive in order to avoid opening the files at all. In my case the C files are huge so such savings could have been small. There are definitely savings to do there but this looked too complicated to maintain. And I don't think that #pragma once would be any effective alternative. > - For the 'reference' subsystem of the scheduler, I also improved build speed by > consolidating .c files into roughly equal size build units. Instead of 20+ > separate .o's, there's now just 4 .o's being built. Obviously this approach > does not scale to the over 30,000 .c files in the kernel, but I wanted to > demonstrate it because optimizing at that level brings the next level of build > performance, and it might be feasible for a handful of other core kernel subsystems. I tried this as well for the sake of avoiding to reprocess the same header files multiple times but it was too difficult and I gave up. I'd be tempted to encourage developers to write a bit less but larger files, but these can also become a maintenance nightmare, they tend to be much slower to build when too big, and they do parallelize less well, so a balance has to be found, and if the headers hell is better addressed, then this becomes less important. I noticed that you measured the number of includes per file. I did the same by counting the references to the include files in the preprocessed output, but ultimately found an easier metric: the total preprocessed size. I simply replaced "-c" with "-E" in my makefile, and ran "find . -name '*.o' | grep '^[^#]' | xargs cat | wc" to observe the output, since in the end, that's what is really fed to the compiler. I overall found that metric to be a relatively accurate representation of an expected build time. It's particularly interesting because it's much faster to obtain than a full build and can easily show you that some optimizations have absolutely zero effect (typically because most includes are guarded and what's not included at some place will be at another one). In my project I noticed that the total preprocessed size was initially around 50-60 times larger than the total C+H files. After optimizing it went down to around 20 times, which is roughly in line with the build time savings. Just my two cents, kudos for working on this! Willy ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> ` (3 preceding siblings ...) 2022-01-04 12:36 ` Willy Tarreau @ 2022-01-04 16:05 ` Andy Shevchenko 2022-01-04 16:18 ` Andy Shevchenko 2022-01-15 0:42 ` Paul E. McKenney 6 siblings, 0 replies; 54+ messages in thread From: Andy Shevchenko @ 2022-01-04 16:05 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > I'm pleased to announce the first public version of my new "Fast Kernel > Headers" project that I've been working on since late 2020, which is a > comprehensive rework of the Linux kernel's header hierarchy & header > dependencies, with the dual goals of: > > - speeding up the kernel build (both absolute and incremental build times) > > - decoupling subsystem type & API definitions from each other > > The fast-headers tree consists of over 25 sub-trees internally, spanning > over 2,200 commits, which can be found here: > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master > > As most kernel developers know, there's around ~10,000 main .h headers in > the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the > last 30+ years they have grown into a complicated & painful set of > cross-dependencies we are affectionately calling 'Dependency Hell'. In the 64e013748e61 ("headers/deps: Optimize <linux/kernel.h>") the linux/container_of.h and linux/stdarg.h are moved around (in the linux/kernel.h) without any explanation in the commit message. Is it necessary? If so, can you add a background note. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> ` (4 preceding siblings ...) 2022-01-04 16:05 ` Andy Shevchenko @ 2022-01-04 16:18 ` Andy Shevchenko 2022-01-15 0:42 ` Paul E. McKenney 6 siblings, 0 replies; 54+ messages in thread From: Andy Shevchenko @ 2022-01-04 16:18 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > I'm pleased to announce the first public version of my new "Fast Kernel > Headers" project that I've been working on since late 2020, which is a > comprehensive rework of the Linux kernel's header hierarchy & header > dependencies, with the dual goals of: > > - speeding up the kernel build (both absolute and incremental build times) > > - decoupling subsystem type & API definitions from each other > > The fast-headers tree consists of over 25 sub-trees internally, spanning > over 2,200 commits, which can be found here: > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master > > As most kernel developers know, there's around ~10,000 main .h headers in > the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the > last 30+ years they have grown into a complicated & painful set of > cross-dependencies we are affectionately calling 'Dependency Hell'. $ git grep -n -w kernel.h mingo/sched/headers -- include/ | wc -l 138 $ git grep -n -w kernel.h next/master -- include/ | wc -l 96 Can we rather split kernel.h more? In some cases kernel.h is used just as a bundle instead of ~2-3 headers. And I can't get why kernel.h is returned in the drm headers. AFAICT there are no dependencies: mingo/sched/headers:include/drm/drm_gem_ttm_helper.h:6:#include <linux/kernel.h> mingo/sched/headers:include/drm/drm_gem_vram_helper.h:15:#include <linux/kernel.h> /* for container_of() */ mingo/sched/headers:include/drm/drm_mm.h:44:#include <linux/kernel.h> mingo/sched/headers:include/drm/drm_property.h:28:#include <linux/kernel.h> mingo/sched/headers:include/drm/intel-gtt.h:9:#include <linux/kernel.h> Ah, it may be due to base on the vanilla rather than on next, it would be nice to see this rebased on top of v5.17-rc1 when it's out. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" [not found] <YdIfz+LMewetSaEB@gmail.com> ` (5 preceding siblings ...) 2022-01-04 16:18 ` Andy Shevchenko @ 2022-01-15 0:42 ` Paul E. McKenney 6 siblings, 0 replies; 54+ messages in thread From: Paul E. McKenney @ 2022-01-15 0:42 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Peter Zijlstra, Thomas Gleixner, Greg Kroah-Hartman, David S. Miller, Ard Biesheuvel, Josh Poimboeuf, Jonathan Corbet, Al Viro On Sun, Jan 02, 2022 at 10:57:35PM +0100, Ingo Molnar wrote: > > I'm pleased to announce the first public version of my new "Fast Kernel > Headers" project that I've been working on since late 2020, which is a > comprehensive rework of the Linux kernel's header hierarchy & header > dependencies, with the dual goals of: > > - speeding up the kernel build (both absolute and incremental build times) > > - decoupling subsystem type & API definitions from each other Yow!!! ;-) [ . . . ] > headers/uninline: Uninline multi-use function: finish_rcuwait() This one looks fine on its own merits, so I grabbed it from your git tree: ecdadb5289d1 ("headers/uninline: Uninline multi-use function: finish_rcuwait()") > headers/deps: RCU: Remove __read_mostly annotations from externs And same with this one: 1c8af2245fd7 ("headers/deps: RCU: Remove __read_mostly annotations from externs") Of course, if you would rather keep these, please let me know and I will drop them. Thanx, Paul ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2022-01-15 0:42 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <YdIfz+LMewetSaEB@gmail.com>
2022-01-03 10:11 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Greg Kroah-Hartman
2022-01-03 11:12 ` Ingo Molnar
2022-01-03 13:46 ` Greg Kroah-Hartman
2022-01-03 16:29 ` Ingo Molnar
2022-01-10 10:28 ` Peter Zijlstra
2022-01-04 14:10 ` [PATCH] per_task: Remove the PER_TASK_BYTES hard-coded constant Ingo Molnar
2022-01-04 15:14 ` Andy Shevchenko
2022-01-04 23:27 ` Ingo Molnar
2022-01-04 17:51 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Arnd Bergmann
2022-01-05 0:05 ` Ingo Molnar
2022-01-05 1:37 ` Arnd Bergmann
2022-01-05 9:37 ` Andy Shevchenko
2022-01-04 14:05 ` [PATCH] per_task: Implement single template to define 'struct task_struct_per_task' fields and offsets Ingo Molnar
2022-01-03 13:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Kirill A. Shutemov
2022-01-04 10:54 ` Ingo Molnar
2022-01-04 13:34 ` Greg Kroah-Hartman
2022-01-04 13:54 ` [PATCH] headers/uninline: Uninline single-use function: kobject_has_children() Ingo Molnar
2022-01-04 15:09 ` Greg Kroah-Hartman
2022-01-04 15:14 ` Greg Kroah-Hartman
2022-01-05 0:11 ` Ingo Molnar
2022-01-05 15:23 ` Greg Kroah-Hartman
2022-01-06 11:26 ` Ingo Molnar
2022-01-03 17:54 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nathan Chancellor
2022-01-04 10:47 ` Ingo Molnar
2022-01-04 10:56 ` [DEBUG PATCH] DO NOT MERGE: Enable SHADOW_CALL_STACK on GCC builds, for build testing Ingo Molnar
2022-01-04 11:02 ` [PATCH] headers/deps: dcache: Move the ____cacheline_aligned attribute to the head of the definition Ingo Molnar
2022-01-04 15:05 ` kernel test robot
2022-01-04 17:51 ` Nathan Chancellor
2022-01-05 0:20 ` Ingo Molnar
2022-01-05 0:26 ` [PATCH] headers/deps: Attribute placement fixes for Clang & GCC Ingo Molnar
2022-01-04 11:19 ` [TREE] "Fast Kernel Headers" Tree WIP/development branch Ingo Molnar
2022-01-04 17:25 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Nick Desaulniers
2022-01-05 0:43 ` Ingo Molnar
2022-01-04 17:50 ` Nathan Chancellor
2022-01-05 0:35 ` [PATCH] x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs Ingo Molnar
2022-01-05 0:40 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-05 1:07 ` Ingo Molnar
2022-01-05 21:42 ` Nathan Chancellor
2022-01-08 10:32 ` [PATCH] headers/deps: Add header dependencies to .c files: <linux/ptrace_api.h> Ingo Molnar
2022-01-08 11:08 ` [PATCH] FIX: headers/deps: uapi/headers: Create usr/include/uapi symbolic link Ingo Molnar
2022-01-08 11:18 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-08 11:38 ` [PATCH] x86/bitops: Remove unused __sw_hweight64() assembly implementation Ingo Molnar
2022-01-08 11:49 ` [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell" Ingo Molnar
2022-01-08 12:17 ` Ingo Molnar
2022-01-10 20:03 ` Nathan Chancellor
2022-01-10 20:05 ` Nathan Chancellor
2022-01-05 22:33 ` Nathan Chancellor
2022-01-08 15:16 ` Ingo Molnar
2022-01-07 0:29 ` Nathan Chancellor
2022-01-08 11:54 ` Ingo Molnar
2022-01-04 12:36 ` Willy Tarreau
2022-01-04 16:05 ` Andy Shevchenko
2022-01-04 16:18 ` Andy Shevchenko
2022-01-15 0:42 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).