* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Steven Rostedt @ 2025-12-27 21:27 UTC (permalink / raw)
To: Yury Norov
Cc: Andy Shevchenko, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Christophe Leroy, Randy Dunlap, Ingo Molnar,
Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
linux-kernel, intel-gfx, dri-devel, linux-modules,
linux-trace-kernel
In-Reply-To: <aVA1GGfWAHSFdACF@yury>
On Sat, 27 Dec 2025 14:35:52 -0500
Yury Norov <yury.norov@gmail.com> wrote:
> The difference is that printk() is not a debugging tool.
Several developers will disagree with you. In fact, Linus has said he uses
printk() as his preferred debugging tool!
The only reason to have printk.h in kernel.h is because it *is* used for
debugging! If it wasn't used for debugging, then you could simply add
printk.h for those places that needed to use printk(). But because it is
one of the most common debugging tools, having it in kernel.h is useful, as
you don't want to have to add #include <printk.h> every time you added a
printk() for debugging purposes (same is true for trace_printk()).
Yes, it is also used for information. But if that's all it was used for,
then it wouldn't need to be in kernel.h. It could be a normal header file
that anything that needed to print information would have to include.
-- Steve
^ permalink raw reply
* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Yury Norov @ 2025-12-27 19:35 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andy Shevchenko, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Christophe Leroy, Randy Dunlap, Ingo Molnar,
Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
linux-kernel, intel-gfx, dri-devel, linux-modules,
linux-trace-kernel
In-Reply-To: <20251227105701.5cbeb47e@robin>
On Sat, Dec 27, 2025 at 10:57:01AM -0500, Steven Rostedt wrote:
> On Sat, 27 Dec 2025 16:45:47 +0200
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>
> > > I'm fine for trying other ways to speed up the compilation, but removing
> > > full access to trace_printk() isn't one of them.
OK, then let's keep trace_printk() available for kernel.h users.
Andrew, can you take the first 6 patches of the series, if no other
objections?
> > I interpreted this as if the header inclusion should be moved from kernel.h
> > to printk.h as a compromise that satisfies all (?) stakeholders. Is it possible
> > approach?
>
> I'm fine with putting the include of trace_printk.h into printk.h. If
> you remove printk.h from kernel.h I would expect a lot more people to
> complain about it. Including Linus himself.
The difference is that printk() is not a debugging tool. It is used
widely to report errors and info messages. Normally, I want to cleanup
all debugging code from my module after finishing development. If
trace_printk.h will be a part of printk.h, there's always a chance to
miss trace_printk() somewhere. I'd prefer to keep them separate.
Thanks,
Yury
^ permalink raw reply
* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Steven Rostedt @ 2025-12-27 15:57 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Yury Norov (NVIDIA), Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Christophe Leroy, Randy Dunlap, Ingo Molnar,
Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
linux-kernel, intel-gfx, dri-devel, linux-modules,
linux-trace-kernel
In-Reply-To: <aU_xG7pK9iauff65@smile.fi.intel.com>
On Sat, 27 Dec 2025 16:45:47 +0200
Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > I'm fine for trying other ways to speed up the compilation, but removing
> > full access to trace_printk() isn't one of them.
>
> I interpreted this as if the header inclusion should be moved from kernel.h
> to printk.h as a compromise that satisfies all (?) stakeholders. Is it possible
> approach?
I'm fine with putting the include of trace_printk.h into printk.h. If
you remove printk.h from kernel.h I would expect a lot more people to
complain about it. Including Linus himself.
-- Steve
^ permalink raw reply
* Re: [PATCH] software node: replace -EEXIST with -EBUSY
From: Andy Shevchenko @ 2025-12-27 15:23 UTC (permalink / raw)
To: Daniel Gomez
Cc: Daniel Scally, Heikki Krogerus, Sakari Ailus, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
Sami Tolvanen, Aaron Tomlin, Lucas De Marchi, linux-acpi,
linux-modules, linux-kernel, Daniel Gomez
In-Reply-To: <20251220-dev-module-init-eexists-linux-acpi-v1-1-af59b1a0e217@samsung.com>
On Sat, Dec 20, 2025 at 04:55:00AM +0100, Daniel Gomez wrote:
> From: Daniel Gomez <da.gomez@samsung.com>
>
> The -EEXIST error code is reserved by the module loading infrastructure
> to indicate that a module is already loaded. When a module's init
> function returns -EEXIST, userspace tools like kmod interpret this as
> "module already loaded" and treat the operation as successful, returning
> 0 to the user even though the module initialization actually failed.
>
> This follows the precedent set by commit 54416fd76770 ("netfilter:
> conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same
> issue in nf_conntrack_helper_register().
>
> Affected modules:
> * meraki_mx100 pcengines_apuv2
As I read the description the problem is in the kmod/do_module_init(). If you
need a clear way to distinguish that, use some unique error code in the kernel
module loader. I fully agree with Greg that this is a slippery slope which
leads to -EEXIST to be forbidden in the drivers which is no go.
NAK.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Andy Shevchenko @ 2025-12-27 14:50 UTC (permalink / raw)
To: Yury Norov (NVIDIA)
Cc: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Christophe Leroy, Randy Dunlap, Ingo Molnar,
Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
linux-kernel, intel-gfx, dri-devel, linux-modules,
linux-trace-kernel
In-Reply-To: <20251225170930.1151781-8-yury.norov@gmail.com>
On Thu, Dec 25, 2025 at 12:09:29PM -0500, Yury Norov (NVIDIA) wrote:
> The trace_printk.h header is debugging-only by nature, but now it's
> included by almost every compilation unit via kernel.h.
>
> Removing trace_printk.h saves 1.5-2% of compilation time on my
> Ubuntu-derived x86_64/localyesconfig.
>
> There's ~30 files in the codebase, requiring trace_printk.h for
> non-debugging reasons: mostly to disable tracing on panic or under
> similar conditions. Include the header for those explicitly.
>
> This implicitly decouples linux/kernel.h and linux/instruction_pointer.h
> as well, because it has been isolated to trace_printk.h early in the
> series.
...
> #include <linux/pagevec.h>
> #include <linux/scatterlist.h>
> #include <linux/workqueue.h>
> +#include <linux/trace_printk.h>
I believe 't' is followed by 'w' and not vise versa.
...
> index 20b3cb29cfff..549fdeaf4508 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -27,6 +27,7 @@
>
> #include <linux/bug.h>
> #include <linux/types.h>
> +#include <linux/trace_printk.h>
In the similar way 'r' then 'y'.
...
Please, double check these and the rest.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Andy Shevchenko @ 2025-12-27 14:45 UTC (permalink / raw)
To: Steven Rostedt
Cc: Yury Norov (NVIDIA), Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Christophe Leroy, Randy Dunlap, Ingo Molnar,
Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
linux-kernel, intel-gfx, dri-devel, linux-modules,
linux-trace-kernel
In-Reply-To: <20251226115848.298465d4@gandalf.local.home>
On Fri, Dec 26, 2025 at 11:58:48AM -0500, Steven Rostedt wrote:
> On Thu, 25 Dec 2025 12:09:29 -0500
> "Yury Norov (NVIDIA)" <yury.norov@gmail.com> wrote:
>
> > The trace_printk.h header is debugging-only by nature, but now it's
> > included by almost every compilation unit via kernel.h.
> >
> > Removing trace_printk.h saves 1.5-2% of compilation time on my
> > Ubuntu-derived x86_64/localyesconfig.
> >
> > There's ~30 files in the codebase, requiring trace_printk.h for
> > non-debugging reasons: mostly to disable tracing on panic or under
> > similar conditions. Include the header for those explicitly.
> >
> > This implicitly decouples linux/kernel.h and linux/instruction_pointer.h
> > as well, because it has been isolated to trace_printk.h early in the
> > series.
> >
> > Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
>
> I'm still against this patch. It means every time someone adds
> trace_printk() they need to add the header for it.
>
> trace_printk() should be as available to the kernel as printk() is. If
> there's a place that one can add printk() without adding a header, then
> they should be able to add trace_printk() to that same location without
> adding any header. If that's not the case, then I'm adding an official
>
> Nacked-by: Steven Rostedt <rostedt@goodmis.org>
>
> I'm fine for trying other ways to speed up the compilation, but removing
> full access to trace_printk() isn't one of them.
I interpreted this as if the header inclusion should be moved from kernel.h
to printk.h as a compromise that satisfies all (?) stakeholders. Is it possible
approach?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Steven Rostedt @ 2025-12-26 16:58 UTC (permalink / raw)
To: Yury Norov (NVIDIA)
Cc: Andrew Morton, Masami Hiramatsu, Mathieu Desnoyers,
Andy Shevchenko, Christophe Leroy, Randy Dunlap, Ingo Molnar,
Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
linux-kernel, intel-gfx, dri-devel, linux-modules,
linux-trace-kernel
In-Reply-To: <20251225170930.1151781-8-yury.norov@gmail.com>
On Thu, 25 Dec 2025 12:09:29 -0500
"Yury Norov (NVIDIA)" <yury.norov@gmail.com> wrote:
> The trace_printk.h header is debugging-only by nature, but now it's
> included by almost every compilation unit via kernel.h.
>
> Removing trace_printk.h saves 1.5-2% of compilation time on my
> Ubuntu-derived x86_64/localyesconfig.
>
> There's ~30 files in the codebase, requiring trace_printk.h for
> non-debugging reasons: mostly to disable tracing on panic or under
> similar conditions. Include the header for those explicitly.
>
> This implicitly decouples linux/kernel.h and linux/instruction_pointer.h
> as well, because it has been isolated to trace_printk.h early in the
> series.
>
> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
I'm still against this patch. It means every time someone adds
trace_printk() they need to add the header for it.
trace_printk() should be as available to the kernel as printk() is. If
there's a place that one can add printk() without adding a header, then
they should be able to add trace_printk() to that same location without
adding any header. If that's not the case, then I'm adding an official
Nacked-by: Steven Rostedt <rostedt@goodmis.org>
I'm fine for trying other ways to speed up the compilation, but removing
full access to trace_printk() isn't one of them.
-- Steve
^ permalink raw reply
* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Yonghong Song @ 2025-12-26 5:04 UTC (permalink / raw)
To: Ihor Solodrai, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Nathan Chancellor, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman
Cc: linux-kernel, linux-modules, bpf, linux-kbuild, llvm
In-Reply-To: <9edd1395-8651-446b-b056-9428076cd830@linux.dev>
On 12/23/25 9:36 PM, Yonghong Song wrote:
>
>
> On 12/23/25 4:57 PM, Ihor Solodrai wrote:
>> I've been chasing down the following flaky splat, introduced by recent
>> changes in BTF generation [1]:
>>
>> ------------[ cut here ]------------
>> BUG: unable to handle page fault for address: ffa000000233d828
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
>> Oops: Oops: 0000 [#1] SMP NOPTI
>> CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W
>> OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
>> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.16.3-4.el9 04/01/2014
>> RIP: 0010:simplify_symbols+0x2b2/0x480
>> 9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83
>> fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20
>> c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
>> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
>> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
>> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
>> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
>> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
>> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
>> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> ? __kmalloc_node_track_caller_noprof+0x37f/0x740
>> ? __pfx_setup_modinfo_srcversion+0x10/0x10
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? kstrdup+0x4a/0x70
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? setup_modinfo_srcversion+0x1a/0x30
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? setup_modinfo+0x12b/0x1e0
>> load_module+0x133a/0x1610
>> __x64_sys_finit_module+0x31b/0x450
>> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> do_syscall_64+0x80/0x2d0
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? exc_page_fault+0x95/0xc0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> RIP: 0033:0x7f1c63a2582d
>> 9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e
>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>> 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
>> RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX:
>> 0000000000000139
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
>> RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
>> RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
>> R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
>> </TASK>
>> Modules linked in: bpf_testmod(OE)
>> CR2: ffa000000233d828
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:simplify_symbols+0x2b2/0x480
>> 9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83
>> fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20
>> c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
>> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
>> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
>> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
>> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
>> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
>> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
>> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
>> PKRU: 55555554
>> Kernel panic - not syncing: Fatal exception
>> Kernel Offset: disabled
>>
>> This hasn't happened on BPF CI so far, for example, however I was able
>> to reproduce it on a particular x64 machine using a kernel built with
>> LLVM 20.
>>
>> The crash happens on attempt to load one of the BPF selftest modules
>> (tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
>> is used by kfunc_module_order test.
>>
>> The reason for the crash is that simplify_symbols() doesn't check for
>> bounds of the ELF section index:
>>
>> for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
>> const char *name = info->strtab + sym[i].st_name;
>>
>> switch (sym[i].st_shndx) {
>> case SHN_COMMON:
>>
>> [...]
>>
>> default:
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>> else
>> /** HERE --> **/ secbase =
>> info->sechdrs[sym[i].st_shndx].sh_addr;
>> sym[i].st_value += secbase;
>> break;
>> }
>> }
>>
>> And in the case I was able to reproduce, the value 0xffff
>> (SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
>>
>> Now this code fragment is between 15 and 20 years old, so obviously
>> it's not expected for a kmodule symbol to have such st_shndx
>> value. Even so, the kernel probably should fail loading the module
>> instead of crashing, which is what this patch attempts to fix.
>>
>> Investigating further, I discovered that the module binary became
>> corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
>> section data in scripts/gen-btf.sh. This explains how the bug has
>> surfaced after gen-btf.sh was introduced:
>>
>> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (2), but unable to locate the extended symbol index table
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (3), but unable to locate the extended symbol index table
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (4), but unable to locate the extended symbol index table
>> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff]
>> __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (16), but unable to locate the extended symbol index table
>> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff]
>> __BTF_ID__func__bpf_test_modorder_retx__44417
>>
>> vs expected
>>
>> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
>> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6
>> __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
>> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6
>> __BTF_ID__func__bpf_test_modorder_retx__44417
>>
>> But why? Updating section data without changing it's size is not
>> supposed to affect sections indices, right?
>>
>> With a bit more testing I confirmed that this is a LLVM-specific
>> issue (doesn't reproduce with GCC kbuild), and it's not stable,
>> because in link-vmlinux.h we also do:
>>
>> ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
>>
>> However:
>>
>> $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep
>> 0xffff
>> # no output, which is good
>>
>> So the suspect is the implementation of llvm-objcopy. As it turns out
>> there is a relevant known bug that explains the flakiness and isn't
>> fixed yet [3].
>>
>> [1]
>> https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
>> [2] https://man7.org/linux/man-pages/man5/elf.5.html
>> [3]
>> https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
>>
>> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>>
>> ---
>>
>> RFC
>>
>> While this llvm-objcopy bug is not fixed, we can not trust it in the
>> kernel build pipeline. In the short-term we have to come up with a
>> workaround for .BTF_ids section update and replace the calls to
>> ${OBJCOPY} --update-section with something else.
>>
>> One potential workaround is to force the use of the objcopy (from
>> binutils) instead of llvm-objcopy when updating .BTF_ids section.
>>
>> Alternatively, we could just dd the .BTF_ids data computed by
>> resolve_btfids at the right offset in the target ELF file.
>>
>> Surprisingly I couldn't find a good way to read a section offset and
>> size from the ELF with a specified format in a command line. Both
>> readelf and {llvm-}objdump give a human readable output, and it
>> appears we can't rely on the column order, for example.
>>
>> We could still try parsing readelf output with awk/grep, covering
>> output variants that appear in the kernel build.
>>
>> We can also do:
>>
>> llvm-readobj --elf-output-style=JSON --sections "$elf" | \
>> jq -r --arg name .BTF_ids '
>> .[0].Sections[] |
>> select(.Section.Name.Name == $name) |
>> "\(.Section.Offset) \(.Section.Size)"'
>>
>> ...but idk man, doesn't feel right.
>>
>> Most reliable way to determine the size and offset of .BTF_ids section
>> is probably reading them by a C program with libelf, such as
>> resolve_btfids. Which is quite ironic, given the recent
>> changes. Setting the irony aside, we could add smth like:
>> resolve_btfids --section-info=.BTF_ids $elf
>>
>> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
>> really like to avoid it, given that BPF features/optimizations in
>> development depend on it.
>>
>> I'd appreciate comments and suggestions on this issue. Thank you!
>> ---
>> kernel/module/main.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index 710ee30b3bea..5bf456fad63e 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module
>> *mod, const struct load_info *info)
>> break;
>> default:
>> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
>> + pr_err("%s: Symbol %s has an invalid section index
>> %u (max %u)\n",
>> + mod->name, name, sym[i].st_shndx,
>> info->hdr->e_shnum - 1);
>> + ret = -ENOEXEC;
>> + break;
>> + }
>> +
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>
> I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
>
> Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
> I did not test llvm20 and I assume it may fail too.
>
> The following llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can fix the issue. Currently it is still in review stage. The actual
> diff is
>
> diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> index e5de17e093df..cc1527d996e2 100644
> --- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> +++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> @@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec,
> ArrayRef<uint8_t> Data) {
> Data.size(), Sec->Name.c_str(), Sec->Size);
>
> if (!Sec->ParentSegment) {
> - Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
> + SectionBase *Replaced = Sec.get();
> + SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
> + DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced,
> Modified}};
> + if (auto err = replaceSections(Replacements))
> + return err;
> } else {
> // The segment writer will be in charge of updating these contents.
> Sec->Size = Data.size();
>
> I applied the above patch to latest llvm21 and llvm22 and
> the crash is gone and the selftests can run properly.
>
> With KASAN, everything is okay for llvm21 and llvm22.
>
> Not sure whether the llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can make into llvm21 or not as looks like llvm21 intends to
> freeze for now. See
> https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
> the llvm22 will branch into rc mode in January.
>
> I will try to see whether we can have a reasonable workaround
> for llvm21 llvm-objcopy (for without KASAN).
>
I commented the llvm patch https://github.com/llvm/llvm-project/pull/170462
and hopefully the fix can land soon.
I didn't find a good solution. Currently if there are kfunc's in the module,
.BTF_ids section will be created. Previously, resolve_btfids will resolve
.BTF_ids such that the count and btf id will be resolved by resolve_btfids
itself.
The current approach, resolve_btfids will not populate the *correct* contents
to .BTF_ids section. Rather it created another file and try to do
update-section. This should work. But it may not work due to the llvm bug.
One possible workaround is in resolve_btfids, the .BTF_ids section is populated
correct contents and remove update-section for .BTF_ids.
^ permalink raw reply
* [PATCH v4 5/7] tracing: Remove size parameter in __trace_puts()
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
From: Steven Rostedt <rostedt@goodmis.org>
The __trace_puts() function takes a string pointer and the size of the
string itself. All users currently simply pass in the strlen() of the
string it is also passing in. There's no reason to pass in the size.
Instead have the __trace_puts() function do the strlen() within the
function itself.
This fixes a header recursion issue where using strlen() in the macro
calling __trace_puts() requires adding #include <linux/string.h> in order
to use strlen(). Removing the use of strlen() from the header fixes the
recursion issue.
Link: https://lore.kernel.org/all/aUN8Hm377C5A0ILX@yury/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
include/linux/kernel.h | 4 ++--
kernel/trace/trace.c | 7 +++----
kernel/trace/trace.h | 2 +-
3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5b879bfea948..4ee48fb10dec 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -329,10 +329,10 @@ int __trace_printk(unsigned long ip, const char *fmt, ...);
if (__builtin_constant_p(str)) \
__trace_bputs(_THIS_IP_, trace_printk_fmt); \
else \
- __trace_puts(_THIS_IP_, str, strlen(str)); \
+ __trace_puts(_THIS_IP_, str); \
})
extern int __trace_bputs(unsigned long ip, const char *str);
-extern int __trace_puts(unsigned long ip, const char *str, int size);
+extern int __trace_puts(unsigned long ip, const char *str);
extern void trace_dump_stack(int skip);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6f2148df14d9..57f24e2cd19c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1178,11 +1178,10 @@ EXPORT_SYMBOL_GPL(__trace_array_puts);
* __trace_puts - write a constant string into the trace buffer.
* @ip: The address of the caller
* @str: The constant string to write
- * @size: The size of the string.
*/
-int __trace_puts(unsigned long ip, const char *str, int size)
+int __trace_puts(unsigned long ip, const char *str)
{
- return __trace_array_puts(printk_trace, ip, str, size);
+ return __trace_array_puts(printk_trace, ip, str, strlen(str));
}
EXPORT_SYMBOL_GPL(__trace_puts);
@@ -1201,7 +1200,7 @@ int __trace_bputs(unsigned long ip, const char *str)
int size = sizeof(struct bputs_entry);
if (!printk_binsafe(tr))
- return __trace_puts(ip, str, strlen(str));
+ return __trace_puts(ip, str);
if (!(tr->trace_flags & TRACE_ITER(PRINTK)))
return 0;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b6d42fe06115..de4e6713b84e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -2116,7 +2116,7 @@ extern void tracing_log_err(struct trace_array *tr,
* about performance). The internal_trace_puts() is for such
* a purpose.
*/
-#define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))
+#define internal_trace_puts(str) __trace_puts(_THIS_IP_, str)
#undef FTRACE_ENTRY
#define FTRACE_ENTRY(call, struct_name, id, tstruct, print) \
--
2.43.0
^ permalink raw reply related
* [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
The trace_printk.h header is debugging-only by nature, but now it's
included by almost every compilation unit via kernel.h.
Removing trace_printk.h saves 1.5-2% of compilation time on my
Ubuntu-derived x86_64/localyesconfig.
There's ~30 files in the codebase, requiring trace_printk.h for
non-debugging reasons: mostly to disable tracing on panic or under
similar conditions. Include the header for those explicitly.
This implicitly decouples linux/kernel.h and linux/instruction_pointer.h
as well, because it has been isolated to trace_printk.h early in the
series.
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
arch/powerpc/kvm/book3s_xics.c | 1 +
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/hwtracing/stm/dummy_stm.c | 1 +
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/tty/sysrq.c | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 1 +
kernel/rcu/rcutorture.c | 1 +
kernel/trace/error_report-traces.c | 1 +
kernel/trace/ring_buffer_benchmark.c | 1 +
kernel/trace/trace.c | 1 +
kernel/trace/trace_benchmark.c | 1 +
kernel/trace/trace_events_trigger.c | 1 +
kernel/trace/trace_functions.c | 1 +
kernel/trace/trace_printk.c | 1 +
kernel/trace/trace_selftest.c | 1 +
lib/sys_info.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-modify.c | 1 +
samples/ftrace/ftrace-direct-multi-modify.c | 1 +
samples/ftrace/ftrace-direct-multi.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 +
samples/ftrace/ftrace-direct.c | 1 +
samples/trace_printk/trace-printk.c | 1 +
sound/hda/common/sysfs.c | 1 +
34 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 589a8f257120..8f8cfc8648c6 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
#include <asm/time.h>
#include <linux/seq_file.h>
+#include <linux/trace_printk.h>
#include "book3s_xics.h"
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index cb3a3244ae6f..f5cf6d807aeb 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -27,6 +27,7 @@
#include <linux/highmem.h>
#include <linux/security.h>
#include <linux/debugfs.h>
+#include <linux/trace_printk.h>
#include <asm/ptrace.h>
#include <asm/smp.h>
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index dcdc7e274848..55ac9c9eeb36 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -20,6 +20,7 @@
#include <linux/gfp.h>
#include <linux/crash_dump.h>
#include <linux/debug_locks.h>
+#include <linux/trace_printk.h>
#include <linux/vmalloc.h>
#include <asm/asm-extable.h>
#include <asm/machine.h>
diff --git a/arch/s390/kernel/machine_kexec.c b/arch/s390/kernel/machine_kexec.c
index baeb3dcfc1c8..668d8444b02b 100644
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -14,6 +14,7 @@
#include <linux/ftrace.h>
#include <linux/debug_locks.h>
#include <linux/cpufeature.h>
+#include <linux/trace_printk.h>
#include <asm/guarded_storage.h>
#include <asm/machine.h>
#include <asm/pfault.h>
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 9d3a3ad567a0..3f6d78a7ccea 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -22,6 +22,7 @@
#include <linux/pagevec.h>
#include <linux/scatterlist.h>
#include <linux/workqueue.h>
+#include <linux/trace_printk.h>
#include <drm/drm_mm.h>
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 20b3cb29cfff..549fdeaf4508 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -27,6 +27,7 @@
#include <linux/bug.h>
#include <linux/types.h>
+#include <linux/trace_printk.h>
#include <drm/drm_drv.h>
diff --git a/drivers/hwtracing/stm/dummy_stm.c b/drivers/hwtracing/stm/dummy_stm.c
index 38528ffdc0b3..8464401756f3 100644
--- a/drivers/hwtracing/stm/dummy_stm.c
+++ b/drivers/hwtracing/stm/dummy_stm.c
@@ -12,6 +12,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/stm.h>
+#include <linux/trace_printk.h>
#include <uapi/linux/stm.h>
static ssize_t notrace
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 58304b91380f..d7c08190d816 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -8,6 +8,7 @@
#include <linux/tracepoint.h>
#include <linux/trace_seq.h>
+#include <linux/trace_printk.h>
#include "hfi.h"
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 1f78b0db3b25..72b2555c2bb8 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -51,6 +51,7 @@
#include <linux/syscalls.h>
#include <linux/of.h>
#include <linux/rcupdate.h>
+#include <linux/trace_printk.h>
#include <asm/ptrace.h>
#include <asm/irq_regs.h>
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index 41118bba9197..dce1e2a3e180 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -22,6 +22,7 @@
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/usb/xhci-dbgp.h>
+#include <linux/trace_printk.h>
#include "../host/xhci.h"
#include "xhci-dbc.h"
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 1f6bc05593df..d15faa78eb07 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -9,6 +9,7 @@
#include <linux/namei.h>
#include <linux/iversion.h>
#include <linux/sched/mm.h>
+#include <linux/trace_printk.h>
#include "ext4_jbd2.h"
#include "ext4.h"
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index a377335e01da..c48f7109bb2a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -32,7 +32,6 @@
#include <linux/build_bug.h>
#include <linux/sprintf.h>
#include <linux/static_call_types.h>
-#include <linux/trace_printk.h>
#include <linux/util_macros.h>
#include <linux/wordpart.h>
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index 891f6173c951..db2b572505f5 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -9,6 +9,7 @@
#ifndef _LINUX_SUNRPC_DEBUG_H_
#define _LINUX_SUNRPC_DEBUG_H_
+#include <linux/trace_printk.h>
#include <uapi/linux/sunrpc/debug.h>
/*
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 0b9495187fba..e9209afc78aa 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -53,6 +53,7 @@
#include <linux/rcupdate.h>
#include <linux/irq.h>
#include <linux/security.h>
+#include <linux/trace_printk.h>
#include <asm/cacheflush.h>
#include <asm/byteorder.h>
diff --git a/kernel/panic.c b/kernel/panic.c
index 0d52210a9e2b..b9e1ff90c637 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -37,6 +37,7 @@
#include <linux/context_tracking.h>
#include <linux/seq_buf.h>
#include <linux/sys_info.h>
+#include <linux/trace_printk.h>
#include <trace/events/error_report.h>
#include <asm/sections.h>
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 9cf01832a6c3..1c8f5765ba8b 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <trace/events/rcu.h>
+#include <linux/trace_printk.h>
/*
* Grace-period counter management.
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 07e51974b06b..c2f859c20ca7 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -48,6 +48,7 @@
#include <linux/tick.h>
#include <linux/rcupdate_trace.h>
#include <linux/nmi.h>
+#include <linux/trace_printk.h>
#include "rcu.h"
diff --git a/kernel/trace/error_report-traces.c b/kernel/trace/error_report-traces.c
index f89792c25b11..6a3c59f39ea2 100644
--- a/kernel/trace/error_report-traces.c
+++ b/kernel/trace/error_report-traces.c
@@ -7,5 +7,6 @@
#define CREATE_TRACE_POINTS
#include <trace/events/error_report.h>
+#include <linux/trace_printk.h>
EXPORT_TRACEPOINT_SYMBOL_GPL(error_report_end);
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 593e3b59e42e..b977ee0879c1 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -10,6 +10,7 @@
#include <uapi/linux/sched/types.h>
#include <linux/module.h>
#include <linux/ktime.h>
+#include <linux/trace_printk.h>
#include <asm/local.h>
struct rb_page {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 57f24e2cd19c..0684cc6b17c5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -53,6 +53,7 @@
#include <linux/sort.h>
#include <linux/io.h> /* vmap_page_range() */
#include <linux/fs_context.h>
+#include <linux/trace_printk.h>
#include <asm/setup.h> /* COMMAND_LINE_SIZE */
diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index e19c32f2a938..740b49c493db 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -3,6 +3,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/trace_clock.h>
+#include <linux/trace_printk.h>
#define CREATE_TRACE_POINTS
#include "trace_benchmark.h"
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 06b75bcfc7b8..1c1420a4c429 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -12,6 +12,7 @@
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/rculist.h>
+#include <linux/trace_printk.h>
#include "trace.h"
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index c12795c2fb39..ec725f8b2343 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -16,6 +16,7 @@
#include <linux/ftrace.h>
#include <linux/slab.h>
#include <linux/fs.h>
+#include <linux/trace_printk.h>
#include "trace.h"
diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
index 29f6e95439b6..e49609c97496 100644
--- a/kernel/trace/trace_printk.c
+++ b/kernel/trace/trace_printk.c
@@ -16,6 +16,7 @@
#include <linux/ctype.h>
#include <linux/list.h>
#include <linux/slab.h>
+#include <linux/trace_printk.h>
#include "trace.h"
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index d88c44f1dfa5..b6aa5c92f079 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -6,6 +6,7 @@
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/slab.h>
+#include <linux/trace_printk.h>
static inline int trace_valid_entry(struct trace_entry *entry)
{
diff --git a/lib/sys_info.c b/lib/sys_info.c
index f32a06ec9ed4..7ded4e7f3671 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -10,6 +10,7 @@
#include <linux/sched/debug.h>
#include <linux/string.h>
#include <linux/sysctl.h>
+#include <linux/trace_printk.h>
#include <linux/sys_info.h>
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index bfe98ce826f3..dfebb1cefb2c 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -17,6 +17,7 @@
#include <linux/fprobe.h>
#include <linux/sched/debug.h>
#include <linux/slab.h>
+#include <linux/trace_printk.h>
#define BACKTRACE_DEPTH 16
#define MAX_SYMBOL_LEN 4096
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index da3a9f2091f5..cb6989f52167 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,6 +2,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/ftrace.h>
+#include <linux/trace_printk.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
#include <asm/asm-offsets.h>
#endif
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index 8f7986d698d8..1b24d53c34c2 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -2,6 +2,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/ftrace.h>
+#include <linux/trace_printk.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
#include <asm/asm-offsets.h>
#endif
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index db326c81a27d..3c94ecdaf3d5 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -4,6 +4,7 @@
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
#include <linux/sched/stat.h>
+#include <linux/trace_printk.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
#include <asm/asm-offsets.h>
#endif
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index 3d0fa260332d..e4c26db202ce 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -3,6 +3,7 @@
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
+#include <linux/trace_printk.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
#include <asm/asm-offsets.h>
#endif
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index 956834b0d19a..01f3512aec50 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -3,6 +3,7 @@
#include <linux/sched.h> /* for wake_up_process() */
#include <linux/ftrace.h>
+#include <linux/trace_printk.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
#include <asm/asm-offsets.h>
#endif
diff --git a/samples/trace_printk/trace-printk.c b/samples/trace_printk/trace-printk.c
index cfc159580263..4fc58844aff1 100644
--- a/samples/trace_printk/trace-printk.c
+++ b/samples/trace_printk/trace-printk.c
@@ -2,6 +2,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/irq_work.h>
+#include <linux/trace_printk.h>
/* Must not be static to force gcc to consider these non constant */
char *trace_printk_test_global_str =
diff --git a/sound/hda/common/sysfs.c b/sound/hda/common/sysfs.c
index f8c8483fd5e5..ac382f7063dc 100644
--- a/sound/hda/common/sysfs.c
+++ b/sound/hda/common/sysfs.c
@@ -19,6 +19,7 @@
#include "hda_local.h"
#include <sound/hda_hwdep.h>
#include <sound/minors.h>
+#include <linux/trace_printk.h>
/* hint string pair */
struct hda_hint {
--
2.43.0
^ permalink raw reply related
* [PATCH v4 6/7] tracing: move tracing declarations from kernel.h to a dedicated header
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
Tracing is a half of the kernel.h in terms of LOCs, although it's
a self-consistent part. It is intended for quick debugging purposes
and isn't used by the normal tracing utilities.
Move it to a separate header. If someone needs to just throw a
trace_printk() in their driver, they will not have to pull all
the heavy tracing machinery.
This is a pure move.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
include/linux/kernel.h | 196 +--------------------------------
include/linux/trace_printk.h | 204 +++++++++++++++++++++++++++++++++++
2 files changed, 205 insertions(+), 195 deletions(-)
create mode 100644 include/linux/trace_printk.h
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4ee48fb10dec..a377335e01da 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -32,7 +32,7 @@
#include <linux/build_bug.h>
#include <linux/sprintf.h>
#include <linux/static_call_types.h>
-#include <linux/instruction_pointer.h>
+#include <linux/trace_printk.h>
#include <linux/util_macros.h>
#include <linux/wordpart.h>
@@ -190,200 +190,6 @@ enum system_states {
};
extern enum system_states system_state;
-/*
- * General tracing related utility functions - trace_printk(),
- * tracing_on/tracing_off and tracing_start()/tracing_stop
- *
- * Use tracing_on/tracing_off when you want to quickly turn on or off
- * tracing. It simply enables or disables the recording of the trace events.
- * This also corresponds to the user space /sys/kernel/tracing/tracing_on
- * file, which gives a means for the kernel and userspace to interact.
- * Place a tracing_off() in the kernel where you want tracing to end.
- * From user space, examine the trace, and then echo 1 > tracing_on
- * to continue tracing.
- *
- * tracing_stop/tracing_start has slightly more overhead. It is used
- * by things like suspend to ram where disabling the recording of the
- * trace is not enough, but tracing must actually stop because things
- * like calling smp_processor_id() may crash the system.
- *
- * Most likely, you want to use tracing_on/tracing_off.
- */
-
-enum ftrace_dump_mode {
- DUMP_NONE,
- DUMP_ALL,
- DUMP_ORIG,
- DUMP_PARAM,
-};
-
-#ifdef CONFIG_TRACING
-void tracing_on(void);
-void tracing_off(void);
-int tracing_is_on(void);
-void tracing_snapshot(void);
-void tracing_snapshot_alloc(void);
-
-extern void tracing_start(void);
-extern void tracing_stop(void);
-
-static inline __printf(1, 2)
-void ____trace_printk_check_format(const char *fmt, ...)
-{
-}
-#define __trace_printk_check_format(fmt, args...) \
-do { \
- if (0) \
- ____trace_printk_check_format(fmt, ##args); \
-} while (0)
-
-/**
- * trace_printk - printf formatting in the ftrace buffer
- * @fmt: the printf format for printing
- *
- * Note: __trace_printk is an internal function for trace_printk() and
- * the @ip is passed in via the trace_printk() macro.
- *
- * This function allows a kernel developer to debug fast path sections
- * that printk is not appropriate for. By scattering in various
- * printk like tracing in the code, a developer can quickly see
- * where problems are occurring.
- *
- * This is intended as a debugging tool for the developer only.
- * Please refrain from leaving trace_printks scattered around in
- * your code. (Extra memory is used for special buffers that are
- * allocated when trace_printk() is used.)
- *
- * A little optimization trick is done here. If there's only one
- * argument, there's no need to scan the string for printf formats.
- * The trace_puts() will suffice. But how can we take advantage of
- * using trace_puts() when trace_printk() has only one argument?
- * By stringifying the args and checking the size we can tell
- * whether or not there are args. __stringify((__VA_ARGS__)) will
- * turn into "()\0" with a size of 3 when there are no args, anything
- * else will be bigger. All we need to do is define a string to this,
- * and then take its size and compare to 3. If it's bigger, use
- * do_trace_printk() otherwise, optimize it to trace_puts(). Then just
- * let gcc optimize the rest.
- */
-
-#define trace_printk(fmt, ...) \
-do { \
- char _______STR[] = __stringify((__VA_ARGS__)); \
- if (sizeof(_______STR) > 3) \
- do_trace_printk(fmt, ##__VA_ARGS__); \
- else \
- trace_puts(fmt); \
-} while (0)
-
-#define do_trace_printk(fmt, args...) \
-do { \
- static const char *trace_printk_fmt __used \
- __section("__trace_printk_fmt") = \
- __builtin_constant_p(fmt) ? fmt : NULL; \
- \
- __trace_printk_check_format(fmt, ##args); \
- \
- if (__builtin_constant_p(fmt)) \
- __trace_bprintk(_THIS_IP_, trace_printk_fmt, ##args); \
- else \
- __trace_printk(_THIS_IP_, fmt, ##args); \
-} while (0)
-
-extern __printf(2, 3)
-int __trace_bprintk(unsigned long ip, const char *fmt, ...);
-
-extern __printf(2, 3)
-int __trace_printk(unsigned long ip, const char *fmt, ...);
-
-/**
- * trace_puts - write a string into the ftrace buffer
- * @str: the string to record
- *
- * Note: __trace_bputs is an internal function for trace_puts and
- * the @ip is passed in via the trace_puts macro.
- *
- * This is similar to trace_printk() but is made for those really fast
- * paths that a developer wants the least amount of "Heisenbug" effects,
- * where the processing of the print format is still too much.
- *
- * This function allows a kernel developer to debug fast path sections
- * that printk is not appropriate for. By scattering in various
- * printk like tracing in the code, a developer can quickly see
- * where problems are occurring.
- *
- * This is intended as a debugging tool for the developer only.
- * Please refrain from leaving trace_puts scattered around in
- * your code. (Extra memory is used for special buffers that are
- * allocated when trace_puts() is used.)
- *
- * Returns: 0 if nothing was written, positive # if string was.
- * (1 when __trace_bputs is used, strlen(str) when __trace_puts is used)
- */
-
-#define trace_puts(str) ({ \
- static const char *trace_printk_fmt __used \
- __section("__trace_printk_fmt") = \
- __builtin_constant_p(str) ? str : NULL; \
- \
- if (__builtin_constant_p(str)) \
- __trace_bputs(_THIS_IP_, trace_printk_fmt); \
- else \
- __trace_puts(_THIS_IP_, str); \
-})
-extern int __trace_bputs(unsigned long ip, const char *str);
-extern int __trace_puts(unsigned long ip, const char *str);
-
-extern void trace_dump_stack(int skip);
-
-/*
- * The double __builtin_constant_p is because gcc will give us an error
- * if we try to allocate the static variable to fmt if it is not a
- * constant. Even with the outer if statement.
- */
-#define ftrace_vprintk(fmt, vargs) \
-do { \
- if (__builtin_constant_p(fmt)) { \
- static const char *trace_printk_fmt __used \
- __section("__trace_printk_fmt") = \
- __builtin_constant_p(fmt) ? fmt : NULL; \
- \
- __ftrace_vbprintk(_THIS_IP_, trace_printk_fmt, vargs); \
- } else \
- __ftrace_vprintk(_THIS_IP_, fmt, vargs); \
-} while (0)
-
-extern __printf(2, 0) int
-__ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
-
-extern __printf(2, 0) int
-__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
-
-extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
-#else
-static inline void tracing_start(void) { }
-static inline void tracing_stop(void) { }
-static inline void trace_dump_stack(int skip) { }
-
-static inline void tracing_on(void) { }
-static inline void tracing_off(void) { }
-static inline int tracing_is_on(void) { return 0; }
-static inline void tracing_snapshot(void) { }
-static inline void tracing_snapshot_alloc(void) { }
-
-static inline __printf(1, 2)
-int trace_printk(const char *fmt, ...)
-{
- return 0;
-}
-static __printf(1, 0) inline int
-ftrace_vprintk(const char *fmt, va_list ap)
-{
- return 0;
-}
-static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
-#endif /* CONFIG_TRACING */
-
/* Rebuild everything on CONFIG_DYNAMIC_FTRACE */
#ifdef CONFIG_DYNAMIC_FTRACE
# define REBUILD_DUE_TO_DYNAMIC_FTRACE
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
new file mode 100644
index 000000000000..bb5874097f24
--- /dev/null
+++ b/include/linux/trace_printk.h
@@ -0,0 +1,204 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TRACE_PRINTK_H
+#define _LINUX_TRACE_PRINTK_H
+
+#include <linux/compiler_attributes.h>
+#include <linux/instruction_pointer.h>
+#include <linux/stddef.h>
+#include <linux/stringify.h>
+
+/*
+ * General tracing related utility functions - trace_printk(),
+ * tracing_on/tracing_off and tracing_start()/tracing_stop
+ *
+ * Use tracing_on/tracing_off when you want to quickly turn on or off
+ * tracing. It simply enables or disables the recording of the trace events.
+ * This also corresponds to the user space /sys/kernel/tracing/tracing_on
+ * file, which gives a means for the kernel and userspace to interact.
+ * Place a tracing_off() in the kernel where you want tracing to end.
+ * From user space, examine the trace, and then echo 1 > tracing_on
+ * to continue tracing.
+ *
+ * tracing_stop/tracing_start has slightly more overhead. It is used
+ * by things like suspend to ram where disabling the recording of the
+ * trace is not enough, but tracing must actually stop because things
+ * like calling smp_processor_id() may crash the system.
+ *
+ * Most likely, you want to use tracing_on/tracing_off.
+ */
+
+enum ftrace_dump_mode {
+ DUMP_NONE,
+ DUMP_ALL,
+ DUMP_ORIG,
+ DUMP_PARAM,
+};
+
+#ifdef CONFIG_TRACING
+void tracing_on(void);
+void tracing_off(void);
+int tracing_is_on(void);
+void tracing_snapshot(void);
+void tracing_snapshot_alloc(void);
+
+extern void tracing_start(void);
+extern void tracing_stop(void);
+
+static inline __printf(1, 2)
+void ____trace_printk_check_format(const char *fmt, ...)
+{
+}
+#define __trace_printk_check_format(fmt, args...) \
+do { \
+ if (0) \
+ ____trace_printk_check_format(fmt, ##args); \
+} while (0)
+
+/**
+ * trace_printk - printf formatting in the ftrace buffer
+ * @fmt: the printf format for printing
+ *
+ * Note: __trace_printk is an internal function for trace_printk() and
+ * the @ip is passed in via the trace_printk() macro.
+ *
+ * This function allows a kernel developer to debug fast path sections
+ * that printk is not appropriate for. By scattering in various
+ * printk like tracing in the code, a developer can quickly see
+ * where problems are occurring.
+ *
+ * This is intended as a debugging tool for the developer only.
+ * Please refrain from leaving trace_printks scattered around in
+ * your code. (Extra memory is used for special buffers that are
+ * allocated when trace_printk() is used.)
+ *
+ * A little optimization trick is done here. If there's only one
+ * argument, there's no need to scan the string for printf formats.
+ * The trace_puts() will suffice. But how can we take advantage of
+ * using trace_puts() when trace_printk() has only one argument?
+ * By stringifying the args and checking the size we can tell
+ * whether or not there are args. __stringify((__VA_ARGS__)) will
+ * turn into "()\0" with a size of 3 when there are no args, anything
+ * else will be bigger. All we need to do is define a string to this,
+ * and then take its size and compare to 3. If it's bigger, use
+ * do_trace_printk() otherwise, optimize it to trace_puts(). Then just
+ * let gcc optimize the rest.
+ */
+
+#define trace_printk(fmt, ...) \
+do { \
+ char _______STR[] = __stringify((__VA_ARGS__)); \
+ if (sizeof(_______STR) > 3) \
+ do_trace_printk(fmt, ##__VA_ARGS__); \
+ else \
+ trace_puts(fmt); \
+} while (0)
+
+#define do_trace_printk(fmt, args...) \
+do { \
+ static const char *trace_printk_fmt __used \
+ __section("__trace_printk_fmt") = \
+ __builtin_constant_p(fmt) ? fmt : NULL; \
+ \
+ __trace_printk_check_format(fmt, ##args); \
+ \
+ if (__builtin_constant_p(fmt)) \
+ __trace_bprintk(_THIS_IP_, trace_printk_fmt, ##args); \
+ else \
+ __trace_printk(_THIS_IP_, fmt, ##args); \
+} while (0)
+
+extern __printf(2, 3)
+int __trace_bprintk(unsigned long ip, const char *fmt, ...);
+
+extern __printf(2, 3)
+int __trace_printk(unsigned long ip, const char *fmt, ...);
+
+/**
+ * trace_puts - write a string into the ftrace buffer
+ * @str: the string to record
+ *
+ * Note: __trace_bputs is an internal function for trace_puts and
+ * the @ip is passed in via the trace_puts macro.
+ *
+ * This is similar to trace_printk() but is made for those really fast
+ * paths that a developer wants the least amount of "Heisenbug" effects,
+ * where the processing of the print format is still too much.
+ *
+ * This function allows a kernel developer to debug fast path sections
+ * that printk is not appropriate for. By scattering in various
+ * printk like tracing in the code, a developer can quickly see
+ * where problems are occurring.
+ *
+ * This is intended as a debugging tool for the developer only.
+ * Please refrain from leaving trace_puts scattered around in
+ * your code. (Extra memory is used for special buffers that are
+ * allocated when trace_puts() is used.)
+ *
+ * Returns: 0 if nothing was written, positive # if string was.
+ * (1 when __trace_bputs is used, strlen(str) when __trace_puts is used)
+ */
+
+#define trace_puts(str) ({ \
+ static const char *trace_printk_fmt __used \
+ __section("__trace_printk_fmt") = \
+ __builtin_constant_p(str) ? str : NULL; \
+ \
+ if (__builtin_constant_p(str)) \
+ __trace_bputs(_THIS_IP_, trace_printk_fmt); \
+ else \
+ __trace_puts(_THIS_IP_, str); \
+})
+extern int __trace_bputs(unsigned long ip, const char *str);
+extern int __trace_puts(unsigned long ip, const char *str);
+
+extern void trace_dump_stack(int skip);
+
+/*
+ * The double __builtin_constant_p is because gcc will give us an error
+ * if we try to allocate the static variable to fmt if it is not a
+ * constant. Even with the outer if statement.
+ */
+#define ftrace_vprintk(fmt, vargs) \
+do { \
+ if (__builtin_constant_p(fmt)) { \
+ static const char *trace_printk_fmt __used \
+ __section("__trace_printk_fmt") = \
+ __builtin_constant_p(fmt) ? fmt : NULL; \
+ \
+ __ftrace_vbprintk(_THIS_IP_, trace_printk_fmt, vargs); \
+ } else \
+ __ftrace_vprintk(_THIS_IP_, fmt, vargs); \
+} while (0)
+
+extern __printf(2, 0) int
+__ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
+
+extern __printf(2, 0) int
+__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
+
+extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
+#else
+static inline void tracing_start(void) { }
+static inline void tracing_stop(void) { }
+static inline void trace_dump_stack(int skip) { }
+
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline int tracing_is_on(void) { return 0; }
+static inline void tracing_snapshot(void) { }
+static inline void tracing_snapshot_alloc(void) { }
+
+static inline __printf(1, 2)
+int trace_printk(const char *fmt, ...)
+{
+ return 0;
+}
+static __printf(1, 0) inline int
+ftrace_vprintk(const char *fmt, va_list ap)
+{
+ return 0;
+}
+static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
+#endif /* CONFIG_TRACING */
+
+#endif
--
2.43.0
^ permalink raw reply related
* [PATCH v4 4/7] kernel.h: include linux/instruction_pointer.h explicitly
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
In preparation for decoupling linux/instruction_pointer.h and
linux/kernel.h, include instruction_pointer.h explicitly where needed.
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
arch/s390/include/asm/processor.h | 1 +
include/linux/ww_mutex.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 3affba95845b..cc187afa07b3 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -31,6 +31,7 @@
#include <linux/cpumask.h>
#include <linux/linkage.h>
#include <linux/irqflags.h>
+#include <linux/instruction_pointer.h>
#include <linux/bitops.h>
#include <asm/fpu-types.h>
#include <asm/cpu.h>
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 45ff6f7a872b..9b30fa2ec508 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -17,6 +17,7 @@
#ifndef __LINUX_WW_MUTEX_H
#define __LINUX_WW_MUTEX_H
+#include <linux/instruction_pointer.h>
#include <linux/mutex.h>
#include <linux/rtmutex.h>
--
2.43.0
^ permalink raw reply related
* [PATCH v4 3/7] kernel.h: move VERIFY_OCTAL_PERMISSIONS() to sysfs.h
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
The macro is related to sysfs, but is defined in kernel.h. Move it to
the proper header, and unload the generic kernel.h.
Now that the macro is removed from kernel.h, linux/moduleparam.h is
decoupled, and kernel.h inclusion can be removed.
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
Documentation/filesystems/sysfs.rst | 2 +-
include/linux/kernel.h | 12 ------------
include/linux/moduleparam.h | 2 +-
include/linux/sysfs.h | 13 +++++++++++++
4 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst
index 2703c04af7d0..ffcef4d6bc8d 100644
--- a/Documentation/filesystems/sysfs.rst
+++ b/Documentation/filesystems/sysfs.rst
@@ -120,7 +120,7 @@ is equivalent to doing::
.store = store_foo,
};
-Note as stated in include/linux/kernel.h "OTHER_WRITABLE? Generally
+Note as stated in include/linux/sysfs.h "OTHER_WRITABLE? Generally
considered a bad idea." so trying to set a sysfs file writable for
everyone will fail reverting to RO mode for "Others".
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 61d63c57bc2d..5b879bfea948 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -389,16 +389,4 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
# define REBUILD_DUE_TO_DYNAMIC_FTRACE
#endif
-/* Permissions on a sysfs file: you didn't miss the 0 prefix did you? */
-#define VERIFY_OCTAL_PERMISSIONS(perms) \
- (BUILD_BUG_ON_ZERO((perms) < 0) + \
- BUILD_BUG_ON_ZERO((perms) > 0777) + \
- /* USER_READABLE >= GROUP_READABLE >= OTHER_READABLE */ \
- BUILD_BUG_ON_ZERO((((perms) >> 6) & 4) < (((perms) >> 3) & 4)) + \
- BUILD_BUG_ON_ZERO((((perms) >> 3) & 4) < ((perms) & 4)) + \
- /* USER_WRITABLE >= GROUP_WRITABLE */ \
- BUILD_BUG_ON_ZERO((((perms) >> 6) & 2) < (((perms) >> 3) & 2)) + \
- /* OTHER_WRITABLE? Generally considered a bad idea. */ \
- BUILD_BUG_ON_ZERO((perms) & 2) + \
- (perms))
#endif
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 03a977168c52..281a006dc284 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -8,7 +8,7 @@
#include <linux/compiler.h>
#include <linux/init.h>
#include <linux/stringify.h>
-#include <linux/kernel.h>
+#include <linux/sysfs.h>
#include <linux/types.h>
/*
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index c33a96b7391a..99b775f3ff46 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -808,4 +808,17 @@ static inline void sysfs_put(struct kernfs_node *kn)
kernfs_put(kn);
}
+/* Permissions on a sysfs file: you didn't miss the 0 prefix did you? */
+#define VERIFY_OCTAL_PERMISSIONS(perms) \
+ (BUILD_BUG_ON_ZERO((perms) < 0) + \
+ BUILD_BUG_ON_ZERO((perms) > 0777) + \
+ /* USER_READABLE >= GROUP_READABLE >= OTHER_READABLE */ \
+ BUILD_BUG_ON_ZERO((((perms) >> 6) & 4) < (((perms) >> 3) & 4)) + \
+ BUILD_BUG_ON_ZERO((((perms) >> 3) & 4) < ((perms) & 4)) + \
+ /* USER_WRITABLE >= GROUP_WRITABLE */ \
+ BUILD_BUG_ON_ZERO((((perms) >> 6) & 2) < (((perms) >> 3) & 2)) + \
+ /* OTHER_WRITABLE? Generally considered a bad idea. */ \
+ BUILD_BUG_ON_ZERO((perms) & 2) + \
+ (perms))
+
#endif /* _SYSFS_H_ */
--
2.43.0
^ permalink raw reply related
* [PATCH v4 2/7] moduleparam: include required headers explicitly
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
The following patch drops moduleparam.h dependency on kernel.h. In
preparation to it, list all the required headers explicitly.
Suggested-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
include/linux/moduleparam.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 915f32f7d888..03a977168c52 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -2,9 +2,14 @@
#ifndef _LINUX_MODULE_PARAMS_H
#define _LINUX_MODULE_PARAMS_H
/* (C) Copyright 2001, 2002 Rusty Russell IBM Corporation */
+
+#include <linux/array_size.h>
+#include <linux/build_bug.h>
+#include <linux/compiler.h>
#include <linux/init.h>
#include <linux/stringify.h>
#include <linux/kernel.h>
+#include <linux/types.h>
/*
* The maximum module name length, including the NUL byte.
--
2.43.0
^ permalink raw reply related
* [PATCH v4 1/7] kernel.h: drop STACK_MAGIC macro
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA), Jani Nikula, Aaron Tomlin, Andi Shyti
In-Reply-To: <20251225170930.1151781-1-yury.norov@gmail.com>
The macro was introduced in 1994, v1.0.4, for stacks protection. Since
that, people found better ways to protect stacks, and now the macro is
only used by i915 selftests. Move it to a local header and drop from
the kernel.h.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
drivers/gpu/drm/i915/gt/selftest_ring_submission.c | 1 +
drivers/gpu/drm/i915/i915_selftest.h | 2 ++
include/linux/kernel.h | 2 --
3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
index 87ceb0f374b6..600333ae6c8c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
@@ -3,6 +3,7 @@
* Copyright © 2020 Intel Corporation
*/
+#include "i915_selftest.h"
#include "intel_engine_pm.h"
#include "selftests/igt_flush_test.h"
diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h
index bdf3e22c0a34..72922028f4ba 100644
--- a/drivers/gpu/drm/i915/i915_selftest.h
+++ b/drivers/gpu/drm/i915/i915_selftest.h
@@ -26,6 +26,8 @@
#include <linux/types.h>
+#define STACK_MAGIC 0xdeadbeef
+
struct pci_dev;
struct drm_i915_private;
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5b46924fdff5..61d63c57bc2d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -40,8 +40,6 @@
#include <uapi/linux/kernel.h>
-#define STACK_MAGIC 0xdeadbeef
-
struct completion;
struct user;
--
2.43.0
^ permalink raw reply related
* [PATCH v4 0/7] Unload linux/kernel.h
From: Yury Norov (NVIDIA) @ 2025-12-25 17:09 UTC (permalink / raw)
To: Steven Rostedt, Andrew Morton, Masami Hiramatsu,
Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
dri-devel, linux-modules, linux-trace-kernel
Cc: Yury Norov (NVIDIA)
kernel.h hosts declarations that can be placed better. This series
decouples kernel.h with some explicit and implicit dependencies; also,
moves tracing functionality to a new independent header.
My local build testing shows ~2% performance improvement for GCC +
Ubuntu x86_64/localyesconfig.
v1: https://lore.kernel.org/all/20251129195304.204082-1-yury.norov@gmail.com/
v2: https://lore.kernel.org/all/20251203162329.280182-1-yury.norov@gmail.com/
v3: https://lore.kernel.org/all/20251205175237.242022-1-yury.norov@gmail.com/
v4:
- drop kernel.h dependency on linux/instruction_pointer.h (new patch #4);
- drop trace_printk.h dependency on string.h (new patch #5 - Steven);
- drop kernel.h dependency on trace_printk.h (new patch #7);
- explicitly tested CONFIG_FORTIFY x86_64 build with no issues.
0-DAY CI Kernel Test Service:
alpha allnoconfig gcc-15.1.0
alpha allyesconfig gcc-15.1.0
alpha defconfig gcc-15.1.0
arc allmodconfig clang-16
arc allmodconfig gcc-15.1.0
arc allnoconfig gcc-15.1.0
arc allyesconfig clang-22
arc allyesconfig gcc-15.1.0
arc defconfig gcc-15.1.0
arc randconfig-001-20251225 gcc-11.5.0
arc randconfig-002-20251225 gcc-11.5.0
arm allnoconfig clang-22
arm allnoconfig gcc-15.1.0
arm allyesconfig clang-16
arm allyesconfig gcc-15.1.0
arm defconfig gcc-15.1.0
arm exynos_defconfig gcc-15.1.0
arm randconfig-001-20251225 gcc-11.5.0
arm randconfig-002-20251225 gcc-11.5.0
arm randconfig-003-20251225 gcc-11.5.0
arm randconfig-004-20251225 gcc-11.5.0
arm spitz_defconfig gcc-15.1.0
arm64 allmodconfig clang-19
arm64 allmodconfig clang-22
arm64 allnoconfig gcc-15.1.0
arm64 defconfig gcc-15.1.0
arm64 randconfig-001-20251225 clang-18
arm64 randconfig-001-20251225 gcc-11.5.0
arm64 randconfig-002-20251225 gcc-11.5.0
arm64 randconfig-002-20251225 gcc-12.5.0
arm64 randconfig-003-20251225 clang-22
arm64 randconfig-003-20251225 gcc-11.5.0
arm64 randconfig-004-20251225 clang-22
arm64 randconfig-004-20251225 gcc-11.5.0
csky allmodconfig gcc-15.1.0
csky allnoconfig gcc-15.1.0
csky defconfig gcc-15.1.0
csky randconfig-001-20251225 gcc-11.5.0
csky randconfig-001-20251225 gcc-15.1.0
csky randconfig-002-20251225 gcc-11.5.0
hexagon allmodconfig clang-17
hexagon allmodconfig gcc-15.1.0
hexagon allnoconfig clang-22
hexagon allnoconfig gcc-15.1.0
hexagon defconfig gcc-15.1.0
hexagon randconfig-001-20251225 clang-22
hexagon randconfig-002-20251225 clang-22
i386 allmodconfig clang-20
i386 allmodconfig gcc-14
i386 allnoconfig gcc-14
i386 allnoconfig gcc-15.1.0
i386 allyesconfig clang-20
i386 allyesconfig gcc-14
i386 buildonly-randconfig-001-20251225 clang-20
i386 buildonly-randconfig-002-20251225 clang-20
i386 buildonly-randconfig-003-20251225 clang-20
i386 buildonly-randconfig-003-20251225 gcc-14
i386 buildonly-randconfig-004-20251225 clang-20
i386 buildonly-randconfig-005-20251225 clang-20
i386 buildonly-randconfig-006-20251225 clang-20
i386 randconfig-007-20251225 clang-20
i386 randconfig-011-20251225 clang-20
i386 randconfig-011-20251225 gcc-14
i386 randconfig-012-20251225 gcc-14
i386 randconfig-013-20251225 gcc-14
i386 randconfig-014-20251225 clang-20
i386 randconfig-014-20251225 gcc-14
i386 randconfig-015-20251225 gcc-14
i386 randconfig-016-20251225 clang-20
i386 randconfig-016-20251225 gcc-14
i386 randconfig-017-20251225 clang-20
i386 randconfig-017-20251225 gcc-14
loongarch allmodconfig clang-19
loongarch allmodconfig clang-22
loongarch allnoconfig clang-22
loongarch allnoconfig gcc-15.1.0
loongarch defconfig clang-19
loongarch randconfig-001-20251225 clang-22
loongarch randconfig-002-20251225 clang-22
loongarch randconfig-002-20251225 gcc-15.1.0
m68k allmodconfig gcc-15.1.0
m68k allnoconfig gcc-15.1.0
m68k allyesconfig clang-16
m68k allyesconfig gcc-15.1.0
m68k defconfig clang-19
microblaze allnoconfig gcc-15.1.0
microblaze allyesconfig gcc-15.1.0
microblaze defconfig clang-19
mips allmodconfig gcc-15.1.0
mips allnoconfig gcc-15.1.0
mips allyesconfig gcc-15.1.0
mips gcw0_defconfig gcc-15.1.0
nios2 allmodconfig clang-22
nios2 allmodconfig gcc-11.5.0
nios2 allnoconfig clang-22
nios2 allnoconfig gcc-11.5.0
nios2 defconfig clang-19
nios2 randconfig-001-20251225 clang-22
nios2 randconfig-001-20251225 gcc-9.5.0
nios2 randconfig-002-20251225 clang-22
nios2 randconfig-002-20251225 gcc-11.5.0
openrisc allmodconfig clang-22
openrisc allmodconfig gcc-15.1.0
openrisc allnoconfig clang-22
openrisc allnoconfig gcc-15.1.0
openrisc defconfig gcc-15.1.0
parisc allmodconfig gcc-15.1.0
parisc allnoconfig clang-22
parisc allnoconfig gcc-15.1.0
parisc allyesconfig clang-19
parisc allyesconfig gcc-15.1.0
parisc defconfig gcc-15.1.0
parisc randconfig-001-20251225 clang-22
parisc randconfig-002-20251225 clang-22
parisc64 defconfig clang-19
powerpc allmodconfig gcc-15.1.0
powerpc allnoconfig clang-22
powerpc allnoconfig gcc-15.1.0
powerpc cell_defconfig gcc-15.1.0
powerpc pmac32_defconfig gcc-15.1.0
powerpc randconfig-001-20251225 clang-22
powerpc randconfig-002-20251225 clang-22
powerpc64 randconfig-001-20251225 clang-22
powerpc64 randconfig-002-20251225 clang-22
riscv allmodconfig clang-22
riscv allnoconfig clang-22
riscv allnoconfig gcc-15.1.0
riscv allyesconfig clang-16
riscv defconfig clang-22
riscv defconfig gcc-15.1.0
riscv randconfig-001-20251225 clang-19
riscv randconfig-001-20251225 clang-22
riscv randconfig-002-20251225 clang-19
riscv randconfig-002-20251225 gcc-11.5.0
s390 allmodconfig clang-18
s390 allmodconfig clang-19
s390 allnoconfig clang-22
s390 allyesconfig gcc-15.1.0
s390 defconfig clang-22
s390 defconfig gcc-15.1.0
s390 randconfig-001-20251225 clang-19
s390 randconfig-001-20251225 gcc-14.3.0
s390 randconfig-002-20251225 clang-19
sh allmodconfig gcc-15.1.0
sh allnoconfig clang-22
sh allnoconfig gcc-15.1.0
sh allyesconfig clang-19
sh allyesconfig gcc-15.1.0
sh defconfig gcc-14
sh randconfig-001-20251225 clang-19
sh randconfig-001-20251225 gcc-15.1.0
sh randconfig-002-20251225 clang-19
sh randconfig-002-20251225 gcc-9.5.0
sh se7724_defconfig gcc-15.1.0
sparc allnoconfig clang-22
sparc allnoconfig gcc-15.1.0
sparc defconfig gcc-15.1.0
sparc randconfig-001-20251225 gcc-13
sparc randconfig-002-20251225 gcc-13
sparc64 allmodconfig clang-22
sparc64 defconfig gcc-14
sparc64 randconfig-001-20251225 gcc-13
sparc64 randconfig-002-20251225 gcc-13
um allmodconfig clang-19
um allnoconfig clang-22
um allyesconfig gcc-14
um allyesconfig gcc-15.1.0
um defconfig gcc-14
um i386_defconfig gcc-14
um randconfig-001-20251225 gcc-13
um randconfig-002-20251225 gcc-13
um x86_64_defconfig gcc-14
x86_64 allmodconfig clang-20
x86_64 allnoconfig clang-20
x86_64 allnoconfig clang-22
x86_64 allyesconfig clang-20
x86_64 buildonly-randconfig-001-20251225 clang-20
x86_64 buildonly-randconfig-001-20251225 gcc-14
x86_64 buildonly-randconfig-002-20251225 clang-20
x86_64 buildonly-randconfig-002-20251225 gcc-14
x86_64 buildonly-randconfig-003-20251225 gcc-14
x86_64 buildonly-randconfig-004-20251225 clang-20
x86_64 buildonly-randconfig-004-20251225 gcc-14
x86_64 buildonly-randconfig-005-20251225 gcc-14
x86_64 buildonly-randconfig-006-20251225 clang-20
x86_64 buildonly-randconfig-006-20251225 gcc-14
x86_64 defconfig gcc-14
x86_64 kexec clang-20
x86_64 randconfig-001-20251225 clang-20
x86_64 randconfig-002-20251225 clang-20
x86_64 randconfig-003-20251225 clang-20
x86_64 randconfig-004-20251225 clang-20
x86_64 randconfig-005-20251225 clang-20
x86_64 randconfig-006-20251225 clang-20
x86_64 randconfig-011-20251225 gcc-13
x86_64 randconfig-012-20251225 gcc-13
x86_64 randconfig-012-20251225 gcc-14
x86_64 randconfig-013-20251225 clang-20
x86_64 randconfig-013-20251225 gcc-13
x86_64 randconfig-014-20251225 clang-20
x86_64 randconfig-014-20251225 gcc-13
x86_64 randconfig-015-20251225 gcc-13
x86_64 randconfig-015-20251225 gcc-14
x86_64 randconfig-016-20251225 clang-20
x86_64 randconfig-016-20251225 gcc-13
x86_64 randconfig-071-20251225 clang-20
x86_64 randconfig-072-20251225 clang-20
x86_64 randconfig-073-20251225 clang-20
x86_64 randconfig-073-20251225 gcc-14
x86_64 randconfig-074-20251225 clang-20
x86_64 randconfig-075-20251225 clang-20
x86_64 randconfig-075-20251225 gcc-14
x86_64 randconfig-076-20251225 clang-20
x86_64 randconfig-076-20251225 gcc-14
x86_64 rhel-9.4 clang-20
x86_64 rhel-9.4-bpf gcc-14
x86_64 rhel-9.4-func clang-20
x86_64 rhel-9.4-kselftests clang-20
x86_64 rhel-9.4-kunit gcc-14
x86_64 rhel-9.4-ltp gcc-14
x86_64 rhel-9.4-rust clang-20
xtensa allnoconfig clang-22
xtensa allnoconfig gcc-15.1.0
xtensa allyesconfig clang-22
xtensa randconfig-001-20251225 gcc-13
xtensa randconfig-002-20251225 gcc-13
Merry Christmas everybody!
Steven Rostedt (1):
tracing: Remove size parameter in __trace_puts()
Yury Norov (NVIDIA) (6):
kernel.h: drop STACK_MAGIC macro
moduleparam: include required headers explicitly
kernel.h: move VERIFY_OCTAL_PERMISSIONS() to sysfs.h
kernel.h: include linux/instruction_pointer.h explicitly
tracing: move tracing declarations from kernel.h to a dedicated header
kernel.h: drop trace_printk.h
Documentation/filesystems/sysfs.rst | 2 +-
arch/powerpc/kvm/book3s_xics.c | 1 +
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/include/asm/processor.h | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
.../drm/i915/gt/selftest_ring_submission.c | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/gpu/drm/i915/i915_selftest.h | 2 +
drivers/hwtracing/stm/dummy_stm.c | 1 +
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/tty/sysrq.c | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/kernel.h | 209 ------------------
include/linux/moduleparam.h | 7 +-
include/linux/sunrpc/debug.h | 1 +
include/linux/sysfs.h | 13 ++
include/linux/trace_printk.h | 204 +++++++++++++++++
include/linux/ww_mutex.h | 1 +
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 1 +
kernel/rcu/rcutorture.c | 1 +
kernel/trace/error_report-traces.c | 1 +
kernel/trace/ring_buffer_benchmark.c | 1 +
kernel/trace/trace.c | 8 +-
kernel/trace/trace.h | 2 +-
kernel/trace/trace_benchmark.c | 1 +
kernel/trace/trace_events_trigger.c | 1 +
kernel/trace/trace_functions.c | 1 +
kernel/trace/trace_printk.c | 1 +
kernel/trace/trace_selftest.c | 1 +
lib/sys_info.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-modify.c | 1 +
samples/ftrace/ftrace-direct-multi-modify.c | 1 +
samples/ftrace/ftrace-direct-multi.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 +
samples/ftrace/ftrace-direct.c | 1 +
samples/trace_printk/trace-printk.c | 1 +
sound/hda/common/sysfs.c | 1 +
43 files changed, 266 insertions(+), 216 deletions(-)
create mode 100644 include/linux/trace_printk.h
--
2.43.0
^ permalink raw reply
* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Yonghong Song @ 2025-12-24 5:36 UTC (permalink / raw)
To: Ihor Solodrai, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Nathan Chancellor, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman
Cc: linux-kernel, linux-modules, bpf, linux-kbuild, llvm
In-Reply-To: <20251224005752.201911-1-ihor.solodrai@linux.dev>
On 12/23/25 4:57 PM, Ihor Solodrai wrote:
> I've been chasing down the following flaky splat, introduced by recent
> changes in BTF generation [1]:
>
> ------------[ cut here ]------------
> BUG: unable to handle page fault for address: ffa000000233d828
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
> RIP: 0010:simplify_symbols+0x2b2/0x480
> 9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __kmalloc_node_track_caller_noprof+0x37f/0x740
> ? __pfx_setup_modinfo_srcversion+0x10/0x10
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? kstrdup+0x4a/0x70
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? setup_modinfo_srcversion+0x1a/0x30
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? setup_modinfo+0x12b/0x1e0
> load_module+0x133a/0x1610
> __x64_sys_finit_module+0x31b/0x450
> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> do_syscall_64+0x80/0x2d0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? exc_page_fault+0x95/0xc0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f1c63a2582d
> 9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
> RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
> RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
> R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
> </TASK>
> Modules linked in: bpf_testmod(OE)
> CR2: ffa000000233d828
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:simplify_symbols+0x2b2/0x480
> 9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: disabled
>
> This hasn't happened on BPF CI so far, for example, however I was able
> to reproduce it on a particular x64 machine using a kernel built with
> LLVM 20.
>
> The crash happens on attempt to load one of the BPF selftest modules
> (tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
> is used by kfunc_module_order test.
>
> The reason for the crash is that simplify_symbols() doesn't check for
> bounds of the ELF section index:
>
> for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
> const char *name = info->strtab + sym[i].st_name;
>
> switch (sym[i].st_shndx) {
> case SHN_COMMON:
>
> [...]
>
> default:
> /* Divert to percpu allocation if a percpu var. */
> if (sym[i].st_shndx == info->index.pcpu)
> secbase = (unsigned long)mod_percpu(mod);
> else
> /** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
> sym[i].st_value += secbase;
> break;
> }
> }
>
> And in the case I was able to reproduce, the value 0xffff
> (SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
>
> Now this code fragment is between 15 and 20 years old, so obviously
> it's not expected for a kmodule symbol to have such st_shndx
> value. Even so, the kernel probably should fail loading the module
> instead of crashing, which is what this patch attempts to fix.
>
> Investigating further, I discovered that the module binary became
> corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
> section data in scripts/gen-btf.sh. This explains how the bug has
> surfaced after gen-btf.sh was introduced:
>
> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (2), but unable to locate the extended symbol index table
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (3), but unable to locate the extended symbol index table
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (4), but unable to locate the extended symbol index table
> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff] __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (16), but unable to locate the extended symbol index table
> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff] __BTF_ID__func__bpf_test_modorder_retx__44417
>
> vs expected
>
> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6 __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6 __BTF_ID__func__bpf_test_modorder_retx__44417
>
> But why? Updating section data without changing it's size is not
> supposed to affect sections indices, right?
>
> With a bit more testing I confirmed that this is a LLVM-specific
> issue (doesn't reproduce with GCC kbuild), and it's not stable,
> because in link-vmlinux.h we also do:
>
> ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
>
> However:
>
> $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep 0xffff
> # no output, which is good
>
> So the suspect is the implementation of llvm-objcopy. As it turns out
> there is a relevant known bug that explains the flakiness and isn't
> fixed yet [3].
>
> [1] https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
> [2] https://man7.org/linux/man-pages/man5/elf.5.html
> [3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>
> ---
>
> RFC
>
> While this llvm-objcopy bug is not fixed, we can not trust it in the
> kernel build pipeline. In the short-term we have to come up with a
> workaround for .BTF_ids section update and replace the calls to
> ${OBJCOPY} --update-section with something else.
>
> One potential workaround is to force the use of the objcopy (from
> binutils) instead of llvm-objcopy when updating .BTF_ids section.
>
> Alternatively, we could just dd the .BTF_ids data computed by
> resolve_btfids at the right offset in the target ELF file.
>
> Surprisingly I couldn't find a good way to read a section offset and
> size from the ELF with a specified format in a command line. Both
> readelf and {llvm-}objdump give a human readable output, and it
> appears we can't rely on the column order, for example.
>
> We could still try parsing readelf output with awk/grep, covering
> output variants that appear in the kernel build.
>
> We can also do:
>
> llvm-readobj --elf-output-style=JSON --sections "$elf" | \
> jq -r --arg name .BTF_ids '
> .[0].Sections[] |
> select(.Section.Name.Name == $name) |
> "\(.Section.Offset) \(.Section.Size)"'
>
> ...but idk man, doesn't feel right.
>
> Most reliable way to determine the size and offset of .BTF_ids section
> is probably reading them by a C program with libelf, such as
> resolve_btfids. Which is quite ironic, given the recent
> changes. Setting the irony aside, we could add smth like:
> resolve_btfids --section-info=.BTF_ids $elf
>
> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
> really like to avoid it, given that BPF features/optimizations in
> development depend on it.
>
> I'd appreciate comments and suggestions on this issue. Thank you!
> ---
> kernel/module/main.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 710ee30b3bea..5bf456fad63e 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
> break;
>
> default:
> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
> + pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
> + mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
> + ret = -ENOEXEC;
> + break;
> + }
> +
> /* Divert to percpu allocation if a percpu var. */
> if (sym[i].st_shndx == info->index.pcpu)
> secbase = (unsigned long)mod_percpu(mod);
I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
I did not test llvm20 and I assume it may fail too.
The following llvm patch
https://github.com/llvm/llvm-project/pull/170462
can fix the issue. Currently it is still in review stage. The actual diff is
diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
index e5de17e093df..cc1527d996e2 100644
--- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
+++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
@@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec, ArrayRef<uint8_t> Data) {
Data.size(), Sec->Name.c_str(), Sec->Size);
if (!Sec->ParentSegment) {
- Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
+ SectionBase *Replaced = Sec.get();
+ SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
+ DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced, Modified}};
+ if (auto err = replaceSections(Replacements))
+ return err;
} else {
// The segment writer will be in charge of updating these contents.
Sec->Size = Data.size();
I applied the above patch to latest llvm21 and llvm22 and
the crash is gone and the selftests can run properly.
With KASAN, everything is okay for llvm21 and llvm22.
Not sure whether the llvm patch
https://github.com/llvm/llvm-project/pull/170462
can make into llvm21 or not as looks like llvm21 intends to
freeze for now. See
https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
the llvm22 will branch into rc mode in January.
I will try to see whether we can have a reasonable workaround
for llvm21 llvm-objcopy (for without KASAN).
^ permalink raw reply related
* [PATCH] ANDROID: gki: kallsyms: add kallsyms_lookup_address_and_size.
From: Yunjin Kim @ 2025-12-24 4:31 UTC (permalink / raw)
To: Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Daniel Gomez
Cc: Yunjin Kim, linux-kernel, linux-modules
In-Reply-To: <CGME20251224043158epcas2p217889374e0ea4b1722371ca143741d85@epcas2p2.samsung.com>
This methods are used by AKKstub-ARM Kernel Kstub.
We need to implement an automatic kernel-method mock that streamlines the
mocking process during kernel-method testing and enables fully automated
operations. This mechanism must traverse the binary instructions of the
target function in memory, locate the appropriate instruction, and replace
it. To perform the traversal, it must know the function’s entry address and
the size of its instruction range.
Bug:
Change-Id: I5a318f762d4412e70b0c8dcf2dfed326312bdc65
Signed-off-by: Yunjin Kim <yunzhen.kim@samsung.com>
---
include/linux/kallsyms.h | 2 ++
include/linux/module.h | 2 ++
kernel/kallsyms.c | 38 ++++++++++++++++++++++++++
kernel/module/kallsyms.c | 58 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 100 insertions(+)
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 1c6a6c1704d8..ec59f25259f2 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -78,6 +78,8 @@ int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
/* Lookup the address for a symbol. Returns 0 if not found. */
unsigned long kallsyms_lookup_name(const char *name);
+unsigned long kallsyms_lookup_address_and_size(const char *name, unsigned long *address, unsigned long * size);
+
extern int kallsyms_lookup_size_offset(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset);
diff --git a/include/linux/module.h b/include/linux/module.h
index 5beb39d56197..47fb46bd1b92 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -976,6 +976,8 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
/* Look for this name: can be of form module:name. */
unsigned long module_kallsyms_lookup_name(const char *name);
+unsigned long module_kallsyms_lookup_address_and_size(const char *name, unsigned long *address, unsigned long *size);
+
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name);
#else /* CONFIG_MODULES && CONFIG_KALLSYMS */
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index a9a0ca605d4a..5533816794da 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -160,6 +160,22 @@ unsigned long kallsyms_sym_address(int idx)
return kallsyms_relative_base - 1 - kallsyms_offsets[idx];
}
+unsigned long kallsyms_sym_address_and_size(int idx, unsigned long *size)
+{
+ /* values are unsigned offsets if --absolute-percpu is not in effect */
+ *size = kallsyms_offsets[idx+1] - kallsyms_offsets[idx];
+
+ if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU))
+ return kallsyms_relative_base + (u32)kallsyms_offsets[idx];
+
+ /* ...otherwise, positive offsets are absolute values */
+ if (kallsyms_offsets[idx] >= 0)
+ return kallsyms_offsets[idx];
+
+ /* ...and negative offsets are relative to kallsyms_relative_base - 1 */
+ return kallsyms_relative_base - 1 - kallsyms_offsets[idx];
+}
+
static unsigned int get_symbol_seq(int index)
{
unsigned int i, seq = 0;
@@ -242,6 +258,27 @@ unsigned long kallsyms_lookup_name(const char *name)
return module_kallsyms_lookup_name(name);
}
+EXPORT_SYMBOL(kallsyms_lookup_name);
+
+unsigned long kallsyms_lookup_address_and_size(const char *name, unsigned long *address, unsigned long * size)
+{
+ int ret;
+ unsigned int i;
+
+ /* Skip the search for empty string. */
+ if (!*name)
+ return 0;
+
+ ret = kallsyms_lookup_names(name, &i, NULL);
+ if (!ret){
+ *address = kallsyms_sym_address_and_size(get_symbol_seq(i), size);
+ return *address;
+ }
+
+ //return module_kallsyms_lookup_name(name);
+ return module_kallsyms_lookup_address_and_size(name, address, size);
+}
+EXPORT_SYMBOL(kallsyms_lookup_address_and_size);
/*
* Iterate over all symbols in vmlinux. For symbols from modules use
@@ -430,6 +467,7 @@ int lookup_symbol_name(unsigned long addr, char *symname)
/* See if it's in a module. */
return lookup_module_symbol_name(addr, symname);
}
+EXPORT_SYMBOL(lookup_symbol_name);
/* Look up a kernel symbol and return it in a text buffer. */
static int __sprint_symbol(char *buffer, unsigned long address,
diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
index bf65e0c3c86f..e8552f5e64c8 100644
--- a/kernel/module/kallsyms.c
+++ b/kernel/module/kallsyms.c
@@ -462,6 +462,64 @@ unsigned long module_kallsyms_lookup_name(const char *name)
return ret;
}
+static unsigned long __find_kallsyms_symbol_address_and_size_value(struct module *mod, const char *name, unsigned long *address, unsigned long* size)
+{
+ unsigned int i;
+ struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
+ unsigned long ret = 0;
+
+ for (i = 0; i < kallsyms->num_symtab; i++) {
+ const Elf_Sym *sym = &kallsyms->symtab[i];
+
+ if (strcmp(name, kallsyms_symbol_name(kallsyms, i)) == 0 &&
+ sym->st_shndx != SHN_UNDEF){
+ ret = kallsyms_symbol_value(sym);
+ *address = ret;
+ *size = sym->st_size;
+ return ret;
+ }
+ }
+ return 0;
+}
+
+static unsigned long __module_kallsyms_lookup_address_and_size(const char *name, unsigned long *address, unsigned long *size)
+{
+ struct module *mod;
+ char *colon;
+
+ colon = strnchr(name, MODULE_NAME_LEN, ':');
+ if (colon) {
+ mod = find_module_all(name, colon - name, false);
+ if (mod)
+ return __find_kallsyms_symbol_address_and_size_value(mod, colon + 1, address, size);
+ return 0;
+ }
+
+ list_for_each_entry_rcu(mod, &modules, list) {
+ unsigned long ret;
+
+ if (mod->state == MODULE_STATE_UNFORMED)
+ continue;
+ ret = __find_kallsyms_symbol_address_and_size_value(mod, name, address, size);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+/* Look for this name: can be of form module:name. */
+unsigned long module_kallsyms_lookup_address_and_size(const char *name, unsigned long *address, unsigned long *size)
+{
+ unsigned long ret;
+
+ /* Don't lock: we're in enough trouble already. */
+ preempt_disable();
+ ret = __module_kallsyms_lookup_address_and_size(name, address, size);
+ preempt_enable();
+ return ret;
+}
+
+
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name)
{
unsigned long ret;
--
2.34.1
^ permalink raw reply related
* [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Ihor Solodrai @ 2025-12-24 0:57 UTC (permalink / raw)
To: Luis Chamberlain, Petr Pavlu, Daniel Gomez, Nathan Chancellor,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman
Cc: linux-kernel, linux-modules, bpf, linux-kbuild, llvm
I've been chasing down the following flaky splat, introduced by recent
changes in BTF generation [1]:
------------[ cut here ]------------
BUG: unable to handle page fault for address: ffa000000233d828
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
RIP: 0010:simplify_symbols+0x2b2/0x480
9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
? __kmalloc_node_track_caller_noprof+0x37f/0x740
? __pfx_setup_modinfo_srcversion+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? kstrdup+0x4a/0x70
? srso_alias_return_thunk+0x5/0xfbef5
? setup_modinfo_srcversion+0x1a/0x30
? srso_alias_return_thunk+0x5/0xfbef5
? setup_modinfo+0x12b/0x1e0
load_module+0x133a/0x1610
__x64_sys_finit_module+0x31b/0x450
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
do_syscall_64+0x80/0x2d0
? srso_alias_return_thunk+0x5/0xfbef5
? exc_page_fault+0x95/0xc0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f1c63a2582d
9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
</TASK>
Modules linked in: bpf_testmod(OE)
CR2: ffa000000233d828
---[ end trace 0000000000000000 ]---
RIP: 0010:simplify_symbols+0x2b2/0x480
9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
This hasn't happened on BPF CI so far, for example, however I was able
to reproduce it on a particular x64 machine using a kernel built with
LLVM 20.
The crash happens on attempt to load one of the BPF selftest modules
(tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
is used by kfunc_module_order test.
The reason for the crash is that simplify_symbols() doesn't check for
bounds of the ELF section index:
for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
const char *name = info->strtab + sym[i].st_name;
switch (sym[i].st_shndx) {
case SHN_COMMON:
[...]
default:
/* Divert to percpu allocation if a percpu var. */
if (sym[i].st_shndx == info->index.pcpu)
secbase = (unsigned long)mod_percpu(mod);
else
/** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
sym[i].st_value += secbase;
break;
}
}
And in the case I was able to reproduce, the value 0xffff
(SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
Now this code fragment is between 15 and 20 years old, so obviously
it's not expected for a kmodule symbol to have such st_shndx
value. Even so, the kernel probably should fail loading the module
instead of crashing, which is what this patch attempts to fix.
Investigating further, I discovered that the module binary became
corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
section data in scripts/gen-btf.sh. This explains how the bug has
surfaced after gen-btf.sh was introduced:
$ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (2), but unable to locate the extended symbol index table
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (3), but unable to locate the extended symbol index table
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (4), but unable to locate the extended symbol index table
3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff] __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (16), but unable to locate the extended symbol index table
4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff] __BTF_ID__func__bpf_test_modorder_retx__44417
vs expected
$ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6 __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6 __BTF_ID__func__bpf_test_modorder_retx__44417
But why? Updating section data without changing it's size is not
supposed to affect sections indices, right?
With a bit more testing I confirmed that this is a LLVM-specific
issue (doesn't reproduce with GCC kbuild), and it's not stable,
because in link-vmlinux.h we also do:
${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
However:
$ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep 0xffff
# no output, which is good
So the suspect is the implementation of llvm-objcopy. As it turns out
there is a relevant known bug that explains the flakiness and isn't
fixed yet [3].
[1] https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
[2] https://man7.org/linux/man-pages/man5/elf.5.html
[3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
RFC
While this llvm-objcopy bug is not fixed, we can not trust it in the
kernel build pipeline. In the short-term we have to come up with a
workaround for .BTF_ids section update and replace the calls to
${OBJCOPY} --update-section with something else.
One potential workaround is to force the use of the objcopy (from
binutils) instead of llvm-objcopy when updating .BTF_ids section.
Alternatively, we could just dd the .BTF_ids data computed by
resolve_btfids at the right offset in the target ELF file.
Surprisingly I couldn't find a good way to read a section offset and
size from the ELF with a specified format in a command line. Both
readelf and {llvm-}objdump give a human readable output, and it
appears we can't rely on the column order, for example.
We could still try parsing readelf output with awk/grep, covering
output variants that appear in the kernel build.
We can also do:
llvm-readobj --elf-output-style=JSON --sections "$elf" | \
jq -r --arg name .BTF_ids '
.[0].Sections[] |
select(.Section.Name.Name == $name) |
"\(.Section.Offset) \(.Section.Size)"'
...but idk man, doesn't feel right.
Most reliable way to determine the size and offset of .BTF_ids section
is probably reading them by a C program with libelf, such as
resolve_btfids. Which is quite ironic, given the recent
changes. Setting the irony aside, we could add smth like:
resolve_btfids --section-info=.BTF_ids $elf
Reverting the gen-btf.sh patch is also a possible workaround, but I'd
really like to avoid it, given that BPF features/optimizations in
development depend on it.
I'd appreciate comments and suggestions on this issue. Thank you!
---
kernel/module/main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 710ee30b3bea..5bf456fad63e 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
break;
default:
+ if (sym[i].st_shndx >= info->hdr->e_shnum) {
+ pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
+ mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
+ ret = -ENOEXEC;
+ break;
+ }
+
/* Divert to percpu allocation if a percpu var. */
if (sym[i].st_shndx == info->index.pcpu)
secbase = (unsigned long)mod_percpu(mod);
--
2.52.0
^ permalink raw reply related
* Re: [PATCH] bpf: crypto: replace -EEXIST with -EBUSY
From: Alexei Starovoitov @ 2025-12-23 19:23 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: Daniel Gomez, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Luis Chamberlain, Petr Pavlu, Sami Tolvanen,
Aaron Tomlin, Lucas De Marchi, bpf, linux-modules, LKML,
Daniel Gomez
In-Reply-To: <47165c76-d856-4c5d-bf2d-6d5a7fe08d43@linux.dev>
On Sat, Dec 20, 2025 at 8:55 AM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 20/12/2025 03:48, Daniel Gomez wrote:
> > From: Daniel Gomez <da.gomez@samsung.com>
> >
> > The -EEXIST error code is reserved by the module loading infrastructure
> > to indicate that a module is already loaded. When a module's init
> > function returns -EEXIST, userspace tools like kmod interpret this as
> > "module already loaded" and treat the operation as successful, returning
> > 0 to the user even though the module initialization actually failed.
> >
> > This follows the precedent set by commit 54416fd76770 ("netfilter:
> > conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same
> > issue in nf_conntrack_helper_register().
> >
> > This affects bpf_crypto_skcipher module. While the configuration
> > required to build it as a module is unlikely in practice, it is
> > technically possible, so fix it for correctness.
> >
> > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > ---
> > The error code -EEXIST is reserved by the kernel module loader to
> > indicate that a module with the same name is already loaded. When a
> > module's init function returns -EEXIST, kmod interprets this as "module
> > already loaded" and reports success instead of failure [1].
> >
> > The kernel module loader will include a safety net that provides -EEXIST
> > to -EBUSY with a warning [2], and a documentation patch has been sent to
> > prevent future occurrences [3].
> >
> > These affected code paths were identified using a static analysis tool
> > [4] that traces -EEXIST returns to module_init(). The tool was developed
> > with AI assistance and all findings were manually validated.
> >
> > Link: https://lore.kernel.org/all/aKEVQhJpRdiZSliu@orbyte.nwl.cc/ [1]
> > Link: https://lore.kernel.org/all/20251013-module-warn-ret-v1-0-ab65b41af01f@intel.com/ [2]
> > Link: https://lore.kernel.org/all/20251218-dev-module-init-eexists-modules-docs-v1-0-361569aa782a@samsung.com/ [3]
> > Link: https://gitlab.com/-/snippets/4913469 [4]
>
> Even though I'm not quite sure that we should care once the core
> module loader can adjust the error, the change looks ok to me:
>
> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Applied to bpf-next.
^ permalink raw reply
* Re: [PATCH v3] module: replace use of system_wq with system_dfl_wq
From: Marco Crivellari @ 2025-12-23 14:27 UTC (permalink / raw)
To: Sami Tolvanen
Cc: linux-kernel, linux-modules, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Luis Chamberlain, Petr Pavlu
In-Reply-To: <176643400575.1902051.11698155532364546867.b4-ty@google.com>
On Mon, Dec 22, 2025 at 9:24 PM Sami Tolvanen <samitolvanen@google.com> wrote:
> Applied to modules-next, thanks!
>
> [1/1] module: replace use of system_wq with system_dfl_wq
> commit: 581ac2d4a58b81669cc6abf645a558bce5cf14ab
>
Many thanks!
--
Marco Crivellari
L3 Support Engineer
^ permalink raw reply
* Re: [PATCH] modules: moduleparam.h: add kernel-doc comments
From: Sami Tolvanen @ 2025-12-22 20:26 UTC (permalink / raw)
To: linux-kernel, Randy Dunlap
Cc: Sami Tolvanen, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
linux-modules
In-Reply-To: <20251214202357.2208303-1-rdunlap@infradead.org>
On Sun, 14 Dec 2025 12:23:57 -0800, Randy Dunlap wrote:
> Add missing kernel-doc comments to prevent kernel-doc warnings:
>
> Warning: include/linux/moduleparam.h:364 function parameter 'arg' not
> described in '__core_param_cb'
> Warning: include/linux/moduleparam.h:395 No description found for return
> value of 'parameq'
> Warning: include/linux/moduleparam.h:405 No description found for return
> value of 'parameqn'
>
> [...]
Applied to modules-next, thanks!
[1/1] modules: moduleparam.h: fix kernel-doc comments
commit: b68758e6f4307179247126b7641fa7ba7109c820
Best regards,
Sami
^ permalink raw reply
* Re: [PATCH] gendwarfksyms: Fix build on 32-bit hosts
From: Sami Tolvanen @ 2025-12-22 20:24 UTC (permalink / raw)
To: linux-modules, Sami Tolvanen
Cc: Luis Chamberlain, Petr Pavlu, Daniel Gomez, linux-kbuild,
linux-kernel, Michal Suchánek
In-Reply-To: <20251117203806.970840-2-samitolvanen@google.com>
On Mon, 17 Nov 2025 20:38:07 +0000, Sami Tolvanen wrote:
> We have interchangeably used unsigned long for some of the types
> defined in elfutils, assuming they're always 64-bit. This obviously
> fails when building gendwarfksyms on 32-bit hosts. Fix the types.
>
>
Applied to modules-next, thanks!
[1/1] gendwarfksyms: Fix build on 32-bit hosts
commit: ddc54f912a551f6eb0bbcfc3880f45fe27a252cb
Best regards,
Sami
^ permalink raw reply
* Re: [PATCH] params: Replace __modinit with __init_or_module
From: Sami Tolvanen @ 2025-12-22 20:24 UTC (permalink / raw)
To: Luis Chamberlain, Daniel Gomez, Petr Pavlu
Cc: Sami Tolvanen, Shyam Saini, Rasmus Villemoes, linux-modules,
linux-kernel
In-Reply-To: <20250819121248.460105-1-petr.pavlu@suse.com>
On Tue, 19 Aug 2025 14:12:09 +0200, Petr Pavlu wrote:
> Remove the custom __modinit macro from kernel/params.c and instead use the
> common __init_or_module macro from include/linux/module.h. Both provide the
> same functionality.
>
>
Applied to modules-next, thanks!
[1/1] params: Replace __modinit with __init_or_module
commit: 3cb0c3bdea5388519bc1bf575dca6421b133302b
Best regards,
Sami
^ permalink raw reply
* Re: [PATCH] module: Remove unused __INIT*_OR_MODULE macros
From: Sami Tolvanen @ 2025-12-22 20:24 UTC (permalink / raw)
To: Luis Chamberlain, Daniel Gomez, Petr Pavlu
Cc: Sami Tolvanen, linux-modules, linux-kernel
In-Reply-To: <20250819121423.460156-1-petr.pavlu@suse.com>
On Tue, 19 Aug 2025 14:13:37 +0200, Petr Pavlu wrote:
> Remove the __INIT_OR_MODULE, __INITDATA_OR_MODULE and
> __INITRODATA_OR_MODULE macros. These were introduced in commit 8b5a10fc6fd0
> ("x86: properly annotate alternatives.c"). Only __INITRODATA_OR_MODULE was
> ever used, in arch/x86/kernel/alternative.c. In 2011, commit dc326fca2b64
> ("x86, cpu: Clean up and unify the NOP selection infrastructure") removed
> this usage.
>
> [...]
Applied to modules-next, thanks!
[1/1] module: Remove unused __INIT*_OR_MODULE macros
commit: f13bff1b6d55de341f37a24781df5a1253377db3
Best regards,
Sami
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox