public inbox for linux-modules@vger.kernel.org

public inbox for linux-modules@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Steven Rostedt @ 2025-12-30 16:46 UTC (permalink / raw)
  To: Yury Norov
  Cc: Mathieu Desnoyers, Andy Shevchenko, Andrew Morton,
	Masami Hiramatsu, Christophe Leroy, Randy Dunlap, Ingo Molnar,
	Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
	Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
	Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	linux-kernel, intel-gfx, dri-devel, linux-modules,
	linux-trace-kernel
In-Reply-To: <aVP7XVtYwb4YV9gy@yury>

On Tue, 30 Dec 2025 11:18:37 -0500
Yury Norov <yury.norov@gmail.com> wrote:

> On Tue, Dec 30, 2025 at 09:21:00AM -0500, Mathieu Desnoyers wrote:
> > On 2025-12-30 03:55, Andy Shevchenko wrote:  
> > > On Mon, Dec 29, 2025 at 05:25:08PM -0500, Mathieu Desnoyers wrote:
> > > 
> > > ...
> > >   
> > > > One possible compromise would be to move it to its own header file,
> > > > and introduce a CONFIG_TRACE_PRINTK Kconfig option (default Y) that
> > > > would surround an include from linux/kernel.h with a preprocessor
> > > > conditional.  
> 
> We already have CONFIG_TRACING, and everything in the new
> trace_printk.h is conditional on it. We can protect the header in
> kernel.h with the same config.

Tracing is used in production all the time. So I think we can have a new
config just for trace_printk(). I was actually thinking of adding a
CONFIG_HIDE_TRACE_PRINTK, with the description of:

  trace_printk() is an extremely powerful utility to debug and develop
  kernel code. It is defined in kernel.h so that it can be easily accessed
  during development or having to debug existing code.

  But trace_printk() is not to be included in the final result, and having
  it in kernel.h during normal builds where the builder has no plans of
  debugging the kernel causes wasted cycles and time in compiling the kernel.

  By saying yes here, the include of trace_printk() macros will be hidden
  from kernel.h and help speed up the compile.

  If you do not plan on debugging this kernel, say Y

And then have in kernel.h:

#ifndef CONFIG_HIDE_TRACE_PRINTK
# include <linux/trace_printk.h>
#endif

This also means it gets set for allyesconfig builds, which I doubt anyone
wants to debug anyway.

> 
> > > > But please make sure the default stays as it is today:
> > > > include the trace printk header by default.  
> > > 
> > > "by default" where exactly?  
> 
> Seemingly nowhere.
> 
> > > The problem is that kernel.h is a total mess and
> > > it's included in a lot of mysterious ways (indirectly),  
> 
> Yes!
> 
> > > and in C you _must_
> > > include a header anyway for a custom API, just define *which* one.  
> >
> > This patch series moves the guts of trace_printk into its own header
> > file, which reduces clutter. So that's already progress. :)
> >   
> > > 
> > > Based on the Steven's first replies I see a compromise in having it inside
> > > printk.h. If you want to debug something with printf() (in general) the same
> > > header should provide all species. Do you agree?  
>  
> It may sound logical, but I don't like this idea. Printk() is used
> for debugging by everyone, but its main goal is to communicate to
> userspace and between different parts of the kernel. Notice how all
> debugging and development API in linux/pritnk.h is protected with the
> corresponding ifdefery. 
> 
> Contrary to that, trace_printk() is a purely debugging feature. There's
> no use for it after the debugging is done. (Or I missed something?)

I actually agree with you here. I don't think adding trace_printk.h into
printk.h is appropriate. I only said that anywhere you can add a printk()
for debugging, you should also be able to add trace_printk(). I believe
kernel.h is the appropriate place for both.

> 
> Everyone admits that kernel.h is a mess. Particularly, it's a mess of
> development and production features. So, moving trace_printk() from an
> already messy kernel.h to a less messy printk.h - to me it looks like
> spreading the mess.
> 
> > I don't have a strong opinion about including trace_printk.h from either
> > kernel.h or printk.h. As long as it's still included by a default kernel
> > config the same way it has been documented/used since 2009.  
> 
> Can you please point to the documentation and quote the exact piece
> stating that? Git history points to the commit 40ada30f9621f from Ingo
> that decouples tracers from DEBUG_KERNEL, and the following 422d3c7a577
> from Kosaki that force-enables the new TRACING_SUPPORT regardless of
> the DEBUG_KERNEL state.
> 
> To me, decoupling tracing from DEBUG_KERNEL looks accidental rather than
> intentional. So maybe simply restore that dependency?

Absolutely not. Tracing is used to debug production kernels, and things
like live kernel patching also depend on it, not to mention BPF.

> 
> Currently, even with tinyconfig, DEBUG_KERNEL is enabled (via EXPERT).
> And even if EXPERT and DEBUG_KERNEL are off, tracers are still enabled.
> This doesn't look right...

Looks fine to me.

-- Steve

^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Yury Norov @ 2025-12-30 16:18 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andy Shevchenko, Steven Rostedt, Andrew Morton, Masami Hiramatsu,
	Christophe Leroy, Randy Dunlap, Ingo Molnar, Jani Nikula,
	Joonas Lahtinen, David Laight, Petr Pavlu, Andi Shyti,
	Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
	dri-devel, linux-modules, linux-trace-kernel
In-Reply-To: <71767aa7-0247-4bcc-8746-3338905197b3@efficios.com>

On Tue, Dec 30, 2025 at 09:21:00AM -0500, Mathieu Desnoyers wrote:
> On 2025-12-30 03:55, Andy Shevchenko wrote:
> > On Mon, Dec 29, 2025 at 05:25:08PM -0500, Mathieu Desnoyers wrote:
> > 
> > ...
> > 
> > > One possible compromise would be to move it to its own header file,
> > > and introduce a CONFIG_TRACE_PRINTK Kconfig option (default Y) that
> > > would surround an include from linux/kernel.h with a preprocessor
> > > conditional.

We already have CONFIG_TRACING, and everything in the new
trace_printk.h is conditional on it. We can protect the header in
kernel.h with the same config.

> > > But please make sure the default stays as it is today:
> > > include the trace printk header by default.
> > 
> > "by default" where exactly?

Seemingly nowhere.

> > The problem is that kernel.h is a total mess and
> > it's included in a lot of mysterious ways (indirectly),

Yes!

> > and in C you _must_
> > include a header anyway for a custom API, just define *which* one.
>
> This patch series moves the guts of trace_printk into its own header
> file, which reduces clutter. So that's already progress. :)
> 
> > 
> > Based on the Steven's first replies I see a compromise in having it inside
> > printk.h. If you want to debug something with printf() (in general) the same
> > header should provide all species. Do you agree?

It may sound logical, but I don't like this idea. Printk() is used
for debugging by everyone, but its main goal is to communicate to
userspace and between different parts of the kernel. Notice how all
debugging and development API in linux/pritnk.h is protected with the
corresponding ifdefery. 

Contrary to that, trace_printk() is a purely debugging feature. There's
no use for it after the debugging is done. (Or I missed something?)

Everyone admits that kernel.h is a mess. Particularly, it's a mess of
development and production features. So, moving trace_printk() from an
already messy kernel.h to a less messy printk.h - to me it looks like
spreading the mess.

> I don't have a strong opinion about including trace_printk.h from either
> kernel.h or printk.h. As long as it's still included by a default kernel
> config the same way it has been documented/used since 2009.

Can you please point to the documentation and quote the exact piece
stating that? Git history points to the commit 40ada30f9621f from Ingo
that decouples tracers from DEBUG_KERNEL, and the following 422d3c7a577
from Kosaki that force-enables the new TRACING_SUPPORT regardless of
the DEBUG_KERNEL state.

To me, decoupling tracing from DEBUG_KERNEL looks accidental rather than
intentional. So maybe simply restore that dependency?

Currently, even with tinyconfig, DEBUG_KERNEL is enabled (via EXPERT).
And even if EXPERT and DEBUG_KERNEL are off, tracers are still enabled.
This doesn't look right...

Thanks,
Yury

^ permalink raw reply

* Re: [PATCH] module: show module version directly in print_modules()
From: Aaron Tomlin @ 2025-12-30 16:10 UTC (permalink / raw)
  To: Yafang Shao; +Cc: Petr Pavlu, mcgrof, da.gomez, samitolvanen, linux-modules
In-Reply-To: <CALOAHbBF_Q02amBXKh_iGPepp_-CbF91a-HAXa3pSnO4qBnX4Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On Tue, Dec 30, 2025 at 10:12:09PM +0800, Yafang Shao wrote:
> > As mentioned, most in-tree modules do not specify an explicit version,
> > so in terms of bloating the information about loaded modules, the patch
> > should have minimal impact in practice. Alternatively, the version
> > information could be printed only for external modules.
> 
> Good suggestion.
> I believe it’s sufficient to print only for external modules.
> 
> Does the following change look good to you?
> 
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -3901,7 +3901,10 @@ void print_modules(void)
>         list_for_each_entry_rcu(mod, &modules, list) {
>                 if (mod->state == MODULE_STATE_UNFORMED)
>                         continue;
> -               pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
> +               pr_cont(" %s", mod->name);
> +               if (mod->version && test_bit(TAINT_OOT_MUDLE, &mod->taints))
> +                       pr_cont("-%s", mod->version);
> +               pr_cont("%s", module_flags(mod, buf, true));
>         }
> 
>         print_unloaded_tainted_modules();
> 

Hi Yafang,


This refined approach is significantly more palatable and addresses the
primary concerns regarding log bloat. By gating the version output behind
the TAINT_OOT_MODULE bit, we strike an excellent balance between
operational necessity and kernel log cleanliness.

From a maintenance perspective, this is a much "tidier" solution. In-tree
modules are tied to the specific kernel version already, so printing their
versions is redundant. However, for external drivers (like proprietary
networking or GPU stacks), the version is the single most critical piece of
metadata for triage.

The logic is sound, though there is a minor typo in the bit name that will
cause a build failure. Here is the corrected implementation:

@@ -3901,7 +3901,10 @@ void print_modules(void)
 	list_for_each_entry_rcu(mod, &modules, list) {
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
-		pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
+		pr_cont(" %s", mod->name);
+		/* Only append version for out-of-tree modules */
+		if (mod->version && test_bit(TAINT_OOT_MODULE, &mod->taints))
+			pr_cont("-%s", mod->version);
+		pr_cont("%s", module_flags(mod, buf, true));
 	}
 
 	print_unloaded_tainted_modules();


Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Mathieu Desnoyers @ 2025-12-30 14:21 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Steven Rostedt, Andrew Morton, Yury Norov (NVIDIA),
	Masami Hiramatsu, Christophe Leroy, Randy Dunlap, Ingo Molnar,
	Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
	Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
	Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	linux-kernel, intel-gfx, dri-devel, linux-modules,
	linux-trace-kernel
In-Reply-To: <aVOTbArAxmbT5LY9@smile.fi.intel.com>

On 2025-12-30 03:55, Andy Shevchenko wrote:
> On Mon, Dec 29, 2025 at 05:25:08PM -0500, Mathieu Desnoyers wrote:
> 
> ...
> 
>> One possible compromise would be to move it to its own header file,
>> and introduce a CONFIG_TRACE_PRINTK Kconfig option (default Y) that
>> would surround an include from linux/kernel.h with a preprocessor
>> conditional. But please make sure the default stays as it is today:
>> include the trace printk header by default.
> 
> "by default" where exactly? The problem is that kernel.h is a total mess and
> it's included in a lot of mysterious ways (indirectly), and in C you _must_
> include a header anyway for a custom API, just define *which* one.

This patch series moves the guts of trace_printk into its own header
file, which reduces clutter. So that's already progress. :)

> 
> Based on the Steven's first replies I see a compromise in having it inside
> printk.h. If you want to debug something with printf() (in general) the same
> header should provide all species. Do you agree?

I don't have a strong opinion about including trace_printk.h from either
kernel.h or printk.h. As long as it's still included by a default kernel
config the same way it has been documented/used since 2009.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply

* Re: [PATCH] module: show module version directly in print_modules()
From: Yafang Shao @ 2025-12-30 14:12 UTC (permalink / raw)
  To: Petr Pavlu; +Cc: mcgrof, da.gomez, samitolvanen, atomlin, linux-modules
In-Reply-To: <971b1fd7-5702-4cf7-ba84-aedde0296449@suse.com>

On Tue, Dec 30, 2025 at 8:41 PM Petr Pavlu <petr.pavlu@suse.com> wrote:
>
> On 12/29/25 3:45 AM, Yafang Shao wrote:
> > We maintain a vmcore analysis script on each server that automatically
> > parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
> > us save considerable effort by avoiding analysis of known bugs.
> >
> > For vmcores triggered by a driver bug, the system calls print_modules() to
> > list the loaded modules. However, print_modules() does not output module
> > version information. Across a large fleet of servers, there are often many
> > different module versions running simultaneously, and we need to know which
> > driver version caused a given vmcore.
> >
> > Currently, the only reliable way to obtain the module version associated
> > with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
> > operation that is resource-intensive. Therefore, we propose printing the
> > driver version directly in the log, which is far more efficient.
> >
> > - Before this patch
> >
> >   Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
> >   Unloaded tainted modules: nvidia_peermem(PO):1
> >
> > - After this patch
> >
> >   Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
> >   Unloaded tainted modules: nvidia_peermem(PO):1
> I feel that module versions are not particularly useful for in-tree
> modules nowadays. They rarely change and therefore provide little
> information about what code is actually running.
>
> This is supported by their limited use in the kernel. In v6.19-rc3,
> I see the following:
>
> $ git grep '^MODULE_LICENSE(.*);$' | wc -l
> 12481
> $ git grep '^MODULE_VERSION(.*);$' | wc -l
> 605
>
> Moreover, in the event of a crash, the log should contain the kernel
> version and usually also the vmlinux build ID, which should provide
> enough information to identify in-tree modules.
>
> However, based on the example in your patch description, it seems to me
> that your main interest is likely in identifying external modules. If
> that is correct, I see why it might be helpful to quickly identify their
> versions.

That's correct.
The motivation behind this change is that the external NVIDIA driver
[0] frequently causes kernel panics across our server fleet.
While we continuously upgrade to newer NVIDIA driver versions,
upgrading the entire fleet is time‑consuming.
Therefore, we need to identify which driver version is responsible for
each panic.

[0] https://github.com/NVIDIA/open-gpu-kernel-modules/tags

> This assumes that developers of external modules actually
> update MODULE_VERSION() in their releases, but I don't know if this is
> generally true.

The external modules we currently use include the NVIDIA GPU driver,
the mlx5 network driver, and related drivers such as gdrcopy.
All of them carry module versions, such as:

  gdrdrv-2.5(PO)
  nvidia-535.274.02(PO)
  nvidia_uvm-535.274.02(PO)
  nvidia_drm-535.274.02(PO)
  mlx5_core-5.8-2.0.3(O)
  nvidia_modeset-535.274.02(PO)

>
> As mentioned, most in-tree modules do not specify an explicit version,
> so in terms of bloating the information about loaded modules, the patch
> should have minimal impact in practice. Alternatively, the version
> information could be printed only for external modules.

Good suggestion.
I believe it’s sufficient to print only for external modules.

Does the following change look good to you?

--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3901,7 +3901,10 @@ void print_modules(void)
        list_for_each_entry_rcu(mod, &modules, list) {
                if (mod->state == MODULE_STATE_UNFORMED)
                        continue;
-               pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
+               pr_cont(" %s", mod->name);
+               if (mod->version && test_bit(TAINT_OOT_MUDLE, &mod->taints))
+                       pr_cont("-%s", mod->version);
+               pr_cont("%s", module_flags(mod, buf, true));
        }

        print_unloaded_tainted_modules();


-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCH] module: show module version directly in print_modules()
From: Petr Pavlu @ 2025-12-30 12:41 UTC (permalink / raw)
  To: Yafang Shao; +Cc: mcgrof, da.gomez, samitolvanen, atomlin, linux-modules
In-Reply-To: <20251229024556.25946-1-laoar.shao@gmail.com>

On 12/29/25 3:45 AM, Yafang Shao wrote:
> We maintain a vmcore analysis script on each server that automatically
> parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
> us save considerable effort by avoiding analysis of known bugs.
> 
> For vmcores triggered by a driver bug, the system calls print_modules() to
> list the loaded modules. However, print_modules() does not output module
> version information. Across a large fleet of servers, there are often many
> different module versions running simultaneously, and we need to know which
> driver version caused a given vmcore.
> 
> Currently, the only reliable way to obtain the module version associated
> with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
> operation that is resource-intensive. Therefore, we propose printing the
> driver version directly in the log, which is far more efficient.
> 
> - Before this patch
> 
>   Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
>   Unloaded tainted modules: nvidia_peermem(PO):1
> 
> - After this patch
> 
>   Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
>   Unloaded tainted modules: nvidia_peermem(PO):1
I feel that module versions are not particularly useful for in-tree
modules nowadays. They rarely change and therefore provide little
information about what code is actually running.

This is supported by their limited use in the kernel. In v6.19-rc3,
I see the following:

$ git grep '^MODULE_LICENSE(.*);$' | wc -l
12481
$ git grep '^MODULE_VERSION(.*);$' | wc -l
605

Moreover, in the event of a crash, the log should contain the kernel
version and usually also the vmlinux build ID, which should provide
enough information to identify in-tree modules.

However, based on the example in your patch description, it seems to me
that your main interest is likely in identifying external modules. If
that is correct, I see why it might be helpful to quickly identify their
versions. This assumes that developers of external modules actually
update MODULE_VERSION() in their releases, but I don't know if this is
generally true.

As mentioned, most in-tree modules do not specify an explicit version,
so in terms of bloating the information about loaded modules, the patch
should have minimal impact in practice. Alternatively, the version
information could be printed only for external modules.

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Petr Pavlu @ 2025-12-30  9:14 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: Luis Chamberlain, Daniel Gomez, Sami Tolvanen, Nathan Chancellor,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, linux-kernel, linux-modules,
	bpf, linux-kbuild, llvm
In-Reply-To: <20251224005752.201911-1-ihor.solodrai@linux.dev>

On 12/24/25 1:57 AM, Ihor Solodrai wrote:
> [...]
> ---
>  kernel/module/main.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 710ee30b3bea..5bf456fad63e 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>  			break;
>  
>  		default:
> +			if (sym[i].st_shndx >= info->hdr->e_shnum) {
> +				pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
> +				       mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
> +				ret = -ENOEXEC;
> +				break;
> +			}
> +
>  			/* Divert to percpu allocation if a percpu var. */
>  			if (sym[i].st_shndx == info->index.pcpu)
>  				secbase = (unsigned long)mod_percpu(mod);

The module loader should always at least get through the signature and
blacklist checks without crashing due to a corrupted ELF file. After
that point, the module content is to be trusted, but we try to error out
for most issues that would cause problems later on.

In this specific case, I think it is useful to add this check because
the code potentially crashes on a valid module that uses SHN_XINDEX. The
loader already rejects sh_link and sh_info values that are above e_shnum
in several places, so the patch is consistent with that behavior.

I suggest adding a proper commit description and sending a non-RFC
version.

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Andy Shevchenko @ 2025-12-30  8:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, Andrew Morton, Yury Norov (NVIDIA),
	Masami Hiramatsu, Christophe Leroy, Randy Dunlap, Ingo Molnar,
	Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
	Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
	Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	linux-kernel, intel-gfx, dri-devel, linux-modules,
	linux-trace-kernel
In-Reply-To: <9833cb61-1ec5-4cc1-ad9d-3e07f3deff80@efficios.com>

On Mon, Dec 29, 2025 at 05:25:08PM -0500, Mathieu Desnoyers wrote:

...

> One possible compromise would be to move it to its own header file,
> and introduce a CONFIG_TRACE_PRINTK Kconfig option (default Y) that
> would surround an include from linux/kernel.h with a preprocessor
> conditional. But please make sure the default stays as it is today:
> include the trace printk header by default.

"by default" where exactly? The problem is that kernel.h is a total mess and
it's included in a lot of mysterious ways (indirectly), and in C you _must_
include a header anyway for a custom API, just define *which* one.

Based on the Steven's first replies I see a compromise in having it inside
printk.h. If you want to debug something with printf() (in general) the same
header should provide all species. Do you agree?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH] module: show module version directly in print_modules()
From: Yafang Shao @ 2025-12-30  3:58 UTC (permalink / raw)
  To: Aaron Tomlin; +Cc: mcgrof, petr.pavlu, da.gomez, samitolvanen, linux-modules
In-Reply-To: <bdp2iitkjdmhl4ycfiu6d4sri3mmsqn2dd26p67heilu33bosv@zmkzcnyayqbt>

On Tue, Dec 30, 2025 at 11:11 AM Aaron Tomlin <atomlin@atomlin.com> wrote:
>
> On Mon, Dec 29, 2025 at 10:45:56AM +0800, Yafang Shao wrote:
> > We maintain a vmcore analysis script on each server that automatically
> > parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
> > us save considerable effort by avoiding analysis of known bugs.
> >
> > For vmcores triggered by a driver bug, the system calls print_modules() to
> > list the loaded modules. However, print_modules() does not output module
> > version information. Across a large fleet of servers, there are often many
> > different module versions running simultaneously, and we need to know which
> > driver version caused a given vmcore.
> >
> > Currently, the only reliable way to obtain the module version associated
> > with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
> > operation that is resource-intensive. Therefore, we propose printing the
> > driver version directly in the log, which is far more efficient.
> >
> > - Before this patch
> >
> >   Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
> >   Unloaded tainted modules: nvidia_peermem(PO):1
> >
> > - After this patch
> >
> >   Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
> >   Unloaded tainted modules: nvidia_peermem(PO):1
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  kernel/module/main.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/module/main.c b/kernel/module/main.c
> > index 710ee30b3bea..1ad9afec8730 100644
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -3901,7 +3901,10 @@ void print_modules(void)
> >       list_for_each_entry_rcu(mod, &modules, list) {
> >               if (mod->state == MODULE_STATE_UNFORMED)
> >                       continue;
> > -             pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
> > +             pr_cont(" %s", mod->name);
> > +             if (mod->version)
> > +                     pr_cont("-%s", mod->version);
> > +             pr_cont("%s", module_flags(mod, buf, true));
> >       }
> >
> >       print_unloaded_tainted_modules();
> > --
> > 2.43.5
> >
>
> Hi Yafang,
>
> While I certainly appreciate the operational burden of managing a
> large-scale fleet and the desire to automate crash triage, I am somewhat
> hesitant to support this change in its current form.
>
> Perhaps the more appropriate approach would be to extend the existing
> module information infrastructure to include the version only when it is
> explicitly requested: introduce print_module_versions().

Isn't that redundant since print_modules() already outputs module names?

>
> In my view, while the requirement for better version visibility is valid,
> we must ensure that the change does not compromise the readability of the
> crash report for the rest of the community.

I understand your concern, but could you elaborate on the potential
troubles? The extraction is straightforward with simple text
processing.

$  cat vmcore-dmesg.txt | awk -F': ' '/Modules linked
in:/{gsub(/\([^)]*\)/, "", $2); n=split($2,a," "); for(i=1;i<=n;i++)
if(a[i]!="") print a[i]}'

Besides, kernel logs aren't an ABI—developers are expected to adapt to
upstream changes. Otherwise, the kernel itself would become
unmaintainable.


>
> Nacked-by: Aaron Tomlin <atomlin@atomlin.com>


-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCH] module: show module version directly in print_modules()
From: Aaron Tomlin @ 2025-12-30  3:11 UTC (permalink / raw)
  To: Yafang Shao; +Cc: mcgrof, petr.pavlu, da.gomez, samitolvanen, linux-modules
In-Reply-To: <20251229024556.25946-1-laoar.shao@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2707 bytes --]

On Mon, Dec 29, 2025 at 10:45:56AM +0800, Yafang Shao wrote:
> We maintain a vmcore analysis script on each server that automatically
> parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
> us save considerable effort by avoiding analysis of known bugs.
> 
> For vmcores triggered by a driver bug, the system calls print_modules() to
> list the loaded modules. However, print_modules() does not output module
> version information. Across a large fleet of servers, there are often many
> different module versions running simultaneously, and we need to know which
> driver version caused a given vmcore.
> 
> Currently, the only reliable way to obtain the module version associated
> with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
> operation that is resource-intensive. Therefore, we propose printing the
> driver version directly in the log, which is far more efficient.
> 
> - Before this patch
> 
>   Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
>   Unloaded tainted modules: nvidia_peermem(PO):1
> 
> - After this patch
> 
>   Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
>   Unloaded tainted modules: nvidia_peermem(PO):1
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  kernel/module/main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 710ee30b3bea..1ad9afec8730 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -3901,7 +3901,10 @@ void print_modules(void)
>  	list_for_each_entry_rcu(mod, &modules, list) {
>  		if (mod->state == MODULE_STATE_UNFORMED)
>  			continue;
> -		pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
> +		pr_cont(" %s", mod->name);
> +		if (mod->version)
> +			pr_cont("-%s", mod->version);
> +		pr_cont("%s", module_flags(mod, buf, true));
>  	}
>  
>  	print_unloaded_tainted_modules();
> -- 
> 2.43.5
> 

Hi Yafang,

While I certainly appreciate the operational burden of managing a
large-scale fleet and the desire to automate crash triage, I am somewhat
hesitant to support this change in its current form.

Perhaps the more appropriate approach would be to extend the existing
module information infrastructure to include the version only when it is
explicitly requested: introduce print_module_versions().

In my view, while the requirement for better version visibility is valid,
we must ensure that the change does not compromise the readability of the
crash report for the rest of the community.

Nacked-by: Aaron Tomlin <atomlin@atomlin.com>


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Ihor Solodrai @ 2025-12-30  0:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Nathan Chancellor, Yonghong Song, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, LKML,
	linux-modules, bpf, Linux Kbuild mailing list, clang-built-linux
In-Reply-To: <CAADnVQ+X-a92LEgcd-HjTJUcw2zR_jtUmD9U-Z6OtNnvpVwfiw@mail.gmail.com>

On 12/29/25 4:50 PM, Alexei Starovoitov wrote:
> On Mon, Dec 29, 2025 at 4:39 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>>
>> [...]
>>
>>
>> From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001
>> From: Ihor Solodrai <ihor.solodrai@linux.dev>
>> Date: Mon, 29 Dec 2025 15:49:51 -0800
>> Subject: [PATCH] BTF_OBJCOPY
>>
>> ---
>>  Makefile                             |  6 +++++-
>>  lib/Kconfig.debug                    |  1 +
>>  scripts/gen-btf.sh                   | 10 +++++-----
>>  scripts/link-vmlinux.sh              |  2 +-
>>  tools/testing/selftests/bpf/Makefile |  4 ++--
>>  5 files changed, 14 insertions(+), 9 deletions(-)
> 
> All the makefile hackery looks like overkill and wrong direction.
> 
> What's wrong with kernel/module/main.c change?
> 
> Module loading already does a bunch of sanity checks for ELF
> in elf_validity_cache_copy().
> 
> + if (sym[i].st_shndx >= info->hdr->e_shnum)
> is just one more.
> 
> Maybe it can be moved to elf_validity*() somewhere,
> but that's a minor detail.
> 
> iiuc llvm-objcopy affects only bpf testmod, so not a general
> issue that needs top level makefile changes.

AFAIU, the problem is that the llvm-objcopy bug is essentially
use-after-free [1], that may (or may not) corrupt st_shndx value of
some symbols when executing --update-section.

And so we can't trust this command anywhere in the kernel build, even
though it only manifested itself in a BPF test module.

With the gen-btf.sh changes ${OBJCOPY} --update-section is called for
all binaries with .BTF_ids: vmlinux and all modules.

The fix in module.c is an independent kernel bug, that is hopefully
fixed with the st_shndx check.

[1] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952


^ permalink raw reply

* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Alexei Starovoitov @ 2025-12-30  0:50 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: Nathan Chancellor, Yonghong Song, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, LKML,
	linux-modules, bpf, Linux Kbuild mailing list, clang-built-linux
In-Reply-To: <6b87701b-98fb-4089-a201-a7b402e338f9@linux.dev>

On Mon, Dec 29, 2025 at 4:39 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>
> On 12/29/25 1:29 PM, Nathan Chancellor wrote:
> > Hi Ihor,
> >
> > On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote:
> >> I think the simplest workaround is this one: use objcopy from binutils
> >> instead of llvm-objcopy when doing --update-section.
> >>
> >> There are just 3 places where that happens, so the OBJCOPY
> >> substitution is going to be localized.
> >>
> >> Also binutils is a documented requirement for compiling the kernel,
> >> whether with clang or not [1].
> >>
> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29
> >
> > This would necessitate always specifying a CROSS_COMPILE variable when
> > cross compiling with LLVM=1, which I would really like to avoid. The
> > LLVM variants have generally been drop in substitutes for several
> > versions now so some groups such as Android may not even have GNU
> > binutils installed in their build environment (see a recent build
> > fix [1]).
> >
> > I would much prefer detecting llvm-objcopy in Kconfig (such as by
> > creating CONFIG_OBJCOPY_IS_LLVM using the existing check for
> > llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working
> > copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt
> > into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be
> > selectable.
>
> I like the idea of opt into GNU objcopy, however I think we should
> avoid requiring kbuilds that want CONFIG_DEBUG_INFO_BTF to change any
> configuration (such as adding an explicit OBJCOPY= in a build command).
>
> I drafted a patch (pasted below), introducing BTF_OBJCOPY which
> defaults to GNU objcopy. This implements the workaround, and should be
> easy to update with a LLVM version check later after the bug is fixed.
>
> This bit:
>
> @@ -391,6 +391,7 @@ config DEBUG_INFO_BTF
>         depends on PAHOLE_VERSION >= 122
>         # pahole uses elfutils, which does not have support for Hexagon relocations
>         depends on !HEXAGON
> +       depends on $(success,command -v $(BTF_OBJCOPY))
>
> Will turn off DEBUG_INFO_BTF if relevant GNU objcopy happens to not be
> installed.
>
> However I am not sure this is the right way to fail here. Because if
> the kernel really does need BTF (which is effectively all kernels
> using BPF), then we are breaking them anyways just downstream of the
> build.
>
> An "objcopy: command not found" might make some pipelines red, but it
> is very clear how to address.
>
> Thoughts?
>
>
> From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001
> From: Ihor Solodrai <ihor.solodrai@linux.dev>
> Date: Mon, 29 Dec 2025 15:49:51 -0800
> Subject: [PATCH] BTF_OBJCOPY
>
> ---
>  Makefile                             |  6 +++++-
>  lib/Kconfig.debug                    |  1 +
>  scripts/gen-btf.sh                   | 10 +++++-----
>  scripts/link-vmlinux.sh              |  2 +-
>  tools/testing/selftests/bpf/Makefile |  4 ++--
>  5 files changed, 14 insertions(+), 9 deletions(-)

All the makefile hackery looks like overkill and wrong direction.

What's wrong with kernel/module/main.c change?

Module loading already does a bunch of sanity checks for ELF
in elf_validity_cache_copy().

+ if (sym[i].st_shndx >= info->hdr->e_shnum)
is just one more.

Maybe it can be moved to elf_validity*() somewhere,
but that's a minor detail.

iiuc llvm-objcopy affects only bpf testmod, so not a general
issue that needs top level makefile changes.

^ permalink raw reply

* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Ihor Solodrai @ 2025-12-30  0:38 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Yonghong Song, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, linux-kernel, linux-modules,
	bpf, linux-kbuild, llvm
In-Reply-To: <20251229212938.GA2701672@ax162>

On 12/29/25 1:29 PM, Nathan Chancellor wrote:
> Hi Ihor,
> 
> On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote:
>> I think the simplest workaround is this one: use objcopy from binutils
>> instead of llvm-objcopy when doing --update-section.
>>
>> There are just 3 places where that happens, so the OBJCOPY
>> substitution is going to be localized.
>>
>> Also binutils is a documented requirement for compiling the kernel,
>> whether with clang or not [1].
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29
> 
> This would necessitate always specifying a CROSS_COMPILE variable when
> cross compiling with LLVM=1, which I would really like to avoid. The
> LLVM variants have generally been drop in substitutes for several
> versions now so some groups such as Android may not even have GNU
> binutils installed in their build environment (see a recent build
> fix [1]).
> 
> I would much prefer detecting llvm-objcopy in Kconfig (such as by
> creating CONFIG_OBJCOPY_IS_LLVM using the existing check for
> llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working
> copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt
> into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be
> selectable.

I like the idea of opt into GNU objcopy, however I think we should
avoid requiring kbuilds that want CONFIG_DEBUG_INFO_BTF to change any
configuration (such as adding an explicit OBJCOPY= in a build command).

I drafted a patch (pasted below), introducing BTF_OBJCOPY which
defaults to GNU objcopy. This implements the workaround, and should be
easy to update with a LLVM version check later after the bug is fixed.

This bit:

@@ -391,6 +391,7 @@ config DEBUG_INFO_BTF
        depends on PAHOLE_VERSION >= 122
        # pahole uses elfutils, which does not have support for Hexagon relocations
        depends on !HEXAGON
+       depends on $(success,command -v $(BTF_OBJCOPY))

Will turn off DEBUG_INFO_BTF if relevant GNU objcopy happens to not be
installed.

However I am not sure this is the right way to fail here. Because if
the kernel really does need BTF (which is effectively all kernels
using BPF), then we are breaking them anyways just downstream of the
build.

An "objcopy: command not found" might make some pipelines red, but it
is very clear how to address.

Thoughts?


From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Mon, 29 Dec 2025 15:49:51 -0800
Subject: [PATCH] BTF_OBJCOPY

---
 Makefile                             |  6 +++++-
 lib/Kconfig.debug                    |  1 +
 scripts/gen-btf.sh                   | 10 +++++-----
 scripts/link-vmlinux.sh              |  2 +-
 tools/testing/selftests/bpf/Makefile |  4 ++--
 5 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Makefile b/Makefile
index 18adf5502244..b7797a85b8c2 100644
--- a/Makefile
+++ b/Makefile
@@ -534,6 +534,9 @@ CLIPPY_DRIVER	= clippy-driver
 BINDGEN		= bindgen
 PAHOLE		= pahole
 RESOLVE_BTFIDS	= $(objtree)/tools/bpf/resolve_btfids/resolve_btfids
+# Always use GNU objcopy when manipulating BTF sections to work around
+# a bug in llvm-objcopy: https://github.com/llvm/llvm-project/issues/168060
+BTF_OBJCOPY	= $(CROSS_COMPILE)objcopy
 LEX		= flex
 YACC		= bison
 AWK		= awk
@@ -627,7 +630,8 @@ export CLIPPY_CONF_DIR := $(srctree)
 export ARCH SRCARCH CONFIG_SHELL BASH HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC HOSTPKG_CONFIG
 export RUSTC RUSTDOC RUSTFMT RUSTC_OR_CLIPPY_QUIET RUSTC_OR_CLIPPY BINDGEN
 export HOSTRUSTC KBUILD_HOSTRUSTFLAGS
-export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
+export CPP AR NM STRIP OBJCOPY OBJDUMP READELF LEX YACC AWK INSTALLKERNEL
+export PAHOLE RESOLVE_BTFIDS BTF_OBJCOPY
 export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX
 export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD TAR
 export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS KBUILD_PROCMACROLDFLAGS LDFLAGS_MODULE
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 60281c4f9e99..ec9e683244fa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -391,6 +391,7 @@ config DEBUG_INFO_BTF
 	depends on PAHOLE_VERSION >= 122
 	# pahole uses elfutils, which does not have support for Hexagon relocations
 	depends on !HEXAGON
+	depends on $(success,command -v $(BTF_OBJCOPY))
 	help
 	  Generate deduplicated BTF type information from DWARF debug info.
 	  Turning this on requires pahole v1.22 or later, which will convert
diff --git a/scripts/gen-btf.sh b/scripts/gen-btf.sh
index 06c6d8becaa2..6ae671523edd 100755
--- a/scripts/gen-btf.sh
+++ b/scripts/gen-btf.sh
@@ -97,9 +97,9 @@ gen_btf_o()
 	# be redefined in the linker script.
 	info OBJCOPY "${btf_data}"
 	echo "" | ${CC} ${CLANG_FLAGS} -c -x c -o ${btf_data} -
-	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
+	${BTF_OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
 		--set-section-flags .BTF=alloc,readonly ${btf_data}
-	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
+	${BTF_OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
 
 	# Change e_type to ET_REL so that it can be used to link final vmlinux.
 	# GNU ld 2.35+ and lld do not allow an ET_EXEC input.
@@ -114,16 +114,16 @@ gen_btf_o()
 embed_btf_data()
 {
 	info OBJCOPY "${ELF_FILE}.BTF"
-	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF ${ELF_FILE}
+	${BTF_OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF ${ELF_FILE}
 
 	# a module might not have a .BTF_ids or .BTF.base section
 	local btf_base="${ELF_FILE}.BTF.base"
 	if [ -f "${btf_base}" ]; then
-		${OBJCOPY} --add-section .BTF.base=${btf_base} ${ELF_FILE}
+		${BTF_OBJCOPY} --add-section .BTF.base=${btf_base} ${ELF_FILE}
 	fi
 	local btf_ids="${ELF_FILE}.BTF_ids"
 	if [ -f "${btf_ids}" ]; then
-		${OBJCOPY} --update-section .BTF_ids=${btf_ids} ${ELF_FILE}
+		${BTF_OBJCOPY} --update-section .BTF_ids=${btf_ids} ${ELF_FILE}
 	fi
 }
 
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index e2207e612ac3..4ad04d31f8bc 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -266,7 +266,7 @@ vmlinux_link "${VMLINUX}"
 
 if is_enabled CONFIG_DEBUG_INFO_BTF; then
 	info OBJCOPY ${btfids_vmlinux}
-	${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
+	${BTF_OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
 fi
 
 mksysmap "${VMLINUX}" System.map
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f28a32b16ff0..e998cac975c1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -4,7 +4,7 @@ include ../../../scripts/Makefile.arch
 include ../../../scripts/Makefile.include
 
 CXX ?= $(CROSS_COMPILE)g++
-OBJCOPY ?= $(CROSS_COMPILE)objcopy
+BTF_OBJCOPY ?= $(CROSS_COMPILE)objcopy
 
 CURDIR := $(abspath .)
 TOOLSDIR := $(abspath ../../..)
@@ -657,7 +657,7 @@ $(TRUNNER_TEST_OBJS): $(TRUNNER_OUTPUT)/%.test.o:			\
 	$$(if $$(TEST_NEEDS_BTFIDS),						\
 		$$(call msg,BTFIDS,$(TRUNNER_BINARY),$$@)			\
 		$(RESOLVE_BTFIDS) --btf $(TRUNNER_OUTPUT)/btf_data.bpf.o $$@;	\
-		$(OBJCOPY) --update-section .BTF_ids=$$@.BTF_ids $$@)
+		$(BTF_OBJCOPY) --update-section .BTF_ids=$$@.BTF_ids $$@)
 
 $(TRUNNER_TEST_OBJS:.o=.d): $(TRUNNER_OUTPUT)/%.test.d:			\
 			    $(TRUNNER_TESTS_DIR)/%.c			\
-- 
2.47.3




> 
>> Patching llvm-objcopy would be great, it should be done. But we are
>> still going to be stuck with making sure older LLVMs can build the kernel.
>> So even if they backport the fix to v21, it won't help us much, unfortunately.
> 
> 21.1.8 was the last planned 21.x release [2] so I think it is unlikely
> that a 21.1.9 would be released for this but we won't know until it is
> merged into main. Much agreed on handling the old versions.
> 
> [1]: https://lore.kernel.org/20251218175824.3122690-1-cmllamas@google.com/
> [2]: https://discourse.llvm.org/t/llvm-21-1-8-released/89144
> 
> Cheers,
> Nathan


^ permalink raw reply related

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Mathieu Desnoyers @ 2025-12-29 22:25 UTC (permalink / raw)
  To: Steven Rostedt, Andrew Morton
  Cc: Yury Norov (NVIDIA), Masami Hiramatsu, Andy Shevchenko,
	Christophe Leroy, Randy Dunlap, Ingo Molnar, Jani Nikula,
	Joonas Lahtinen, David Laight, Petr Pavlu, Andi Shyti,
	Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
	dri-devel, linux-modules, linux-trace-kernel
In-Reply-To: <20251229111748.3ba66311@gandalf.local.home>

On 2025-12-29 11:17, Steven Rostedt wrote:
> On Sun, 28 Dec 2025 13:31:50 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>>> trace_printk() should be as available to the kernel as printk() is.
>>
>> um, why?  trace_printk is used 1% as often as is printk.  Seems
>> reasonable to include a header file to access such a rarely-used(!) and
>> specialized thing?
[...]
> Yes, it's not in your kernel, but it is in several other people's kernels
> as they develop it. And adding a requirement that they need to include a
> header file for every place they add it (and then have to remember to
> remove that header file when they are done debugging) is going to waste
> more precious time that kernel developers don't have much of.

I agree with Steven. trace_printk() needs to stay convenient to use for
kernel developers. Part of this convenience comes from not having to
include additional header files by hand. It has been around for
16 years and documented as such in kernel documentation [1],
LWN articles [2], and conference presentation material. Changing
this would lead to confusion for people trying to use the feature.

I personally use trace_printk() to sprinkle temporary debug-style
trace events in frequently executed kernel code I need to follow
carefully.

One possible compromise would be to move it to its own header file,
and introduce a CONFIG_TRACE_PRINTK Kconfig option (default Y) that
would surround an include from linux/kernel.h with a preprocessor
conditional. But please make sure the default stays as it is today:
include the trace printk header by default.

Thanks,

Mathieu

[1] Debugging the kernel using Ftrace - part 1 https://lwn.net/Articles/365835/
[2] Documentation/trace/ftrace.txt

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply

* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Nathan Chancellor @ 2025-12-29 21:29 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: Yonghong Song, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, linux-kernel, linux-modules,
	bpf, linux-kbuild, llvm
In-Reply-To: <af906e9e-8f94-41f5-9100-1a3b4526e220@linux.dev>

Hi Ihor,

On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote:
> I think the simplest workaround is this one: use objcopy from binutils
> instead of llvm-objcopy when doing --update-section.
> 
> There are just 3 places where that happens, so the OBJCOPY
> substitution is going to be localized.
> 
> Also binutils is a documented requirement for compiling the kernel,
> whether with clang or not [1].
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29

This would necessitate always specifying a CROSS_COMPILE variable when
cross compiling with LLVM=1, which I would really like to avoid. The
LLVM variants have generally been drop in substitutes for several
versions now so some groups such as Android may not even have GNU
binutils installed in their build environment (see a recent build
fix [1]).

I would much prefer detecting llvm-objcopy in Kconfig (such as by
creating CONFIG_OBJCOPY_IS_LLVM using the existing check for
llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working
copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt
into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be
selectable.

> Patching llvm-objcopy would be great, it should be done. But we are
> still going to be stuck with making sure older LLVMs can build the kernel.
> So even if they backport the fix to v21, it won't help us much, unfortunately.

21.1.8 was the last planned 21.x release [2] so I think it is unlikely
that a 21.1.9 would be released for this but we won't know until it is
merged into main. Much agreed on handling the old versions.

[1]: https://lore.kernel.org/20251218175824.3122690-1-cmllamas@google.com/
[2]: https://discourse.llvm.org/t/llvm-21-1-8-released/89144

Cheers,
Nathan

^ permalink raw reply

* Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is out of bounds
From: Ihor Solodrai @ 2025-12-29 20:40 UTC (permalink / raw)
  To: Yonghong Song, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Nathan Chancellor, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman
  Cc: linux-kernel, linux-modules, bpf, linux-kbuild, llvm
In-Reply-To: <9edd1395-8651-446b-b056-9428076cd830@linux.dev>

On 12/23/25 9:36 PM, Yonghong Song wrote:
> 
> 
> On 12/23/25 4:57 PM, Ihor Solodrai wrote:
>> [...]
>>
>> While this llvm-objcopy bug is not fixed, we can not trust it in the
>> kernel build pipeline. In the short-term we have to come up with a
>> workaround for .BTF_ids section update and replace the calls to
>> ${OBJCOPY} --update-section with something else.
>>
>> One potential workaround is to force the use of the objcopy (from
>> binutils) instead of llvm-objcopy when updating .BTF_ids section.

I think the simplest workaround is this one: use objcopy from binutils
instead of llvm-objcopy when doing --update-section.

There are just 3 places where that happens, so the OBJCOPY
substitution is going to be localized.

Also binutils is a documented requirement for compiling the kernel,
whether with clang or not [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29

>>
>> Alternatively, we could just dd the .BTF_ids data computed by
>> resolve_btfids at the right offset in the target ELF file.
>>
>> Surprisingly I couldn't find a good way to read a section offset and
>> size from the ELF with a specified format in a command line. Both
>> readelf and {llvm-}objdump give a human readable output, and it
>> appears we can't rely on the column order, for example.
>>
>> We could still try parsing readelf output with awk/grep, covering
>> output variants that appear in the kernel build.
>>
>> We can also do:
>>
>>     llvm-readobj --elf-output-style=JSON --sections "$elf" | \
>>          jq -r --arg name .BTF_ids '
>>              .[0].Sections[] |
>>              select(.Section.Name.Name == $name) |
>>              "\(.Section.Offset) \(.Section.Size)"'
>>
>> ...but idk man, doesn't feel right.
>>
>> Most reliable way to determine the size and offset of .BTF_ids section
>> is probably reading them by a C program with libelf, such as
>> resolve_btfids. Which is quite ironic, given the recent
>> changes. Setting the irony aside, we could add smth like:
>>           resolve_btfids --section-info=.BTF_ids $elf
>>
>> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
>> really like to avoid it, given that BPF features/optimizations in
>> development depend on it.
>>
>> I'd appreciate comments and suggestions on this issue. Thank you!
>> ---
>>   kernel/module/main.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index 710ee30b3bea..5bf456fad63e 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>>               break;
>>             default:
>> +            if (sym[i].st_shndx >= info->hdr->e_shnum) {
>> +                pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
>> +                       mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
>> +                ret = -ENOEXEC;
>> +                break;
>> +            }
>> +
>>               /* Divert to percpu allocation if a percpu var. */
>>               if (sym[i].st_shndx == info->index.pcpu)
>>                   secbase = (unsigned long)mod_percpu(mod);
> 
> I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
> 
> Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
> I did not test llvm20 and I assume it may fail too.
> 
> The following llvm patch
>    https://github.com/llvm/llvm-project/pull/170462
> can fix the issue. Currently it is still in review stage. The actual diff is
> 
> diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> index e5de17e093df..cc1527d996e2 100644
> --- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> +++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> @@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec, ArrayRef<uint8_t> Data) {
>                               Data.size(), Sec->Name.c_str(), Sec->Size);
>  
>    if (!Sec->ParentSegment) {
> -    Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
> +    SectionBase *Replaced = Sec.get();
> +    SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
> +    DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced, Modified}};
> +    if (auto err = replaceSections(Replacements))
> +      return err;
>    } else {
>      // The segment writer will be in charge of updating these contents.
>      Sec->Size = Data.size();
> 
> I applied the above patch to latest llvm21 and llvm22 and
> the crash is gone and the selftests can run properly.

Hi Yonghong, thank you for confirming the issue.

Patching llvm-objcopy would be great, it should be done. But we are
still going to be stuck with making sure older LLVMs can build the kernel.
So even if they backport the fix to v21, it won't help us much, unfortunately.

> 
> With KASAN, everything is okay for llvm21 and llvm22.
> 
> Not sure whether the llvm patch
>    https://github.com/llvm/llvm-project/pull/170462
> can make into llvm21 or not as looks like llvm21 intends to
> freeze for now. See
>    https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
> the llvm22 will branch into rc mode in January.
> 
> I will try to see whether we can have a reasonable workaround
> for llvm21 llvm-objcopy (for without KASAN).
> 


^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Borislav Petkov @ 2025-12-29 17:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Yury Norov (NVIDIA), Masami Hiramatsu,
	Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
	Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
	David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
	Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, linux-kernel, intel-gfx,
	dri-devel, linux-modules, linux-trace-kernel
In-Reply-To: <20251229111748.3ba66311@gandalf.local.home>

On Mon, Dec 29, 2025 at 11:17:48AM -0500, Steven Rostedt wrote:
> But sure, if you want to save the few minutes that is added to "make
> allyesconfig"

Nah, it is

"Removing trace_printk.h saves 1.5-2% of compilation time on my
Ubuntu-derived x86_64/localyesconfig"

which is:

  localyesconfig  - Update current config converting local mods to core

and which makes me wonder - who does that?

What are we actually optimizing here?

And 1-2% at that.

I don't see how this outweighs the goodness of using trace_printk()
everywhere.

So that's a NO on that patch from me too.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Danilo Krummrich @ 2025-12-29 16:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Yury Norov (NVIDIA), Masami Hiramatsu,
	Mathieu Desnoyers, Andy Shevchenko, Christophe Leroy,
	Randy Dunlap, Ingo Molnar, Jani Nikula, Joonas Lahtinen,
	David Laight, Petr Pavlu, Andi Shyti, Rodrigo Vivi,
	Tvrtko Ursulin, Daniel Gomez, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, intel-gfx, dri-devel,
	linux-modules, linux-trace-kernel
In-Reply-To: <20251229111748.3ba66311@gandalf.local.home>

On Mon Dec 29, 2025 at 5:17 PM CET, Steven Rostedt wrote:
> It will waste a lot of kernel developers time. Go to conferences and talk
> with developers. trace_printk() is now one of the most common ways to debug
> your code. Having to add "#include <linux/trace_printk.h>" in every file
> that you use trace_printk() (and after your build fails because you forgot
> to include that file **WILL** slow down kernel debugging for hundreds of
> developers! It *is* used more than printk() for debugging today. Because
> it's fast and can be used in any context (NMI, interrupt handlers, etc).

I strongly agree with this. I heavly use trace_printk() for debugging for a long
time and have recommended it to dozens of people that all have been very
thankful for it -- especially when it comes to debugging race conditions on a
tough timing, where a normal printk() simply "fixes" the race.

Having to include additional headers would be very painful, especially when
debugging large code bases with lots of files. For instance, one of the
components I maintain is the nouveau driver with 773 C files and 1390 files
overall.

I suppose it would be fair to argue that such codebases usually have their own
common header files that could be reused, but even in that case, I’d consider
the ergonomic cost too high.

- Danilo

^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Steven Rostedt @ 2025-12-29 16:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yury Norov (NVIDIA), Masami Hiramatsu, Mathieu Desnoyers,
	Andy Shevchenko, Christophe Leroy, Randy Dunlap, Ingo Molnar,
	Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
	Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
	Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	linux-kernel, intel-gfx, dri-devel, linux-modules,
	linux-trace-kernel
In-Reply-To: <20251228133150.1d5731d04bc1b685b0fe81c1@linux-foundation.org>

On Sun, 28 Dec 2025 13:31:50 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> > trace_printk() should be as available to the kernel as printk() is.  
> 
> um, why?  trace_printk is used 1% as often as is printk.  Seems
> reasonable to include a header file to access such a rarely-used(!) and
> specialized thing?

It will waste a lot of kernel developers time. Go to conferences and talk
with developers. trace_printk() is now one of the most common ways to debug
your code. Having to add "#include <linux/trace_printk.h>" in every file
that you use trace_printk() (and after your build fails because you forgot
to include that file **WILL** slow down kernel debugging for hundreds of
developers! It *is* used more than printk() for debugging today. Because
it's fast and can be used in any context (NMI, interrupt handlers, etc).

But sure, if you want to save the few minutes that is added to "make
allyesconfig" by sacrificing minutes of kernel developer's time. Go ahead
and make this change.

I don't know how much you debug and develop today, but lots of people I
talk to at conferences thank me for trace_printk() because it makes
debugging their code so much easier.

The "shotgun" approach is very common. That is, you add:

	trace_printk("%s:%d\n", __func__, __LINE__);

all over your code to find out where things are going wrong. With the
persistent ring buffer, you can even extract that information after a
crash and reboot.

There's very few instances of it in the kernel because I made it that way.
If you add a trace_printk() in the kernel, you get the banner:

 **********************************************************
 **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
 **                                                      **
 ** trace_printk() being used. Allocating extra memory.  **
 **                                                      **
 ** This means that this is a DEBUG kernel and it is     **
 ** unsafe for production use.                           **
 **                                                      **
 ** If you see this message and you are not debugging    **
 ** the kernel, report this immediately to your vendor!  **
 **                                                      **
 **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
 **********************************************************

in your dmesg.

I've been recommending people that if you find a "trace_printk()" useful,
you should convert it into a normal TRACE_EVENT() for submission. Which
many developers have done.

Yes, it's not in your kernel, but it is in several other people's kernels
as they develop it. And adding a requirement that they need to include a
header file for every place they add it (and then have to remember to
remove that header file when they are done debugging) is going to waste
more precious time that kernel developers don't have much of.

-- Steve

^ permalink raw reply

* Re: [PATCH] ANDROID: gki: kallsyms: add kallsyms_lookup_address_and_size.
From: Petr Pavlu @ 2025-12-29 15:38 UTC (permalink / raw)
  To: Yunjin Kim
  Cc: Luis Chamberlain, Sami Tolvanen, Daniel Gomez, linux-kernel,
	linux-modules
In-Reply-To: <20251224043157.59289-1-yunzhen.kim@samsung.com>

On 12/24/25 5:31 AM, Yunjin Kim wrote:
> This methods are used by AKKstub-ARM Kernel Kstub.
> 
> We need to implement an automatic kernel-method mock that streamlines the
> mocking process during kernel-method testing and enables fully automated
> operations. This mechanism must traverse the binary instructions of the
> target function in memory, locate the appropriate instruction, and replace
> it. To perform the traversal, it must know the function’s entry address and
> the size of its instruction range.
> 
> Bug:
> Change-Id: I5a318f762d4412e70b0c8dcf2dfed326312bdc65
> Signed-off-by: Yunjin Kim <yunzhen.kim@samsung.com>

I'm confused by this patch. It seems like it should be sent for review
and inclusion in some Android-specific downstream kernel, rather than
the official Linux kernel. As it stands, without more context, the patch
only adds dead code to the kernel.

-- 
Thanks,
Petr

^ permalink raw reply

* Re: /proc/modules address+size bounds are inconsistent
From: Petr Pavlu @ 2025-12-29 13:57 UTC (permalink / raw)
  To: Tatsuyuki Ishi; +Cc: mcgrof, da.gomez, Sami Tolvanen, song, linux-modules
In-Reply-To: <CANqewP0+N0i8Ld+fGKQZbLg5yJhVkLTyvZKz_ZL0aV+noArsiQ@mail.gmail.com>

On 12/21/25 1:52 PM, Tatsuyuki Ishi wrote:
> Hi,
> 
> I noticed that /proc/modules reports inconsistent address and size
> values for modules. In m_show():
> 
>      size = module_total_size(mod);              // .text + .rodata +
> .data + ...
>      value = mod->mem[MOD_TEXT].base;            // only text base
> 
> Looking at kallsyms, .data symbols can come before .text symbols, so
> [addr, addr+size) is useless as a bound and can be overlapping.
> 
> I have a userspace frontend for perf [1] and the code currently
> expects non-overlapping regions. I can add a workaround to truncate
> any overlapping regions from /proc/modules. But is it possible to
> "fix" the kernel-side semantics here?
> 
> [1]: https://github.com/mstange/samply/pull/736

The initial code to show the module start address in /proc/modules was
added in 2003 by "[PATCH] Module state and address in /proc/modules."
[1].

I'm not entirely sure if the intention at that time was for the address
and (already present) size read from /proc/modules to provide a range
where the module is loaded. In particular, if a module had a separate
init region and was still in the process of being loaded, the size would
also include a non-zero value of mod->init_size, which means this could
result in overlapping ranges.

Nonetheless, I assume that providing a range was indeed the intention,
as I don't see how having just the address of a module would be
particularly useful.

The patch mentions that the module address was added for use by OProfile
and ksymoops. The OProfile code reads /proc/modules in the function
_record_module_info() [2] and appears to expect that the address and
size form a valid range. The old ksymoops code [3] reads the file in the
function read_lsmod() but doesn't seem to handle the added address.

More importantly, I notice that perf has the function modules__parse()
[4], which reads the /proc/modules data and is called in several places.
For instance, the machine__create_module() callback [5] then expects
that the address and size form a valid range.

The original behavior was first broken in 2022 by commit 01dc0386efb7
("module: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC"), which
introduced a third optional module region. A year later, commit
ac3b43283923 ("module: replace module_layout with module_memory")
further split the module into up to seven separate regions.

The separation of module regions was done for good reasons and should
not be reverted.

Instead, a simple and consistent approach could be for /proc/modules to
report only the size of MOD_TEXT, or to show it in an additional column.
I suspect this should be sufficient for debugging tools. However, this
requires careful checking to ensure that nothing else breaks. If more
accurate or complete information is necessary, the kernel could export
data about all module regions under something like
/sys/module/<modname>/segments/<segname>.

[1] https://lore.kernel.org/all/20030114025452.563462C374@lists.samba.org/
[2] https://sourceforge.net/p/oprofile/oprofile/ci/master/tree/libperf_events/operf_utils.cpp#l1327
[3] https://www.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/ksymoops-2.4.11.tar.gz
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/symbol.c?h=v6.19-rc3#n668
[5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/machine.c?h=v6.19-rc3#n1467

-- 
Cheers,
Petr

^ permalink raw reply

* Re: [syzbot] [mm?] INFO: rcu detected stall in finish_dput
From: syzbot @ 2025-12-29  4:10 UTC (permalink / raw)
  To: atomlin, da.gomez, gregkh, linux-kernel, linux-mm, linux-modules,
	mcgrof, petr.pavlu, samitolvanen, syzkaller-bugs, tj
In-Reply-To: <693f889b.a70a0220.104cf0.0335.GAE@google.com>

syzbot has found a reproducer for the following issue on:

HEAD commit:    cc3aa43b44bd Add linux-next specific files for 20251219
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16918422580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=da1bc82c6189c463
dashboard link: https://syzkaller.appspot.com/bug?extid=d1b2c58262854b97eb1f
compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16937c9a580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11918422580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/30bf539e6f28/disk-cc3aa43b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0e2f8b08e342/vmlinux-cc3aa43b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ec7ee6ece11f/bzImage-cc3aa43b.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/b44ad9245927/mount_16.gz
  fsck result: OK (log: https://syzkaller.appspot.com/x/fsck.log?x=139de49a580000)

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1b2c58262854b97eb1f@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P5981/1:b..l P8410/1:b..l P5988/1:b..l
rcu: 	(detected by 1, t=10503 jiffies, g=18665, q=488 ncpus=2)
task:syz-executor    state:R  running task     stack:19496 pid:5988  tgid:5988  ppid:5980   task_flags:0x400140 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5258 [inline]
 __schedule+0x150e/0x5070 kernel/sched/core.c:6866
 preempt_schedule_irq+0xb5/0x150 kernel/sched/core.c:7193
 irqentry_exit+0x5d8/0x660 kernel/entry/common.c:216
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:lock_acquire+0x16c/0x340 kernel/locking/lockdep.c:5872
Code: 00 00 00 00 9c 8f 44 24 30 f7 44 24 30 00 02 00 00 0f 85 cd 00 00 00 f7 44 24 08 00 02 00 00 74 01 fb 65 48 8b 05 44 41 01 11 <48> 3b 44 24 58 0f 85 e5 00 00 00 48 83 c4 60 5b 41 5c 41 5d 41 5e
RSP: 0018:ffffc900041cf538 EFLAGS: 00000206
RAX: 32a75ba562c33400 RBX: 0000000000000000 RCX: 32a75ba562c33400
RDX: 000000005a44979c RSI: ffffffff8db7ed49 RDI: ffffffff8be07960
RBP: ffffffff81742f85 R08: ffffffff81742f85 R09: ffffffff8e13f2e0
R10: ffffc900041cf6f8 R11: ffffffff81ad9d50 R12: 0000000000000002
R13: ffffffff8e13f2e0 R14: 0000000000000000 R15: 0000000000000246
 rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 rcu_read_lock include/linux/rcupdate.h:867 [inline]
 class_rcu_constructor include/linux/rcupdate.h:1195 [inline]
 unwind_next_frame+0xc2/0x23d0 arch/x86/kernel/unwind_orc.c:495
 arch_stack_walk+0x11c/0x150 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x9c/0xe0 kernel/stacktrace.c:122
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 __call_rcu_common kernel/rcu/tree.c:3119 [inline]
 call_rcu+0x157/0x9c0 kernel/rcu/tree.c:3239
 __destroy_inode+0x2da/0x670 fs/inode.c:371
 destroy_inode fs/inode.c:394 [inline]
 evict+0x87d/0xae0 fs/inode.c:861
 __dentry_kill+0x209/0x660 fs/dcache.c:670
 finish_dput+0xc9/0x480 fs/dcache.c:879
 __fput+0x68e/0xa70 fs/file_table.c:476
 fput_close_sync+0x113/0x220 fs/file_table.c:573
 __do_sys_close fs/open.c:1534 [inline]
 __se_sys_close fs/open.c:1519 [inline]
 __x64_sys_close+0x7f/0x110 fs/open.c:1519
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f53c018e3aa
RSP: 002b:00007fff1566c5c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53c018e3aa
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007fff1566c61c R08: 00007fff1566bf1c R09: 00007fff1566c327
R10: 00007fff1566bfa0 R11: 0000000000000293 R12: 000000000000003d
R13: 0000000000000059 R14: 00000000000789ec R15: 00007fff1566c670
 </TASK>
task:btrfs-cleaner   state:R  running task     stack:28144 pid:8410  tgid:8410  ppid:2      task_flags:0x208040 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5258 [inline]
 __schedule+0x150e/0x5070 kernel/sched/core.c:6866
 preempt_schedule_notrace+0xd1/0x110 kernel/sched/core.c:7143
 preempt_schedule_notrace_thunk+0x16/0x30 arch/x86/entry/thunk.S:13
 rcu_is_watching+0x7f/0xb0 kernel/rcu/tree.c:752
 trace_lock_release include/trace/events/lock.h:69 [inline]
 lock_release+0x4b/0x3b0 kernel/locking/lockdep.c:5879
 rcu_lock_release include/linux/rcupdate.h:341 [inline]
 rcu_read_unlock include/linux/rcupdate.h:897 [inline]
 class_rcu_destructor include/linux/rcupdate.h:1195 [inline]
 unwind_next_frame+0x1ab1/0x23d0 arch/x86/kernel/unwind_orc.c:695
 arch_stack_walk+0x11c/0x150 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x9c/0xe0 kernel/stacktrace.c:122
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 __call_rcu_common kernel/rcu/tree.c:3119 [inline]
 call_rcu+0x157/0x9c0 kernel/rcu/tree.c:3239
 kernfs_put+0x18e/0x470 fs/kernfs/dir.c:591
 kernfs_remove_by_name_ns+0xb7/0x130 fs/kernfs/dir.c:1721
 kernfs_remove_by_name include/linux/kernfs.h:633 [inline]
 create_files fs/sysfs/group.c:66 [inline]
 internal_create_group+0x57b/0x1170 fs/sysfs/group.c:189
 btrfs_sysfs_feature_update+0x9b/0x1d0 fs/btrfs/sysfs.c:2689
 cleaner_kthread+0x302/0x400 fs/btrfs/disk-io.c:1482
 kthread+0x711/0x8a0 kernel/kthread.c:463
 ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
task:syz-executor    state:R  running task     stack:19496 pid:5981  tgid:5981  ppid:5978   task_flags:0x400140 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5258 [inline]
 __schedule+0x150e/0x5070 kernel/sched/core.c:6866
 preempt_schedule_irq+0xb5/0x150 kernel/sched/core.c:7193
 irqentry_exit+0x5d8/0x660 kernel/entry/common.c:216
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:lock_release+0x2a6/0x3b0 kernel/locking/lockdep.c:5893
Code: 4d 48 c7 44 24 20 00 00 00 00 9c 8f 44 24 20 f7 44 24 20 00 02 00 00 75 52 f7 c3 00 02 00 00 74 01 fb 65 48 8b 05 2a 0f 01 11 <48> 3b 44 24 28 75 75 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9
RSP: 0018:ffffc900041ff430 EFLAGS: 00000206
RAX: 56b6579f03543c00 RBX: 0000000000000202 RCX: 56b6579f03543c00
RDX: 0000000000000001 RSI: ffffffff8db7ed49 RDI: ffffffff8be07960
RBP: ffff88801cb58b58 R08: ffffc900041ff808 R09: 0000000000000000
R10: ffffc900041ff5b8 R11: fffff5200083feb9 R12: 0000000000000001
R13: 0000000000000001 R14: ffffffff8e13f2e0 R15: ffff88801cb58000
 rcu_lock_release include/linux/rcupdate.h:341 [inline]
 rcu_read_unlock include/linux/rcupdate.h:897 [inline]
 class_rcu_destructor include/linux/rcupdate.h:1195 [inline]
 unwind_next_frame+0x1ab1/0x23d0 arch/x86/kernel/unwind_orc.c:695
 arch_stack_walk+0x11c/0x150 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x9c/0xe0 kernel/stacktrace.c:122
 save_stack+0xf5/0x1f0 mm/page_owner.c:165
 __reset_page_owner+0x71/0x1f0 mm/page_owner.c:320
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1406 [inline]
 __free_frozen_pages+0xbc8/0xd30 mm/page_alloc.c:2943
 vfree+0x25a/0x400 mm/vmalloc.c:3504
 copy_entries_to_user net/ipv4/netfilter/ip_tables.c:866 [inline]
 get_entries net/ipv4/netfilter/ip_tables.c:1022 [inline]
 do_ipt_get_ctl+0xebc/0x1180 net/ipv4/netfilter/ip_tables.c:1668
 nf_getsockopt+0x26e/0x290 net/netfilter/nf_sockopt.c:116
 ip_getsockopt+0x1c4/0x220 net/ipv4/ip_sockglue.c:1777
 do_sock_getsockopt+0x2b4/0x3d0 net/socket.c:2398
 __sys_getsockopt net/socket.c:2427 [inline]
 __do_sys_getsockopt net/socket.c:2434 [inline]
 __se_sys_getsockopt net/socket.c:2431 [inline]
 __x64_sys_getsockopt+0x1a5/0x250 net/socket.c:2431
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb026b9148a
RSP: 002b:00007fff93b01148 EFLAGS: 00000212 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00007fff93b011d0 RCX: 00007fb026b9148a
RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 00007fff93b0116c R09: 00007fff93b01587
R10: 00007fff93b011d0 R11: 0000000000000212 R12: 00007fb026db8be0
R13: 00007fff93b0116c R14: 0000000000000000 R15: 00007fb026dba020
 </TASK>
rcu: rcu_preempt kthread starved for 10620 jiffies! g18665 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:27728 pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5258 [inline]
 __schedule+0x150e/0x5070 kernel/sched/core.c:6866
 __schedule_loop kernel/sched/core.c:6948 [inline]
 schedule+0x165/0x360 kernel/sched/core.c:6963
 schedule_timeout+0x12b/0x270 kernel/time/sleep_timeout.c:99
 rcu_gp_fqs_loop+0x301/0x1540 kernel/rcu/tree.c:2083
 rcu_gp_kthread+0x99/0x390 kernel/rcu/tree.c:2285
 kthread+0x711/0x8a0 kernel/kthread.c:463
 ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:pv_native_safe_halt+0x13/0x20 arch/x86/kernel/paravirt.c:82
Code: cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 9e 2a 00 f3 0f 1e fa fb f4 <e9> 48 ee 02 00 cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90
RSP: 0000:ffffffff8de07d80 EFLAGS: 000002c6
RAX: 741d0ec708556100 RBX: ffffffff819786ba RCX: 741d0ec708556100
RDX: 0000000000000001 RSI: ffffffff8d997e54 RDI: ffffffff8be07960
RBP: ffffffff8de07ea8 R08: ffff8880b86336db R09: 1ffff110170c66db
R10: dffffc0000000000 R11: ffffed10170c66dc R12: ffffffff8fa22f70
R13: 1ffffffff1bd29b8 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888125c25000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000755b0000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline]
 default_idle+0x13/0x20 arch/x86/kernel/process.c:767
 default_idle_call+0x73/0xb0 kernel/sched/idle.c:122
 cpuidle_idle_call kernel/sched/idle.c:191 [inline]
 do_idle+0x1ea/0x520 kernel/sched/idle.c:332
 cpu_startup_entry+0x44/0x60 kernel/sched/idle.c:430
 rest_init+0x2de/0x300 init/main.c:758
 start_kernel+0x3ac/0x400 init/main.c:1208
 x86_64_start_reservations+0x24/0x30 arch/x86/kernel/head64.c:310
 x86_64_start_kernel+0x143/0x1c0 arch/x86/kernel/head64.c:291
 common_startup_64+0x13e/0x147
 </TASK>


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply

* [PATCH] module: show module version directly in print_modules()
From: Yafang Shao @ 2025-12-29  2:45 UTC (permalink / raw)
  To: mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin
  Cc: linux-modules, Yafang Shao

We maintain a vmcore analysis script on each server that automatically
parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
us save considerable effort by avoiding analysis of known bugs.

For vmcores triggered by a driver bug, the system calls print_modules() to
list the loaded modules. However, print_modules() does not output module
version information. Across a large fleet of servers, there are often many
different module versions running simultaneously, and we need to know which
driver version caused a given vmcore.

Currently, the only reliable way to obtain the module version associated
with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
operation that is resource-intensive. Therefore, we propose printing the
driver version directly in the log, which is far more efficient.

- Before this patch

  Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
  Unloaded tainted modules: nvidia_peermem(PO):1

- After this patch

  Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
  Unloaded tainted modules: nvidia_peermem(PO):1

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/module/main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 710ee30b3bea..1ad9afec8730 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3901,7 +3901,10 @@ void print_modules(void)
 	list_for_each_entry_rcu(mod, &modules, list) {
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
-		pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
+		pr_cont(" %s", mod->name);
+		if (mod->version)
+			pr_cont("-%s", mod->version);
+		pr_cont("%s", module_flags(mod, buf, true));
 	}

 	print_unloaded_tainted_modules();
-- 
2.43.5

^ permalink raw reply related

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Andrew Morton @ 2025-12-28 21:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Yury Norov (NVIDIA), Masami Hiramatsu, Mathieu Desnoyers,
	Andy Shevchenko, Christophe Leroy, Randy Dunlap, Ingo Molnar,
	Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
	Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
	Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	linux-kernel, intel-gfx, dri-devel, linux-modules,
	linux-trace-kernel
In-Reply-To: <20251226115848.298465d4@gandalf.local.home>

On Fri, 26 Dec 2025 11:58:48 -0500 Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 25 Dec 2025 12:09:29 -0500
> "Yury Norov (NVIDIA)" <yury.norov@gmail.com> wrote:
> 
> > The trace_printk.h header is debugging-only by nature, but now it's
> > included by almost every compilation unit via kernel.h.
> > 
> > Removing trace_printk.h saves 1.5-2% of compilation time on my
> > Ubuntu-derived x86_64/localyesconfig.
> > 
> > There's ~30 files in the codebase, requiring trace_printk.h for
> > non-debugging reasons: mostly to disable tracing on panic or under
> > similar conditions. Include the header for those explicitly.
> > 
> > This implicitly decouples linux/kernel.h and linux/instruction_pointer.h
> > as well, because it has been isolated to trace_printk.h early in the
> > series.
> > 
> > Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
> 
> I'm still against this patch. It means every time someone adds
> trace_printk() they need to add the header for it.
> 
> trace_printk() should be as available to the kernel as printk() is.

um, why?  trace_printk is used 1% as often as is printk.  Seems
reasonable to include a header file to access such a rarely-used(!) and
specialized thing?

^ permalink raw reply

* Re: [PATCH v4 7/7] kernel.h: drop trace_printk.h
From: Steven Rostedt @ 2025-12-27 21:27 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, Andrew Morton, Masami Hiramatsu,
	Mathieu Desnoyers, Christophe Leroy, Randy Dunlap, Ingo Molnar,
	Jani Nikula, Joonas Lahtinen, David Laight, Petr Pavlu,
	Andi Shyti, Rodrigo Vivi, Tvrtko Ursulin, Daniel Gomez,
	Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	linux-kernel, intel-gfx, dri-devel, linux-modules,
	linux-trace-kernel
In-Reply-To: <aVA1GGfWAHSFdACF@yury>

On Sat, 27 Dec 2025 14:35:52 -0500
Yury Norov <yury.norov@gmail.com> wrote:

> The difference is that printk() is not a debugging tool.

Several developers will disagree with you. In fact, Linus has said he uses
printk() as his preferred debugging tool!

The only reason to have printk.h in kernel.h is because it *is* used for
debugging! If it wasn't used for debugging, then you could simply add
printk.h for those places that needed to use printk(). But because it is
one of the most common debugging tools, having it in kernel.h is useful, as
you don't want to have to add #include <printk.h> every time you added a
printk() for debugging purposes (same is true for trace_printk()).

Yes, it is also used for information. But if that's all it was used for,
then it wouldn't need to be in kernel.h. It could be a normal header file
that anything that needed to print information would have to include.

-- Steve

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox