public inbox for linux-modules@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] module: print version for external modules in print_modules()
@ 2026-03-10  2:38 Yafang Shao
  2026-03-10  6:31 ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Yafang Shao @ 2026-03-10  2:38 UTC (permalink / raw)
  To: mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin
  Cc: linux-modules, Yafang Shao

We maintain a vmcore analysis script on each server that automatically
parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
us save considerable effort by avoiding analysis of known bugs.

For vmcores triggered by a driver bug, the system calls print_modules() to
list the loaded modules. However, print_modules() does not output module
version information. Across a large fleet of servers, there are often many
different module versions running simultaneously, and we need to know which
driver version caused a given vmcore.

Currently, the only reliable way to obtain the module version associated
with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
operation that is resource-intensive. Therefore, we propose printing the
driver version directly in the log, which is far more efficient.

The motivation behind this change is that the external NVIDIA driver
[0] frequently causes kernel panics across our server fleet.
While we continuously upgrade to newer NVIDIA driver versions,
upgrading the entire fleet is time-consuming. Therefore, we need to
identify which driver version is responsible for each panic.

In-tree modules are tied to the specific kernel version already, so
printing their versions is redundant. However, for external drivers (like
proprietary networking or GPU stacks), the version is the single most
critical piece of metadata for triage. Therefore, to avoid bloating the
information about loaded modules, we only print the version for external
modules.

- Before this patch

  Modules linked in: mlx5_core(O) nvidia(PO) nvme_core

- After this patch

  Modules linked in: mlx5_core-5.8-2.0.3(O) nvidia-535.274.02(PO) nvme_core
                              ^^^^^^^^^^          ^^^^^^^^^^^

Note: nvme_core is a in-tree module[1], so its version isn't printed.

As pointed out by Sami, we must ensure mod->version is valid in
print_modules():

 : We release the memory for mod->version in:
 :
 :   free_module
 :     -> module_remove_modinfo_attrs
 :     -> attr->free = free_modinfo_version
 :
 : And this happens before the module is removed from the list.
 : Couldn't there be a race condition where we read a non-NULL
 : mod->version here, but the buffer is being concurrently released
 : by another core that's unloading the module, resulting in a
 : use-after-free in the pr_cont call?
 :
 : In order to do this safely, we should presumably drop the attr->free
 : call from module_remove_modinfo_attrs and release the attributes
 : only after the synchronize_rcu call in free_module (there's already
 : free_modinfo we can use), so mod->version is valid for the entire
 : time the module is on the list.

Link: https://github.com/NVIDIA/open-gpu-kernel-modules/tags [0]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/core.c?h=v6.19-rc3#n5448 [1]
Suggested-by: Petr Pavlu <petr.pavlu@suse.com>
Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/module/main.c  | 29 +++++++++++++++++------------
 kernel/module/sysfs.c |  2 --
 2 files changed, 17 insertions(+), 14 deletions(-)

---
v2->v3:
- ensure mod->version is valid when printing it. (Sami)

v1->v2:
- print it for external module only (Petr, Aaron)
- add comment for it (Aaron)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 2bac4c7cd019..c8f41fa90f8a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1384,6 +1384,17 @@ static void free_mod_mem(struct module *mod)
 	module_memory_free(mod, MOD_DATA);
 }
 
+static void free_modinfo(struct module *mod)
+{
+	const struct module_attribute *attr;
+	int i;
+
+	for (i = 0; (attr = modinfo_attrs[i]); i++) {
+		if (attr->free)
+			attr->free(mod);
+	}
+}
+
 /* Free a module, remove from lists, etc. */
 static void free_module(struct module *mod)
 {
@@ -1422,6 +1433,7 @@ static void free_module(struct module *mod)
 	module_bug_cleanup(mod);
 	/* Wait for RCU synchronizing before releasing mod->list and buglist. */
 	synchronize_rcu();
+	free_modinfo(mod);
 	if (try_add_tainted_module(mod))
 		pr_err("%s: adding tainted module to the unloaded tainted modules list failed.\n",
 		       mod->name);
@@ -1779,17 +1791,6 @@ static int setup_modinfo(struct module *mod, struct load_info *info)
 	return 0;
 }
 
-static void free_modinfo(struct module *mod)
-{
-	const struct module_attribute *attr;
-	int i;
-
-	for (i = 0; (attr = modinfo_attrs[i]); i++) {
-		if (attr->free)
-			attr->free(mod);
-	}
-}
-
 bool __weak module_init_section(const char *name)
 {
 	return strstarts(name, ".init");
@@ -3901,7 +3902,11 @@ void print_modules(void)
 	list_for_each_entry_rcu(mod, &modules, list) {
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
-		pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
+		pr_cont(" %s", mod->name);
+		/* Only append version for out-of-tree modules */
+		if (mod->version && test_bit(TAINT_OOT_MODULE, &mod->taints))
+			pr_cont("-%s", mod->version);
+		pr_cont("%s", module_flags(mod, buf, true));
 	}
 
 	print_unloaded_tainted_modules();
diff --git a/kernel/module/sysfs.c b/kernel/module/sysfs.c
index 01c65d608873..17d1796d6dc7 100644
--- a/kernel/module/sysfs.c
+++ b/kernel/module/sysfs.c
@@ -278,8 +278,6 @@ static void module_remove_modinfo_attrs(struct module *mod, int end)
 		if (!attr->attr.name)
 			break;
 		sysfs_remove_file(&mod->mkobj.kobj, &attr->attr);
-		if (attr->free)
-			attr->free(mod);
 	}
 	kfree(mod->modinfo_attrs);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-03-11 22:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10  2:38 [PATCH v3] module: print version for external modules in print_modules() Yafang Shao
2026-03-10  6:31 ` Christoph Hellwig
2026-03-10 13:04   ` Yafang Shao
2026-03-10 13:07     ` Christoph Hellwig
2026-03-10 13:11       ` Yafang Shao
2026-03-10 13:14         ` Christoph Hellwig
2026-03-10 13:19           ` Yafang Shao
2026-03-10 13:21             ` Christoph Hellwig
2026-03-10 13:30               ` Yafang Shao
2026-03-10 13:33                 ` Christoph Hellwig
2026-03-10 13:35                   ` Yafang Shao
2026-03-10 13:43                     ` Christoph Hellwig
2026-03-10 13:44                       ` Yafang Shao
2026-03-10 13:47                         ` Christoph Hellwig
2026-03-10 13:49                           ` Yafang Shao
2026-03-11 22:44   ` Sami Tolvanen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox