* [PATCH v2 0/3] module: Fix freeing of charp module parameters when CONFIG_SYSFS=n
From: Petr Pavlu @ 2026-03-13 13:48 UTC (permalink / raw)
To: Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen
Cc: Christophe Leroy, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-modules, linux-kernel
Fix freeing of charp module parameters when CONFIG_SYSFS=n and, related to
this, update moduleparam.h to keep its coding style consistent.
Changes since v1 [1]:
* Remove the extern keyword from the declaration of module_destroy_params()
and update the type of its num parameter from `unsigned` to
`unsigned int`.
* Add a cleanup patch for parse_args() to similarly update its num
parameter to `unsigned int` and to synchronize the parameter names
between its prototype and definition.
* Add a cleanup patch to drop the unnecessary extern keyword for all
function declarations in moduleparam.h.
[1] https://lore.kernel.org/linux-modules/20260306125457.1377402-1-petr.pavlu@suse.com/
Petr Pavlu (3):
module: Fix freeing of charp module parameters when CONFIG_SYSFS=n
module: Clean up parse_args() arguments
module: Remove extern keyword from param prototypes
include/linux/moduleparam.h | 100 +++++++++++++++++-------------------
kernel/module/main.c | 4 +-
kernel/params.c | 29 +++++++----
3 files changed, 68 insertions(+), 65 deletions(-)
base-commit: 0257f64bdac7fdca30fa3cae0df8b9ecbec7733a
--
2.53.0
^ permalink raw reply
* [PATCH v2 1/3] module: Fix freeing of charp module parameters when CONFIG_SYSFS=n
From: Petr Pavlu @ 2026-03-13 13:48 UTC (permalink / raw)
To: Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen
Cc: Christophe Leroy, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-modules, linux-kernel
In-Reply-To: <20260313134932.335275-1-petr.pavlu@suse.com>
When setting a charp module parameter, the param_set_charp() function
allocates memory to store a copy of the input value. Later, when the module
is potentially unloaded, the destroy_params() function is called to free
this allocated memory.
However, destroy_params() is available only when CONFIG_SYSFS=y, otherwise
only a dummy variant is present. In the unlikely case that the kernel is
configured with CONFIG_MODULES=y and CONFIG_SYSFS=n, this results in
a memory leak of charp values when a module is unloaded.
Fix this issue by making destroy_params() always available when
CONFIG_MODULES=y. Rename the function to module_destroy_params() to clarify
that it is intended for use by the module loader.
Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
---
include/linux/moduleparam.h | 11 +++--------
kernel/module/main.c | 4 ++--
kernel/params.c | 27 ++++++++++++++++++---------
3 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 7d22d4c4ea2e..8667f72503d9 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -426,14 +426,9 @@ extern char *parse_args(const char *name,
void *arg, parse_unknown_fn unknown);
/* Called by module remove. */
-#ifdef CONFIG_SYSFS
-extern void destroy_params(const struct kernel_param *params, unsigned num);
-#else
-static inline void destroy_params(const struct kernel_param *params,
- unsigned num)
-{
-}
-#endif /* !CONFIG_SYSFS */
+#ifdef CONFIG_MODULES
+void module_destroy_params(const struct kernel_param *params, unsigned int num);
+#endif
/* All the helper functions */
/* The macros to do compile-time type checking stolen from Jakub
diff --git a/kernel/module/main.c b/kernel/module/main.c
index c3ce106c70af..ef2e2130972f 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1408,7 +1408,7 @@ static void free_module(struct module *mod)
module_unload_free(mod);
/* Free any allocated parameters. */
- destroy_params(mod->kp, mod->num_kp);
+ module_destroy_params(mod->kp, mod->num_kp);
if (is_livepatch_module(mod))
free_module_elf(mod);
@@ -3519,7 +3519,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
mod_sysfs_teardown(mod);
coming_cleanup:
mod->state = MODULE_STATE_GOING;
- destroy_params(mod->kp, mod->num_kp);
+ module_destroy_params(mod->kp, mod->num_kp);
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
diff --git a/kernel/params.c b/kernel/params.c
index 7188a12dbe86..c6a354d54213 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -745,15 +745,6 @@ void module_param_sysfs_remove(struct module *mod)
}
#endif
-void destroy_params(const struct kernel_param *params, unsigned num)
-{
- unsigned int i;
-
- for (i = 0; i < num; i++)
- if (params[i].ops->free)
- params[i].ops->free(params[i].arg);
-}
-
struct module_kobject * __init_or_module
lookup_or_create_module_kobject(const char *name)
{
@@ -985,3 +976,21 @@ static int __init param_sysfs_builtin_init(void)
late_initcall(param_sysfs_builtin_init);
#endif /* CONFIG_SYSFS */
+
+#ifdef CONFIG_MODULES
+
+/*
+ * module_destroy_params - free all parameters for one module
+ * @params: module parameters (array)
+ * @num: number of module parameters
+ */
+void module_destroy_params(const struct kernel_param *params, unsigned int num)
+{
+ unsigned int i;
+
+ for (i = 0; i < num; i++)
+ if (params[i].ops->free)
+ params[i].ops->free(params[i].arg);
+}
+
+#endif /* CONFIG_MODULES */
--
2.53.0
^ permalink raw reply related
* [PATCH v2 2/3] module: Clean up parse_args() arguments
From: Petr Pavlu @ 2026-03-13 13:48 UTC (permalink / raw)
To: Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen
Cc: Christophe Leroy, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-modules, linux-kernel
In-Reply-To: <20260313134932.335275-1-petr.pavlu@suse.com>
* Use the preferred `unsigned int` over plain `unsigned` for the `num`
parameter.
* Synchronize the parameter names in moduleparam.h with the ones used by
the implementation in params.c.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
---
include/linux/moduleparam.h | 8 ++++----
kernel/params.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 8667f72503d9..604bc6e9f3a1 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -417,12 +417,12 @@ extern bool parameqn(const char *name1, const char *name2, size_t n);
typedef int (*parse_unknown_fn)(char *param, char *val, const char *doing, void *arg);
/* Called on module insert or kernel boot */
-extern char *parse_args(const char *name,
+extern char *parse_args(const char *doing,
char *args,
const struct kernel_param *params,
- unsigned num,
- s16 level_min,
- s16 level_max,
+ unsigned int num,
+ s16 min_level,
+ s16 max_level,
void *arg, parse_unknown_fn unknown);
/* Called by module remove. */
diff --git a/kernel/params.c b/kernel/params.c
index c6a354d54213..74d620bc2521 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -161,7 +161,7 @@ static int parse_one(char *param,
char *parse_args(const char *doing,
char *args,
const struct kernel_param *params,
- unsigned num,
+ unsigned int num,
s16 min_level,
s16 max_level,
void *arg, parse_unknown_fn unknown)
--
2.53.0
^ permalink raw reply related
* [PATCH v2 3/3] module: Remove extern keyword from param prototypes
From: Petr Pavlu @ 2026-03-13 13:48 UTC (permalink / raw)
To: Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen
Cc: Christophe Leroy, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, linux-modules, linux-kernel
In-Reply-To: <20260313134932.335275-1-petr.pavlu@suse.com>
The external function declarations do not need the "extern" keyword. Remove
it to align with the Linux kernel coding style and to silence the
associated checkpatch warnings.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
---
include/linux/moduleparam.h | 89 ++++++++++++++++++-------------------
1 file changed, 44 insertions(+), 45 deletions(-)
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 604bc6e9f3a1..075f28585074 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -317,8 +317,8 @@ struct kparam_array
name, &__param_ops_##name, arg, perm, -1, 0)
#ifdef CONFIG_SYSFS
-extern void kernel_param_lock(struct module *mod);
-extern void kernel_param_unlock(struct module *mod);
+void kernel_param_lock(struct module *mod);
+void kernel_param_unlock(struct module *mod);
#else
static inline void kernel_param_lock(struct module *mod)
{
@@ -398,7 +398,7 @@ static inline void kernel_param_unlock(struct module *mod)
* Returns: true if the two parameter names are equal.
* Dashes (-) are considered equal to underscores (_).
*/
-extern bool parameq(const char *name1, const char *name2);
+bool parameq(const char *name1, const char *name2);
/**
* parameqn - checks if two parameter names match
@@ -412,18 +412,18 @@ extern bool parameq(const char *name1, const char *name2);
* are equal.
* Dashes (-) are considered equal to underscores (_).
*/
-extern bool parameqn(const char *name1, const char *name2, size_t n);
+bool parameqn(const char *name1, const char *name2, size_t n);
typedef int (*parse_unknown_fn)(char *param, char *val, const char *doing, void *arg);
/* Called on module insert or kernel boot */
-extern char *parse_args(const char *doing,
- char *args,
- const struct kernel_param *params,
- unsigned int num,
- s16 min_level,
- s16 max_level,
- void *arg, parse_unknown_fn unknown);
+char *parse_args(const char *doing,
+ char *args,
+ const struct kernel_param *params,
+ unsigned int num,
+ s16 min_level,
+ s16 max_level,
+ void *arg, parse_unknown_fn unknown);
/* Called by module remove. */
#ifdef CONFIG_MODULES
@@ -437,78 +437,77 @@ void module_destroy_params(const struct kernel_param *params, unsigned int num);
static inline type __always_unused *__check_##name(void) { return(p); }
extern const struct kernel_param_ops param_ops_byte;
-extern int param_set_byte(const char *val, const struct kernel_param *kp);
-extern int param_get_byte(char *buffer, const struct kernel_param *kp);
+int param_set_byte(const char *val, const struct kernel_param *kp);
+int param_get_byte(char *buffer, const struct kernel_param *kp);
#define param_check_byte(name, p) __param_check(name, p, unsigned char)
extern const struct kernel_param_ops param_ops_short;
-extern int param_set_short(const char *val, const struct kernel_param *kp);
-extern int param_get_short(char *buffer, const struct kernel_param *kp);
+int param_set_short(const char *val, const struct kernel_param *kp);
+int param_get_short(char *buffer, const struct kernel_param *kp);
#define param_check_short(name, p) __param_check(name, p, short)
extern const struct kernel_param_ops param_ops_ushort;
-extern int param_set_ushort(const char *val, const struct kernel_param *kp);
-extern int param_get_ushort(char *buffer, const struct kernel_param *kp);
+int param_set_ushort(const char *val, const struct kernel_param *kp);
+int param_get_ushort(char *buffer, const struct kernel_param *kp);
#define param_check_ushort(name, p) __param_check(name, p, unsigned short)
extern const struct kernel_param_ops param_ops_int;
-extern int param_set_int(const char *val, const struct kernel_param *kp);
-extern int param_get_int(char *buffer, const struct kernel_param *kp);
+int param_set_int(const char *val, const struct kernel_param *kp);
+int param_get_int(char *buffer, const struct kernel_param *kp);
#define param_check_int(name, p) __param_check(name, p, int)
extern const struct kernel_param_ops param_ops_uint;
-extern int param_set_uint(const char *val, const struct kernel_param *kp);
-extern int param_get_uint(char *buffer, const struct kernel_param *kp);
+int param_set_uint(const char *val, const struct kernel_param *kp);
+int param_get_uint(char *buffer, const struct kernel_param *kp);
int param_set_uint_minmax(const char *val, const struct kernel_param *kp,
unsigned int min, unsigned int max);
#define param_check_uint(name, p) __param_check(name, p, unsigned int)
extern const struct kernel_param_ops param_ops_long;
-extern int param_set_long(const char *val, const struct kernel_param *kp);
-extern int param_get_long(char *buffer, const struct kernel_param *kp);
+int param_set_long(const char *val, const struct kernel_param *kp);
+int param_get_long(char *buffer, const struct kernel_param *kp);
#define param_check_long(name, p) __param_check(name, p, long)
extern const struct kernel_param_ops param_ops_ulong;
-extern int param_set_ulong(const char *val, const struct kernel_param *kp);
-extern int param_get_ulong(char *buffer, const struct kernel_param *kp);
+int param_set_ulong(const char *val, const struct kernel_param *kp);
+int param_get_ulong(char *buffer, const struct kernel_param *kp);
#define param_check_ulong(name, p) __param_check(name, p, unsigned long)
extern const struct kernel_param_ops param_ops_ullong;
-extern int param_set_ullong(const char *val, const struct kernel_param *kp);
-extern int param_get_ullong(char *buffer, const struct kernel_param *kp);
+int param_set_ullong(const char *val, const struct kernel_param *kp);
+int param_get_ullong(char *buffer, const struct kernel_param *kp);
#define param_check_ullong(name, p) __param_check(name, p, unsigned long long)
extern const struct kernel_param_ops param_ops_hexint;
-extern int param_set_hexint(const char *val, const struct kernel_param *kp);
-extern int param_get_hexint(char *buffer, const struct kernel_param *kp);
+int param_set_hexint(const char *val, const struct kernel_param *kp);
+int param_get_hexint(char *buffer, const struct kernel_param *kp);
#define param_check_hexint(name, p) param_check_uint(name, p)
extern const struct kernel_param_ops param_ops_charp;
-extern int param_set_charp(const char *val, const struct kernel_param *kp);
-extern int param_get_charp(char *buffer, const struct kernel_param *kp);
-extern void param_free_charp(void *arg);
+int param_set_charp(const char *val, const struct kernel_param *kp);
+int param_get_charp(char *buffer, const struct kernel_param *kp);
+void param_free_charp(void *arg);
#define param_check_charp(name, p) __param_check(name, p, char *)
/* We used to allow int as well as bool. We're taking that away! */
extern const struct kernel_param_ops param_ops_bool;
-extern int param_set_bool(const char *val, const struct kernel_param *kp);
-extern int param_get_bool(char *buffer, const struct kernel_param *kp);
+int param_set_bool(const char *val, const struct kernel_param *kp);
+int param_get_bool(char *buffer, const struct kernel_param *kp);
#define param_check_bool(name, p) __param_check(name, p, bool)
extern const struct kernel_param_ops param_ops_bool_enable_only;
-extern int param_set_bool_enable_only(const char *val,
- const struct kernel_param *kp);
+int param_set_bool_enable_only(const char *val, const struct kernel_param *kp);
/* getter is the same as for the regular bool */
#define param_check_bool_enable_only param_check_bool
extern const struct kernel_param_ops param_ops_invbool;
-extern int param_set_invbool(const char *val, const struct kernel_param *kp);
-extern int param_get_invbool(char *buffer, const struct kernel_param *kp);
+int param_set_invbool(const char *val, const struct kernel_param *kp);
+int param_get_invbool(char *buffer, const struct kernel_param *kp);
#define param_check_invbool(name, p) __param_check(name, p, bool)
/* An int, which can only be set like a bool (though it shows as an int). */
extern const struct kernel_param_ops param_ops_bint;
-extern int param_set_bint(const char *val, const struct kernel_param *kp);
+int param_set_bint(const char *val, const struct kernel_param *kp);
#define param_get_bint param_get_int
#define param_check_bint param_check_int
@@ -615,19 +614,19 @@ enum hwparam_type {
extern const struct kernel_param_ops param_array_ops;
extern const struct kernel_param_ops param_ops_string;
-extern int param_set_copystring(const char *val, const struct kernel_param *);
-extern int param_get_string(char *buffer, const struct kernel_param *kp);
+int param_set_copystring(const char *val, const struct kernel_param *kp);
+int param_get_string(char *buffer, const struct kernel_param *kp);
/* for exporting parameters in /sys/module/.../parameters */
struct module;
#if defined(CONFIG_SYSFS) && defined(CONFIG_MODULES)
-extern int module_param_sysfs_setup(struct module *mod,
- const struct kernel_param *kparam,
- unsigned int num_params);
+int module_param_sysfs_setup(struct module *mod,
+ const struct kernel_param *kparam,
+ unsigned int num_params);
-extern void module_param_sysfs_remove(struct module *mod);
+void module_param_sysfs_remove(struct module *mod);
#else
static inline int module_param_sysfs_setup(struct module *mod,
const struct kernel_param *kparam,
--
2.53.0
^ permalink raw reply related
* [PATCH] module: remove MODULE_VERSION()
From: Greg Kroah-Hartman @ 2026-03-13 14:20 UTC (permalink / raw)
To: linux-modules
Cc: linux-kernel, Greg Kroah-Hartman, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
Module "versions" do not make sense as the kernel is built all at once,
the "version" is the overall kernel version number, so modules can not
really be described as having a unique version given that they rely on
the infrastructure of the whole kernel.
For now, just make this an "empty" define, to keep existing code
building properly as the tree is slowly purged of the use of this over
time.
This macro will be removed entirely in the future when there are no
in-tree users.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Daniel Gomez <da.gomez@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Aaron Tomlin <atomlin@atomlin.com>
Cc: Shyam Saini <shyamsaini@linux.microsoft.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/module.h | 56 +++++++++---------------------------------
kernel/params.c | 30 ----------------------
2 files changed, 11 insertions(+), 75 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index 14f391b186c6..37cb369b4c3a 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -62,15 +62,6 @@ struct module_attribute {
void (*free)(struct module *);
};
-struct module_version_attribute {
- struct module_attribute mattr;
- const char *module_name;
- const char *version;
-};
-
-extern ssize_t __modver_version_show(const struct module_attribute *,
- struct module_kobject *, char *);
-
extern const struct module_attribute module_uevent;
/* These are either module local, or the kernel's dummy ones. */
@@ -256,43 +247,18 @@ struct module_kobject *lookup_or_create_module_kobject(const char *name);
static typeof(name) __mod_device_table(type, name) \
__attribute__ ((used, alias(__stringify(name))))
-/* Version of form [<epoch>:]<version>[-<extra-version>].
- * Or for CVS/RCS ID version, everything but the number is stripped.
- * <epoch>: A (small) unsigned integer which allows you to start versions
- * anew. If not mentioned, it's zero. eg. "2:1.0" is after
- * "1:2.0".
-
- * <version>: The <version> may contain only alphanumerics and the
- * character `.'. Ordered by numeric sort for numeric parts,
- * ascii sort for ascii parts (as per RPM or DEB algorithm).
-
- * <extraversion>: Like <version>, but inserted for local
- * customizations, eg "rh3" or "rusty1".
-
- * Using this automatically adds a checksum of the .c files and the
- * local headers in "srcversion".
+/*
+ * Module "versions" do not make sense as the kernel is built all at once, the
+ * "version" is the overall kernel version number, so modules can not really be
+ * described as having a unique version given that they rely on the
+ * infrastructure of the whole kernel.
+ *
+ * For now, just make this an "empty" define, to keep existing code building
+ * properly as the tree is slowly purged of the use of this over time.
+ *
+ * It will be removed in the future when there are no in-tree users.
*/
-
-#if defined(MODULE) || !defined(CONFIG_SYSFS)
-#define MODULE_VERSION(_version) MODULE_INFO(version, _version)
-#else
-#define MODULE_VERSION(_version) \
- MODULE_INFO(version, _version); \
- static const struct module_version_attribute __modver_attr \
- __used __section("__modver") \
- __aligned(__alignof__(struct module_version_attribute)) \
- = { \
- .mattr = { \
- .attr = { \
- .name = "version", \
- .mode = S_IRUGO, \
- }, \
- .show = __modver_version_show, \
- }, \
- .module_name = KBUILD_MODNAME, \
- .version = _version, \
- }
-#endif
+#define MODULE_VERSION(_version)
/* Optional firmware file (or files) needed by the module
* format is simply firmware file name. Multiple firmware
diff --git a/kernel/params.c b/kernel/params.c
index 7188a12dbe86..1b14b1ab5fcb 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -846,35 +846,6 @@ static void __init param_sysfs_builtin(void)
}
}
-ssize_t __modver_version_show(const struct module_attribute *mattr,
- struct module_kobject *mk, char *buf)
-{
- const struct module_version_attribute *vattr =
- container_of_const(mattr, struct module_version_attribute, mattr);
-
- return scnprintf(buf, PAGE_SIZE, "%s\n", vattr->version);
-}
-
-extern const struct module_version_attribute __start___modver[];
-extern const struct module_version_attribute __stop___modver[];
-
-static void __init version_sysfs_builtin(void)
-{
- const struct module_version_attribute *vattr;
- struct module_kobject *mk;
- int err;
-
- for (vattr = __start___modver; vattr < __stop___modver; vattr++) {
- mk = lookup_or_create_module_kobject(vattr->module_name);
- if (mk) {
- err = sysfs_create_file(&mk->kobj, &vattr->mattr.attr);
- WARN_ON_ONCE(err);
- kobject_uevent(&mk->kobj, KOBJ_ADD);
- kobject_put(&mk->kobj);
- }
- }
-}
-
/* module-related sysfs stuff */
static ssize_t module_attr_show(struct kobject *kobj,
@@ -977,7 +948,6 @@ static int __init param_sysfs_builtin_init(void)
if (!module_kset)
return -ENOMEM;
- version_sysfs_builtin();
param_sysfs_builtin();
return 0;
--
2.53.0
^ permalink raw reply related
* Re: [PATCH] module: remove MODULE_VERSION()
From: Greg Kroah-Hartman @ 2026-03-13 15:46 UTC (permalink / raw)
To: linux-modules
Cc: linux-kernel, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <2026031341-evolve-repeater-987b@gregkh>
On Fri, Mar 13, 2026 at 03:20:42PM +0100, Greg Kroah-Hartman wrote:
> Module "versions" do not make sense as the kernel is built all at once,
> the "version" is the overall kernel version number, so modules can not
> really be described as having a unique version given that they rely on
> the infrastructure of the whole kernel.
>
> For now, just make this an "empty" define, to keep existing code
> building properly as the tree is slowly purged of the use of this over
> time.
>
> This macro will be removed entirely in the future when there are no
> in-tree users.
>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> Cc: Petr Pavlu <petr.pavlu@suse.com>
> Cc: Daniel Gomez <da.gomez@kernel.org>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Aaron Tomlin <atomlin@atomlin.com>
> Cc: Shyam Saini <shyamsaini@linux.microsoft.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Thorsten Blum <thorsten.blum@linux.dev>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> include/linux/module.h | 56 +++++++++---------------------------------
> kernel/params.c | 30 ----------------------
> 2 files changed, 11 insertions(+), 75 deletions(-)
Sami just pointed out to me off-list that maybe I should also drop the
srcversion stuff too. I'll gladly do that too, does anyone know if
anyone even uses that anymore?
thanks,
greg k-h
>
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 14f391b186c6..37cb369b4c3a 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -62,15 +62,6 @@ struct module_attribute {
> void (*free)(struct module *);
> };
>
> -struct module_version_attribute {
> - struct module_attribute mattr;
> - const char *module_name;
> - const char *version;
> -};
> -
> -extern ssize_t __modver_version_show(const struct module_attribute *,
> - struct module_kobject *, char *);
> -
> extern const struct module_attribute module_uevent;
>
> /* These are either module local, or the kernel's dummy ones. */
> @@ -256,43 +247,18 @@ struct module_kobject *lookup_or_create_module_kobject(const char *name);
> static typeof(name) __mod_device_table(type, name) \
> __attribute__ ((used, alias(__stringify(name))))
>
> -/* Version of form [<epoch>:]<version>[-<extra-version>].
> - * Or for CVS/RCS ID version, everything but the number is stripped.
> - * <epoch>: A (small) unsigned integer which allows you to start versions
> - * anew. If not mentioned, it's zero. eg. "2:1.0" is after
> - * "1:2.0".
> -
> - * <version>: The <version> may contain only alphanumerics and the
> - * character `.'. Ordered by numeric sort for numeric parts,
> - * ascii sort for ascii parts (as per RPM or DEB algorithm).
> -
> - * <extraversion>: Like <version>, but inserted for local
> - * customizations, eg "rh3" or "rusty1".
> -
> - * Using this automatically adds a checksum of the .c files and the
> - * local headers in "srcversion".
> +/*
> + * Module "versions" do not make sense as the kernel is built all at once, the
> + * "version" is the overall kernel version number, so modules can not really be
> + * described as having a unique version given that they rely on the
> + * infrastructure of the whole kernel.
> + *
> + * For now, just make this an "empty" define, to keep existing code building
> + * properly as the tree is slowly purged of the use of this over time.
> + *
> + * It will be removed in the future when there are no in-tree users.
> */
> -
> -#if defined(MODULE) || !defined(CONFIG_SYSFS)
> -#define MODULE_VERSION(_version) MODULE_INFO(version, _version)
> -#else
> -#define MODULE_VERSION(_version) \
> - MODULE_INFO(version, _version); \
> - static const struct module_version_attribute __modver_attr \
> - __used __section("__modver") \
> - __aligned(__alignof__(struct module_version_attribute)) \
> - = { \
> - .mattr = { \
> - .attr = { \
> - .name = "version", \
> - .mode = S_IRUGO, \
> - }, \
> - .show = __modver_version_show, \
> - }, \
> - .module_name = KBUILD_MODNAME, \
> - .version = _version, \
> - }
> -#endif
> +#define MODULE_VERSION(_version)
>
> /* Optional firmware file (or files) needed by the module
> * format is simply firmware file name. Multiple firmware
> diff --git a/kernel/params.c b/kernel/params.c
> index 7188a12dbe86..1b14b1ab5fcb 100644
> --- a/kernel/params.c
> +++ b/kernel/params.c
> @@ -846,35 +846,6 @@ static void __init param_sysfs_builtin(void)
> }
> }
>
> -ssize_t __modver_version_show(const struct module_attribute *mattr,
> - struct module_kobject *mk, char *buf)
> -{
> - const struct module_version_attribute *vattr =
> - container_of_const(mattr, struct module_version_attribute, mattr);
> -
> - return scnprintf(buf, PAGE_SIZE, "%s\n", vattr->version);
> -}
> -
> -extern const struct module_version_attribute __start___modver[];
> -extern const struct module_version_attribute __stop___modver[];
> -
> -static void __init version_sysfs_builtin(void)
> -{
> - const struct module_version_attribute *vattr;
> - struct module_kobject *mk;
> - int err;
> -
> - for (vattr = __start___modver; vattr < __stop___modver; vattr++) {
> - mk = lookup_or_create_module_kobject(vattr->module_name);
> - if (mk) {
> - err = sysfs_create_file(&mk->kobj, &vattr->mattr.attr);
> - WARN_ON_ONCE(err);
> - kobject_uevent(&mk->kobj, KOBJ_ADD);
> - kobject_put(&mk->kobj);
> - }
> - }
> -}
> -
> /* module-related sysfs stuff */
>
> static ssize_t module_attr_show(struct kobject *kobj,
> @@ -977,7 +948,6 @@ static int __init param_sysfs_builtin_init(void)
> if (!module_kset)
> return -ENOMEM;
>
> - version_sysfs_builtin();
> param_sysfs_builtin();
>
> return 0;
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Sami Tolvanen @ 2026-03-13 17:07 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-modules, linux-kernel, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Aaron Tomlin, Shyam Saini, Kees Cook, Thorsten Blum,
Christoph Hellwig
In-Reply-To: <2026031341-evolve-repeater-987b@gregkh>
Hi Greg,
On Fri, Mar 13, 2026 at 03:20:42PM +0100, Greg Kroah-Hartman wrote:
> Module "versions" do not make sense as the kernel is built all at once,
> the "version" is the overall kernel version number, so modules can not
> really be described as having a unique version given that they rely on
> the infrastructure of the whole kernel.
>
> For now, just make this an "empty" define, to keep existing code
> building properly as the tree is slowly purged of the use of this over
> time.
>
> This macro will be removed entirely in the future when there are no
> in-tree users.
>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> Cc: Petr Pavlu <petr.pavlu@suse.com>
> Cc: Daniel Gomez <da.gomez@kernel.org>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Aaron Tomlin <atomlin@atomlin.com>
> Cc: Shyam Saini <shyamsaini@linux.microsoft.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Thorsten Blum <thorsten.blum@linux.dev>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> include/linux/module.h | 56 +++++++++---------------------------------
> kernel/params.c | 30 ----------------------
> 2 files changed, 11 insertions(+), 75 deletions(-)
This certainly solves our version problems! I noticed there are a
couple of __modver references left in the code base after this patch,
which can be dropped too, and you can also remove the version field
from struct module at the same time.
Sami
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Sami Tolvanen @ 2026-03-13 17:28 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-modules, linux-kernel, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Aaron Tomlin, Shyam Saini, Kees Cook, Thorsten Blum,
Christoph Hellwig
In-Reply-To: <2026031303-prelaunch-creation-3fce@gregkh>
On Fri, Mar 13, 2026 at 04:46:06PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Mar 13, 2026 at 03:20:42PM +0100, Greg Kroah-Hartman wrote:
> > Module "versions" do not make sense as the kernel is built all at once,
> > the "version" is the overall kernel version number, so modules can not
> > really be described as having a unique version given that they rely on
> > the infrastructure of the whole kernel.
> >
> > For now, just make this an "empty" define, to keep existing code
> > building properly as the tree is slowly purged of the use of this over
> > time.
> >
> > This macro will be removed entirely in the future when there are no
> > in-tree users.
> >
> > Cc: Luis Chamberlain <mcgrof@kernel.org>
> > Cc: Petr Pavlu <petr.pavlu@suse.com>
> > Cc: Daniel Gomez <da.gomez@kernel.org>
> > Cc: Sami Tolvanen <samitolvanen@google.com>
> > Cc: Aaron Tomlin <atomlin@atomlin.com>
> > Cc: Shyam Saini <shyamsaini@linux.microsoft.com>
> > Cc: Kees Cook <kees@kernel.org>
> > Cc: Thorsten Blum <thorsten.blum@linux.dev>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> > include/linux/module.h | 56 +++++++++---------------------------------
> > kernel/params.c | 30 ----------------------
> > 2 files changed, 11 insertions(+), 75 deletions(-)
>
>
> Sami just pointed out to me off-list that maybe I should also drop the
> srcversion stuff too. I'll gladly do that too, does anyone know if
> anyone even uses that anymore?
Looks like a lof of distributions enable MODULE_SRCVERSION_ALL, but
I'm not sure if they actually depend on the feature:
https://oracle.github.io/kconfigs/?config=UTS_RELEASE&config=MODULE_SRCVERSION_ALL
Sami
^ permalink raw reply
* Re: [PATCH 02/61] btrfs: Prefer IS_ERR_OR_NULL over manual NULL check
From: David Sterba @ 2026-03-13 19:22 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Chris Mason, David Sterba
In-Reply-To: <20260310-b4-is_err_or_null-v1-2-bd63b656022d@avm.de>
On Tue, Mar 10, 2026 at 12:48:28PM +0100, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> IS_ERR_OR_NULL() already uses likely(!ptr) internally. checkpatch does
> not like nesting it:
> > WARNING: nested (un)?likely() calls, IS_ERR_OR_NULL already uses
> > unlikely() internally
> Remove the explicit use of likely().
>
> Change generated with coccinelle.
>
> To: Chris Mason <clm@fb.com>
> To: David Sterba <dsterba@suse.com>
> Cc: linux-btrfs@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
Added to for-next, we seem to be using IS_ERR_OR_NULL() already in a
few other places so this is makes sense for consistency. Thanks.
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Christophe Leroy (CS GROUP) @ 2026-03-14 10:22 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-modules
Cc: linux-kernel, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <2026031303-prelaunch-creation-3fce@gregkh>
Le 13/03/2026 à 16:46, Greg Kroah-Hartman a écrit :
> On Fri, Mar 13, 2026 at 03:20:42PM +0100, Greg Kroah-Hartman wrote:
>> Module "versions" do not make sense as the kernel is built all at once,
>> the "version" is the overall kernel version number, so modules can not
>> really be described as having a unique version given that they rely on
>> the infrastructure of the whole kernel.
>>
>> For now, just make this an "empty" define, to keep existing code
>> building properly as the tree is slowly purged of the use of this over
>> time.
>>
>> This macro will be removed entirely in the future when there are no
>> in-tree users.
>>
>> Cc: Luis Chamberlain <mcgrof@kernel.org>
>> Cc: Petr Pavlu <petr.pavlu@suse.com>
>> Cc: Daniel Gomez <da.gomez@kernel.org>
>> Cc: Sami Tolvanen <samitolvanen@google.com>
>> Cc: Aaron Tomlin <atomlin@atomlin.com>
>> Cc: Shyam Saini <shyamsaini@linux.microsoft.com>
>> Cc: Kees Cook <kees@kernel.org>
>> Cc: Thorsten Blum <thorsten.blum@linux.dev>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
>> include/linux/module.h | 56 +++++++++---------------------------------
>> kernel/params.c | 30 ----------------------
>> 2 files changed, 11 insertions(+), 75 deletions(-)
>
>
> Sami just pointed out to me off-list that maybe I should also drop the
> srcversion stuff too. I'll gladly do that too, does anyone know if
> anyone even uses that anymore?
If I understand correctly the text in kernel/module/Kconfig, srcversion
is added only for modules which contain a MODULE_VERSION.
So as you drop MODULE_VERSION, srcversion becomes completely useless
doesn't it ?
Christophe
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Christoph Hellwig @ 2026-03-16 8:57 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-modules, linux-kernel, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <2026031341-evolve-repeater-987b@gregkh>
On Fri, Mar 13, 2026 at 03:20:42PM +0100, Greg Kroah-Hartman wrote:
> Module "versions" do not make sense as the kernel is built all at once,
> the "version" is the overall kernel version number, so modules can not
> really be described as having a unique version given that they rely on
> the infrastructure of the whole kernel.
>
> For now, just make this an "empty" define, to keep existing code
> building properly as the tree is slowly purged of the use of this over
> time.
>
> This macro will be removed entirely in the future when there are no
> in-tree users.
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
The removal should be an easy scriptable one after the next -rc1.
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Christoph Hellwig @ 2026-03-16 8:58 UTC (permalink / raw)
To: Christophe Leroy (CS GROUP)
Cc: Greg Kroah-Hartman, linux-modules, linux-kernel, Luis Chamberlain,
Petr Pavlu, Daniel Gomez, Sami Tolvanen, Aaron Tomlin,
Shyam Saini, Kees Cook, Thorsten Blum, Christoph Hellwig
In-Reply-To: <f036410e-f53c-4284-b108-18bcdb2f0d28@kernel.org>
On Sat, Mar 14, 2026 at 11:22:22AM +0100, Christophe Leroy (CS GROUP) wrote:
> > Sami just pointed out to me off-list that maybe I should also drop the
> > srcversion stuff too. I'll gladly do that too, does anyone know if
> > anyone even uses that anymore?
>
> If I understand correctly the text in kernel/module/Kconfig, srcversion is
> added only for modules which contain a MODULE_VERSION.
>
> So as you drop MODULE_VERSION, srcversion becomes completely useless doesn't
> it ?
Looks like it.
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Petr Pavlu @ 2026-03-16 9:37 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-modules, linux-kernel, Luis Chamberlain, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <2026031341-evolve-repeater-987b@gregkh>
On 3/13/26 3:20 PM, Greg Kroah-Hartman wrote:
> Module "versions" do not make sense as the kernel is built all at once,
> the "version" is the overall kernel version number, so modules can not
> really be described as having a unique version given that they rely on
> the infrastructure of the whole kernel.
>
> For now, just make this an "empty" define, to keep existing code
> building properly as the tree is slowly purged of the use of this over
> time.
>
> This macro will be removed entirely in the future when there are no
> in-tree users.
I share a similar sentiment that module versions set by MODULE_VERSION()
are not particularly useful for in-tree modules and the macro is often
used unnecessarily. However, I don't think this patch takes the best
approach to phase it out.
The file Documentation/ABI/stable/sysfs-module documents
/sys/module/<MODULENAME>/version as a stable ABI. Searching for
'^MODULE_VERSION' in v7.0-rc4 shows 600 uses of the macro. My concern is
that if any of these modules has a userspace part that checks the
version, this patch could potentially break users' systems.
I believe it would be safer to start by removing individual uses of
MODULE_VERSION(). That way, we can also learn if we're missing any use
cases for having module versions.
The original patch "Add a MODULE_VERSION macro" [1] from 2004 doesn't
say much about the motivation for adding module versions, but it does
mention that they should be accessible via sysfs. That was implemented
a year later in commit c988d2b28454 ("[PATCH] modules: add version and
srcversion to sysfs") [2], which primarily discusses use cases related
to DKMS, and to administrators + tech support needing to know what is
actually loaded on the system. For the latter, I believe srcversion (or
something similar) should be sufficient.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=466ae11966ae380eb5e10cdf323e140d824fa10c
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c988d2b2845495373f666a381d354a7f80981d62
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Greg Kroah-Hartman @ 2026-03-16 10:03 UTC (permalink / raw)
To: Petr Pavlu
Cc: linux-modules, linux-kernel, Luis Chamberlain, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <d622c70a-f79a-4215-84fb-c5de0a8f6ce5@suse.com>
On Mon, Mar 16, 2026 at 10:37:38AM +0100, Petr Pavlu wrote:
> On 3/13/26 3:20 PM, Greg Kroah-Hartman wrote:
> > Module "versions" do not make sense as the kernel is built all at once,
> > the "version" is the overall kernel version number, so modules can not
> > really be described as having a unique version given that they rely on
> > the infrastructure of the whole kernel.
> >
> > For now, just make this an "empty" define, to keep existing code
> > building properly as the tree is slowly purged of the use of this over
> > time.
> >
> > This macro will be removed entirely in the future when there are no
> > in-tree users.
>
> I share a similar sentiment that module versions set by MODULE_VERSION()
> are not particularly useful for in-tree modules and the macro is often
> used unnecessarily. However, I don't think this patch takes the best
> approach to phase it out.
>
> The file Documentation/ABI/stable/sysfs-module documents
> /sys/module/<MODULENAME>/version as a stable ABI. Searching for
> '^MODULE_VERSION' in v7.0-rc4 shows 600 uses of the macro. My concern is
> that if any of these modules has a userspace part that checks the
> version, this patch could potentially break users' systems.
sysfs use is ALWAYS "if the file is not there, the userspace tool should
keep working". How would userspace every do something different if a
module version flag is not there, that is not how a kernel driver should
ever be attempting to communicate with userspace as to what the api is,
or is not.
And as the module version does not even work for any stable kernel
release, it's kind of proof that userspace does not rely on this.
> I believe it would be safer to start by removing individual uses of
> MODULE_VERSION(). That way, we can also learn if we're missing any use
> cases for having module versions.
Sure, I'll make up a patch to drop all 700 uses, but how is that much
different? :)
> The original patch "Add a MODULE_VERSION macro" [1] from 2004 doesn't
> say much about the motivation for adding module versions, but it does
> mention that they should be accessible via sysfs.
That was because we were just exporting all of the module information in
sysfs, not due to us attempting to do anything special with that info in
userspace other than "hey we have an attribute, let's export it!"
> That was implemented
> a year later in commit c988d2b28454 ("[PATCH] modules: add version and
> srcversion to sysfs") [2], which primarily discusses use cases related
> to DKMS, and to administrators + tech support needing to know what is
> actually loaded on the system. For the latter, I believe srcversion (or
> something similar) should be sufficient.
And does dkms actually do anything with this sysfs value? At quick
glance, I couldn't see anything.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Petr Pavlu @ 2026-03-16 10:48 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-modules, linux-kernel, Luis Chamberlain, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <2026031303-prelaunch-creation-3fce@gregkh>
On 3/13/26 4:46 PM, Greg Kroah-Hartman wrote:
> On Fri, Mar 13, 2026 at 03:20:42PM +0100, Greg Kroah-Hartman wrote:
>> Module "versions" do not make sense as the kernel is built all at once,
>> the "version" is the overall kernel version number, so modules can not
>> really be described as having a unique version given that they rely on
>> the infrastructure of the whole kernel.
>>
>> For now, just make this an "empty" define, to keep existing code
>> building properly as the tree is slowly purged of the use of this over
>> time.
>>
>> This macro will be removed entirely in the future when there are no
>> in-tree users.
>>
>> Cc: Luis Chamberlain <mcgrof@kernel.org>
>> Cc: Petr Pavlu <petr.pavlu@suse.com>
>> Cc: Daniel Gomez <da.gomez@kernel.org>
>> Cc: Sami Tolvanen <samitolvanen@google.com>
>> Cc: Aaron Tomlin <atomlin@atomlin.com>
>> Cc: Shyam Saini <shyamsaini@linux.microsoft.com>
>> Cc: Kees Cook <kees@kernel.org>
>> Cc: Thorsten Blum <thorsten.blum@linux.dev>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
>> include/linux/module.h | 56 +++++++++---------------------------------
>> kernel/params.c | 30 ----------------------
>> 2 files changed, 11 insertions(+), 75 deletions(-)
>
>
> Sami just pointed out to me off-list that maybe I should also drop the
> srcversion stuff too. I'll gladly do that too, does anyone know if
> anyone even uses that anymore?
Despite its name, I believe srcversion is primarily used to identify
binaries. Nowadays, modules contain build IDs, which is a standard
mechanism for this. The information is available already via
/sys/module/<module>/notes/.note.gnu.build-id, so removing the
srcversion data makes sense to me.
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH 50/61] iommu: Prefer IS_ERR_OR_NULL over manual NULL check
From: Robin Murphy @ 2026-03-16 13:30 UTC (permalink / raw)
To: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
linux-nfs, linux-omap, linux-phy, linux-pm, linux-rockchip,
linux-s390, linux-scsi, linux-sctp, linux-security-module,
linux-sh, linux-sound, linux-stm32, linux-trace-kernel, linux-usb,
linux-wireless, netdev, ntfs3, samba-technical, sched-ext,
target-devel, tipc-discussion, v9fs
Cc: Joerg Roedel, Will Deacon
In-Reply-To: <20260310-b4-is_err_or_null-v1-50-bd63b656022d@avm.de>
On 2026-03-10 11:49 am, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
AFAICS it doesn't look possible for the argument to be anything other
than valid at both callsites, so *both* conditions here seem in fact to
be entirely redundant.
> Change generated with coccinelle.
Please use coccinelle responsibly. Mechanical changes are great for
scripted API updates, but for cleanup, whilst it's ideal for *finding*
areas of code that are worth looking at, the code then wants actually
looking at, in its whole context, because meaningful cleanup often goes
deeper than trivial replacement.
In particular, anywhere IS_ERR_OR_NULL() is genuinely relevant is
usually a sign of bad interface design, so if you're looking at this
then you really should be looking first and foremost to remove any
checks that are already unnecessary, and for the remainder, to see if
the thing being checked can be improved to not mix the two different
styles. That would be constructive and (usually) welcome cleanup. Simply
churning a bunch of code with this ugly macro that's arguably less
readable than what it replaces, not so much.
Thanks,
Robin.
> To: Joerg Roedel <joro@8bytes.org>
> To: Will Deacon <will@kernel.org>
> To: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux.dev
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
> drivers/iommu/omap-iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
> index 8231d7d6bb6a9202025643639a6b28e6faa84659..500a42b57a997696ff37c76f028a717ab71d01f9 100644
> --- a/drivers/iommu/omap-iommu.c
> +++ b/drivers/iommu/omap-iommu.c
> @@ -881,7 +881,7 @@ static int omap_iommu_attach(struct omap_iommu *obj, u32 *iopgd)
> **/
> static void omap_iommu_detach(struct omap_iommu *obj)
> {
> - if (!obj || IS_ERR(obj))
> + if (IS_ERR_OR_NULL(obj))
> return;
>
> spin_lock(&obj->iommu_lock);
>
^ permalink raw reply
* Re: [PATCH v2] module.lds,codetag: force 0 sh_addr for sections
From: Petr Pavlu @ 2026-03-16 14:23 UTC (permalink / raw)
To: Sami Tolvanen, Joe Lawrence
Cc: linux-modules, linux-kernel, Luis Chamberlain, Daniel Gomez,
Aaron Tomlin, Petr Mladek, Josh Poimboeuf
In-Reply-To: <20260311211207.GA2440964@google.com>
On 3/11/26 10:12 PM, Sami Tolvanen wrote:
> On Wed, Mar 04, 2026 at 08:52:37PM -0500, Joe Lawrence wrote:
>> Commit 1ba9f8979426 ("vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and
>> related macros") added .text and made .data, .bss, and .rodata sections
>> unconditional in the module linker script, but without an explicit
>> address like the other sections in the same file.
>>
>> When linking modules with ld.bfd -r, sections defined without an address
>> inherit the location counter, resulting in non-zero sh_addr values in
>> the .ko. Relocatable objects are expected to have sh_addr=0 for these
>> sections and these non-zero addresses confuse elfutils and have been
>> reported to cause segmentation faults in SystemTap [1].
>>
>> Add the 0 address specifier to all sections in module.lds, including the
>> .codetag.* sections via MOD_SEPARATE_CODETAG_SECTIONS macro.
>>
>> Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33958
>> Fixes: 1ba9f8979426 ("vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related macros")
>> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
>> ---
>> include/asm-generic/codetag.lds.h | 2 +-
>> scripts/module.lds.S | 12 ++++++------
>> 2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> v2:
>> - Update the MOD_SEPARATE_CODETAG_SECTION for .codetag.* as well [Petr]
>
> Do we also need similar changes in any of the architecture-specific module
> linker scripts (arch/*/include/asm/module.lds.h)?
I overlooked these architecture-specific files. I believe we should do
the same for them. For instance, riscv explicitly defines the .plt, .got
and .got.plt sections, and they have misleading addresses:
$ readelf -t fs/xfs/xfs.ko
[...]
Section Headers:
[Nr] Name
Type Address Offset Link
Size EntSize Info Align
Flags
[...]
[48] __versions
PROGBITS 0000000000000000 0000000000194e90 0
0000000000007900 0000000000000000 0 8
[0000000000000002]: ALLOC
[49] .plt
PROGBITS 0000000000007900 000000000019c790 0
0000000000000001 0000000000000000 0 1
[0000000000000006]: ALLOC, EXEC
[50] .got
PROGBITS 0000000000007901 000000000019c791 0
0000000000000001 0000000000000000 0 1
[0000000000000003]: WRITE, ALLOC
[51] .got.plt
PROGBITS 0000000000007902 000000000019c792 0
0000000000000001 0000000000000000 0 1
[0000000000000002]: ALLOC
[...]
Nonetheless, this can be done separately. I think fixes for these files
should better go through architecture-specific trees anyway.
I can check the individual architectures and prepare the necessary
patches, unless someone else is already looking into this or wants to
take a look.
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH] module: remove MODULE_VERSION()
From: Sami Tolvanen @ 2026-03-16 17:25 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christophe Leroy (CS GROUP), Greg Kroah-Hartman, linux-modules,
linux-kernel, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Aaron Tomlin, Shyam Saini, Kees Cook, Thorsten Blum
In-Reply-To: <abfGQUxgm8mEWlAz@infradead.org>
On Mon, Mar 16, 2026 at 01:58:41AM -0700, Christoph Hellwig wrote:
> On Sat, Mar 14, 2026 at 11:22:22AM +0100, Christophe Leroy (CS GROUP) wrote:
> > > Sami just pointed out to me off-list that maybe I should also drop the
> > > srcversion stuff too. I'll gladly do that too, does anyone know if
> > > anyone even uses that anymore?
> >
> > If I understand correctly the text in kernel/module/Kconfig, srcversion is
> > added only for modules which contain a MODULE_VERSION.
> >
> > So as you drop MODULE_VERSION, srcversion becomes completely useless doesn't
> > it ?
>
> Looks like it.
Looking at modpost, srcversions are added to all modules if
MODULE_SRCVERSION_ALL is enabled, whether they have MODULE_VERSION or
not. Doesn't mean it's not completely useless, of course.
Sami
^ permalink raw reply
* Re: [PATCH 46/61] vfio: Prefer IS_ERR_OR_NULL over manual NULL check
From: Alex Williamson @ 2026-03-16 22:10 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, alex
In-Reply-To: <20260310-b4-is_err_or_null-v1-46-bd63b656022d@avm.de>
On Tue, 10 Mar 2026 12:49:12 +0100
Philipp Hahn <phahn-oss@avm.de> wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Change generated with coccinelle.
>
> To: Alex Williamson <alex@shazbot.org>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
> drivers/vfio/vfio_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 742477546b15d4dbaf9ebcfb2e67627db71521e0..d71922dfde5885967398deddec3e9e04b05adfec 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -923,7 +923,7 @@ vfio_ioctl_device_feature_mig_device_state(struct vfio_device *device,
>
> /* Handle the VFIO_DEVICE_FEATURE_SET */
> filp = device->mig_ops->migration_set_state(device, mig.device_state);
> - if (IS_ERR(filp) || !filp)
> + if (IS_ERR_OR_NULL(filp))
> goto out_copy;
>
> return vfio_ioct_mig_return_fd(filp, arg, &mig);
>
As others have expressed in general, this doesn't seem to be cleaner
and tends to mask that we consider IS_ERR() and NULL as separate cases
in the goto. This code looks like it could use some refactoring, and
likely that refactoring should handle the IS_ERR() and NULL cases
separately, but conflating them here is not an improvement. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH v18 00/42] DEPT(DEPendency Tracker)
From: Byungchul Park @ 2026-03-17 4:44 UTC (permalink / raw)
To: linux-kernel
Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
corbet, catalin.marinas, bp, x86, hpa, luto, sumit.semwal,
gustavo, christian.koenig, andi.shyti, arnd, lorenzo.stoakes,
Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu, da.gomez,
samitolvanen, paulmck, frederic, neeraj.upadhyay, joelagnelf,
josh, urezki, mathieu.desnoyers, jiangshanlai, qiang.zhang,
juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
vschneid, chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy,
anna, kees, bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
alexander.shishkin, lillian, chenhuacai, francesco,
guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel,
2407018371, dakr, miguel.ojeda.sandonis, neilb, bagasdotme,
wsa+renesas, dave.hansen, geert, ojeda, alex.gaynor, gary,
bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, rust-for-linux
In-Reply-To: <20251205071855.72743-1-byungchul@sk.com>
On Fri, Dec 05, 2025 at 04:18:13PM +0900, Byungchul Park wrote:
> I'm happy to see that DEPT reported real problems in practice:
>
> https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/
> https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
> https://lore.kernel.org/all/b6e00e77-4a8c-4e05-ab79-266bf05fcc2d@igalia.com/
>
> I’ve added documentation describing DEPT — this should help you
> understand what DEPT is and how it works. You can use DEPT simply by
> enabling CONFIG_DEPT and checking dmesg at runtime.
> ---
>
> Hi Linus and folks,
Hi Linus and Ingo,
Although the DEPT tool still has areas for improvement, I am confident
it employs the most appropriate method for tracking dependencies within
Linux kernel.
If it is to be maintained in the source tree, which subsystem would be
the most suitable for its management? Personally, I believe introducing
a dedicated dependency subsystem under 'kernel/' would be ideal, though
managing it within the already well-maintained scheduler or locking
subsystems might be more practical.
Since DEPT tracks not only locking mechanisms but also general sleep and
wake events, I have avoided placing it under the locking subsystem thus
far. If you have alternative or more fitting suggestions, I would
appreciate your input.
Thanks.
Byungchul
>
> I’ve been developing a tool to detect deadlock possibilities by tracking
> waits/events — rather than lock acquisition order — to cover all the
> synchronization mechanisms. To summarize the design rationale, starting
> from the problem statement, through analysis, to the solution:
>
> CURRENT STATUS
> --------------
> Lockdep tracks lock acquisition order to identify deadlock conditions.
> Additionally, it tracks IRQ state changes — via {en,dis}able — to
> detect cases where locks are acquired unintentionally during
> interrupt handling.
>
> PROBLEM
> -------
> Waits and their associated events that are never reachable can
> eventually lead to deadlocks. However, since Lockdep focuses solely
> on lock acquisition order, it has inherent limitations when handling
> waits and events.
>
> Moreover, by tracking only lock acquisition order, Lockdep cannot
> properly handle read locks or cross-event scenarios — such as
> wait_for_completion() and complete() — making it increasingly
> inadequate as a general-purpose deadlock detection tool.
>
> SOLUTION
> --------
> Once again, waits and their associated events that are never
> reachable can eventually lead to deadlocks. The new solution, DEPT,
> focuses directly on waits and events. DEPT monitors waits and events,
> and reports them when any become unreachable.
>
> DEPT provides:
>
> * Correct handling of read locks.
> * Support for general waits and events.
> * Continuous operation, even after multiple reports.
> * Simple, intuitive annotation APIs.
>
> There are still false positives, and some are already being worked on
> for suppression. Especially splitting the folio class into several
> appropriate classes e.g. block device mapping class and regular file
> mapping class, is currently under active development by me and Yeoreum
> Yun.
>
> Anyway, these efforts will need to continue for a while, as we’ve seen
> with lockdep over two decades. DEPT is tagged as EXPERIMENTAL in
> Kconfig — meaning it’s not yet suitable for use as an automation tool.
>
> However, for those who are interested in using DEPT to analyze complex
> synchronization patterns and extract dependency insights, DEPT would be
> a great tool for the purpose.
>
> Thanks for your support and contributions to:
>
> Harry Yoo <harry.yoo@oracle.com>
> Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
> Yunseong Kim <ysk@kzalloc.com>
> Yeoreum Yun <yeoreum.yun@arm.com>
>
> FAQ
> ---
> Q. Is this the first attempt to solve this problem?
> A. No. The cross-release feature (commit b09be676e0ff2) attempted to
> address it — as a Lockdep extension. It was merged, but quickly
> reverted, because:
>
> While it uncovered valuable hidden issues, it also introduced false
> positives. Since these false positives mask further real problems
> with Lockdep — and developers strongly dislike them — the feature was
> rolled back.
>
> Q. Why wasn’t DEPT built as a Lockdep extension?
> A. Lockdep is the result of years of work by kernel developers — and is
> now very stable. But I chose to build DEPT separately, because:
>
> While reusing BFS(Breadth First Search) and Lockdep’s hashing is
> beneficial, the rest of the system must be rebuilt from scratch to
> align with DEPT’s wait-event model — since Lockdep was originally
> designed for tracking lock acquisition orders, not wait-event
> dependencies.
>
> Q. Do you plan to replace Lockdep entirely?
> A. Not at all — Lockdep still plays a vital role in validating correct
> lock usage. While its dependency-checking logic should eventually be
> superseded by DEPT, the rest of its functionality should stay.
>
> Q. Should we replace the dependency check immediately?
> A. Absolutely not. Lockdep’s stability is the result of years of hard
> work by kernel developers. Lockdep and DEPT should run side by side
> until DEPT matures.
>
> Q. Stronger detection often leads to more false positives — which was a
> major pain point when cross-release was added. Is DEPT designed to
> handle this?
> A. Yes. DEPT’s simple, generalized design enables flexible reporting —
> so while false positives still need fixing, they’re far less
> disruptive than they were under the Lockdep extension, cross-release.
>
> Q. Why not fix all false positives out-of-tree before merging?
> A. Since the affected subsystems span the entire kernel, like Lockdep,
> which has relied on annotations to avoid false positives over the
> last two decades, DEPT too will require the annotation efforts.
>
> Performing annotation work within the mainline will help us add
> annotations more appropriately and will also make DEPT a useful tool
> for a wider range of users more quickly.
>
> CONFIG_DEPT is marked EXPERIMENTAL, so it’s opt-in. Some users are
> already interested in using DEPT to analyze complex synchronization
> patterns and extract dependency insights.
>
> Byungchul
> ---
> Changes from v17:
>
> 1. Rebase on the mainline as of 2025 Dec 5.
> 2. Convert the documents' format from txt to rst. (feedbacked
> by Jonathan Corbet and Bagas Sanjaya)
> 3. Move the documents from 'Documentation/dependency' to
> 'Documentation/dev-tools'. (feedbakced by Jonathan Corbet)
> 4. Improve the documentation. (feedbacked by NeilBrown)
> 5. Use a common function, enter_from_user_mode(), instead of
> arch specific code, to notice context switch from user mode.
> (feedbacked by Dave Hansen, Mark Rutland, and Mark Brown)
> 6. Resolve the header dependency issue by using dept's internal
> header, instead of relocating 'struct llist_{head,node}' to
> another header. (feedbacked by Greg KH)
> 7. Improve page(or folio) usage type APIs.
> 8. Add rust helper for wait_for_completion(). (feedbacked by
> Guangbo Cui, Boqun Feng, and Danilo Krummrich)
> 9. Refine some commit messages.
>
> Changes from v16:
>
> 1. Rebase on v6.17.
> 2. Fix a false positive from rcu (by Yunseong Kim)
> 3. Introduce APIs to set page's usage, dept_set_page_usage() and
> dept_reset_page_usage() to avoid false positives.
> 4. Consider lock_page() as a potential wait unconditionally.
> 5. Consider folio_lock_killable() as a potential wait
> unconditionally.
> 6. Add support for tracking PG_writeback waits and events.
> 7. Fix two build errors due to the additional debug information
> added by dept. (by Yunseong Kim)
>
> Changes from v15:
>
> 1. Fix typo and improve comments and commit messages (feedbacked
> by ALOK TIWARI, Waiman Long, and kernel test robot).
> 2. Do not stop dept on detection of cicular dependency of
> recover event, allowing to keep reporting.
> 3. Add SK hynix to copyright.
> 4. Consider folio_lock() as a potential wait unconditionally.
> 5. Fix Kconfig dependency bug (feedbacked by kernel test rebot).
> 6. Do not suppress reports that involve classes even that have
> already involved in other reports, allowing to keep
> reporting.
>
> Changes from v14:
>
> 1. Rebase on the current latest, v6.15-rc6.
> 2. Refactor dept code.
> 3. With multi event sites for a single wait, even if an event
> forms a circular dependency, the event can be recovered by
> other event(or wake up) paths. Even though informing the
> circular dependency is worthy but it should be suppressed
> once informing it, if it doesn't lead an actual deadlock. So
> introduce APIs to annotate the relationship between event
> site and recover site, that are, event_site() and
> dept_recover_event().
> 4. wait_for_completion() worked with dept map embedded in struct
> completion. However, it generates a few false positves since
> all the waits using the instance of struct completion, share
> the map and key. To avoid the false positves, make it not to
> share the map and key but each wait_for_completion() caller
> have its own key by default. Of course, external maps also
> can be used if needed.
> 5. Fix a bug about hardirq on/off tracing.
> 6. Implement basic unit test for dept.
> 7. Add more supports for dma fence synchronization.
> 8. Add emergency stop of dept e.g. on panic().
> 9. Fix false positives by mmu_notifier_invalidate_*().
> 10. Fix recursive call bug by DEPT_WARN_*() and DEPT_STOP().
> 11. Fix trivial bugs in DEPT_WARN_*() and DEPT_STOP().
> 12. Fix a bug that a spin lock, dept_pool_spin, is used in
> both contexts of irq disabled and enabled without irq
> disabled.
> 13. Suppress reports with classes, any of that already have
> been reported, even though they have different chains but
> being barely meaningful.
> 14. Print stacktrace of the wait that an event is now waking up,
> not only stacktrace of the event.
> 15. Make dept aware of lockdep_cmp_fn() that is used to avoid
> false positives in lockdep so that dept can also avoid them.
> 16. Do do_event() only if there are no ecxts have been
> delimited.
> 17. Fix a bug that was not synchronized for stage_m in struct
> dept_task, using a spin lock, dept_task()->stage_lock.
> 18. Fix a bug that dept didn't handle the case that multiple
> ttwus for a single waiter can be called at the same time
> e.i. a race issue.
> 19. Distinguish each kernel context from others, not only by
> system call but also by user oriented fault so that dept can
> work with more accuracy information about kernel context.
> That helps to avoid a few false positives.
> 20. Limit dept's working to x86_64 and arm64.
>
> Changes from v13:
>
> 1. Rebase on the current latest version, v6.9-rc7.
> 2. Add 'dept' documentation describing dept APIs.
>
> Changes from v12:
>
> 1. Refine the whole document for dept.
> 2. Add 'Interpret dept report' section in the document, using a
> deadlock report obtained in practice. Hope this version of
> document helps guys understand dept better.
>
> https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
> https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
>
> Changes from v11:
>
> 1. Add 'dept' documentation describing the concept of dept.
> 2. Rewrite the commit messages of the following commits for
> using weaker lockdep annotation, for better description.
>
> fs/jbd2: Use a weaker annotation in journal handling
> cpu/hotplug: Use a weaker annotation in AP thread
>
> (feedbacked by Thomas Gleixner)
>
> Changes from v10:
>
> 1. Fix noinstr warning when building kernel source.
> 2. dept has been reporting some false positives due to the folio
> lock's unfairness. Reflect it and make dept work based on
> dept annotaions instead of just wait and wake up primitives.
> 3. Remove the support for PG_writeback while working on 2. I
> will add the support later if needed.
> 4. dept didn't print stacktrace for [S] if the participant of a
> deadlock is not lock mechanism but general wait and event.
> However, it made hard to interpret the report in that case.
> So add support to print stacktrace of the requestor who asked
> the event context to run - usually a waiter of the event does
> it just before going to wait state.
> 5. Give up tracking raw_local_irq_{disable,enable}() since it
> totally messed up dept's irq tracking. So make it work in the
> same way as lockdep does. I will consider it once any false
> positives by those are observed again.
> 6. Change the manual rwsem_acquire_read(->j_trans_commit_map)
> annotation in fs/jbd2/transaction.c to the try version so
> that it works as much as it exactly needs.
> 7. Remove unnecessary 'inline' keyword in dept.c and add
> '__maybe_unused' to a needed place.
>
> Changes from v9:
>
> 1. Fix a bug. SDT tracking didn't work well because of my big
> mistake that I should've used waiter's map to indentify its
> class but it had been working with waker's one. FYI,
> PG_locked and PG_writeback weren't affected. They still
> worked well. (reported by YoungJun)
>
> Changes from v8:
>
> 1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and
> EXPORT_SYMBOL(PG_writeback_map) for kernel module build -
> appologize for that. (reported by kernel test robot)
> 2. Fix build error by removing header file's circular dependency
> that was caused by "atomic.h", "kernel.h" and "irqflags.h",
> which I introduced - appolgize for that. (reported by kernel
> test robot)
>
> Changes from v7:
>
> 1. Fix a bug that cannot track rwlock dependency properly,
> introduced in v7. (reported by Boqun and lockdep selftest)
> 2. Track wait/event of PG_{locked,writeback} more aggressively
> assuming that when a bit of PG_{locked,writeback} is cleared
> there might be waits on the bit. (reported by Linus, Hillf
> and syzbot)
> 3. Fix and clean bad style code e.i. unnecessarily introduced
> a randome pattern and so on. (pointed out by Linux)
> 4. Clean code for applying dept to wait_for_completion().
>
> Changes from v6:
>
> 1. Tie to task scheduler code to track sleep and try_to_wake_up()
> assuming sleeps cause waits, try_to_wake_up()s would be the
> events that those are waiting for, of course with proper dept
> annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
> and so on. For these cases, class is classified at sleep
> entrance rather than the synchronization initialization code.
> Which would extremely reduce false alarms.
> 2. Remove the dept associated instance in each page struct for
> tracking dependencies by PG_locked and PG_writeback thanks to
> the 1. work above.
> 3. Introduce CONFIG_dept_AGGRESIVE_TIMEOUT_WAIT to suppress
> reports that waits with timeout set are involved, for those
> who don't like verbose reporting.
> 4. Add a mechanism to refill the internal memory pools on
> running out so that dept could keep working as long as free
> memory is available in the system.
> 5. Re-enable tracking hashed-waitqueue wait. That's going to no
> longer generate false positives because class is classified
> at sleep entrance rather than the waitqueue initailization.
> 6. Refactor to make it easier to port onto each new version of
> the kernel.
> 7. Apply dept to dma fence.
> 8. Do trivial optimizaitions.
>
> Changes from v5:
>
> 1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
> of internal resources because WARN_*() printing stacktrace is
> too much for informing the lack. (feedback from Ted, Hyeonggon)
> 2. Fix trivial bugs like missing initializing a struct before
> using it.
> 3. Assign a different class per task when handling onstack
> variables for waitqueue or the like. Which makes dept
> distinguish between onstack variables of different tasks so
> as to prevent false positives. (reported by Hyeonggon)
> 4. Make dept aware of even raw_local_irq_*() to prevent false
> positives. (reported by Hyeonggon)
> 5. Don't consider dependencies between the events that might be
> triggered within __schedule() and the waits that requires
> __schedule(), real ones. (reported by Hyeonggon)
> 6. Unstage the staged wait that has prepare_to_wait_event()'ed
> *and* yet to get to __schedule(), if we encounter __schedule()
> in-between for another sleep, which is possible if e.g. a
> mutex_lock() exists in 'condition' of ___wait_event().
> 7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
> on the hardirq and softirq entrance tracing to make dept more
> portable for now.
>
> Changes from v4:
>
> 1. Fix some bugs that produce false alarms.
> 2. Distinguish each syscall context from another *for arm64*.
> 3. Make it not warn it but just print it in case dept ring
> buffer gets exhausted. (feedback from Hyeonggon)
> 4. Explicitely describe "EXPERIMENTAL" and "dept might produce
> false positive reports" in Kconfig. (feedback from Ted)
>
> Changes from v3:
>
> 1. dept shouldn't create dependencies between different depths
> of a class that were indicated by *_lock_nested(). dept
> normally doesn't but it does once another lock class comes
> in. So fixed it. (feedback from Hyeonggon)
> 2. dept considered a wait as a real wait once getting to
> __schedule() even if it has been set to TASK_RUNNING by wake
> up sources in advance. Fixed it so that dept doesn't consider
> the case as a real wait. (feedback from Jan Kara)
> 3. Stop tracking dependencies with a map once the event
> associated with the map has been handled. dept will start to
> work with the map again, on the next sleep.
>
> Changes from v2:
>
> 1. Disable dept on bit_wait_table[] in sched/wait_bit.c
> reporting a lot of false positives, which is my fault.
> Wait/event for bit_wait_table[] should've been tagged in a
> higher layer for better work, which is a future work.
> (feedback from Jan Kara)
> 2. Disable dept on crypto_larval's completion to prevent a false
> positive.
>
> Changes from v1:
>
> 1. Fix coding style and typo. (feedback from Steven)
> 2. Distinguish each work context from another in workqueue.
> 3. Skip checking lock acquisition with nest_lock, which is about
> correct lock usage that should be checked by lockdep.
>
> Changes from RFC(v0):
>
> 1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> (feedback from Linus and Matthew)
> 2. Use try version at lockdep_acquire_cpus_lock() annotation.
> 3. Distinguish each syscall context from another.
>
> Byungchul Park (41):
> dept: implement DEPT(DEPendency Tracker)
> dept: add single event dependency tracker APIs
> dept: add lock dependency tracker APIs
> dept: tie to lockdep and IRQ tracing
> dept: add proc knobs to show stats and dependency graph
> dept: distinguish each kernel context from another
> dept: distinguish each work from another
> dept: add a mechanism to refill the internal memory pools on running
> out
> dept: record the latest one out of consecutive waits of the same class
> dept: apply sdt_might_sleep_{start,end}() to
> wait_for_completion()/complete()
> dept: apply sdt_might_sleep_{start,end}() to swait
> dept: apply sdt_might_sleep_{start,end}() to waitqueue wait
> dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait
> dept: apply sdt_might_sleep_{start,end}() to dma fence
> dept: track timeout waits separately with a new Kconfig
> dept: apply timeout consideration to wait_for_completion()/complete()
> dept: apply timeout consideration to swait
> dept: apply timeout consideration to waitqueue wait
> dept: apply timeout consideration to hashed-waitqueue wait
> dept: apply timeout consideration to dma fence wait
> dept: make dept able to work with an external wgen
> dept: track PG_locked with dept
> dept: print staged wait's stacktrace on report
> locking/lockdep: prevent various lockdep assertions when
> lockdep_off()'ed
> dept: add documents for dept
> cpu/hotplug: use a weaker annotation in AP thread
> dept: assign dept map to mmu notifier invalidation synchronization
> dept: assign unique dept_key to each distinct dma fence caller
> dept: make dept aware of lockdep_set_lock_cmp_fn() annotation
> dept: make dept stop from working on debug_locks_off()
> dept: assign unique dept_key to each distinct wait_for_completion()
> caller
> completion, dept: introduce init_completion_dmap() API
> dept: introduce a new type of dependency tracking between multi event
> sites
> dept: add module support for struct dept_event_site and
> dept_event_site_dep
> dept: introduce event_site() to disable event tracking if it's
> recoverable
> dept: implement a basic unit test for dept
> dept: call dept_hardirqs_off() in local_irq_*() regardless of irq
> state
> dept: introduce APIs to set page usage and use subclasses_evt for the
> usage
> dept: track PG_writeback with dept
> SUNRPC: relocate struct rcu_head to the first field of struct rpc_xprt
> mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on DEPT and large
> PAGE_SIZE
>
> Yunseong Kim (1):
> rcu/update: fix same dept key collision between various types of RCU
>
> Documentation/dev-tools/dept.rst | 778 ++++++
> Documentation/dev-tools/dept_api.rst | 125 +
> drivers/dma-buf/dma-fence.c | 23 +-
> include/asm-generic/vmlinux.lds.h | 13 +-
> include/linux/completion.h | 124 +-
> include/linux/dept.h | 402 +++
> include/linux/dept_ldt.h | 78 +
> include/linux/dept_sdt.h | 68 +
> include/linux/dept_unit_test.h | 67 +
> include/linux/dma-fence.h | 74 +-
> include/linux/hardirq.h | 3 +
> include/linux/irq-entry-common.h | 4 +
> include/linux/irqflags.h | 21 +-
> include/linux/local_lock_internal.h | 1 +
> include/linux/lockdep.h | 105 +-
> include/linux/lockdep_types.h | 3 +
> include/linux/mm_types.h | 4 +
> include/linux/mmu_notifier.h | 26 +
> include/linux/module.h | 5 +
> include/linux/mutex.h | 1 +
> include/linux/page-flags.h | 217 +-
> include/linux/pagemap.h | 37 +-
> include/linux/percpu-rwsem.h | 2 +-
> include/linux/percpu.h | 4 +
> include/linux/rcupdate_wait.h | 13 +-
> include/linux/rtmutex.h | 1 +
> include/linux/rwlock_types.h | 1 +
> include/linux/rwsem.h | 1 +
> include/linux/sched.h | 118 +
> include/linux/seqlock.h | 2 +-
> include/linux/spinlock_types_raw.h | 3 +
> include/linux/srcu.h | 2 +-
> include/linux/sunrpc/xprt.h | 9 +-
> include/linux/swait.h | 3 +
> include/linux/wait.h | 3 +
> include/linux/wait_bit.h | 3 +
> init/init_task.c | 2 +
> init/main.c | 2 +
> kernel/Makefile | 1 +
> kernel/cpu.c | 2 +-
> kernel/dependency/Makefile | 5 +
> kernel/dependency/dept.c | 3499 ++++++++++++++++++++++++++
> kernel/dependency/dept_hash.h | 10 +
> kernel/dependency/dept_internal.h | 314 +++
> kernel/dependency/dept_object.h | 13 +
> kernel/dependency/dept_proc.c | 94 +
> kernel/dependency/dept_unit_test.c | 173 ++
> kernel/exit.c | 1 +
> kernel/fork.c | 2 +
> kernel/locking/lockdep.c | 33 +
> kernel/module/main.c | 19 +
> kernel/rcu/rcu.h | 1 +
> kernel/rcu/update.c | 5 +-
> kernel/sched/completion.c | 62 +-
> kernel/sched/core.c | 9 +
> kernel/workqueue.c | 3 +
> lib/Kconfig.debug | 48 +
> lib/debug_locks.c | 2 +
> lib/locking-selftest.c | 2 +
> mm/filemap.c | 38 +
> mm/mm_init.c | 3 +
> mm/mmu_notifier.c | 31 +-
> rust/helpers/completion.c | 5 +
> 63 files changed, 6602 insertions(+), 121 deletions(-)
> create mode 100644 Documentation/dev-tools/dept.rst
> create mode 100644 Documentation/dev-tools/dept_api.rst
> create mode 100644 include/linux/dept.h
> create mode 100644 include/linux/dept_ldt.h
> create mode 100644 include/linux/dept_sdt.h
> create mode 100644 include/linux/dept_unit_test.h
> create mode 100644 kernel/dependency/Makefile
> create mode 100644 kernel/dependency/dept.c
> create mode 100644 kernel/dependency/dept_hash.h
> create mode 100644 kernel/dependency/dept_internal.h
> create mode 100644 kernel/dependency/dept_object.h
> create mode 100644 kernel/dependency/dept_proc.c
> create mode 100644 kernel/dependency/dept_unit_test.c
>
>
> base-commit: 43dfc13ca972988e620a6edb72956981b75ab6b0
> --
> 2.17.1
^ permalink raw reply
* [PATCH] module/kallsyms: sort function symbols and use binary search
From: Stanislaw Gruszka @ 2026-03-17 11:04 UTC (permalink / raw)
To: linux-modules, Sami Tolvanen, Luis Chamberlain
Cc: linux-kernel, linux-trace-kernel, live-patching, Petr Pavlu,
Daniel Gomez, Aaron Tomlin, Steven Rostedt, Masami Hiramatsu,
Jordan Rome, Viktor Malik
Module symbol lookup via find_kallsyms_symbol() performs a linear scan
over the entire symtab when resolving an address. The number of symbols
in module symtabs has grown over the years, largely due to additional
metadata in non-standard sections, making this lookup very slow.
Improve this by separating function symbols during module load, placing
them at the beginning of the symtab, sorting them by address, and using
binary search when resolving addresses in module text.
This also should improve times for linear symbol name lookups, as valid
function symbols are now located at the beginning of the symtab.
The cost of sorting is small relative to module load time. In repeated
module load tests [1], depending on .config options, this change
increases load time between 2% and 4%. With cold caches, the difference
is not measurable, as memory access latency dominates.
The sorting theoretically could be done in compile time, but much more
complicated as we would have to simulate kernel addresses resolution
for symbols, and then correct relocation entries. That would be risky
if get out of sync.
The improvement can be observed when listing ftrace filter functions:
root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
74908
real 0m1.315s
user 0m0.000s
sys 0m1.312s
After:
root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
74911
real 0m0.167s
user 0m0.004s
sys 0m0.175s
(there are three more symbols introduced by the patch)
For livepatch modules, the symtab layout is preserved and the existing
linear search is used. For this case, it should be possible to keep
the original ELF symtab instead of copying it 1:1, but that is outside
the scope of this patch.
Link: https://gist.github.com/sgruszka/09f3fb1dad53a97b1aad96e1927ab117 [1]
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
---
include/linux/module.h | 6 ++
kernel/module/internal.h | 1 +
kernel/module/kallsyms.c | 169 ++++++++++++++++++++++++++++-----------
3 files changed, 131 insertions(+), 45 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index ac254525014c..4da0289fba02 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -379,8 +379,14 @@ struct module_memory {
struct mod_kallsyms {
Elf_Sym *symtab;
unsigned int num_symtab;
+ unsigned int num_func_syms;
char *strtab;
char *typetab;
+
+ unsigned int (*search)(struct mod_kallsyms *kallsyms,
+ struct module_memory *mod_mem,
+ unsigned long addr, unsigned long *bestval,
+ unsigned long *nextval);
};
#ifdef CONFIG_LIVEPATCH
diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index 618202578b42..6a4d498619b1 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -73,6 +73,7 @@ struct load_info {
bool sig_ok;
#ifdef CONFIG_KALLSYMS
unsigned long mod_kallsyms_init_off;
+ unsigned long num_func_syms;
#endif
#ifdef CONFIG_MODULE_DECOMPRESS
#ifdef CONFIG_MODULE_STATS
diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
index 0fc11e45df9b..9e131baae441 100644
--- a/kernel/module/kallsyms.c
+++ b/kernel/module/kallsyms.c
@@ -10,6 +10,7 @@
#include <linux/kallsyms.h>
#include <linux/buildid.h>
#include <linux/bsearch.h>
+#include <linux/sort.h>
#include "internal.h"
/* Lookup exported symbol in given range of kernel_symbols */
@@ -103,6 +104,95 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
return true;
}
+static inline bool is_func_symbol(const Elf_Sym *sym)
+{
+ return sym->st_shndx != SHN_UNDEF && sym->st_size != 0 &&
+ ELF_ST_TYPE(sym->st_info) == STT_FUNC;
+}
+
+static unsigned int bsearch_kallsyms_symbol(struct mod_kallsyms *kallsyms,
+ struct module_memory *mod_mem,
+ unsigned long addr,
+ unsigned long *bestval,
+ unsigned long *nextval)
+
+{
+ unsigned int mid, low = 1, high = kallsyms->num_func_syms + 1;
+ unsigned int best = 0;
+ unsigned long thisval;
+
+ while (low < high) {
+ mid = low + (high - low) / 2;
+ thisval = kallsyms_symbol_value(&kallsyms->symtab[mid]);
+
+ if (thisval <= addr) {
+ *bestval = thisval;
+ best = mid;
+ low = mid + 1;
+ } else {
+ *nextval = thisval;
+ high = mid;
+ }
+ }
+
+ return best;
+}
+
+static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms,
+ unsigned int symnum)
+{
+ return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
+}
+
+static unsigned int search_kallsyms_symbol(struct mod_kallsyms *kallsyms,
+ struct module_memory *mod_mem,
+ unsigned long addr,
+ unsigned long *bestval,
+ unsigned long *nextval)
+{
+ unsigned int i, best = 0;
+
+ /*
+ * Scan for closest preceding symbol, and next symbol. (ELF
+ * starts real symbols at 1).
+ */
+ for (i = 1; i < kallsyms->num_symtab; i++) {
+ const Elf_Sym *sym = &kallsyms->symtab[i];
+ unsigned long thisval = kallsyms_symbol_value(sym);
+
+ if (sym->st_shndx == SHN_UNDEF)
+ continue;
+
+ /*
+ * We ignore unnamed symbols: they're uninformative
+ * and inserted at a whim.
+ */
+ if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
+ is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
+ continue;
+
+ if (thisval <= addr && thisval > *bestval) {
+ best = i;
+ *bestval = thisval;
+ }
+ if (thisval > addr && thisval < *nextval)
+ *nextval = thisval;
+ }
+
+ return best;
+}
+
+static int elf_sym_cmp(const void *a, const void *b)
+{
+ const Elf_Sym *sym_a = a;
+ const Elf_Sym *sym_b = b;
+
+ if (sym_a->st_value < sym_b->st_value)
+ return -1;
+
+ return sym_a->st_value > sym_b->st_value;
+}
+
/*
* We only allocate and copy the strings needed by the parts of symtab
* we keep. This is simple, but has the effect of making multiple
@@ -115,9 +205,10 @@ void layout_symtab(struct module *mod, struct load_info *info)
Elf_Shdr *symsect = info->sechdrs + info->index.sym;
Elf_Shdr *strsect = info->sechdrs + info->index.str;
const Elf_Sym *src;
- unsigned int i, nsrc, ndst, strtab_size = 0;
+ unsigned int i, nsrc, ndst, nfunc, strtab_size = 0;
struct module_memory *mod_mem_data = &mod->mem[MOD_DATA];
struct module_memory *mod_mem_init_data = &mod->mem[MOD_INIT_DATA];
+ bool is_lp_mod = is_livepatch_module(mod);
/* Put symbol section at end of init part of module. */
symsect->sh_flags |= SHF_ALLOC;
@@ -129,12 +220,14 @@ void layout_symtab(struct module *mod, struct load_info *info)
nsrc = symsect->sh_size / sizeof(*src);
/* Compute total space required for the core symbols' strtab. */
- for (ndst = i = 0; i < nsrc; i++) {
- if (i == 0 || is_livepatch_module(mod) ||
+ for (ndst = nfunc = i = 0; i < nsrc; i++) {
+ if (i == 0 || is_lp_mod ||
is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
info->index.pcpu)) {
strtab_size += strlen(&info->strtab[src[i].st_name]) + 1;
ndst++;
+ if (!is_lp_mod && is_func_symbol(src + i))
+ nfunc++;
}
}
@@ -156,6 +249,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
mod_mem_init_data->size = ALIGN(mod_mem_init_data->size,
__alignof__(struct mod_kallsyms));
info->mod_kallsyms_init_off = mod_mem_init_data->size;
+ info->num_func_syms = nfunc;
mod_mem_init_data->size += sizeof(struct mod_kallsyms);
info->init_typeoffs = mod_mem_init_data->size;
@@ -169,7 +263,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
*/
void add_kallsyms(struct module *mod, const struct load_info *info)
{
- unsigned int i, ndst;
+ unsigned int i, di, nfunc, ndst;
const Elf_Sym *src;
Elf_Sym *dst;
char *s;
@@ -178,6 +272,7 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
void *data_base = mod->mem[MOD_DATA].base;
void *init_data_base = mod->mem[MOD_INIT_DATA].base;
struct mod_kallsyms *kallsyms;
+ bool is_lp_mod = is_livepatch_module(mod);
kallsyms = init_data_base + info->mod_kallsyms_init_off;
@@ -186,6 +281,7 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
/* Make sure we get permanent strtab: don't use info->strtab. */
kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
kallsyms->typetab = init_data_base + info->init_typeoffs;
+ kallsyms->search = search_kallsyms_symbol;
/*
* Now populate the cut down core kallsyms for after init
@@ -194,19 +290,30 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
mod->core_kallsyms.symtab = dst = data_base + info->symoffs;
mod->core_kallsyms.strtab = s = data_base + info->stroffs;
mod->core_kallsyms.typetab = data_base + info->core_typeoffs;
+ mod->core_kallsyms.search = is_lp_mod ? search_kallsyms_symbol :
+ bsearch_kallsyms_symbol;
+
strtab_size = info->core_typeoffs - info->stroffs;
src = kallsyms->symtab;
- for (ndst = i = 0; i < kallsyms->num_symtab; i++) {
+ ndst = info->num_func_syms + 1;
+
+ for (nfunc = i = 0; i < kallsyms->num_symtab; i++) {
kallsyms->typetab[i] = elf_type(src + i, info);
- if (i == 0 || is_livepatch_module(mod) ||
+ if (i == 0 || is_lp_mod ||
is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
info->index.pcpu)) {
ssize_t ret;
- mod->core_kallsyms.typetab[ndst] =
- kallsyms->typetab[i];
- dst[ndst] = src[i];
- dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
+ if (i == 0)
+ di = 0;
+ else if (!is_lp_mod && is_func_symbol(src + i))
+ di = 1 + nfunc++;
+ else
+ di = ndst++;
+
+ mod->core_kallsyms.typetab[di] = kallsyms->typetab[i];
+ dst[di] = src[i];
+ dst[di].st_name = s - mod->core_kallsyms.strtab;
ret = strscpy(s, &kallsyms->strtab[src[i].st_name],
strtab_size);
if (ret < 0)
@@ -216,9 +323,13 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
}
}
+ WARN_ON_ONCE(nfunc != info->num_func_syms);
+ sort(dst + 1, nfunc, sizeof(Elf_Sym), elf_sym_cmp, NULL);
+
/* Set up to point into init section. */
rcu_assign_pointer(mod->kallsyms, kallsyms);
mod->core_kallsyms.num_symtab = ndst;
+ mod->core_kallsyms.num_func_syms = nfunc;
}
#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
@@ -241,11 +352,6 @@ void init_build_id(struct module *mod, const struct load_info *info)
}
#endif
-static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned int symnum)
-{
- return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
-}
-
/*
* Given a module and address, find the corresponding symbol and return its name
* while providing its size and offset if needed.
@@ -255,7 +361,7 @@ static const char *find_kallsyms_symbol(struct module *mod,
unsigned long *size,
unsigned long *offset)
{
- unsigned int i, best = 0;
+ unsigned int best;
unsigned long nextval, bestval;
struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms);
struct module_memory *mod_mem;
@@ -267,36 +373,9 @@ static const char *find_kallsyms_symbol(struct module *mod,
mod_mem = &mod->mem[MOD_TEXT];
nextval = (unsigned long)mod_mem->base + mod_mem->size;
+ bestval = kallsyms_symbol_value(&kallsyms->symtab[0]);
- bestval = kallsyms_symbol_value(&kallsyms->symtab[best]);
-
- /*
- * Scan for closest preceding symbol, and next symbol. (ELF
- * starts real symbols at 1).
- */
- for (i = 1; i < kallsyms->num_symtab; i++) {
- const Elf_Sym *sym = &kallsyms->symtab[i];
- unsigned long thisval = kallsyms_symbol_value(sym);
-
- if (sym->st_shndx == SHN_UNDEF)
- continue;
-
- /*
- * We ignore unnamed symbols: they're uninformative
- * and inserted at a whim.
- */
- if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
- is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
- continue;
-
- if (thisval <= addr && thisval > bestval) {
- best = i;
- bestval = thisval;
- }
- if (thisval > addr && thisval < nextval)
- nextval = thisval;
- }
-
+ best = kallsyms->search(kallsyms, mod_mem, addr, &bestval, &nextval);
if (!best)
return NULL;
--
2.50.1
^ permalink raw reply related
* Re: [PATCH] module: remove MODULE_VERSION()
From: Petr Pavlu @ 2026-03-17 12:50 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-modules, linux-kernel, Luis Chamberlain, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Shyam Saini, Kees Cook,
Thorsten Blum, Christoph Hellwig
In-Reply-To: <2026031630-linseed-powdered-a0d1@gregkh>
On 3/16/26 11:03 AM, Greg Kroah-Hartman wrote:
> On Mon, Mar 16, 2026 at 10:37:38AM +0100, Petr Pavlu wrote:
>> On 3/13/26 3:20 PM, Greg Kroah-Hartman wrote:
>>> Module "versions" do not make sense as the kernel is built all at once,
>>> the "version" is the overall kernel version number, so modules can not
>>> really be described as having a unique version given that they rely on
>>> the infrastructure of the whole kernel.
>>>
>>> For now, just make this an "empty" define, to keep existing code
>>> building properly as the tree is slowly purged of the use of this over
>>> time.
>>>
>>> This macro will be removed entirely in the future when there are no
>>> in-tree users.
>>
>> I share a similar sentiment that module versions set by MODULE_VERSION()
>> are not particularly useful for in-tree modules and the macro is often
>> used unnecessarily. However, I don't think this patch takes the best
>> approach to phase it out.
>>
>> The file Documentation/ABI/stable/sysfs-module documents
>> /sys/module/<MODULENAME>/version as a stable ABI. Searching for
>> '^MODULE_VERSION' in v7.0-rc4 shows 600 uses of the macro. My concern is
>> that if any of these modules has a userspace part that checks the
>> version, this patch could potentially break users' systems.
>
> sysfs use is ALWAYS "if the file is not there, the userspace tool should
> keep working". How would userspace every do something different if a
> module version flag is not there, that is not how a kernel driver should
> ever be attempting to communicate with userspace as to what the api is,
> or is not.
>
> And as the module version does not even work for any stable kernel
> release, it's kind of proof that userspace does not rely on this.
Makes sense. I have now also reviewed updates of MODULE_VERSION()
invocations going back 5 years and my impression is that removing this
macro should be generally safe.
New instances of MODULE_VERSION() are typically introduced when a new
driver is added to the codebase.
There were numerous commits where MODULE_VERSION() is removed with the
argument that it is unnecessary for in-tree modules and that the main
kernel version should be used instead. This suggest that there is
already a broader consensus that module versions are not particularly
useful, at least not for in-tree modules.
There was a limited number of cases where a per-module version is
actually updated. Examples:
ea7af9454815 ("platform/x86: dell_rbu: Bump version")
93d09773d1a5 ("xz: add RISC-V BCJ filter")
98d5b61ef5fa ("coda: bump module version to 7.2")
8cb5d216ab33 ("char: xillybus: Move class-related functions to new xillybus_class.c")
74f46a0524f8 ("scsi: fnic: Turn off FDMI ACTIVE flags on link down")
6cd379f75f42 ("ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()")
47adef20e67d ("pata: ixp4xx: Rewrite to use device tree")
(The last three commits are indirect updates, where the module contains
MODULE_VERSION(DRV_VERSION); and DRV_VERSION is updated.)
Most of these version bumps do not seem particularly noteworthy. Only
commit 98d5b61ef5fa appears somewhat interesting with the description:
coda: bump module version to 7.2
Helps with tracking which patches have been propagated upstream and if
users are running the latest known version.
>
>> I believe it would be safer to start by removing individual uses of
>> MODULE_VERSION(). That way, we can also learn if we're missing any use
>> cases for having module versions.
>
> Sure, I'll make up a patch to drop all 700 uses, but how is that much
> different? :)
The end result is ultimately the same. The kernel will no longer have
any uses of MODULE_VERSION() or its implementation.
The difference is that by removing the uses of MODULE_VERSION() first,
the maintainers of the relevant code will be properly informed about
this change, rather than being told afterwards, when they will no longer
have a chance to provide any feedback.
Another aspect is that MODULE_VERSION() still appears useful for
external modules. Sure, we won't keep this macro if all uses in the
kernel are removed and these external modules will need to replace it
with a different mechanism. However, changing MODULE_VERSION() now to
only a dummy implementation can silently break version tracking for
these modules, which is not ideal.
If I end up being the only one who prefers removing the invocations of
this macro first, then please at least CC a few more lists on v2 of the
patch, such as driver-core and netdev, as well as the authors of the
recent version-bump commits that I identified above.
>
>> The original patch "Add a MODULE_VERSION macro" [1] from 2004 doesn't
>> say much about the motivation for adding module versions, but it does
>> mention that they should be accessible via sysfs.
>
> That was because we were just exporting all of the module information in
> sysfs, not due to us attempting to do anything special with that info in
> userspace other than "hey we have an attribute, let's export it!"
>
>> That was implemented
>> a year later in commit c988d2b28454 ("[PATCH] modules: add version and
>> srcversion to sysfs") [2], which primarily discusses use cases related
>> to DKMS, and to administrators + tech support needing to know what is
>> actually loaded on the system. For the latter, I believe srcversion (or
>> something similar) should be sufficient.
>
> And does dkms actually do anything with this sysfs value? At quick
> glance, I couldn't see anything.
I'm not familiar with DKMS. From a quick look, it parses both the
version and srcversion, although it calls modinfo and doesn't read the
values from sysfs. See get_module_verinfo() and compare_module_version()
in dkms.in [1].
[1] https://github.com/dkms-project/dkms/blob/2c35ae1d32eb6377ef8e8dd7e15427d56b63828d/dkms.in#L983
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH v2] module.lds,codetag: force 0 sh_addr for sections
From: Josh Poimboeuf @ 2026-03-17 22:55 UTC (permalink / raw)
To: Petr Pavlu
Cc: Sami Tolvanen, Joe Lawrence, linux-modules, linux-kernel,
Luis Chamberlain, Daniel Gomez, Aaron Tomlin, Petr Mladek
In-Reply-To: <3969f887-4d3d-4fb8-870d-6af834d120e5@suse.com>
On Mon, Mar 16, 2026 at 03:23:20PM +0100, Petr Pavlu wrote:
> > Do we also need similar changes in any of the architecture-specific module
> > linker scripts (arch/*/include/asm/module.lds.h)?
>
> I overlooked these architecture-specific files. I believe we should do
> the same for them. For instance, riscv explicitly defines the .plt, .got
> and .got.plt sections, and they have misleading addresses:
>
> $ readelf -t fs/xfs/xfs.ko
> [...]
> Section Headers:
> [Nr] Name
> Type Address Offset Link
> Size EntSize Info Align
> Flags
> [...]
> [48] __versions
> PROGBITS 0000000000000000 0000000000194e90 0
> 0000000000007900 0000000000000000 0 8
> [0000000000000002]: ALLOC
> [49] .plt
> PROGBITS 0000000000007900 000000000019c790 0
> 0000000000000001 0000000000000000 0 1
> [0000000000000006]: ALLOC, EXEC
> [50] .got
> PROGBITS 0000000000007901 000000000019c791 0
> 0000000000000001 0000000000000000 0 1
> [0000000000000003]: WRITE, ALLOC
> [51] .got.plt
> PROGBITS 0000000000007902 000000000019c792 0
> 0000000000000001 0000000000000000 0 1
> [0000000000000002]: ALLOC
> [...]
>
> Nonetheless, this can be done separately. I think fixes for these files
> should better go through architecture-specific trees anyway.
>
> I can check the individual architectures and prepare the necessary
> patches, unless someone else is already looking into this or wants to
> take a look.
I agree those can be done separately. In the meantime do you plan to
take this patch in the module tree?
--
Josh
^ permalink raw reply
* Re: [PATCH v2] module.lds,codetag: force 0 sh_addr for sections
From: Sami Tolvanen @ 2026-03-17 23:10 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Petr Pavlu, Joe Lawrence, linux-modules, linux-kernel,
Luis Chamberlain, Daniel Gomez, Aaron Tomlin, Petr Mladek
In-Reply-To: <jl7b2iy44nakt6qiyytncemcym22aoddrjixblxcrfgjnnfc3k@vpmf2yfnrokr>
On Tue, Mar 17, 2026 at 03:55:57PM -0700, Josh Poimboeuf wrote:
> On Mon, Mar 16, 2026 at 03:23:20PM +0100, Petr Pavlu wrote:
> > Nonetheless, this can be done separately. I think fixes for these files
> > should better go through architecture-specific trees anyway.
> >
> > I can check the individual architectures and prepare the necessary
> > patches, unless someone else is already looking into this or wants to
> > take a look.
>
> I agree those can be done separately. In the meantime do you plan to
> take this patch in the module tree?
Yes, that's the plan.
Sami
^ permalink raw reply
* Re: [PATCH v18 00/42] DEPT(DEPendency Tracker)
From: Yunseong Kim @ 2026-03-18 5:58 UTC (permalink / raw)
To: torvalds, mingo, Byungchul Park, linux-kernel
Cc: kernel_team, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
peterz, will, tglx, rostedt, joel, sashal, daniel.vetter,
duyuyang, johannes.berg, tj, tytso, willy, david, amir73il,
gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
yunseong.kim, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
corbet, catalin.marinas, bp, x86, hpa, luto, sumit.semwal,
gustavo, christian.koenig, andi.shyti, arnd, lorenzo.stoakes,
Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu, da.gomez,
samitolvanen, paulmck, frederic, neeraj.upadhyay, joelagnelf,
josh, urezki, mathieu.desnoyers, jiangshanlai, qiang.zhang,
juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
vschneid, chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy,
anna, kees, bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
alexander.shishkin, lillian, chenhuacai, francesco,
guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel,
2407018371, dakr, miguel.ojeda.sandonis, neilb, bagasdotme,
wsa+renesas, dave.hansen, geert, ojeda, alex.gaynor, gary,
bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, rust-for-linux
In-Reply-To: <20260317044459.GA27353@system.software.com>
Hi all,
I would like to support this proposal by noting that we are currently tracking
a number of kernel bugs uniquely detectable by DEPT.
In my experience running kernel fuzzers, many of these issues were previously
flagged as hung task and left unresolved because they involved wait/event
dependencies for PG_locked and writeback that are difficult to track otherwise.
DEPT provides the clarity necessary to address these cases. I believe having
DEPT in the mainline tree will be a significant asset for identifying and fixing
these kinds of hidden, latent bugs.
Best regards,
Yunseong
On 3/17/26 1:44 PM, Byungchul Park wrote:
> On Fri, Dec 05, 2025 at 04:18:13PM +0900, Byungchul Park wrote:
>> I'm happy to see that DEPT reported real problems in practice:
>>
>> https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/
>> https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
>> https://lore.kernel.org/all/b6e00e77-4a8c-4e05-ab79-266bf05fcc2d@igalia.com/
>>
>> I’ve added documentation describing DEPT — this should help you
>> understand what DEPT is and how it works. You can use DEPT simply by
>> enabling CONFIG_DEPT and checking dmesg at runtime.
>> ---
>>
>> Hi Linus and folks,
>
> Hi Linus and Ingo,
>
> Although the DEPT tool still has areas for improvement, I am confident
> it employs the most appropriate method for tracking dependencies within
> Linux kernel.
>
> If it is to be maintained in the source tree, which subsystem would be
> the most suitable for its management? Personally, I believe introducing
> a dedicated dependency subsystem under 'kernel/' would be ideal, though
> managing it within the already well-maintained scheduler or locking
> subsystems might be more practical.
>
> Since DEPT tracks not only locking mechanisms but also general sleep and
> wake events, I have avoided placing it under the locking subsystem thus
> far. If you have alternative or more fitting suggestions, I would
> appreciate your input.
>
> Thanks.
>
> Byungchul
>>
>> I’ve been developing a tool to detect deadlock possibilities by tracking
>> waits/events — rather than lock acquisition order — to cover all the
>> synchronization mechanisms. To summarize the design rationale, starting
>> from the problem statement, through analysis, to the solution:
>>
>> CURRENT STATUS
>> --------------
>> Lockdep tracks lock acquisition order to identify deadlock conditions.
>> Additionally, it tracks IRQ state changes — via {en,dis}able — to
>> detect cases where locks are acquired unintentionally during
>> interrupt handling.
>>
>> PROBLEM
>> -------
>> Waits and their associated events that are never reachable can
>> eventually lead to deadlocks. However, since Lockdep focuses solely
>> on lock acquisition order, it has inherent limitations when handling
>> waits and events.
>>
>> Moreover, by tracking only lock acquisition order, Lockdep cannot
>> properly handle read locks or cross-event scenarios — such as
>> wait_for_completion() and complete() — making it increasingly
>> inadequate as a general-purpose deadlock detection tool.
>>
>> SOLUTION
>> --------
>> Once again, waits and their associated events that are never
>> reachable can eventually lead to deadlocks. The new solution, DEPT,
>> focuses directly on waits and events. DEPT monitors waits and events,
>> and reports them when any become unreachable.
>>
>> DEPT provides:
>>
>> * Correct handling of read locks.
>> * Support for general waits and events.
>> * Continuous operation, even after multiple reports.
>> * Simple, intuitive annotation APIs.
>>
>> There are still false positives, and some are already being worked on
>> for suppression. Especially splitting the folio class into several
>> appropriate classes e.g. block device mapping class and regular file
>> mapping class, is currently under active development by me and Yeoreum
>> Yun.
>>
>> Anyway, these efforts will need to continue for a while, as we’ve seen
>> with lockdep over two decades. DEPT is tagged as EXPERIMENTAL in
>> Kconfig — meaning it’s not yet suitable for use as an automation tool.
>>
>> However, for those who are interested in using DEPT to analyze complex
>> synchronization patterns and extract dependency insights, DEPT would be
>> a great tool for the purpose.
>>
>> Thanks for your support and contributions to:
>>
>> Harry Yoo <harry.yoo@oracle.com>
>> Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
>> Yunseong Kim <ysk@kzalloc.com>
>> Yeoreum Yun <yeoreum.yun@arm.com>
>>
>> FAQ
>> ---
>> Q. Is this the first attempt to solve this problem?
>> A. No. The cross-release feature (commit b09be676e0ff2) attempted to
>> address it — as a Lockdep extension. It was merged, but quickly
>> reverted, because:
>>
>> While it uncovered valuable hidden issues, it also introduced false
>> positives. Since these false positives mask further real problems
>> with Lockdep — and developers strongly dislike them — the feature was
>> rolled back.
>>
>> Q. Why wasn’t DEPT built as a Lockdep extension?
>> A. Lockdep is the result of years of work by kernel developers — and is
>> now very stable. But I chose to build DEPT separately, because:
>>
>> While reusing BFS(Breadth First Search) and Lockdep’s hashing is
>> beneficial, the rest of the system must be rebuilt from scratch to
>> align with DEPT’s wait-event model — since Lockdep was originally
>> designed for tracking lock acquisition orders, not wait-event
>> dependencies.
>>
>> Q. Do you plan to replace Lockdep entirely?
>> A. Not at all — Lockdep still plays a vital role in validating correct
>> lock usage. While its dependency-checking logic should eventually be
>> superseded by DEPT, the rest of its functionality should stay.
>>
>> Q. Should we replace the dependency check immediately?
>> A. Absolutely not. Lockdep’s stability is the result of years of hard
>> work by kernel developers. Lockdep and DEPT should run side by side
>> until DEPT matures.
>>
>> Q. Stronger detection often leads to more false positives — which was a
>> major pain point when cross-release was added. Is DEPT designed to
>> handle this?
>> A. Yes. DEPT’s simple, generalized design enables flexible reporting —
>> so while false positives still need fixing, they’re far less
>> disruptive than they were under the Lockdep extension, cross-release.
>>
>> Q. Why not fix all false positives out-of-tree before merging?
>> A. Since the affected subsystems span the entire kernel, like Lockdep,
>> which has relied on annotations to avoid false positives over the
>> last two decades, DEPT too will require the annotation efforts.
>>
>> Performing annotation work within the mainline will help us add
>> annotations more appropriately and will also make DEPT a useful tool
>> for a wider range of users more quickly.
>>
>> CONFIG_DEPT is marked EXPERIMENTAL, so it’s opt-in. Some users are
>> already interested in using DEPT to analyze complex synchronization
>> patterns and extract dependency insights.
>>
>> Byungchul
>> ---
>> Changes from v17:
>>
>> 1. Rebase on the mainline as of 2025 Dec 5.
>> 2. Convert the documents' format from txt to rst. (feedbacked
>> by Jonathan Corbet and Bagas Sanjaya)
>> 3. Move the documents from 'Documentation/dependency' to
>> 'Documentation/dev-tools'. (feedbakced by Jonathan Corbet)
>> 4. Improve the documentation. (feedbacked by NeilBrown)
>> 5. Use a common function, enter_from_user_mode(), instead of
>> arch specific code, to notice context switch from user mode.
>> (feedbacked by Dave Hansen, Mark Rutland, and Mark Brown)
>> 6. Resolve the header dependency issue by using dept's internal
>> header, instead of relocating 'struct llist_{head,node}' to
>> another header. (feedbacked by Greg KH)
>> 7. Improve page(or folio) usage type APIs.
>> 8. Add rust helper for wait_for_completion(). (feedbacked by
>> Guangbo Cui, Boqun Feng, and Danilo Krummrich)
>> 9. Refine some commit messages.
>>
>> Changes from v16:
>>
>> 1. Rebase on v6.17.
>> 2. Fix a false positive from rcu (by Yunseong Kim)
>> 3. Introduce APIs to set page's usage, dept_set_page_usage() and
>> dept_reset_page_usage() to avoid false positives.
>> 4. Consider lock_page() as a potential wait unconditionally.
>> 5. Consider folio_lock_killable() as a potential wait
>> unconditionally.
>> 6. Add support for tracking PG_writeback waits and events.
>> 7. Fix two build errors due to the additional debug information
>> added by dept. (by Yunseong Kim)
>>
>> Changes from v15:
>>
>> 1. Fix typo and improve comments and commit messages (feedbacked
>> by ALOK TIWARI, Waiman Long, and kernel test robot).
>> 2. Do not stop dept on detection of cicular dependency of
>> recover event, allowing to keep reporting.
>> 3. Add SK hynix to copyright.
>> 4. Consider folio_lock() as a potential wait unconditionally.
>> 5. Fix Kconfig dependency bug (feedbacked by kernel test rebot).
>> 6. Do not suppress reports that involve classes even that have
>> already involved in other reports, allowing to keep
>> reporting.
>>
>> Changes from v14:
>>
>> 1. Rebase on the current latest, v6.15-rc6.
>> 2. Refactor dept code.
>> 3. With multi event sites for a single wait, even if an event
>> forms a circular dependency, the event can be recovered by
>> other event(or wake up) paths. Even though informing the
>> circular dependency is worthy but it should be suppressed
>> once informing it, if it doesn't lead an actual deadlock. So
>> introduce APIs to annotate the relationship between event
>> site and recover site, that are, event_site() and
>> dept_recover_event().
>> 4. wait_for_completion() worked with dept map embedded in struct
>> completion. However, it generates a few false positves since
>> all the waits using the instance of struct completion, share
>> the map and key. To avoid the false positves, make it not to
>> share the map and key but each wait_for_completion() caller
>> have its own key by default. Of course, external maps also
>> can be used if needed.
>> 5. Fix a bug about hardirq on/off tracing.
>> 6. Implement basic unit test for dept.
>> 7. Add more supports for dma fence synchronization.
>> 8. Add emergency stop of dept e.g. on panic().
>> 9. Fix false positives by mmu_notifier_invalidate_*().
>> 10. Fix recursive call bug by DEPT_WARN_*() and DEPT_STOP().
>> 11. Fix trivial bugs in DEPT_WARN_*() and DEPT_STOP().
>> 12. Fix a bug that a spin lock, dept_pool_spin, is used in
>> both contexts of irq disabled and enabled without irq
>> disabled.
>> 13. Suppress reports with classes, any of that already have
>> been reported, even though they have different chains but
>> being barely meaningful.
>> 14. Print stacktrace of the wait that an event is now waking up,
>> not only stacktrace of the event.
>> 15. Make dept aware of lockdep_cmp_fn() that is used to avoid
>> false positives in lockdep so that dept can also avoid them.
>> 16. Do do_event() only if there are no ecxts have been
>> delimited.
>> 17. Fix a bug that was not synchronized for stage_m in struct
>> dept_task, using a spin lock, dept_task()->stage_lock.
>> 18. Fix a bug that dept didn't handle the case that multiple
>> ttwus for a single waiter can be called at the same time
>> e.i. a race issue.
>> 19. Distinguish each kernel context from others, not only by
>> system call but also by user oriented fault so that dept can
>> work with more accuracy information about kernel context.
>> That helps to avoid a few false positives.
>> 20. Limit dept's working to x86_64 and arm64.
>>
>> Changes from v13:
>>
>> 1. Rebase on the current latest version, v6.9-rc7.
>> 2. Add 'dept' documentation describing dept APIs.
>>
>> Changes from v12:
>>
>> 1. Refine the whole document for dept.
>> 2. Add 'Interpret dept report' section in the document, using a
>> deadlock report obtained in practice. Hope this version of
>> document helps guys understand dept better.
>>
>> https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
>> https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
>>
>> Changes from v11:
>>
>> 1. Add 'dept' documentation describing the concept of dept.
>> 2. Rewrite the commit messages of the following commits for
>> using weaker lockdep annotation, for better description.
>>
>> fs/jbd2: Use a weaker annotation in journal handling
>> cpu/hotplug: Use a weaker annotation in AP thread
>>
>> (feedbacked by Thomas Gleixner)
>>
>> Changes from v10:
>>
>> 1. Fix noinstr warning when building kernel source.
>> 2. dept has been reporting some false positives due to the folio
>> lock's unfairness. Reflect it and make dept work based on
>> dept annotaions instead of just wait and wake up primitives.
>> 3. Remove the support for PG_writeback while working on 2. I
>> will add the support later if needed.
>> 4. dept didn't print stacktrace for [S] if the participant of a
>> deadlock is not lock mechanism but general wait and event.
>> However, it made hard to interpret the report in that case.
>> So add support to print stacktrace of the requestor who asked
>> the event context to run - usually a waiter of the event does
>> it just before going to wait state.
>> 5. Give up tracking raw_local_irq_{disable,enable}() since it
>> totally messed up dept's irq tracking. So make it work in the
>> same way as lockdep does. I will consider it once any false
>> positives by those are observed again.
>> 6. Change the manual rwsem_acquire_read(->j_trans_commit_map)
>> annotation in fs/jbd2/transaction.c to the try version so
>> that it works as much as it exactly needs.
>> 7. Remove unnecessary 'inline' keyword in dept.c and add
>> '__maybe_unused' to a needed place.
>>
>> Changes from v9:
>>
>> 1. Fix a bug. SDT tracking didn't work well because of my big
>> mistake that I should've used waiter's map to indentify its
>> class but it had been working with waker's one. FYI,
>> PG_locked and PG_writeback weren't affected. They still
>> worked well. (reported by YoungJun)
>>
>> Changes from v8:
>>
>> 1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and
>> EXPORT_SYMBOL(PG_writeback_map) for kernel module build -
>> appologize for that. (reported by kernel test robot)
>> 2. Fix build error by removing header file's circular dependency
>> that was caused by "atomic.h", "kernel.h" and "irqflags.h",
>> which I introduced - appolgize for that. (reported by kernel
>> test robot)
>>
>> Changes from v7:
>>
>> 1. Fix a bug that cannot track rwlock dependency properly,
>> introduced in v7. (reported by Boqun and lockdep selftest)
>> 2. Track wait/event of PG_{locked,writeback} more aggressively
>> assuming that when a bit of PG_{locked,writeback} is cleared
>> there might be waits on the bit. (reported by Linus, Hillf
>> and syzbot)
>> 3. Fix and clean bad style code e.i. unnecessarily introduced
>> a randome pattern and so on. (pointed out by Linux)
>> 4. Clean code for applying dept to wait_for_completion().
>>
>> Changes from v6:
>>
>> 1. Tie to task scheduler code to track sleep and try_to_wake_up()
>> assuming sleeps cause waits, try_to_wake_up()s would be the
>> events that those are waiting for, of course with proper dept
>> annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
>> and so on. For these cases, class is classified at sleep
>> entrance rather than the synchronization initialization code.
>> Which would extremely reduce false alarms.
>> 2. Remove the dept associated instance in each page struct for
>> tracking dependencies by PG_locked and PG_writeback thanks to
>> the 1. work above.
>> 3. Introduce CONFIG_dept_AGGRESIVE_TIMEOUT_WAIT to suppress
>> reports that waits with timeout set are involved, for those
>> who don't like verbose reporting.
>> 4. Add a mechanism to refill the internal memory pools on
>> running out so that dept could keep working as long as free
>> memory is available in the system.
>> 5. Re-enable tracking hashed-waitqueue wait. That's going to no
>> longer generate false positives because class is classified
>> at sleep entrance rather than the waitqueue initailization.
>> 6. Refactor to make it easier to port onto each new version of
>> the kernel.
>> 7. Apply dept to dma fence.
>> 8. Do trivial optimizaitions.
>>
>> Changes from v5:
>>
>> 1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
>> of internal resources because WARN_*() printing stacktrace is
>> too much for informing the lack. (feedback from Ted, Hyeonggon)
>> 2. Fix trivial bugs like missing initializing a struct before
>> using it.
>> 3. Assign a different class per task when handling onstack
>> variables for waitqueue or the like. Which makes dept
>> distinguish between onstack variables of different tasks so
>> as to prevent false positives. (reported by Hyeonggon)
>> 4. Make dept aware of even raw_local_irq_*() to prevent false
>> positives. (reported by Hyeonggon)
>> 5. Don't consider dependencies between the events that might be
>> triggered within __schedule() and the waits that requires
>> __schedule(), real ones. (reported by Hyeonggon)
>> 6. Unstage the staged wait that has prepare_to_wait_event()'ed
>> *and* yet to get to __schedule(), if we encounter __schedule()
>> in-between for another sleep, which is possible if e.g. a
>> mutex_lock() exists in 'condition' of ___wait_event().
>> 7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
>> on the hardirq and softirq entrance tracing to make dept more
>> portable for now.
>>
>> Changes from v4:
>>
>> 1. Fix some bugs that produce false alarms.
>> 2. Distinguish each syscall context from another *for arm64*.
>> 3. Make it not warn it but just print it in case dept ring
>> buffer gets exhausted. (feedback from Hyeonggon)
>> 4. Explicitely describe "EXPERIMENTAL" and "dept might produce
>> false positive reports" in Kconfig. (feedback from Ted)
>>
>> Changes from v3:
>>
>> 1. dept shouldn't create dependencies between different depths
>> of a class that were indicated by *_lock_nested(). dept
>> normally doesn't but it does once another lock class comes
>> in. So fixed it. (feedback from Hyeonggon)
>> 2. dept considered a wait as a real wait once getting to
>> __schedule() even if it has been set to TASK_RUNNING by wake
>> up sources in advance. Fixed it so that dept doesn't consider
>> the case as a real wait. (feedback from Jan Kara)
>> 3. Stop tracking dependencies with a map once the event
>> associated with the map has been handled. dept will start to
>> work with the map again, on the next sleep.
>>
>> Changes from v2:
>>
>> 1. Disable dept on bit_wait_table[] in sched/wait_bit.c
>> reporting a lot of false positives, which is my fault.
>> Wait/event for bit_wait_table[] should've been tagged in a
>> higher layer for better work, which is a future work.
>> (feedback from Jan Kara)
>> 2. Disable dept on crypto_larval's completion to prevent a false
>> positive.
>>
>> Changes from v1:
>>
>> 1. Fix coding style and typo. (feedback from Steven)
>> 2. Distinguish each work context from another in workqueue.
>> 3. Skip checking lock acquisition with nest_lock, which is about
>> correct lock usage that should be checked by lockdep.
>>
>> Changes from RFC(v0):
>>
>> 1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
>> (feedback from Linus and Matthew)
>> 2. Use try version at lockdep_acquire_cpus_lock() annotation.
>> 3. Distinguish each syscall context from another.
>>
>> Byungchul Park (41):
>> dept: implement DEPT(DEPendency Tracker)
>> dept: add single event dependency tracker APIs
>> dept: add lock dependency tracker APIs
>> dept: tie to lockdep and IRQ tracing
>> dept: add proc knobs to show stats and dependency graph
>> dept: distinguish each kernel context from another
>> dept: distinguish each work from another
>> dept: add a mechanism to refill the internal memory pools on running
>> out
>> dept: record the latest one out of consecutive waits of the same class
>> dept: apply sdt_might_sleep_{start,end}() to
>> wait_for_completion()/complete()
>> dept: apply sdt_might_sleep_{start,end}() to swait
>> dept: apply sdt_might_sleep_{start,end}() to waitqueue wait
>> dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait
>> dept: apply sdt_might_sleep_{start,end}() to dma fence
>> dept: track timeout waits separately with a new Kconfig
>> dept: apply timeout consideration to wait_for_completion()/complete()
>> dept: apply timeout consideration to swait
>> dept: apply timeout consideration to waitqueue wait
>> dept: apply timeout consideration to hashed-waitqueue wait
>> dept: apply timeout consideration to dma fence wait
>> dept: make dept able to work with an external wgen
>> dept: track PG_locked with dept
>> dept: print staged wait's stacktrace on report
>> locking/lockdep: prevent various lockdep assertions when
>> lockdep_off()'ed
>> dept: add documents for dept
>> cpu/hotplug: use a weaker annotation in AP thread
>> dept: assign dept map to mmu notifier invalidation synchronization
>> dept: assign unique dept_key to each distinct dma fence caller
>> dept: make dept aware of lockdep_set_lock_cmp_fn() annotation
>> dept: make dept stop from working on debug_locks_off()
>> dept: assign unique dept_key to each distinct wait_for_completion()
>> caller
>> completion, dept: introduce init_completion_dmap() API
>> dept: introduce a new type of dependency tracking between multi event
>> sites
>> dept: add module support for struct dept_event_site and
>> dept_event_site_dep
>> dept: introduce event_site() to disable event tracking if it's
>> recoverable
>> dept: implement a basic unit test for dept
>> dept: call dept_hardirqs_off() in local_irq_*() regardless of irq
>> state
>> dept: introduce APIs to set page usage and use subclasses_evt for the
>> usage
>> dept: track PG_writeback with dept
>> SUNRPC: relocate struct rcu_head to the first field of struct rpc_xprt
>> mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on DEPT and large
>> PAGE_SIZE
>>
>> Yunseong Kim (1):
>> rcu/update: fix same dept key collision between various types of RCU
>>
>> Documentation/dev-tools/dept.rst | 778 ++++++
>> Documentation/dev-tools/dept_api.rst | 125 +
>> drivers/dma-buf/dma-fence.c | 23 +-
>> include/asm-generic/vmlinux.lds.h | 13 +-
>> include/linux/completion.h | 124 +-
>> include/linux/dept.h | 402 +++
>> include/linux/dept_ldt.h | 78 +
>> include/linux/dept_sdt.h | 68 +
>> include/linux/dept_unit_test.h | 67 +
>> include/linux/dma-fence.h | 74 +-
>> include/linux/hardirq.h | 3 +
>> include/linux/irq-entry-common.h | 4 +
>> include/linux/irqflags.h | 21 +-
>> include/linux/local_lock_internal.h | 1 +
>> include/linux/lockdep.h | 105 +-
>> include/linux/lockdep_types.h | 3 +
>> include/linux/mm_types.h | 4 +
>> include/linux/mmu_notifier.h | 26 +
>> include/linux/module.h | 5 +
>> include/linux/mutex.h | 1 +
>> include/linux/page-flags.h | 217 +-
>> include/linux/pagemap.h | 37 +-
>> include/linux/percpu-rwsem.h | 2 +-
>> include/linux/percpu.h | 4 +
>> include/linux/rcupdate_wait.h | 13 +-
>> include/linux/rtmutex.h | 1 +
>> include/linux/rwlock_types.h | 1 +
>> include/linux/rwsem.h | 1 +
>> include/linux/sched.h | 118 +
>> include/linux/seqlock.h | 2 +-
>> include/linux/spinlock_types_raw.h | 3 +
>> include/linux/srcu.h | 2 +-
>> include/linux/sunrpc/xprt.h | 9 +-
>> include/linux/swait.h | 3 +
>> include/linux/wait.h | 3 +
>> include/linux/wait_bit.h | 3 +
>> init/init_task.c | 2 +
>> init/main.c | 2 +
>> kernel/Makefile | 1 +
>> kernel/cpu.c | 2 +-
>> kernel/dependency/Makefile | 5 +
>> kernel/dependency/dept.c | 3499 ++++++++++++++++++++++++++
>> kernel/dependency/dept_hash.h | 10 +
>> kernel/dependency/dept_internal.h | 314 +++
>> kernel/dependency/dept_object.h | 13 +
>> kernel/dependency/dept_proc.c | 94 +
>> kernel/dependency/dept_unit_test.c | 173 ++
>> kernel/exit.c | 1 +
>> kernel/fork.c | 2 +
>> kernel/locking/lockdep.c | 33 +
>> kernel/module/main.c | 19 +
>> kernel/rcu/rcu.h | 1 +
>> kernel/rcu/update.c | 5 +-
>> kernel/sched/completion.c | 62 +-
>> kernel/sched/core.c | 9 +
>> kernel/workqueue.c | 3 +
>> lib/Kconfig.debug | 48 +
>> lib/debug_locks.c | 2 +
>> lib/locking-selftest.c | 2 +
>> mm/filemap.c | 38 +
>> mm/mm_init.c | 3 +
>> mm/mmu_notifier.c | 31 +-
>> rust/helpers/completion.c | 5 +
>> 63 files changed, 6602 insertions(+), 121 deletions(-)
>> create mode 100644 Documentation/dev-tools/dept.rst
>> create mode 100644 Documentation/dev-tools/dept_api.rst
>> create mode 100644 include/linux/dept.h
>> create mode 100644 include/linux/dept_ldt.h
>> create mode 100644 include/linux/dept_sdt.h
>> create mode 100644 include/linux/dept_unit_test.h
>> create mode 100644 kernel/dependency/Makefile
>> create mode 100644 kernel/dependency/dept.c
>> create mode 100644 kernel/dependency/dept_hash.h
>> create mode 100644 kernel/dependency/dept_internal.h
>> create mode 100644 kernel/dependency/dept_object.h
>> create mode 100644 kernel/dependency/dept_proc.c
>> create mode 100644 kernel/dependency/dept_unit_test.c
>>
>>
>> base-commit: 43dfc13ca972988e620a6edb72956981b75ab6b0
>> --
>> 2.17.1
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox