public inbox for linux-modules@vger.kernel.org
 help / color / mirror / Atom feed
* Re: modprobe returns 0 upon -EEXIST from insmod
From: Lucas De Marchi @ 2025-10-09 14:13 UTC (permalink / raw)
  To: Petr Pavlu; +Cc: Phil Sutter, Christophe Leroy, linux-modules, Yi Chen
In-Reply-To: <ce7f293c-d9f9-4137-bcad-8cc492d34773@suse.com>

On Thu, Oct 09, 2025 at 03:47:42PM +0200, Petr Pavlu wrote:
>On 10/8/25 8:41 AM, Lucas De Marchi wrote:
>> On Tue, Aug 19, 2025 at 09:17:50AM -0500, Lucas De Marchi wrote:
>>> On Tue, Aug 19, 2025 at 10:52:16AM +0200, Petr Pavlu wrote:
>>>> On 8/18/25 11:34 AM, Phil Sutter wrote:
>>>>> On Sun, Aug 17, 2025 at 05:54:27PM +0200, Christophe Leroy wrote:
>>>>>> Le 17/08/2025 à 01:33, Phil Sutter a écrit :
>>>>>>> [Vous ne recevez pas souvent de courriers de phil@nwl.cc. D?couvrez pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I admittedly didn't fully analyze the cause, but on my system a call to:
>>>>>>>
>>>>>>> # insmod /lib/module/$(uname -r)/kernel/net/netfilter/nf_conntrack_ftp.ko
>>>>>>>
>>>>>>> fails with -EEXIST (due to a previous call to 'nfct add helper ftp inet
>>>>>>> tcp'). A call to:
>>>>>>>
>>>>>>> # modprobe nf_conntrack_ftp
>>>>>>>
>>>>>>> though returns 0 even though module loading fails. Is there a bug in
>>>>>>> modprobe error status handling?
>>>>>>>
>>>>>>
>>>>>> Read the man page : https://linux.die.net/man/8/modprobe
>>>>>>
>>>>>> In the man page I see:
>>>>>>
>>>>>>            Normally, modprobe will succeed (and do nothing) if told to
>>>>>> insert a module which is already present or to remove a module which
>>>>>> isn't present.
>>>>>
>>>>> This is not a case of already inserted module, it is not loaded before
>>>>> the call to modprobe. It is the module_init callback
>>>>> nf_conntrack_ftp_init() which returns -EEXIST it received from
>>>>> nf_conntrack_helpers_register().
>>>
>>> is this a real failure condition or something benign like "if it's
>>> already registered, there's nothing to do"?
>>>
>>>>>
>>>>> Can't user space distinguish the two causes of -EEXIST? Or in other
>>>>> words, is use of -EEXIST in module_init callbacks problematic?
>>>>
>>>> Unfortunately, error return codes from (f)init_module cannot be reliably
>>>> depended upon. For instance, cpufreq drivers have similar behavior of
>>>> returning -EEXIST when another cpufreq driver is already registered.
>>>> Returning this code unexpectedly can then confuse kmod, as it interprets
>>>> -EEXIST to mean "the module is already loaded" [1].
>>>
>>> well, it's not that it can't be relied on. There's 1 exit code that is
>>> treated specially, EEXISTS, because that error is used by the module
>>> loading part, before the module_init call, to signify the module is
>>> already loaded.
>>>
>>>>
>>>> I have thought about this problem before. We might fix the main
>>>> problematic occurrences, but we can't really audit all the code that
>>>> module init functions can invoke. I then wonder if it would make sense
>>>> for the module loader to warn about any -EEXIST returned by a module's
>>>> init function and translate it to -EBUSY.
>>>
>>> If it's a failure condition then yes, -EBUSY looks appropriate.
>>
>> something like this:
>>
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index c66b261849362..e5fb1a4ef3441 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -3038,6 +3038,11 @@ static noinline int do_init_module(struct module *mod)
>>      if (mod->init != NULL)
>>          ret = do_one_initcall(mod->init);
>>      if (ret < 0) {
>> +        if (ret == -EEXIST) {
>> +            pr_warn("%s: '%s'->init suspiciously returned %d: Overriding with %d\n",
>> +                __func__, mod->name, -EEXIST, -EBUSY);
>> +            ret = -EBUSY;
>> +        }
>>          goto fail_free_freeinit;
>>      }
>>      if (ret > 0) {
>
>Yes, that's what I had in mind. Could you please send this as a proper
>patch to the list?
>
>I only think we should include a hint to explain why this is a problem
>and simplify the message somewhat, something like:
>
>pr_warn("%s: init suspiciously returned -EEXIST (reserved for loaded modules), overriding with -EBUSY\n", mod->name);
>
>I realize you based the message on the later warning about the init
>function returning a >0 value but I think we should rather update that
>message as well. It should follow the usual style of
>"<mod-name>: <error-description>". I suggest simplifying it to:
>
>pr_warn("%s: init suspiciously returned %d, it should follow 0/-E convention\n", mod->name, ret);

will do and actually run some tests to make sure it's not only
build-tested.

Thanks,
Lucas De Marchi

>
>-- 
>Thanks,
>Petr

^ permalink raw reply

* Re: modprobe returns 0 upon -EEXIST from insmod
From: Petr Pavlu @ 2025-10-09 13:47 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Phil Sutter, Christophe Leroy, linux-modules, Yi Chen
In-Reply-To: <hupl3hqym5ru3fr27s3elg6vti4fjtphdwvvyxmuvekc2w5mna@kilmmcgobw6x>

On 10/8/25 8:41 AM, Lucas De Marchi wrote:
> On Tue, Aug 19, 2025 at 09:17:50AM -0500, Lucas De Marchi wrote:
>> On Tue, Aug 19, 2025 at 10:52:16AM +0200, Petr Pavlu wrote:
>>> On 8/18/25 11:34 AM, Phil Sutter wrote:
>>>> On Sun, Aug 17, 2025 at 05:54:27PM +0200, Christophe Leroy wrote:
>>>>> Le 17/08/2025 à 01:33, Phil Sutter a écrit :
>>>>>> [Vous ne recevez pas souvent de courriers de phil@nwl.cc. D?couvrez pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I admittedly didn't fully analyze the cause, but on my system a call to:
>>>>>>
>>>>>> # insmod /lib/module/$(uname -r)/kernel/net/netfilter/nf_conntrack_ftp.ko
>>>>>>
>>>>>> fails with -EEXIST (due to a previous call to 'nfct add helper ftp inet
>>>>>> tcp'). A call to:
>>>>>>
>>>>>> # modprobe nf_conntrack_ftp
>>>>>>
>>>>>> though returns 0 even though module loading fails. Is there a bug in
>>>>>> modprobe error status handling?
>>>>>>
>>>>>
>>>>> Read the man page : https://linux.die.net/man/8/modprobe
>>>>>
>>>>> In the man page I see:
>>>>>
>>>>>            Normally, modprobe will succeed (and do nothing) if told to
>>>>> insert a module which is already present or to remove a module which
>>>>> isn't present.
>>>>
>>>> This is not a case of already inserted module, it is not loaded before
>>>> the call to modprobe. It is the module_init callback
>>>> nf_conntrack_ftp_init() which returns -EEXIST it received from
>>>> nf_conntrack_helpers_register().
>>
>> is this a real failure condition or something benign like "if it's
>> already registered, there's nothing to do"?
>>
>>>>
>>>> Can't user space distinguish the two causes of -EEXIST? Or in other
>>>> words, is use of -EEXIST in module_init callbacks problematic?
>>>
>>> Unfortunately, error return codes from (f)init_module cannot be reliably
>>> depended upon. For instance, cpufreq drivers have similar behavior of
>>> returning -EEXIST when another cpufreq driver is already registered.
>>> Returning this code unexpectedly can then confuse kmod, as it interprets
>>> -EEXIST to mean "the module is already loaded" [1].
>>
>> well, it's not that it can't be relied on. There's 1 exit code that is
>> treated specially, EEXISTS, because that error is used by the module
>> loading part, before the module_init call, to signify the module is
>> already loaded.
>>
>>>
>>> I have thought about this problem before. We might fix the main
>>> problematic occurrences, but we can't really audit all the code that
>>> module init functions can invoke. I then wonder if it would make sense
>>> for the module loader to warn about any -EEXIST returned by a module's
>>> init function and translate it to -EBUSY.
>>
>> If it's a failure condition then yes, -EBUSY looks appropriate.
> 
> something like this:
> 
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index c66b261849362..e5fb1a4ef3441 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -3038,6 +3038,11 @@ static noinline int do_init_module(struct module *mod)
>      if (mod->init != NULL)
>          ret = do_one_initcall(mod->init);
>      if (ret < 0) {
> +        if (ret == -EEXIST) {
> +            pr_warn("%s: '%s'->init suspiciously returned %d: Overriding with %d\n",
> +                __func__, mod->name, -EEXIST, -EBUSY);
> +            ret = -EBUSY;
> +        }
>          goto fail_free_freeinit;
>      }
>      if (ret > 0) {

Yes, that's what I had in mind. Could you please send this as a proper
patch to the list?

I only think we should include a hint to explain why this is a problem
and simplify the message somewhat, something like:

pr_warn("%s: init suspiciously returned -EEXIST (reserved for loaded modules), overriding with -EBUSY\n", mod->name);

I realize you based the message on the later warning about the init
function returning a >0 value but I think we should rather update that
message as well. It should follow the usual style of
"<mod-name>: <error-description>". I suggest simplifying it to:

pr_warn("%s: init suspiciously returned %d, it should follow 0/-E convention\n", mod->name, ret);

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH 10/10] module loader: enforce symbol import protection
From: Petr Pavlu @ 2025-10-08 15:35 UTC (permalink / raw)
  To: Siddharth Nayyar
  Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
	Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
	linux-modules, linux-kernel
In-Reply-To: <20250829105418.3053274-11-sidnayyar@google.com>

On 8/29/25 12:54 PM, Siddharth Nayyar wrote:
> The module loader will reject unsigned modules from loading if such a
> module attempts to import a symbol which has the import protection bit
> set in the kflagstab entry for the symbol.
> 
> Signed-off-by: Siddharth Nayyar <sidnayyar@google.com>
> ---
> [...]
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 4437c2a451ea..ece074a6ba7b 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -380,6 +380,7 @@ static bool find_exported_symbol_in_section(const struct symsearch *syms,
>  	fsa->crc = symversion(syms->crcs, sym - syms->start);
>  	fsa->sym = sym;
>  	fsa->license = (sym_flags & KSYM_FLAG_GPL_ONLY) ? GPL_ONLY : NOT_GPL_ONLY;
> +	fsa->is_protected = sym_flags & KSYM_FLAG_PROTECTED;
>  
>  	return true;
>  }
> @@ -1273,6 +1274,11 @@ static const struct kernel_symbol *resolve_symbol(struct module *mod,
>  		goto getname;
>  	}
>  
> +	if (fsa.is_protected && !mod->sig_ok) {
> +		fsa.sym = ERR_PTR(-EACCES);
> +		goto getname;
> +	}
> +
>  getname:
>  	/* We must make copy under the lock if we failed to get ref. */
>  	strscpy(ownername, module_name(fsa.owner), MODULE_NAME_LEN);

The is_protected check should be moved before the ref_module() call.
Adding a reference to another module should be always the last step,
after all symbol checks have been performed.

> @@ -1550,8 +1556,12 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>  				break;
>  
>  			ret = PTR_ERR(ksym) ?: -ENOENT;
> -			pr_warn("%s: Unknown symbol %s (err %d)\n",
> -				mod->name, name, ret);
> +			if (ret == -EACCES)
> +				pr_warn("%s: Protected symbol %s (err %d)\n",
> +					mod->name, name, ret);
> +			else
> +				pr_warn("%s: Unknown symbol %s (err %d)\n",
> +					mod->name, name, ret);
>  			break;
>  
>  		default:

I suggest moving the error message about the symbol being protected down
into resolve_symbol(), at the point where this issue is detected. This
approach is generally used for other checks, such as the CRC or
namespace check. Additionally, I think it would make sense to change the
current "Unknown symbol" warning here to "Unresolved symbol" to be more
accurate.

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH 09/10] modpost: add symbol import protection flag to kflagstab
From: Petr Pavlu @ 2025-10-08 13:35 UTC (permalink / raw)
  To: Siddharth Nayyar
  Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
	Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
	linux-modules, linux-kernel
In-Reply-To: <20250829105418.3053274-10-sidnayyar@google.com>

On 8/29/25 12:54 PM, Siddharth Nayyar wrote:
> When the unused exports whitelist is provided, the symbol protection bit
> is set for symbols not present in the unused exports whitelist.
> 
> The flag will be used in the following commit to prevent unsigned
> modules from the using symbols other than those explicitly declared by
> the such modules ahead of time.
> 
> Signed-off-by: Siddharth Nayyar <sidnayyar@google.com>
> ---
> [...]
> diff --git a/include/linux/module_symbol.h b/include/linux/module_symbol.h
> index 574609aced99..96fe3f4d7424 100644
> --- a/include/linux/module_symbol.h
> +++ b/include/linux/module_symbol.h
> @@ -3,8 +3,9 @@
>  #define _LINUX_MODULE_SYMBOL_H
>  
>  /* Kernel symbol flags bitset. */
> -enum ksym_flags {
> +enum symbol_flags {
>  	KSYM_FLAG_GPL_ONLY	= 1 << 0,
> +	KSYM_FLAG_PROTECTED	= 1 << 1,
>  };
>  

Nit: The ksym_flags enum is added in patch #1. If you prefer a different
name, you can change it in that patch.

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH 08/10] remove references to *_gpl sections in documentation
From: Petr Pavlu @ 2025-10-08 13:24 UTC (permalink / raw)
  To: Siddharth Nayyar
  Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
	Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
	linux-modules, linux-kernel
In-Reply-To: <20250829105418.3053274-9-sidnayyar@google.com>

On 8/29/25 12:54 PM, Siddharth Nayyar wrote:
> Signed-off-by: Siddharth Nayyar <sidnayyar@google.com>
> ---
>  Documentation/kbuild/modules.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/kbuild/modules.rst b/Documentation/kbuild/modules.rst
> index d0703605bfa4..f2022fa2342f 100644
> --- a/Documentation/kbuild/modules.rst
> +++ b/Documentation/kbuild/modules.rst
> @@ -426,11 +426,11 @@ Symbols From the Kernel (vmlinux + modules)
>  Version Information Formats
>  ---------------------------
>  
> -	Exported symbols have information stored in __ksymtab or __ksymtab_gpl
> -	sections. Symbol names and namespaces are stored in __ksymtab_strings,
> +	Exported symbols have information stored in the __ksymtab section.
> +	Symbol names and namespaces are stored in __ksymtab_strings section,
>  	using a format similar to the string table used for ELF. If
>  	CONFIG_MODVERSIONS is enabled, the CRCs corresponding to exported
> -	symbols will be added to the __kcrctab or __kcrctab_gpl.
> +	symbols will be added to the __kcrctab section.
>  
>  	If CONFIG_BASIC_MODVERSIONS is enabled (default with
>  	CONFIG_MODVERSIONS), imported symbols will have their symbol name and

Nit: I realize this part of the document primarily discusses sections
related to modversions, but I think it would be good to briefly mention
also the existence of the __kflagstab section. The first sentence could
say:

Exported symbols have information stored in the __ksymtab and
__kflagstab sections.

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH 06/10] module loader: remove references of *_gpl sections
From: Petr Pavlu @ 2025-10-08 13:22 UTC (permalink / raw)
  To: Siddharth Nayyar
  Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
	Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
	linux-modules, linux-kernel
In-Reply-To: <20250829105418.3053274-7-sidnayyar@google.com>

On 8/29/25 12:54 PM, Siddharth Nayyar wrote:
> The *_gpl section are not being used populated by modpost anymore. Hence
> the module loader doesn't need to find and process these sections in
> modules.
> 
> Signed-off-by: Siddharth Nayyar <sidnayyar@google.com>
> ---
> [...]
> @@ -2601,10 +2590,6 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>  	mod->syms = section_objs(info, "__ksymtab",
>  				 sizeof(*mod->syms), &mod->num_syms);
>  	mod->crcs = section_addr(info, "__kcrctab");
> -	mod->gpl_syms = section_objs(info, "__ksymtab_gpl",
> -				     sizeof(*mod->gpl_syms),
> -				     &mod->num_gpl_syms);
> -	mod->gpl_crcs = section_addr(info, "__kcrctab_gpl");
>  	mod->flagstab = section_addr(info, "__kflagstab");
>  

I suggest adding a check that the loaded module doesn't contain
a __ksymtab_gpl or __kcrctab_gpl section, similarly how the function
later checks if the old __obsparm section isn't present. Something like:

	if (section_addr(info, "__ksymtab_gpl"))
		pr_warn("%s: ignoring obsolete section __ksymtab_gpl\n", mod->name);
	if (section_addr(info, "__kcrctab_gpl"))
		pr_warn("%s: ignoring obsolete section __kcrctab_gpl\n", mod->name);

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH 04/10] module loader: use kflagstab instead of *_gpl sections
From: Petr Pavlu @ 2025-10-08 13:19 UTC (permalink / raw)
  To: Siddharth Nayyar
  Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
	Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
	linux-modules, linux-kernel
In-Reply-To: <20250829105418.3053274-5-sidnayyar@google.com>

On 8/29/25 12:54 PM, Siddharth Nayyar wrote:
> Read __kflagstab section for vmlinux and modules to determine whether
> kernel symbols are GPL only.
> 
> Signed-off-by: Siddharth Nayyar <sidnayyar@google.com>
> ---
> [...]
> @@ -2607,6 +2605,7 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>  				     sizeof(*mod->gpl_syms),
>  				     &mod->num_gpl_syms);
>  	mod->gpl_crcs = section_addr(info, "__kcrctab_gpl");
> +	mod->flagstab = section_addr(info, "__kflagstab");
>  
>  #ifdef CONFIG_CONSTRUCTORS
>  	mod->ctors = section_objs(info, ".ctors",

The module loader should always at least get through the signature and
blacklist checks without crashing due to a corrupted ELF file. After
that point, the module content is to be trusted, but we try to error out
for most issues that would cause problems later on.

For __kflagstab, I believe it would be useful to check that the section
is present to prevent the code from potentially crashing due to a NULL
dereference deep in find_exported_symbol_in_section(). You can rename
check_export_symbol_versions() to check_export_symbol_sections() and add
the following:

	if (mod->num_syms && !mod->flagstab) {
		pr_err("%s: no flags for exported symbols\n", mod->name);
		return -ENOEXEC;
	}

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH 3/3] module: Add compile-time check for embedded NUL characters
From: Petr Pavlu @ 2025-10-08  9:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis Chamberlain, Rusty Russell, Daniel Gomez, Sami Tolvanen,
	linux-modules, Malcolm Priestley, Mauro Carvalho Chehab,
	Hans Verkuil, Uwe Kleine-König, linux-kernel, linux-media,
	linux-hardening
In-Reply-To: <20251008035938.838263-3-kees@kernel.org>

On 10/8/25 5:59 AM, Kees Cook wrote:
> Long ago, the kernel module license checks were bypassed by embedding a
> NUL character in the MODULE_LICENSE() string[1]. By using a string like
> "GPL\0proprietary text", the kernel would only read "GPL" due to C string
> termination at the NUL byte, allowing proprietary modules to avoid kernel
> tainting and access GPL-only symbols.
> 
> The MODULE_INFO() macro stores these strings in the .modinfo ELF
> section, and get_next_modinfo() uses strcmp()-family functions
> which stop at the first NUL. This split the embedded string into two
> separate .modinfo entries, with only the first part being processed by
> license_is_gpl_compatible().
> 
> Add a compile-time check using _Static_assert that compares the full
> string length (sizeof - 1) against __builtin_strlen(), which stops at
> the first NUL. If they differ, compilation fails with a clear error
> message.
> 
> While this check can still be circumvented by modifying the ELF binary
> post-compilation, it prevents accidental embedded NULs and forces
> intentional abuse to require deliberate binary manipulation rather than
> simple source-level tricks.
> 
> Build tested with test modules containing both valid and invalid license
> strings. The check correctly rejects:
> 
>     MODULE_LICENSE("GPL\0proprietary")
> 
> while accepting normal declarations:
> 
>     MODULE_LICENSE("GPL")
> 
> Link: https://lwn.net/Articles/82305/ [1]
> Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> Cc: Petr Pavlu <petr.pavlu@suse.com>
> Cc: Daniel Gomez <da.gomez@kernel.org>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: <linux-modules@vger.kernel.org>
> ---
>  include/linux/moduleparam.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
> index 6907aedc4f74..160f1678fafa 100644
> --- a/include/linux/moduleparam.h
> +++ b/include/linux/moduleparam.h
> @@ -26,6 +26,9 @@
>  
>  /* Generic info of form tag = "info" */
>  #define MODULE_INFO(tag, info)					  \
> +	_Static_assert(						  \
> +		sizeof(info) - 1 == __builtin_strlen(info),	  \
> +		"MODULE_INFO(" #tag ", ...) contains embedded NUL byte"); \
>  	static const char __UNIQUE_ID(modinfo)[]			  \
>  		__used __section(".modinfo") __aligned(1)		  \
>  		= __MODULE_INFO_PREFIX __stringify(tag) "=" info

Nit: I think it is better to use static_assert() over _Static_assert()
for consistency. Note also that C23 [1, 2] introduces static_assert()
with the message being optional, which essentially matches the
static_assert() macro in include/linux/build_bug.h, and deprecates
_Static_assert().

[1] https://en.cppreference.com/w/c/language/_Static_assert.html
[2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf (C23 similar draft)

-- 
Thanks,
Petr

^ permalink raw reply

* Re: modprobe returns 0 upon -EEXIST from insmod
From: Lucas De Marchi @ 2025-10-08  6:41 UTC (permalink / raw)
  To: Petr Pavlu; +Cc: Phil Sutter, Christophe Leroy, linux-modules, Yi Chen
In-Reply-To: <i4ayzta5zgltyubg6bfr4mwqhl6goyh73lyc7j7m3vngvpooi3@boorlngxpi52>

On Tue, Aug 19, 2025 at 09:17:50AM -0500, Lucas De Marchi wrote:
>On Tue, Aug 19, 2025 at 10:52:16AM +0200, Petr Pavlu wrote:
>>On 8/18/25 11:34 AM, Phil Sutter wrote:
>>>On Sun, Aug 17, 2025 at 05:54:27PM +0200, Christophe Leroy wrote:
>>>>Le 17/08/2025 à 01:33, Phil Sutter a écrit :
>>>>>[Vous ne recevez pas souvent de courriers de phil@nwl.cc. D?couvrez pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>>Hi,
>>>>>
>>>>>I admittedly didn't fully analyze the cause, but on my system a call to:
>>>>>
>>>>># insmod /lib/module/$(uname -r)/kernel/net/netfilter/nf_conntrack_ftp.ko
>>>>>
>>>>>fails with -EEXIST (due to a previous call to 'nfct add helper ftp inet
>>>>>tcp'). A call to:
>>>>>
>>>>># modprobe nf_conntrack_ftp
>>>>>
>>>>>though returns 0 even though module loading fails. Is there a bug in
>>>>>modprobe error status handling?
>>>>>
>>>>
>>>>Read the man page : https://linux.die.net/man/8/modprobe
>>>>
>>>>In the man page I see:
>>>>
>>>>            Normally, modprobe will succeed (and do nothing) if told to
>>>>insert a module which is already present or to remove a module which
>>>>isn't present.
>>>
>>>This is not a case of already inserted module, it is not loaded before
>>>the call to modprobe. It is the module_init callback
>>>nf_conntrack_ftp_init() which returns -EEXIST it received from
>>>nf_conntrack_helpers_register().
>
>is this a real failure condition or something benign like "if it's
>already registered, there's nothing to do"?
>
>>>
>>>Can't user space distinguish the two causes of -EEXIST? Or in other
>>>words, is use of -EEXIST in module_init callbacks problematic?
>>
>>Unfortunately, error return codes from (f)init_module cannot be reliably
>>depended upon. For instance, cpufreq drivers have similar behavior of
>>returning -EEXIST when another cpufreq driver is already registered.
>>Returning this code unexpectedly can then confuse kmod, as it interprets
>>-EEXIST to mean "the module is already loaded" [1].
>
>well, it's not that it can't be relied on. There's 1 exit code that is
>treated specially, EEXISTS, because that error is used by the module
>loading part, before the module_init call, to signify the module is
>already loaded.
>
>>
>>I have thought about this problem before. We might fix the main
>>problematic occurrences, but we can't really audit all the code that
>>module init functions can invoke. I then wonder if it would make sense
>>for the module loader to warn about any -EEXIST returned by a module's
>>init function and translate it to -EBUSY.
>
>If it's a failure condition then yes, -EBUSY looks appropriate.

something like this:


diff --git a/kernel/module/main.c b/kernel/module/main.c
index c66b261849362..e5fb1a4ef3441 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3038,6 +3038,11 @@ static noinline int do_init_module(struct module *mod)
  	if (mod->init != NULL)
  		ret = do_one_initcall(mod->init);
  	if (ret < 0) {
+		if (ret == -EEXIST) {
+			pr_warn("%s: '%s'->init suspiciously returned %d: Overriding with %d\n",
+				__func__, mod->name, -EEXIST, -EBUSY);
+			ret = -EBUSY;
+		}
  		goto fail_free_freeinit;
  	}
  	if (ret > 0) {

Lucas De Marchi

>
>Lucas De Marchi
>
>>
>>Ensuring the reliability of the 0 and -EEXIST return codes from
>>(f)init_module should help user space.
>>
>>[1] https://github.com/kmod-project/kmod/blob/695fd084a727cf76f51b129b67d5a4be1d6db32e/libkmod/libkmod-module.c#L1087
>>
>>-- Petr

^ permalink raw reply related

* Re: [PATCH 0/3] module: Add compile-time check for embedded NUL characters
From: Hans Verkuil @ 2025-10-08  6:27 UTC (permalink / raw)
  To: Kees Cook, Luis Chamberlain
  Cc: Malcolm Priestley, Mauro Carvalho Chehab, Uwe Kleine-König,
	Rusty Russell, Petr Pavlu, Daniel Gomez, Sami Tolvanen,
	linux-kernel, linux-media, linux-modules, linux-hardening
In-Reply-To: <20251008033844.work.801-kees@kernel.org>

On 08/10/2025 05:59, Kees Cook wrote:
> Hi,
> 
> A long time ago we had an issue with embedded NUL bytes in MODULE_INFO
> strings[1]. While this stands out pretty strongly when you look at the
> code, and we can't do anything about a binary module that just plain lies,
> we never actually implemented the trivial compile-time check needed to
> detect it.
> 
> Add this check (and fix 2 instances of needless trailing semicolons that
> this change exposed).
> 
> Note that these patches were produced as part of another LLM exercise.
> This time I wanted to try "what happens if I ask an LLM to go read
> a specific LWN article and write a patch based on a discussion?" It
> pretty effortlessly chose and implemented a suggested solution, tested
> the change, and fixed new build warnings in the process.
> 
> Since this was a relatively short session, here's an overview of the
> prompts involved as I guided it through a clean change and tried to see
> how it would reason about static_assert vs _Static_assert. (It wanted
> to use what was most common, not what was the current style -- we may
> want to update the comment above the static_assert macro to suggest
> using _Static_assert directly these days...)
> 
>   I want to fix a weakness in the module info strings. Read about it
>   here: https://lwn.net/Articles/82305/
> 
>   Since it's only "info" that we need to check, can you reduce the checks
>   to just that instead of all the other stuff?
> 
>   I think the change to the comment is redundent, and that should be
>   in a commit log instead. Let's just keep the change to the static assert.
> 
>   Is "static_assert" the idiomatic way to use a static assert in this
>   code base? I've seen _Static_assert used sometimes.
> 
>   What's the difference between the two?
> 
>   Does Linux use C11 by default now?
> 
>   Then let's not use the wrapper any more.
> 
>   Do an "allmodconfig all -s" build to verify this works for all modules
>   in the kernel.
> 
> 
> Thanks!
> 
> -Kees
> 
> [1] https://lwn.net/Articles/82305/
> 
> Kees Cook (3):
>   media: dvb-usb-v2: lmedm04: Fix firmware macro definitions
>   media: radio: si470x: Fix DRIVER_AUTHOR macro definition
>   module: Add compile-time check for embedded NUL characters

I reviewed the two media patches. Feel free to take this series.
If you prefer that I take the two media patches, then let me know
but it makes more sense in this case that you take all three.

Regards,

	Hans

> 
>  include/linux/moduleparam.h                   |  3 +++
>  drivers/media/radio/si470x/radio-si470x-i2c.c |  2 +-
>  drivers/media/usb/dvb-usb-v2/lmedm04.c        | 12 ++++++------
>  3 files changed, 10 insertions(+), 7 deletions(-)
> 


^ permalink raw reply

* Re: [PATCH 2/3] media: radio: si470x: Fix DRIVER_AUTHOR macro definition
From: Hans Verkuil @ 2025-10-08  6:24 UTC (permalink / raw)
  To: Kees Cook, Luis Chamberlain
  Cc: Hans Verkuil, Mauro Carvalho Chehab, Uwe Kleine-König,
	linux-media, Malcolm Priestley, Rusty Russell, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, linux-kernel, linux-modules,
	linux-hardening
In-Reply-To: <20251008035938.838263-2-kees@kernel.org>

On 08/10/2025 05:59, Kees Cook wrote:
> The DRIVER_AUTHOR macro incorrectly included a semicolon in its
> string literal definition. Right now, this wasn't causing any real
> problem, but coming changes to the MODULE_INFO() macro make this more
> sensitive. Specifically, when used with MODULE_AUTHOR(), this created
> syntax errors during macro expansion:
> 
>     MODULE_AUTHOR(DRIVER_AUTHOR);
> 
> expands to:
> 
>     MODULE_INFO(author, "Joonyoung Shim <jy0922.shim@samsung.com>";)
>                                                                   ^
>                                                        syntax error
> 
> Remove the trailing semicolon from the DRIVER_AUTHOR definition.
> Semicolons should only appear at the point of use, not in the macro
> definition.
> 
> Signed-off-by: Kees Cook <kees@kernel.org>

Reviewed-by: Hans Verkuil <hverkuil+cisco@kernel.org>

> ---
> Cc: Hans Verkuil <hverkuil@kernel.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Cc: <linux-media@vger.kernel.org>
> ---
>  drivers/media/radio/si470x/radio-si470x-i2c.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/radio/si470x/radio-si470x-i2c.c b/drivers/media/radio/si470x/radio-si470x-i2c.c
> index cdd2ac198f2c..3932a449a1b1 100644
> --- a/drivers/media/radio/si470x/radio-si470x-i2c.c
> +++ b/drivers/media/radio/si470x/radio-si470x-i2c.c
> @@ -10,7 +10,7 @@
>  
>  
>  /* driver definitions */
> -#define DRIVER_AUTHOR "Joonyoung Shim <jy0922.shim@samsung.com>";
> +#define DRIVER_AUTHOR "Joonyoung Shim <jy0922.shim@samsung.com>"
>  #define DRIVER_CARD "Silicon Labs Si470x FM Radio"
>  #define DRIVER_DESC "I2C radio driver for Si470x FM Radio Receivers"
>  #define DRIVER_VERSION "1.0.2"


^ permalink raw reply

* Re: [PATCH 1/3] media: dvb-usb-v2: lmedm04: Fix firmware macro definitions
From: Hans Verkuil @ 2025-10-08  6:24 UTC (permalink / raw)
  To: Kees Cook, Luis Chamberlain
  Cc: Malcolm Priestley, Mauro Carvalho Chehab, linux-media,
	Hans Verkuil, Uwe Kleine-König, Rusty Russell, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, linux-kernel, linux-modules,
	linux-hardening
In-Reply-To: <20251008035938.838263-1-kees@kernel.org>

On 08/10/2025 05:59, Kees Cook wrote:
> The firmware filename macros incorrectly included semicolons in their
> string literal definitions. Right now, this wasn't causing any real
> problem, but coming changes to the MODULE_INFO() macro make this more
> sensitive. Specifically, when used with MODULE_FIRMWARE(), this
> created syntax errors during macro expansion:
> 
>     MODULE_FIRMWARE(LME2510_C_S7395);
> 
> expands to:
> 
>     MODULE_INFO(firmware, "dvb-usb-lme2510c-s7395.fw";)
>                                                      ^
>                                           syntax error
> 
> Remove the trailing semicolons from all six firmware filename macro
> definitions. Semicolons should only appear at the point of use, not in
> the macro definition.
> 
> Signed-off-by: Kees Cook <kees@kernel.org>

Reviewed-by: Hans Verkuil <hverkuil+cisco@kernel.org>

> ---
> Cc: Malcolm Priestley <tvboxspy@gmail.com>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: <linux-media@vger.kernel.org>
> ---
>  drivers/media/usb/dvb-usb-v2/lmedm04.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/media/usb/dvb-usb-v2/lmedm04.c b/drivers/media/usb/dvb-usb-v2/lmedm04.c
> index 0c510035805b..05c18b6de5c6 100644
> --- a/drivers/media/usb/dvb-usb-v2/lmedm04.c
> +++ b/drivers/media/usb/dvb-usb-v2/lmedm04.c
> @@ -70,12 +70,12 @@
>  #include "ts2020.h"
>  
>  
> -#define LME2510_C_S7395	"dvb-usb-lme2510c-s7395.fw";
> -#define LME2510_C_LG	"dvb-usb-lme2510c-lg.fw";
> -#define LME2510_C_S0194	"dvb-usb-lme2510c-s0194.fw";
> -#define LME2510_C_RS2000 "dvb-usb-lme2510c-rs2000.fw";
> -#define LME2510_LG	"dvb-usb-lme2510-lg.fw";
> -#define LME2510_S0194	"dvb-usb-lme2510-s0194.fw";
> +#define LME2510_C_S7395	"dvb-usb-lme2510c-s7395.fw"
> +#define LME2510_C_LG	"dvb-usb-lme2510c-lg.fw"
> +#define LME2510_C_S0194	"dvb-usb-lme2510c-s0194.fw"
> +#define LME2510_C_RS2000 "dvb-usb-lme2510c-rs2000.fw"
> +#define LME2510_LG	"dvb-usb-lme2510-lg.fw"
> +#define LME2510_S0194	"dvb-usb-lme2510-s0194.fw"
>  
>  /* debug */
>  static int dvb_usb_lme2510_debug;


^ permalink raw reply

* [PATCH 3/3] module: Add compile-time check for embedded NUL characters
From: Kees Cook @ 2025-10-08  3:59 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Kees Cook, Rusty Russell, Petr Pavlu, Daniel Gomez, Sami Tolvanen,
	linux-modules, Malcolm Priestley, Mauro Carvalho Chehab,
	Hans Verkuil, Uwe Kleine-König, linux-kernel, linux-media,
	linux-hardening
In-Reply-To: <20251008033844.work.801-kees@kernel.org>

Long ago, the kernel module license checks were bypassed by embedding a
NUL character in the MODULE_LICENSE() string[1]. By using a string like
"GPL\0proprietary text", the kernel would only read "GPL" due to C string
termination at the NUL byte, allowing proprietary modules to avoid kernel
tainting and access GPL-only symbols.

The MODULE_INFO() macro stores these strings in the .modinfo ELF
section, and get_next_modinfo() uses strcmp()-family functions
which stop at the first NUL. This split the embedded string into two
separate .modinfo entries, with only the first part being processed by
license_is_gpl_compatible().

Add a compile-time check using _Static_assert that compares the full
string length (sizeof - 1) against __builtin_strlen(), which stops at
the first NUL. If they differ, compilation fails with a clear error
message.

While this check can still be circumvented by modifying the ELF binary
post-compilation, it prevents accidental embedded NULs and forces
intentional abuse to require deliberate binary manipulation rather than
simple source-level tricks.

Build tested with test modules containing both valid and invalid license
strings. The check correctly rejects:

    MODULE_LICENSE("GPL\0proprietary")

while accepting normal declarations:

    MODULE_LICENSE("GPL")

Link: https://lwn.net/Articles/82305/ [1]
Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Daniel Gomez <da.gomez@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: <linux-modules@vger.kernel.org>
---
 include/linux/moduleparam.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 6907aedc4f74..160f1678fafa 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -26,6 +26,9 @@
 
 /* Generic info of form tag = "info" */
 #define MODULE_INFO(tag, info)					  \
+	_Static_assert(						  \
+		sizeof(info) - 1 == __builtin_strlen(info),	  \
+		"MODULE_INFO(" #tag ", ...) contains embedded NUL byte"); \
 	static const char __UNIQUE_ID(modinfo)[]			  \
 		__used __section(".modinfo") __aligned(1)		  \
 		= __MODULE_INFO_PREFIX __stringify(tag) "=" info
-- 
2.34.1


^ permalink raw reply related

* [PATCH 2/3] media: radio: si470x: Fix DRIVER_AUTHOR macro definition
From: Kees Cook @ 2025-10-08  3:59 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Kees Cook, Hans Verkuil, Mauro Carvalho Chehab,
	Uwe Kleine-König, linux-media, Malcolm Priestley,
	Rusty Russell, Petr Pavlu, Daniel Gomez, Sami Tolvanen,
	linux-kernel, linux-modules, linux-hardening
In-Reply-To: <20251008033844.work.801-kees@kernel.org>

The DRIVER_AUTHOR macro incorrectly included a semicolon in its
string literal definition. Right now, this wasn't causing any real
problem, but coming changes to the MODULE_INFO() macro make this more
sensitive. Specifically, when used with MODULE_AUTHOR(), this created
syntax errors during macro expansion:

    MODULE_AUTHOR(DRIVER_AUTHOR);

expands to:

    MODULE_INFO(author, "Joonyoung Shim <jy0922.shim@samsung.com>";)
                                                                  ^
                                                       syntax error

Remove the trailing semicolon from the DRIVER_AUTHOR definition.
Semicolons should only appear at the point of use, not in the macro
definition.

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Hans Verkuil <hverkuil@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: <linux-media@vger.kernel.org>
---
 drivers/media/radio/si470x/radio-si470x-i2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/radio/si470x/radio-si470x-i2c.c b/drivers/media/radio/si470x/radio-si470x-i2c.c
index cdd2ac198f2c..3932a449a1b1 100644
--- a/drivers/media/radio/si470x/radio-si470x-i2c.c
+++ b/drivers/media/radio/si470x/radio-si470x-i2c.c
@@ -10,7 +10,7 @@
 
 
 /* driver definitions */
-#define DRIVER_AUTHOR "Joonyoung Shim <jy0922.shim@samsung.com>";
+#define DRIVER_AUTHOR "Joonyoung Shim <jy0922.shim@samsung.com>"
 #define DRIVER_CARD "Silicon Labs Si470x FM Radio"
 #define DRIVER_DESC "I2C radio driver for Si470x FM Radio Receivers"
 #define DRIVER_VERSION "1.0.2"
-- 
2.34.1


^ permalink raw reply related

* [PATCH 1/3] media: dvb-usb-v2: lmedm04: Fix firmware macro definitions
From: Kees Cook @ 2025-10-08  3:59 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Kees Cook, Malcolm Priestley, Mauro Carvalho Chehab, linux-media,
	Hans Verkuil, Uwe Kleine-König, Rusty Russell, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, linux-kernel, linux-modules,
	linux-hardening
In-Reply-To: <20251008033844.work.801-kees@kernel.org>

The firmware filename macros incorrectly included semicolons in their
string literal definitions. Right now, this wasn't causing any real
problem, but coming changes to the MODULE_INFO() macro make this more
sensitive. Specifically, when used with MODULE_FIRMWARE(), this
created syntax errors during macro expansion:

    MODULE_FIRMWARE(LME2510_C_S7395);

expands to:

    MODULE_INFO(firmware, "dvb-usb-lme2510c-s7395.fw";)
                                                     ^
                                          syntax error

Remove the trailing semicolons from all six firmware filename macro
definitions. Semicolons should only appear at the point of use, not in
the macro definition.

Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Malcolm Priestley <tvboxspy@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <linux-media@vger.kernel.org>
---
 drivers/media/usb/dvb-usb-v2/lmedm04.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/media/usb/dvb-usb-v2/lmedm04.c b/drivers/media/usb/dvb-usb-v2/lmedm04.c
index 0c510035805b..05c18b6de5c6 100644
--- a/drivers/media/usb/dvb-usb-v2/lmedm04.c
+++ b/drivers/media/usb/dvb-usb-v2/lmedm04.c
@@ -70,12 +70,12 @@
 #include "ts2020.h"
 
 
-#define LME2510_C_S7395	"dvb-usb-lme2510c-s7395.fw";
-#define LME2510_C_LG	"dvb-usb-lme2510c-lg.fw";
-#define LME2510_C_S0194	"dvb-usb-lme2510c-s0194.fw";
-#define LME2510_C_RS2000 "dvb-usb-lme2510c-rs2000.fw";
-#define LME2510_LG	"dvb-usb-lme2510-lg.fw";
-#define LME2510_S0194	"dvb-usb-lme2510-s0194.fw";
+#define LME2510_C_S7395	"dvb-usb-lme2510c-s7395.fw"
+#define LME2510_C_LG	"dvb-usb-lme2510c-lg.fw"
+#define LME2510_C_S0194	"dvb-usb-lme2510c-s0194.fw"
+#define LME2510_C_RS2000 "dvb-usb-lme2510c-rs2000.fw"
+#define LME2510_LG	"dvb-usb-lme2510-lg.fw"
+#define LME2510_S0194	"dvb-usb-lme2510-s0194.fw"
 
 /* debug */
 static int dvb_usb_lme2510_debug;
-- 
2.34.1


^ permalink raw reply related

* [PATCH 0/3] module: Add compile-time check for embedded NUL characters
From: Kees Cook @ 2025-10-08  3:59 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Kees Cook, Malcolm Priestley, Mauro Carvalho Chehab, Hans Verkuil,
	Uwe Kleine-König, Rusty Russell, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, linux-kernel, linux-media, linux-modules,
	linux-hardening

Hi,

A long time ago we had an issue with embedded NUL bytes in MODULE_INFO
strings[1]. While this stands out pretty strongly when you look at the
code, and we can't do anything about a binary module that just plain lies,
we never actually implemented the trivial compile-time check needed to
detect it.

Add this check (and fix 2 instances of needless trailing semicolons that
this change exposed).

Note that these patches were produced as part of another LLM exercise.
This time I wanted to try "what happens if I ask an LLM to go read
a specific LWN article and write a patch based on a discussion?" It
pretty effortlessly chose and implemented a suggested solution, tested
the change, and fixed new build warnings in the process.

Since this was a relatively short session, here's an overview of the
prompts involved as I guided it through a clean change and tried to see
how it would reason about static_assert vs _Static_assert. (It wanted
to use what was most common, not what was the current style -- we may
want to update the comment above the static_assert macro to suggest
using _Static_assert directly these days...)

  I want to fix a weakness in the module info strings. Read about it
  here: https://lwn.net/Articles/82305/

  Since it's only "info" that we need to check, can you reduce the checks
  to just that instead of all the other stuff?

  I think the change to the comment is redundent, and that should be
  in a commit log instead. Let's just keep the change to the static assert.

  Is "static_assert" the idiomatic way to use a static assert in this
  code base? I've seen _Static_assert used sometimes.

  What's the difference between the two?

  Does Linux use C11 by default now?

  Then let's not use the wrapper any more.

  Do an "allmodconfig all -s" build to verify this works for all modules
  in the kernel.


Thanks!

-Kees

[1] https://lwn.net/Articles/82305/

Kees Cook (3):
  media: dvb-usb-v2: lmedm04: Fix firmware macro definitions
  media: radio: si470x: Fix DRIVER_AUTHOR macro definition
  module: Add compile-time check for embedded NUL characters

 include/linux/moduleparam.h                   |  3 +++
 drivers/media/radio/si470x/radio-si470x-i2c.c |  2 +-
 drivers/media/usb/dvb-usb-v2/lmedm04.c        | 12 ++++++------
 3 files changed, 10 insertions(+), 7 deletions(-)

-- 
2.34.1


^ permalink raw reply

* Re: [PATCH v8 7/8] modpost: Create modalias for builtin modules
From: Alexey Gladkov @ 2025-10-07 10:15 UTC (permalink / raw)
  To: Charles Mirabile
  Cc: da.gomez, linux-kbuild, linux-kernel, linux-modules, masahiroy,
	mcgrof, nathan, nicolas.schier, petr.pavlu, samitolvanen, sfr
In-Reply-To: <20251007011637.2512413-1-cmirabil@redhat.com>

On Mon, Oct 06, 2025 at 09:16:37PM -0400, Charles Mirabile wrote:
> On Thu, Sep 18, 2025 at 10:05:51AM +0200, Alexey Gladkov wrote:
> > For some modules, modalias is generated using the modpost utility and
> > the section is added to the module file.
> > 
> > When a module is added inside vmlinux, modpost does not generate
> > modalias for such modules and the information is lost.
> > 
> > As a result kmod (which uses modules.builtin.modinfo in userspace)
> > cannot determine that modalias is handled by a builtin kernel module.
> > 
> > $ cat /sys/devices/pci0000:00/0000:00:14.0/modalias
> > pci:v00008086d0000A36Dsv00001043sd00008694bc0Csc03i30
> > 
> > $ modinfo xhci_pci
> > name:           xhci_pci
> > filename:       (builtin)
> > license:        GPL
> > file:           drivers/usb/host/xhci-pci
> > description:    xHCI PCI Host Controller Driver
> > 
> > Missing modalias "pci:v*d*sv*sd*bc0Csc03i30*" which will be generated by
> > modpost if the module is built separately.
> > 
> > To fix this it is necessary to generate the same modalias for vmlinux as
> > for the individual modules. Fortunately '.vmlinux.export.o' is already
> > generated from which '.modinfo' can be extracted in the same way as for
> > vmlinux.o.
> 
> Hi -
> 
> This patch broke RISC-V builds for me. During the final objcopy where the new
> symbols are supposed to be stripped, an error occurs producing lots of error
> messages similar to this one:
> 
> riscv64-linux-gnu-objcopy: not stripping symbol `__mod_device_table__...'
> because it is named in a relocation
> 
> It does not occur using defconfig, but I was able to bisect my way to this
> commit and then reduce my config delta w.r.t defconfig until I landed on:
> 
> cat > .config <<'EOF'
> CONFIG_RELOCATABLE=y
> CONFIG_KASAN=y
> EOF
> ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- make olddefconfig
> ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- make -j $(nproc)
> ...
>   LD      vmlinux.unstripped
>   NM      System.map
>   SORTTAB vmlinux.unstripped
>   CHKREL  vmlinux.unstripped
>   OBJCOPY vmlinux
>   OBJCOPY modules.builtin.modinfo
>   GEN     modules.builtin
> riscv64-linux-gnu-objcopy: not stripping symbol `<long symbol name>'
> because it is named in a relocation
> <repeats with different symbol names about a dozen times>
> make[3]: *** [scripts/Makefile.vmlinux:97: vmlinux] Error 1
> make[3]: *** Deleting file 'vmlinux'
> make[2]: *** [Makefile:1242: vmlinux] Error 2
> make[1]: *** [/tmp/linux/Makefile:369: __build_one_by_one] Error 2
> make: *** [Makefile:248: __sub-make] Error 2
> 
> I confirmed that reverting this commit fixes the issue.

Hm. Indeed. I haven't found a good solution yet, but you can use the
following patch to unlock compilation. It won't solve the problem, it will
only hide it.

--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -84,7 +84,7 @@ endif
 remove-section-y                                   := .modinfo
 remove-section-$(CONFIG_ARCH_VMLINUX_NEEDS_RELOCS) += '.rel*'

-remove-symbols := -w --strip-symbol='__mod_device_table__*'
+remove-symbols := -w --strip-unneeded-symbol='__mod_device_table__*'

 # To avoid warnings: "empty loadable segment detected at ..." from GNU objcopy,
 # it is necessary to remove the PT_LOAD flag from the segment.

> > 
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > Signed-off-by: Alexey Gladkov <legion@kernel.org>
> > Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
> > ---
> >  include/linux/module.h   |  4 ----
> >  scripts/Makefile.vmlinux |  4 +++-
> >  scripts/mksysmap         |  3 +++
> >  scripts/mod/file2alias.c | 19 ++++++++++++++++++-
> >  scripts/mod/modpost.c    | 15 +++++++++++++++
> >  scripts/mod/modpost.h    |  2 ++
> >  6 files changed, 41 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index e31ee29fac6b7..e135cc79aceea 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -256,14 +256,10 @@ struct module_kobject *lookup_or_create_module_kobject(const char *name);
> >  	__PASTE(type,			\
> >  	__PASTE(__, name)))))
> >  
> > -#ifdef MODULE
> >  /* Creates an alias so file2alias.c can find device table. */
> >  #define MODULE_DEVICE_TABLE(type, name)					\
> >  static typeof(name) __mod_device_table(type, name)			\
> >    __attribute__ ((used, alias(__stringify(name))))
> > -#else  /* !MODULE */
> > -#define MODULE_DEVICE_TABLE(type, name)
> > -#endif
> >  
> >  /* Version of form [<epoch>:]<version>[-<extra-version>].
> >   * Or for CVS/RCS ID version, everything but the number is stripped.
> > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> > index ce79461714979..1e5e37aadcd05 100644
> > --- a/scripts/Makefile.vmlinux
> > +++ b/scripts/Makefile.vmlinux
> > @@ -89,11 +89,13 @@ endif
> >  remove-section-y                                   := .modinfo
> >  remove-section-$(CONFIG_ARCH_VMLINUX_NEEDS_RELOCS) += '.rel*'
> >  
> > +remove-symbols := -w --strip-symbol='__mod_device_table__*'
> > +
> >  # To avoid warnings: "empty loadable segment detected at ..." from GNU objcopy,
> >  # it is necessary to remove the PT_LOAD flag from the segment.
> >  quiet_cmd_strip_relocs = OBJCOPY $@
> >        cmd_strip_relocs = $(OBJCOPY) $(patsubst %,--set-section-flags %=noload,$(remove-section-y)) $< $@; \
> > -                         $(OBJCOPY) $(addprefix --remove-section=,$(remove-section-y)) $@
> > +                         $(OBJCOPY) $(addprefix --remove-section=,$(remove-section-y)) $(remove-symbols) $@
> >  
> >  targets += vmlinux
> >  vmlinux: vmlinux.unstripped FORCE
> > diff --git a/scripts/mksysmap b/scripts/mksysmap
> > index a607a0059d119..c4531eacde202 100755
> > --- a/scripts/mksysmap
> > +++ b/scripts/mksysmap
> > @@ -59,6 +59,9 @@
> >  # EXPORT_SYMBOL (namespace)
> >  / __kstrtabns_/d
> >  
> > +# MODULE_DEVICE_TABLE (symbol name)
> > +/ __mod_device_table__/d
> > +
> >  # ---------------------------------------------------------------------------
> >  # Ignored suffixes
> >  #  (do not forget '$' after each pattern)
> > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> > index 1260bc2287fba..7da9735e7ab3e 100644
> > --- a/scripts/mod/file2alias.c
> > +++ b/scripts/mod/file2alias.c
> > @@ -1477,7 +1477,7 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
> >  	void *symval;
> >  	char *zeros = NULL;
> >  	const char *type, *name, *modname;
> > -	size_t typelen;
> > +	size_t typelen, modnamelen;
> >  	static const char *prefix = "__mod_device_table__";
> >  
> >  	/* We're looking for a section relative symbol */
> > @@ -1500,6 +1500,7 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
> >  	type = strstr(modname, "__");
> >  	if (!type)
> >  		return;
> > +	modnamelen = type - modname;
> >  	type += strlen("__");
> >  
> >  	name = strstr(type, "__");
> > @@ -1526,5 +1527,21 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
> >  		}
> >  	}
> >  
> > +	if (mod->is_vmlinux) {
> > +		struct module_alias *alias;
> > +
> > +		/*
> > +		 * If this is vmlinux, record the name of the builtin module.
> > +		 * Traverse the linked list in the reverse order, and set the
> > +		 * builtin_modname unless it has already been set in the
> > +		 * previous call.
> > +		 */
> > +		list_for_each_entry_reverse(alias, &mod->aliases, node) {
> > +			if (alias->builtin_modname)
> > +				break;
> > +			alias->builtin_modname = xstrndup(modname, modnamelen);
> > +		}
> > +	}
> > +
> >  	free(zeros);
> >  }
> > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > index 5ca7c268294eb..47c8aa2a69392 100644
> > --- a/scripts/mod/modpost.c
> > +++ b/scripts/mod/modpost.c
> > @@ -2067,11 +2067,26 @@ static void write_if_changed(struct buffer *b, const char *fname)
> >  static void write_vmlinux_export_c_file(struct module *mod)
> >  {
> >  	struct buffer buf = { };
> > +	struct module_alias *alias, *next;
> >  
> >  	buf_printf(&buf,
> >  		   "#include <linux/export-internal.h>\n");
> >  
> >  	add_exported_symbols(&buf, mod);
> > +
> > +	buf_printf(&buf,
> > +		   "#include <linux/module.h>\n"
> > +		   "#undef __MODULE_INFO_PREFIX\n"
> > +		   "#define __MODULE_INFO_PREFIX\n");
> > +
> > +	list_for_each_entry_safe(alias, next, &mod->aliases, node) {
> > +		buf_printf(&buf, "MODULE_INFO(%s.alias, \"%s\");\n",
> > +			   alias->builtin_modname, alias->str);
> > +		list_del(&alias->node);
> > +		free(alias->builtin_modname);
> > +		free(alias);
> > +	}
> > +
> >  	write_if_changed(&buf, ".vmlinux.export.c");
> >  	free(buf.p);
> >  }
> > diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
> > index 9133e4c3803f0..2aecb8f25c87e 100644
> > --- a/scripts/mod/modpost.h
> > +++ b/scripts/mod/modpost.h
> > @@ -99,10 +99,12 @@ buf_write(struct buffer *buf, const char *s, int len);
> >   * struct module_alias - auto-generated MODULE_ALIAS()
> >   *
> >   * @node: linked to module::aliases
> > + * @modname: name of the builtin module (only for vmlinux)
> >   * @str: a string for MODULE_ALIAS()
> >   */
> >  struct module_alias {
> >  	struct list_head node;
> > +	char *builtin_modname;
> >  	char str[];
> >  };
> >  
> > -- 
> > 2.51.0
> > 
> 

-- 
Rgrds, legion


^ permalink raw reply

* Re: [PATCH v8 7/8] modpost: Create modalias for builtin modules
From: Charles Mirabile @ 2025-10-07  1:16 UTC (permalink / raw)
  To: legion
  Cc: da.gomez, linux-kbuild, linux-kernel, linux-modules, masahiroy,
	mcgrof, nathan, nicolas.schier, petr.pavlu, samitolvanen, sfr
In-Reply-To: <28d4da3b0e3fc8474142746bcf469e03752c3208.1758182101.git.legion@kernel.org>

On Thu, Sep 18, 2025 at 10:05:51AM +0200, Alexey Gladkov wrote:
> For some modules, modalias is generated using the modpost utility and
> the section is added to the module file.
> 
> When a module is added inside vmlinux, modpost does not generate
> modalias for such modules and the information is lost.
> 
> As a result kmod (which uses modules.builtin.modinfo in userspace)
> cannot determine that modalias is handled by a builtin kernel module.
> 
> $ cat /sys/devices/pci0000:00/0000:00:14.0/modalias
> pci:v00008086d0000A36Dsv00001043sd00008694bc0Csc03i30
> 
> $ modinfo xhci_pci
> name:           xhci_pci
> filename:       (builtin)
> license:        GPL
> file:           drivers/usb/host/xhci-pci
> description:    xHCI PCI Host Controller Driver
> 
> Missing modalias "pci:v*d*sv*sd*bc0Csc03i30*" which will be generated by
> modpost if the module is built separately.
> 
> To fix this it is necessary to generate the same modalias for vmlinux as
> for the individual modules. Fortunately '.vmlinux.export.o' is already
> generated from which '.modinfo' can be extracted in the same way as for
> vmlinux.o.

Hi -

This patch broke RISC-V builds for me. During the final objcopy where the new
symbols are supposed to be stripped, an error occurs producing lots of error
messages similar to this one:

riscv64-linux-gnu-objcopy: not stripping symbol `__mod_device_table__...'
because it is named in a relocation

It does not occur using defconfig, but I was able to bisect my way to this
commit and then reduce my config delta w.r.t defconfig until I landed on:

cat > .config <<'EOF'
CONFIG_RELOCATABLE=y
CONFIG_KASAN=y
EOF
ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- make olddefconfig
ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- make -j $(nproc)
...
  LD      vmlinux.unstripped
  NM      System.map
  SORTTAB vmlinux.unstripped
  CHKREL  vmlinux.unstripped
  OBJCOPY vmlinux
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
riscv64-linux-gnu-objcopy: not stripping symbol `<long symbol name>'
because it is named in a relocation
<repeats with different symbol names about a dozen times>
make[3]: *** [scripts/Makefile.vmlinux:97: vmlinux] Error 1
make[3]: *** Deleting file 'vmlinux'
make[2]: *** [Makefile:1242: vmlinux] Error 2
make[1]: *** [/tmp/linux/Makefile:369: __build_one_by_one] Error 2
make: *** [Makefile:248: __sub-make] Error 2

I confirmed that reverting this commit fixes the issue.

> 
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> Signed-off-by: Alexey Gladkov <legion@kernel.org>
> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
> ---
>  include/linux/module.h   |  4 ----
>  scripts/Makefile.vmlinux |  4 +++-
>  scripts/mksysmap         |  3 +++
>  scripts/mod/file2alias.c | 19 ++++++++++++++++++-
>  scripts/mod/modpost.c    | 15 +++++++++++++++
>  scripts/mod/modpost.h    |  2 ++
>  6 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index e31ee29fac6b7..e135cc79aceea 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -256,14 +256,10 @@ struct module_kobject *lookup_or_create_module_kobject(const char *name);
>  	__PASTE(type,			\
>  	__PASTE(__, name)))))
>  
> -#ifdef MODULE
>  /* Creates an alias so file2alias.c can find device table. */
>  #define MODULE_DEVICE_TABLE(type, name)					\
>  static typeof(name) __mod_device_table(type, name)			\
>    __attribute__ ((used, alias(__stringify(name))))
> -#else  /* !MODULE */
> -#define MODULE_DEVICE_TABLE(type, name)
> -#endif
>  
>  /* Version of form [<epoch>:]<version>[-<extra-version>].
>   * Or for CVS/RCS ID version, everything but the number is stripped.
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index ce79461714979..1e5e37aadcd05 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -89,11 +89,13 @@ endif
>  remove-section-y                                   := .modinfo
>  remove-section-$(CONFIG_ARCH_VMLINUX_NEEDS_RELOCS) += '.rel*'
>  
> +remove-symbols := -w --strip-symbol='__mod_device_table__*'
> +
>  # To avoid warnings: "empty loadable segment detected at ..." from GNU objcopy,
>  # it is necessary to remove the PT_LOAD flag from the segment.
>  quiet_cmd_strip_relocs = OBJCOPY $@
>        cmd_strip_relocs = $(OBJCOPY) $(patsubst %,--set-section-flags %=noload,$(remove-section-y)) $< $@; \
> -                         $(OBJCOPY) $(addprefix --remove-section=,$(remove-section-y)) $@
> +                         $(OBJCOPY) $(addprefix --remove-section=,$(remove-section-y)) $(remove-symbols) $@
>  
>  targets += vmlinux
>  vmlinux: vmlinux.unstripped FORCE
> diff --git a/scripts/mksysmap b/scripts/mksysmap
> index a607a0059d119..c4531eacde202 100755
> --- a/scripts/mksysmap
> +++ b/scripts/mksysmap
> @@ -59,6 +59,9 @@
>  # EXPORT_SYMBOL (namespace)
>  / __kstrtabns_/d
>  
> +# MODULE_DEVICE_TABLE (symbol name)
> +/ __mod_device_table__/d
> +
>  # ---------------------------------------------------------------------------
>  # Ignored suffixes
>  #  (do not forget '$' after each pattern)
> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> index 1260bc2287fba..7da9735e7ab3e 100644
> --- a/scripts/mod/file2alias.c
> +++ b/scripts/mod/file2alias.c
> @@ -1477,7 +1477,7 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
>  	void *symval;
>  	char *zeros = NULL;
>  	const char *type, *name, *modname;
> -	size_t typelen;
> +	size_t typelen, modnamelen;
>  	static const char *prefix = "__mod_device_table__";
>  
>  	/* We're looking for a section relative symbol */
> @@ -1500,6 +1500,7 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
>  	type = strstr(modname, "__");
>  	if (!type)
>  		return;
> +	modnamelen = type - modname;
>  	type += strlen("__");
>  
>  	name = strstr(type, "__");
> @@ -1526,5 +1527,21 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
>  		}
>  	}
>  
> +	if (mod->is_vmlinux) {
> +		struct module_alias *alias;
> +
> +		/*
> +		 * If this is vmlinux, record the name of the builtin module.
> +		 * Traverse the linked list in the reverse order, and set the
> +		 * builtin_modname unless it has already been set in the
> +		 * previous call.
> +		 */
> +		list_for_each_entry_reverse(alias, &mod->aliases, node) {
> +			if (alias->builtin_modname)
> +				break;
> +			alias->builtin_modname = xstrndup(modname, modnamelen);
> +		}
> +	}
> +
>  	free(zeros);
>  }
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 5ca7c268294eb..47c8aa2a69392 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -2067,11 +2067,26 @@ static void write_if_changed(struct buffer *b, const char *fname)
>  static void write_vmlinux_export_c_file(struct module *mod)
>  {
>  	struct buffer buf = { };
> +	struct module_alias *alias, *next;
>  
>  	buf_printf(&buf,
>  		   "#include <linux/export-internal.h>\n");
>  
>  	add_exported_symbols(&buf, mod);
> +
> +	buf_printf(&buf,
> +		   "#include <linux/module.h>\n"
> +		   "#undef __MODULE_INFO_PREFIX\n"
> +		   "#define __MODULE_INFO_PREFIX\n");
> +
> +	list_for_each_entry_safe(alias, next, &mod->aliases, node) {
> +		buf_printf(&buf, "MODULE_INFO(%s.alias, \"%s\");\n",
> +			   alias->builtin_modname, alias->str);
> +		list_del(&alias->node);
> +		free(alias->builtin_modname);
> +		free(alias);
> +	}
> +
>  	write_if_changed(&buf, ".vmlinux.export.c");
>  	free(buf.p);
>  }
> diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
> index 9133e4c3803f0..2aecb8f25c87e 100644
> --- a/scripts/mod/modpost.h
> +++ b/scripts/mod/modpost.h
> @@ -99,10 +99,12 @@ buf_write(struct buffer *buf, const char *s, int len);
>   * struct module_alias - auto-generated MODULE_ALIAS()
>   *
>   * @node: linked to module::aliases
> + * @modname: name of the builtin module (only for vmlinux)
>   * @str: a string for MODULE_ALIAS()
>   */
>  struct module_alias {
>  	struct list_head node;
> +	char *builtin_modname;
>  	char str[];
>  };
>  
> -- 
> 2.51.0
> 


^ permalink raw reply

* Re: [PATCH 1/4] PCI: Support FIXUP quirks in modules
From: Brian Norris @ 2025-10-06 22:58 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: Bjorn Helgaas, Luis Chamberlain, Daniel Gomez, linux-pci,
	David Gow, Rae Moar, linux-kselftest, linux-kernel, linux-modules,
	Johannes Berg, Sami Tolvanen, Richard Weinberger, Wei Liu,
	Brendan Higgins, kunit-dev, Anton Ivanov, linux-um
In-Reply-To: <2071b071-874c-4f85-8500-033c73dfaaab@suse.com>

Hi Petr,

On Wed, Sep 24, 2025 at 09:48:47AM +0200, Petr Pavlu wrote:
> On 9/23/25 7:42 PM, Brian Norris wrote:
> > Hi Petr,
> > 
> > On Tue, Sep 23, 2025 at 02:55:34PM +0200, Petr Pavlu wrote:
> >> On 9/13/25 12:59 AM, Brian Norris wrote:
> >>> @@ -259,6 +315,12 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev)
> >>>  		return;
> >>>  	}
> >>>  	pci_do_fixups(dev, start, end);
> >>> +
> >>> +	struct pci_fixup_arg arg = {
> >>> +		.dev = dev,
> >>> +		.pass = pass,
> >>> +	};
> >>> +	module_for_each_mod(pci_module_fixup, &arg);
> >>
> >> The function module_for_each_mod() walks not only modules that are LIVE,
> >> but also those in the COMING and GOING states. This means that this code
> >> can potentially execute a PCI fixup from a module before its init
> >> function is invoked, and similarly, a fixup can be executed after the
> >> exit function has already run. Is this intentional?
> > 
> > Thanks for the callout. I didn't really give this part much thought
> > previously.
> > 
> > Per the comments, COMING means "Full formed, running module_init". I
> > believe that is a good thing, actually; specifically for controller
> > drivers, module_init() might be probing the controller and enumerating
> > child PCI devices to which we should apply these FIXUPs. That is a key
> > case to support.
> > 
> > GOING is not clearly defined in the header comments, but it seems like
> > it's a relatively narrow window between determining there are no module
> > refcounts (and transition to GOING) and starting to really tear it down
> > (transitioning to UNFORMED before any significant teardown).
> > module_exit() runs in the GOING phase.
> > 
> > I think it does not make sense to execute FIXUPs on a GOING module; I'll
> > make that change.
> 
> Note that when walking the modules list using module_for_each_mod(),
> the delete_module() operation can concurrently transition a module to
> MODULE_STATE_GOING. If you are thinking about simply having
> pci_module_fixup() check that mod->state isn't MODULE_STATE_GOING,
> I believe this won't quite work.

Good point. I think this at least suggests that this should hook into
some blocking point in the module-load sequence, such as the notifiers
or even module_init() as you suggest below.

> > Re-quoting one piece:
> >> This means that this code
> >> can potentially execute a PCI fixup from a module before its init
> >> function is invoked,
> > 
> > IIUC, this part is not true? A module is put into COMING state before
> > its init function is invoked.
> 
> When loading a module, the load_module() function calls
> complete_formation(), which puts the module into the COMING state. At
> this point, the new code in pci_fixup_device() can see the new module
> and potentially attempt to invoke its PCI fixups. However, such a module
> has still a bit of way to go before its init function is called from
> do_init_module(). The module hasn't yet had its arguments parsed, is not
> linked in sysfs, isn't fully registered with codetag support, and hasn't
> invoked its constructors (needed for gcov/kasan support).

It seems unlikely that sysfs, codetag, or arguments should matter much.
gcov and kasan might be nice to have though.

> I don't know enough about PCI fixups and what is allowable in them, but
> I suspect it would be better to ensure that no fixup can be invoked from
> the module during this period.

I don't know of general rules, but they generally do pretty minimal work
to adjust various fields in and around 'struct pci_dev', to account for
broken IDs. Sometimes they need to read a few PCI registers. They may
even tweak PM-related features. It varies based
on what kind of "quriky" devices need to be handled, but it's usually
pretty straightforward and well-contained -- not relying on any kind of
global state, or even all that much specific to the module in question
besides constant IDs.

(You can peruse drivers/pci/quirks.c or the various other files that use
DECLARE_PCI_FIXUP_*() macros, if you're curious.)

> If the above makes sense, I think using module_for_each_mod() might not
> be the right approach. Alternative options include registering a module
> notifier or having modules explicitly register their PCI fixups in their
> init function.

I agree module_for_each_mod() is probably not the right choice, but I'm
not sure what the right choice is.

register_module_notifier() + keying off MODULE_STATE_COMING before
pulling in the '.pci_fixup*' list seems attractive, but it still comes
before gcov/kasan.

It seems like "first thing in module_init()" would be the right choice,
but I don't know of a great way to do that. I could insert PCI-related
calls directly into do_init_module() / delete_module(), but that doesn't
seem very elegant. I could also mess with the module_{init,exit}()
macros, but that seems a bit strange too.

I'm open to suggestions. Or else maybe I'll just go with
register_module_notifier(), and accept that there may some small
downsides still.

Thanks,
Brian

^ permalink raw reply

* Re: [PATCH v17 35/47] i2c: rename wait_for_completion callback to wait_for_completion_cb
From: Wolfram Sang @ 2025-10-04 16:39 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-36-byungchul@sk.com>

On Thu, Oct 02, 2025 at 05:12:35PM +0900, Byungchul Park wrote:
> Functionally no change.  This patch is a preparation for DEPT(DEPendency
> Tracker) to track dependencies related to a scheduler API,
> wait_for_completion().
> 
> Unfortunately, struct i2c_algo_pca_data has a callback member named
> wait_for_completion, that is the same as the scheduler API, which makes
> it hard to change the scheduler API to a macro form because of the
> ambiguity.
> 
> Add a postfix _cb to the callback member to remove the ambiguity.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>

This patch seems reasonable in any case. I'll pick it, so you have one
dependency less. Good luck with the series!

Applied to for-next, thanks!


^ permalink raw reply

* Re: [PATCH v17 09/47] arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
From: Mark Rutland @ 2025-10-03 14:36 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, ada.coupriediaz, kristina.martsenko,
	wangkefeng.wang, broonie, kevin.brodsky, dwmw, shakeel.butt, ast,
	ziy, yuzhao, baolin.wang, usamaarif642, joel.granados,
	richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-10-byungchul@sk.com>

On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote:
> dept needs to notice every entrance from user to kernel mode to treat
> every kernel context independently when tracking wait-event dependencies.
> Roughly, system call and user oriented fault are the cases.
> 
> Make dept aware of the entrances of arm64 and add support
> CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
>  arch/arm64/Kconfig          | 1 +
>  arch/arm64/kernel/syscall.c | 7 +++++++
>  arch/arm64/mm/fault.c       | 7 +++++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a6..a8fab2c052dc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -281,6 +281,7 @@ config ARM64
>  	select USER_STACKTRACE_SUPPORT
>  	select VDSO_GETRANDOM
>  	select VMAP_STACK
> +	select ARCH_HAS_DEPT_SUPPORT
>  	help
>  	  ARM 64-bit (AArch64) Linux support.
>  
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index c442fcec6b9e..bbd306335179 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -7,6 +7,7 @@
>  #include <linux/ptrace.h>
>  #include <linux/randomize_kstack.h>
>  #include <linux/syscalls.h>
> +#include <linux/dept.h>
>  
>  #include <asm/debug-monitors.h>
>  #include <asm/exception.h>
> @@ -96,6 +97,12 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
>  	 * (Similarly for HVC and SMC elsewhere.)
>  	 */
>  
> +	/*
> +	 * This is a system call from user mode.  Make dept work with a
> +	 * new kernel mode context.
> +	 */
> +	dept_update_cxt();

As Mark Brown pointed out in his replies, this patch is missing a whole
bunch of cases and does not work correctly as-is.

As Dave Hansen pointed out on the x86 patch, you shouldn't do this
piecemeal in architecture code, and should instead work with the
existing context tracking, e.g. by adding logic to
enter_from_user_mode() and exit_to_user_mode(), or by reusing some
existing context tracking logic that's called there.

Mark.

^ permalink raw reply

* Re: [PATCH v17 09/47] arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
From: Mark Brown @ 2025-10-03 11:33 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251003014641.GF75385@system.software.com>

[-- Attachment #1: Type: text/plain, Size: 1888 bytes --]

On Fri, Oct 03, 2025 at 10:46:41AM +0900, Byungchul Park wrote:
> On Thu, Oct 02, 2025 at 12:39:31PM +0100, Mark Brown wrote:
> > On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote:
> > > dept needs to notice every entrance from user to kernel mode to treat
> > > every kernel context independently when tracking wait-event dependencies.
> > > Roughly, system call and user oriented fault are the cases.

> > > Make dept aware of the entrances of arm64 and add support
> > > CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64.

> > The description of what needs to be tracked probably needs some
> > tightening up here, it's not clear to me for example why exceptions for
> > mops or the vector extensions aren't included here, or what the
> > distinction is with error faults like BTI or GCS not being tracked?

> Thanks for the feedback but I'm afraid I don't get you.  Can you explain
> in more detail with example?

Your commit log says we need to track every entrance from user mode to
kernel mode but the code only adds tracking to syscalls and some memory
faults.  The exception types listed above (and some others) also result
in entries to the kernel from userspace.

> JFYI, pairs of wait and its event need to be tracked to see if each
> event can be prevented from being reachable by other waits like:

>    context X				context Y
> 
>    lock L
>    ...
>    initiate event A context		start toward event A
>    ...					...
>    wait A // wait for event A and	lock L // wait for unlock L and
>           // prevent unlock L		       // prevent event A
>    ...					...
>    unlock L				unlock L
> 					...
> 					event A

> I meant things like this need to be tracked.

I don't think that's at all clear from the above context, and the
handling for some of the above exception types (eg, the vector
extensions) includes taking locks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v17 28/47] dept: add documentation for dept
From: NeilBrown @ 2025-10-03  6:55 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, okorniev, Dai.Ngo, tom, trondmy, anna, kees, bigeasy,
	clrkwllms, mark.rutland, ada.coupriediaz, kristina.martsenko,
	wangkefeng.wang, broonie, kevin.brodsky, dwmw, shakeel.butt, ast,
	ziy, yuzhao, baolin.wang, usamaarif642, joel.granados,
	richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-29-byungchul@sk.com>

On Thu, 02 Oct 2025, Byungchul Park wrote:
> This document describes the concept and APIs of dept.
> 

Thanks for the documentation.  I've been trying to understand it.


> +How DEPT works
> +--------------
> +
> +Let's take a look how DEPT works with the 1st example in the section
> +'Limitation of lockdep'.
> +
> +   context X	   context Y	   context Z
> +
> +		   mutex_lock A
> +   folio_lock B
> +		   folio_lock B <- DEADLOCK
> +				   mutex_lock A <- DEADLOCK
> +				   folio_unlock B
> +		   folio_unlock B
> +		   mutex_unlock A
> +				   mutex_unlock A
> +
> +Adding comments to describe DEPT's view in terms of wait and event:
> +
> +   context X	   context Y	   context Z
> +
> +		   mutex_lock A
> +		   /* wait for A */
> +   folio_lock B
> +   /* wait for A */
> +   /* start event A context */
> +
> +		   folio_lock B
> +		   /* wait for B */ <- DEADLOCK
> +		   /* start event B context */
> +
> +				   mutex_lock A
> +				   /* wait for A */ <- DEADLOCK
> +				   /* start event A context */
> +
> +				   folio_unlock B
> +				   /* event B */
> +		   folio_unlock B
> +		   /* event B */
> +
> +		   mutex_unlock A
> +		   /* event A */
> +				   mutex_unlock A
> +				   /* event A */
> +

I can't see the value of the above section.
The first section with no comments is useful as it is easy to see the
deadlock being investigate.  The section below is useful as it add
comments to explain how DEPT sees the situation.  But the above section,
with some but not all of the comments, does seem (to me) to add anything
useful.

> +Adding more supplementary comments to describe DEPT's view in detail:
> +
> +   context X	   context Y	   context Z
> +
> +		   mutex_lock A
> +		   /* might wait for A */
> +		   /* start to take into account event A's context */

What do you mean precisely by "context".
You use the word in the heading "context X  context Y  context Z"
so it seems like "context" means "task" or "process".  But then as I
read on, I think - maybe it means something else.  If it does, then you
should use different words.  Maybe "task X ..." in the heading.

If the examples that follow It seems that the "context" for event A
starts at "mutex lock A" when it (possibly) waits for a mutex and ends
at "mutex unlock A" - which are both in the same process.  Clearly
various other events that happen between these two points in the same
process could be seen as the "context" for event A.

However event B starts in "context X" with "folio_lock B" and ends in
"context Z" or "context Y" with "folio_unlock B".  Is that right?

My question then is: how do you decide which, of all the event in all
the processes in all the system, between the start[S] and the end[E] are
considered to be part of the "context" of event A.

I think it would help me if you defined what a "context" is earlier.
What sorts of things appear in a context?

Thanks,
NeilBrown


> +		   /* 1 */
> +   folio_lock B
> +   /* might wait for B */
> +   /* start to take into account event B's context */
> +   /* 2 */
> +
> +		   folio_lock B
> +		   /* might wait for B */ <- DEADLOCK
> +		   /* start to take into account event B's context */
> +		   /* 3 */
> +
> +				   mutex_lock A
> +				   /* might wait for A */ <- DEADLOCK
> +				   /* start to take into account
> +				      event A's context */
> +				   /* 4 */
> +
> +				   folio_unlock B
> +				   /* event B that's been valid since 2 */
> +		   folio_unlock B
> +		   /* event B that's been valid since 3 */
> +
> +		   mutex_unlock A
> +		   /* event A that's been valid since 1 */
> +
> +				   mutex_unlock A
> +				   /* event A that's been valid since 4 */
> +
> +Let's build up dependency graph with this example. Firstly, context X:
> +
> +   context X
> +
> +   folio_lock B
> +   /* might wait for B */
> +   /* start to take into account event B's context */
> +   /* 2 */
> +
> +There are no events to create dependency. Next, context Y:
> +
> +   context Y
> +
> +   mutex_lock A
> +   /* might wait for A */
> +   /* start to take into account event A's context */
> +   /* 1 */
> +
> +   folio_lock B
> +   /* might wait for B */
> +   /* start to take into account event B's context */
> +   /* 3 */
> +
> +   folio_unlock B
> +   /* event B that's been valid since 3 */
> +
> +   mutex_unlock A
> +   /* event A that's been valid since 1 */
> +
> +There are two events. For event B, folio_unlock B, since there are no
> +waits between 3 and the event, event B does not create dependency. For
> +event A, there is a wait, folio_lock B, between 1 and the event. Which
> +means event A cannot be triggered if event B does not wake up the wait.
> +Therefore, we can say event A depends on event B, say, 'A -> B'. The
> +graph will look like after adding the dependency:
> +
> +   A -> B
> +
> +   where 'A -> B' means that event A depends on event B.
> +
> +Lastly, context Z:
> +
> +   context Z
> +
> +   mutex_lock A
> +   /* might wait for A */
> +   /* start to take into account event A's context */
> +   /* 4 */
> +
> +   folio_unlock B
> +   /* event B that's been valid since 2 */
> +
> +   mutex_unlock A
> +   /* event A that's been valid since 4 */
> +
> +There are also two events. For event B, folio_unlock B, there is a
> +wait, mutex_lock A, between 2 and the event - remind 2 is at a very
> +start and before the wait in timeline. Which means event B cannot be
> +triggered if event A does not wake up the wait. Therefore, we can say
> +event B depends on event A, say, 'B -> A'. The graph will look like
> +after adding the dependency:
> +
> +    -> A -> B -
> +   /           \
> +   \           /
> +    -----------
> +
> +   where 'A -> B' means that event A depends on event B.
> +
> +A new loop has been created. So DEPT can report it as a deadlock. For
> +event A, mutex_unlock A, since there are no waits between 4 and the
> +event, event A does not create dependency. That's it.
> +
> +CONCLUSION
> +
> +DEPT works well with any general synchronization mechanisms by focusing
> +on wait, event and its context.
> +

^ permalink raw reply

* Re: [PATCH v17 28/47] dept: add documentation for dept
From: Jonathan Corbet @ 2025-10-03  5:36 UTC (permalink / raw)
  To: Byungchul Park, linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	catalin.marinas, bp, dave.hansen, x86, hpa, luto, sumit.semwal,
	gustavo, christian.koenig, andi.shyti, arnd, lorenzo.stoakes,
	Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu, da.gomez,
	samitolvanen, paulmck, frederic, neeraj.upadhyay, joelagnelf,
	josh, urezki, mathieu.desnoyers, jiangshanlai, qiang.zhang,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	vschneid, chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy,
	anna, kees, bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-29-byungchul@sk.com>

Byungchul Park <byungchul@sk.com> writes:

> This document describes the concept and APIs of dept.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
>  Documentation/dependency/dept.txt     | 735 ++++++++++++++++++++++++++
>  Documentation/dependency/dept_api.txt | 117 ++++
>  2 files changed, 852 insertions(+)
>  create mode 100644 Documentation/dependency/dept.txt
>  create mode 100644 Documentation/dependency/dept_api.txt

As already suggested, this should be in RST; you're already 95% of the
way there.  Also, please put it under Documentation/dev-tools; we don't
need another top-level directory for this.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH v17 28/47] dept: add documentation for dept
From: Bagas Sanjaya @ 2025-10-03  2:44 UTC (permalink / raw)
  To: Byungchul Park, linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-29-byungchul@sk.com>

On Thu, Oct 02, 2025 at 05:12:28PM +0900, Byungchul Park wrote:
> This document describes the concept and APIs of dept.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
>  Documentation/dependency/dept.txt     | 735 ++++++++++++++++++++++++++
>  Documentation/dependency/dept_api.txt | 117 ++++
>  2 files changed, 852 insertions(+)
>  create mode 100644 Documentation/dependency/dept.txt
>  create mode 100644 Documentation/dependency/dept_api.txt

What about writing dept docs in reST (like the rest of kernel documentation)?

---- >8 ----
diff --git a/Documentation/dependency/dept.txt b/Documentation/locking/dept.rst
similarity index 92%
rename from Documentation/dependency/dept.txt
rename to Documentation/locking/dept.rst
index 5dd358b96734e6..7b90a0d95f0876 100644
--- a/Documentation/dependency/dept.txt
+++ b/Documentation/locking/dept.rst
@@ -8,7 +8,7 @@ How lockdep works
 
 Lockdep detects a deadlock by checking lock acquisition order. For
 example, a graph to track acquisition order built by lockdep might look
-like:
+like::
 
    A -> B -
            \
@@ -16,12 +16,12 @@ like:
            /
    C -> D -
 
-   where 'A -> B' means that acquisition A is prior to acquisition B
-   with A still held.
+where 'A -> B' means that acquisition A is prior to acquisition B
+with A still held.
 
 Lockdep keeps adding each new acquisition order into the graph in
 runtime. For example, 'E -> C' will be added when the two locks have
-been acquired in the order, E and then C. The graph will look like:
+been acquired in the order, E and then C. The graph will look like::
 
        A -> B -
                \
@@ -32,10 +32,10 @@ been acquired in the order, E and then C. The graph will look like:
    \                  /
     ------------------
 
-   where 'A -> B' means that acquisition A is prior to acquisition B
-   with A still held.
+where 'A -> B' means that acquisition A is prior to acquisition B
+with A still held.
 
-This graph contains a subgraph that demonstrates a loop like:
+This graph contains a subgraph that demonstrates a loop like::
 
                 -> E -
                /      \
@@ -67,6 +67,8 @@ mechanisms, lockdep doesn't work.
 
 Can lockdep detect the following deadlock?
 
+::
+
    context X	   context Y	   context Z
 
 		   mutex_lock A
@@ -80,6 +82,8 @@ Can lockdep detect the following deadlock?
 
 No. What about the following?
 
+::
+
    context X		   context Y
 
 			   mutex_lock A
@@ -101,7 +105,7 @@ What leads a deadlock
 ---------------------
 
 A deadlock occurs when one or multi contexts are waiting for events that
-will never happen. For example:
+will never happen. For example::
 
    context X	   context Y	   context Z
 
@@ -121,24 +125,24 @@ We call this *deadlock*.
 If an event occurrence is a prerequisite to reaching another event, we
 call it *dependency*. In this example:
 
-   Event A occurrence is a prerequisite to reaching event C.
-   Event C occurrence is a prerequisite to reaching event B.
-   Event B occurrence is a prerequisite to reaching event A.
+   * Event A occurrence is a prerequisite to reaching event C.
+   * Event C occurrence is a prerequisite to reaching event B.
+   * Event B occurrence is a prerequisite to reaching event A.
 
 In terms of dependency:
 
-   Event C depends on event A.
-   Event B depends on event C.
-   Event A depends on event B.
+   * Event C depends on event A.
+   * Event B depends on event C.
+   * Event A depends on event B.
 
-Dependency graph reflecting this example will look like:
+Dependency graph reflecting this example will look like::
 
     -> C -> A -> B -
    /                \
    \                /
     ----------------
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
 A circular dependency exists. Such a circular dependency leads a
 deadlock since no waiters can have desired events triggered.
@@ -152,7 +156,7 @@ Introduce DEPT
 --------------
 
 DEPT(DEPendency Tracker) tracks wait and event instead of lock
-acquisition order so as to recognize the following situation:
+acquisition order so as to recognize the following situation::
 
    context X	   context Y	   context Z
 
@@ -165,18 +169,18 @@ acquisition order so as to recognize the following situation:
 				   event A
 
 and builds up a dependency graph in runtime that is similar to lockdep.
-The graph might look like:
+The graph might look like::
 
     -> C -> A -> B -
    /                \
    \                /
     ----------------
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
 DEPT keeps adding each new dependency into the graph in runtime. For
 example, 'B -> D' will be added when event D occurrence is a
-prerequisite to reaching event B like:
+prerequisite to reaching event B like::
 
    |
    v
@@ -184,7 +188,7 @@ prerequisite to reaching event B like:
    .
    event B
 
-After the addition, the graph will look like:
+After the addition, the graph will look like::
 
                      -> D
                     /
@@ -209,6 +213,8 @@ How DEPT works
 Let's take a look how DEPT works with the 1st example in the section
 'Limitation of lockdep'.
 
+::
+
    context X	   context Y	   context Z
 
 		   mutex_lock A
@@ -220,7 +226,7 @@ Let's take a look how DEPT works with the 1st example in the section
 		   mutex_unlock A
 				   mutex_unlock A
 
-Adding comments to describe DEPT's view in terms of wait and event:
+Adding comments to describe DEPT's view in terms of wait and event::
 
    context X	   context Y	   context Z
 
@@ -248,7 +254,7 @@ Adding comments to describe DEPT's view in terms of wait and event:
 				   mutex_unlock A
 				   /* event A */
 
-Adding more supplementary comments to describe DEPT's view in detail:
+Adding more supplementary comments to describe DEPT's view in detail::
 
    context X	   context Y	   context Z
 
@@ -283,7 +289,7 @@ Adding more supplementary comments to describe DEPT's view in detail:
 				   mutex_unlock A
 				   /* event A that's been valid since 4 */
 
-Let's build up dependency graph with this example. Firstly, context X:
+Let's build up dependency graph with this example. Firstly, context X::
 
    context X
 
@@ -292,7 +298,7 @@ Let's build up dependency graph with this example. Firstly, context X:
    /* start to take into account event B's context */
    /* 2 */
 
-There are no events to create dependency. Next, context Y:
+There are no events to create dependency. Next, context Y::
 
    context Y
 
@@ -317,13 +323,13 @@ waits between 3 and the event, event B does not create dependency. For
 event A, there is a wait, folio_lock B, between 1 and the event. Which
 means event A cannot be triggered if event B does not wake up the wait.
 Therefore, we can say event A depends on event B, say, 'A -> B'. The
-graph will look like after adding the dependency:
+graph will look like after adding the dependency::
 
    A -> B
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
-Lastly, context Z:
+Lastly, context Z::
 
    context Z
 
@@ -343,7 +349,7 @@ wait, mutex_lock A, between 2 and the event - remind 2 is at a very
 start and before the wait in timeline. Which means event B cannot be
 triggered if event A does not wake up the wait. Therefore, we can say
 event B depends on event A, say, 'B -> A'. The graph will look like
-after adding the dependency:
+after adding the dependency::
 
     -> A -> B -
    /           \
@@ -367,6 +373,8 @@ Interpret DEPT report
 
 The following is the example in the section 'How DEPT works'.
 
+::
+
    context X	   context Y	   context Z
 
 		   mutex_lock A
@@ -402,7 +410,7 @@ The following is the example in the section 'How DEPT works'.
 
 We can Simplify this by replacing each waiting point with [W], each
 point where its event's context starts with [S] and each event with [E].
-This example will look like after the replacement:
+This example will look like after the replacement::
 
    context X	   context Y	   context Z
 
@@ -419,6 +427,8 @@ This example will look like after the replacement:
 DEPT uses the symbols [W], [S] and [E] in its report as described above.
 The following is an example reported by DEPT for a real problem.
 
+::
+
    Link: https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
    Link: https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
 
@@ -620,6 +630,8 @@ The following is an example reported by DEPT for a real problem.
 
 Let's take a look at the summary that is the most important part.
 
+::
+
    ---------------------------------------------------
    summary
    ---------------------------------------------------
@@ -639,7 +651,7 @@ Let's take a look at the summary that is the most important part.
    [W]: the wait blocked
    [E]: the event not reachable
 
-The summary shows the following scenario:
+The summary shows the following scenario::
 
    context A	   context B	   context ?(unknown)
 
@@ -652,7 +664,7 @@ The summary shows the following scenario:
 
    [E] unlock(&ni->ni_lock:0)
 
-Adding supplementary comments to describe DEPT's view in detail:
+Adding supplementary comments to describe DEPT's view in detail::
 
    context A	   context B	   context ?(unknown)
 
@@ -677,7 +689,7 @@ Adding supplementary comments to describe DEPT's view in detail:
    [E] unlock(&ni->ni_lock:0)
    /* event that's been valid since 2 */
 
-Let's build up dependency graph with this report. Firstly, context A:
+Let's build up dependency graph with this report. Firstly, context A::
 
    context A
 
@@ -697,13 +709,13 @@ wait, folio_lock(&f1), between 2 and the event. Which means
 unlock(&ni->ni_lock:0) is not reachable if folio_unlock(&f1) does not
 wake up the wait. Therefore, we can say unlock(&ni->ni_lock:0) depends
 on folio_unlock(&f1), say, 'unlock(&ni->ni_lock:0) -> folio_unlock(&f1)'.
-The graph will look like after adding the dependency:
+The graph will look like after adding the dependency::
 
    unlock(&ni->ni_lock:0) -> folio_unlock(&f1)
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
-Secondly, context B:
+Secondly, context B::
 
    context B
 
@@ -719,14 +731,14 @@ very start and before the wait in timeline. Which means folio_unlock(&f1)
 is not reachable if unlock(&ni->ni_lock:0) does not wake up the wait.
 Therefore, we can say folio_unlock(&f1) depends on unlock(&ni->ni_lock:0),
 say, 'folio_unlock(&f1) -> unlock(&ni->ni_lock:0)'. The graph will look
-like after adding the dependency:
+like after adding the dependency::
 
     -> unlock(&ni->ni_lock:0) -> folio_unlock(&f1) -
    /                                                \
    \                                                /
     ------------------------------------------------
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
 A new loop has been created. So DEPT can report it as a deadlock! Cool!
 
diff --git a/Documentation/dependency/dept_api.txt b/Documentation/locking/dept_api.rst
similarity index 97%
rename from Documentation/dependency/dept_api.txt
rename to Documentation/locking/dept_api.rst
index 8e0d5a118a460e..96c4d65f4a9a2d 100644
--- a/Documentation/dependency/dept_api.txt
+++ b/Documentation/locking/dept_api.rst
@@ -10,6 +10,8 @@ already applied into the existing synchronization primitives e.g.
 waitqueue, swait, wait_for_completion(), dma fence and so on.  The basic
 APIs of SDT are:
 
+.. code-block:: c
+
    /*
     * After defining 'struct dept_map map', initialize the instance.
     */
@@ -27,6 +29,8 @@ APIs of SDT are:
 
 The advanced APIs of SDT are:
 
+.. code-block:: c
+
    /*
     * After defining 'struct dept_map map', initialize the instance
     * using an external key.
@@ -83,6 +87,8 @@ Do not use these APIs directly.  These are the wrappers for typical
 locks, that have been already applied into major locks internally e.g.
 spin lock, mutex, rwlock and so on.  The APIs of LDT are:
 
+.. code-block:: c
+   
    ldt_init(map, key, sub, name);
    ldt_lock(map, sub_local, try, nest, ip);
    ldt_rlock(map, sub_local, try, nest, ip, queued);
@@ -96,6 +102,8 @@ Raw APIs
 --------
 Do not use these APIs directly.  The raw APIs of dept are:
 
+.. code-block:: c
+
    dept_free_range(start, size);
    dept_map_init(map, key, sub, name);
    dept_map_reinit(map, key, sub, name);
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 6a9ea96c8bcb70..7ec3dce7fee425 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -24,6 +24,8 @@ Locking
     percpu-rw-semaphore
     robust-futexes
     robust-futex-ABI
+    dept
+    dept_api
 
 .. only::  subproject and html
 

> +Can lockdep detect the following deadlock?
> +
> +   context X	   context Y	   context Z
> +
> +		   mutex_lock A
> +   folio_lock B
> +		   folio_lock B <- DEADLOCK
> +				   mutex_lock A <- DEADLOCK
> +				   folio_unlock B
> +		   folio_unlock B
> +		   mutex_unlock A
> +				   mutex_unlock A
> +
> +No. What about the following?
> +
> +   context X		   context Y
> +
> +			   mutex_lock A
> +   mutex_lock A <- DEADLOCK
> +			   wait_for_complete B <- DEADLOCK
> +   complete B
> +			   mutex_unlock A
> +   mutex_unlock A

Can you explain how DEPT detects deadlock on the second example above (like
the first one being described in "How DEPT works" section)?

Confused...

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox