* Re: [PATCH 0/4] PCI: Add support and tests for FIXUP quirks in modules
From: Brian Norris @ 2025-09-15 18:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um
In-Reply-To: <aMgZJgU7p57KC0DL@infradead.org>
Hi Christoph,
On Mon, Sep 15, 2025 at 06:48:22AM -0700, Christoph Hellwig wrote:
> On Fri, Sep 12, 2025 at 03:59:31PM -0700, Brian Norris wrote:
> > This series primarily adds support for DECLARE_PCI_FIXUP_*() in modules.
> > There are a few drivers that already use this, and so they are
> > presumably broken when built as modules.
>
> That's a reall bad idea, because it allows random code to insert quirks
> not even bound to the hardware they support.
I see fixups in controller drivers here:
drivers/pci/controller/dwc/pci-imx6.c
drivers/pci/controller/dwc/pci-keystone.c
drivers/pci/controller/dwc/pcie-qcom.c
drivers/pci/controller/pci-loongson.c
drivers/pci/controller/pci-tegra.c
drivers/pci/controller/pcie-iproc-bcma.c
drivers/pci/controller/pcie-iproc.c
Are any of those somehow wrong?
And if they are not wrong, then is this a good reason to disallow making
these drivers modular? (Yes, few of them are currently modular; but I
don't see why that *must* be the case.)
I agree, as with many kernel features, there are plenty of ways to use
them incorrectly. But I'm just trying to patch over one rough edge about
how to use them incorrectly, and I don't really see why it's such a bad
idea.
> So no, modules should not allow quirks, but the kernel should probably
> be nice enough to fail compilation when someone is attemping that
> instead of silently ignoring the quirks.
Sure, if consensus says we should not support this, I'd definitely like
to make this failure mode more obvious -- likely a build error.
Thanks for your thoughts,
Brian
^ permalink raw reply
* Re: [PATCH 1/4] PCI: Support FIXUP quirks in modules
From: Brian Norris @ 2025-09-15 18:34 UTC (permalink / raw)
To: Johannes Berg
Cc: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Sami Tolvanen, Richard Weinberger, Wei Liu,
Brendan Higgins, kunit-dev, Anton Ivanov, linux-um,
Manivannan Sadhasivam
In-Reply-To: <8e75d6cc3847899ba8d6a0cbd0ef3ac57eabf009.camel@sipsolutions.net>
Hi Johannes,
On Mon, Sep 15, 2025 at 08:33:08AM +0200, Johannes Berg wrote:
> On Fri, 2025-09-12 at 15:59 -0700, Brian Norris wrote:
> > The PCI framework supports "quirks" for PCI devices via several
> > DECLARE_PCI_FIXUP_*() macros. These macros allow arch or driver code to
> > match device IDs to provide customizations or workarounds for broken
> > devices.
> >
> > This mechanism is generally used in code that can only be built into the
> > kernel, but there are a few occasions where this mechanism is used in
> > drivers that can be modules. For example, see commit 574f29036fce ("PCI:
> > iproc: Apply quirk_paxc_bridge() for module as well as built-in").
> >
> > The PCI fixup mechanism only works for built-in code, however, because
> > pci_fixup_device() only scans the ".pci_fixup_*" linker sections found
> > in the main kernel; it never touches modules.
> >
> > Extend the fixup approach to modules.
>
> This _feels_ a bit odd to me - what if you reload a module, should the
> fixup be done twice? Right now this was not possible in a module, which
> is a bit of a gotcha, but at least that's only one for developers, not
> for users (unless someone messes up and puts it into modular code, as in
> the example you gave.)
My assumption was that FIXUPs in modules are only legitimate if they
apply to a dependency chain that involves the module they are built
into. So for example, the fixup could apply to a bridge that is
supported only by the module (driver) in question; or it could apply
only to devices that sit under the controller in question [1].
Everything I see that could potentially be in a module works like this
AFAICT.
To answer your question: no, the fixup should not be done twice, unless
the device is removed and recreated. More below.
[1] The quirks in drivers/pci/controller/dwc/pci-keystone.c look like
this. (Side note: pci-keystone.c cannot be built as a module today.)
> Although, come to think of it, you don't even apply the fixup when the
> module is loaded, so what I just wrote isn't really true. That almost
> seems like an oversight though, now the module has to be loaded before
> the PCI device is enumerated, which is unlikely to happen in practice?
> But then we get the next gotcha - the device is already enumerated, so
> the fixups cannot be applied at the various enumeration stages, and
> you're back to having to load the module before PCI enumeration, which
> could be tricky, or somehow forcing re-enumeration of a given device
> from userspace, but then you're firmly in "gotcha for the user"
> territory again ...
With my assumption above, none of this would really be needed. The
relevant device(s) will only exist after the module is loaded, and they
will go away when the module is gone.
Or am I misreading your problem statements?
> I don't really have any skin in this game, but overall I'd probably
> argue it's better to occasionally have to fix things such as in the
> commit you point out but have a predictable system, than apply things
> from modules.
FWIW, I believe some folks are working on making *more* controller
drivers modular. So this problem will bite more people. (Specifically, I
believe Manivannan was working on
drivers/pci/controller/dwc/pcie-qcom.c, and it has plenty of FIXUPs.)
I also don't think it makes things much less predictable, as long as
developers abide by my above assumption. I think that's a perfectly
reasonable assumption (it's not so different than, say,
MODULE_DEVICE_TABLE), but I could perhaps be convinced otherwise.
> Perhaps it'd be better to extend the section checking infrastructure to
> catch and error out on these sections in modules instead, so we catch it
> at build time, rather than finding things missing at runtime?
Maybe I'm missing something here, but it seems like it'd be pretty easy
to do something like:
#ifdef MODULE
#define DECLARE_PCI_FIXUP_SECTION...) BUILD_BUG()
#else
... real definitions ...
#endif
I'd prefer not doing this though, if we can help it, since I believe
(a) FIXUPs are useful in perfectly reasonable ways for controller
drivers and
(b) controller drivers can potentially be modules (yes, there are some
pitfalls besides $subject).
> And yeah, now I've totally ignored the kunit angle, but ... not sure how
> to combine the two requirements if they are, as I think, conflicting.
Right, either we support FIXUPs in modules, or we should outlaw them.
For kunit: we could still add tests, but just force them to be built-in.
It wouldn't be the first kernel subsystem to need that.
Brian
^ permalink raw reply
* Re: [PATCH v7 3/8] kbuild: keep .modinfo section in vmlinux.unstripped
From: Nicolas Schier @ 2025-09-15 5:56 UTC (permalink / raw)
To: Alexey Gladkov
Cc: Nathan Chancellor, Petr Pavlu, Luis Chamberlain, Sami Tolvanen,
Daniel Gomez, linux-kernel, linux-modules, linux-kbuild,
Masahiro Yamada
In-Reply-To: <4d53c72293d88b663257a0d723ebf3473a08b374.1755535876.git.legion@kernel.org>
On Mon, Aug 18, 2025 at 06:54:57PM +0200, Alexey Gladkov wrote:
> From: Masahiro Yamada <masahiroy@kernel.org>
>
> Keep the .modinfo section during linking, but strip it from the final
> vmlinux.
>
> Adjust scripts/mksysmap to exclude modinfo symbols from kallsyms.
>
> This change will allow the next commit to extract the .modinfo section
> from the vmlinux.unstripped intermediate.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> include/asm-generic/vmlinux.lds.h | 2 +-
> scripts/Makefile.vmlinux | 2 +-
> scripts/mksysmap | 3 +++
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index ae2d2359b79e9..cfa63860dfd4c 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -831,6 +831,7 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
>
> /* Required sections not related to debugging. */
> #define ELF_DETAILS \
> + .modinfo : { *(.modinfo) } \
> .comment 0 : { *(.comment) } \
> .symtab 0 : { *(.symtab) } \
> .strtab 0 : { *(.strtab) } \
> @@ -1044,7 +1045,6 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
> *(.discard.*) \
> *(.export_symbol) \
> *(.no_trim_symbol) \
> - *(.modinfo) \
> /* ld.bfd warns about .gnu.version* even when not emitted */ \
> *(.gnu.version*) \
>
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index 4f2d4c3fb7372..e2ceeb9e168d4 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -86,7 +86,7 @@ endif
> # vmlinux
> # ---------------------------------------------------------------------------
>
> -remove-section-y :=
> +remove-section-y := .modinfo
> remove-section-$(CONFIG_ARCH_VMLINUX_NEEDS_RELOCS) += '.rel*'
>
> quiet_cmd_strip_relocs = OBJCOPY $@
> diff --git a/scripts/mksysmap b/scripts/mksysmap
> index 3accbdb269ac7..a607a0059d119 100755
> --- a/scripts/mksysmap
> +++ b/scripts/mksysmap
> @@ -79,6 +79,9 @@
> / _SDA_BASE_$/d
> / _SDA2_BASE_$/d
>
> +# MODULE_INFO()
> +/ __UNIQUE_ID_modinfo[0-9]*$/d
> +
> # ---------------------------------------------------------------------------
> # Ignored patterns
> # (symbols that contain the pattern are ignored)
> --
> 2.50.1
>
Hi Alexey,
with this patch applied, I still get a warning from objcpy as Masahiro
and Stephen wrote [1,2]
SORTTAB vmlinux.unstripped
+ sorttable vmlinux.unstripped
+ nm -S vmlinux.unstripped
+ ./scripts/sorttable -s .tmp_vmlinux.nm-sort vmlinux.unstripped
+ is_enabled CONFIG_KALLSYMS
+ grep -q ^CONFIG_KALLSYMS=y include/config/auto.conf
+ cmp -s System.map .tmp_vmlinux2.syms
+ echo vmlinux.unstripped: ../scripts/link-vmlinux.sh
# OBJCOPY vmlinux
objcopy --remove-section=.modinfo vmlinux.unstripped vmlinux
objcopy: vmlinux.unstripped: warning: empty loadable segment detected at vaddr=0xffff8000807a0000, is this intentional?
(arm64, allnoconfig)
Kind regards,
Nicolas
[1]: https://lore.kernel.org/linux-kbuild/CAK7LNAR-gD2H6Kk-rZjo0R3weTHCGTm0a=u2tRH1WWW6Sx6=RQ@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/20250730164047.7c4a731a@canb.auug.org.au/
^ permalink raw reply
* Re: [PATCH v7 6/8] modpost: Add modname to mod_device_table alias
From: Nicolas Schier @ 2025-09-14 20:07 UTC (permalink / raw)
To: Alexey Gladkov
Cc: Nathan Chancellor, Petr Pavlu, Luis Chamberlain, Sami Tolvanen,
Daniel Gomez, linux-kernel, linux-modules, linux-kbuild,
Miguel Ojeda, Andreas Hindborg, Danilo Krummrich, Alex Gaynor,
rust-for-linux
In-Reply-To: <66785b63b47878446a15bcb14a9ef42dc3bda092.1755535876.git.legion@kernel.org>
On Mon, Aug 18, 2025 at 06:55:00PM +0200, Alexey Gladkov wrote:
> At this point, if a symbol is compiled as part of the kernel,
> information about which module the symbol belongs to is lost.
>
> To save this it is possible to add the module name to the alias name.
> It's not very pretty, but it's possible for now.
>
> Cc: Miguel Ojeda <ojeda@kernel.org>
> Cc: Andreas Hindborg <a.hindborg@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Alex Gaynor <alex.gaynor@gmail.com>
> Cc: rust-for-linux@vger.kernel.org
> Signed-off-by: Alexey Gladkov <legion@kernel.org>
> Acked-by: Danilo Krummrich <dakr@kernel.org>
> ---
> include/linux/module.h | 14 +++++++++++++-
> rust/kernel/device_id.rs | 8 ++++----
> scripts/mod/file2alias.c | 18 ++++++++++++++----
> 3 files changed, 31 insertions(+), 9 deletions(-)
>
...
> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> index 00586119a25b7..13021266a18f8 100644
> --- a/scripts/mod/file2alias.c
> +++ b/scripts/mod/file2alias.c
> @@ -1476,8 +1476,8 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
> {
> void *symval;
> char *zeros = NULL;
> - const char *type, *name;
> - size_t typelen;
> + const char *type, *name, *modname;
> + size_t typelen, modnamelen;
when applying the patch-set onto kbuild-next, gcc refuses to build this patch
(this possibly killing future bisecting):
../scripts/mod/file2alias.c: In function ‘handle_moddevtable’:
../scripts/mod/file2alias.c:1480:25: error: variable ‘modnamelen’ set but not used [-Werror=unused-but-set-variable]
1480 | size_t typelen, modnamelen;
| ^~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [../scripts/Makefile.host:131: scripts/mod/file2alias.o] Error
(-Werror is on for userprogs, since commit
27758d8c2583d10472b745a43ff86fef96c11ef7)
Introduction of modnamelen has to be moved to the next patch.
Kind regards
Nicolas
^ permalink raw reply
* Re: [RFC PATCH 00/10] scalable symbol flags with __kflagstab
From: Sid Nayyar @ 2025-09-15 15:53 UTC (permalink / raw)
To: Petr Pavlu
Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
linux-modules, linux-kernel, Giuliano Procida,
Matthias Männich
In-Reply-To: <409ddefc-24f8-465c-8872-17dc585626a6@suse.com>
On Mon, Sep 8, 2025 at 11:09 AM Petr Pavlu <petr.pavlu@suse.com> wrote:
> This sounds reasonable to me. Do you have any numbers on hand that would
> show the impact of extending __ksymtab?
I did performance analysis for module loading. The kflagstab
optimizes symbol search, which accounts for less than 2% of the
average module load time. Therefore, this change does not translate
into any meaningful gains (or losses) in module loading performance.
On the binary size side, the on-disk size for vmlinux is somewhat
inflated due to extra entries in .symtab and .strtab. Since these
sections are not part of the final Image, the only increase in the
in-memory size of the kernel is for the kflagstab itself. This new
table occupies 1 byte for each symbol in the ksymtab.
> > The Android Common Kernel source is compiled into what we call
> > GKI (Generic Kernel Image), which consists of a kernel and a
> > number of modules. We maintain a stable interface (based on CRCs and
> > types) between the GKI components and vendor-specific modules
> > (compiled by device manufacturers, e.g., for hardware-specific
> > drivers) for the lifetime of a given GKI version.
> >
> > This interface is intentionally restricted to the minimal set of
> > symbols required by the union of all vendor modules; our partners
> > declare their requirements in symbol lists. Any additions to these
> > lists are reviewed to ensure kernel internals are not overly exposed.
> > For example, we restrict drivers from having the ability to open and
> > read arbitrary files. This ABI boundary also allows us to evolve
> > internal kernel types that are not exposed to vendor modules, for
> > example, when a security fix requires a type to change.
> >
> > The mechanism we use for this is CONFIG_TRIM_UNUSED_KSYMS and
> > CONFIG_UNUSED_KSYMS_WHITELIST. This results in a ksymtab
> > containing two kinds of exported symbols: those explicitly required
> > by vendors ("vendor-listed") and those only required by GKI modules
> > ("GKI use only").
> >
> > On top of this, we have implemented symbol import protection
> > (covered in patches 9/10 and 10/10). This feature prevents vendor
> > modules from using symbols that are not on the vendor-listed
> > whitelist. It is built on top of CONFIG_MODULE_SIG. GKI modules are
> > signed with a specific key, while vendor modules are unsigned and thus
> > treated as untrusted. This distinction allows signed GKI modules to
> > use any symbol in the ksymtab, while unsigned vendor modules can only
> > access the declared subset. This provides a significant layer of
> > defense and security against potentially exploitable vendor module
> > code.
>
> If I understand correctly, this is similar to the recently introduced
> EXPORT_SYMBOL_FOR_MODULES() macro, but with a coarser boundary.
>
> I think that if the goal is to control the kABI scope and limit the use
> of certain symbols only to GKI modules, then having the protection
> depend on whether the module is signed is somewhat odd. It doesn't give
> me much confidence if vendor modules are unsigned in the Android
> ecosystem. I would expect that you want to improve this in the long
> term.
GKI modules are the only modules built in the same Kbuild as the
kernel image, which Google builds and provides to partners. In
contrast, vendor modules are built and packaged entirely by partners.
Google signs GKI modules with ephemeral keys. Since partners do
not have these keys, vendor modules are treated as unsigned by
the kernel.
To ensure the authenticity of these unsigned modules, partners
package them into a separate image that becomes one of the boot
partitions. This entire image is signed, and its signature is
verified by the bootloader at boot time.
> It would then make more sense to me if the protection was determined by
> whether the module is in-tree (the "intree" flag in modinfo) or,
> alternatively, if it is signed by a built-in trusted key. I feel this
> way the feature could be potentially useful for other distributions that
> care about the kABI scope and have ecosystems where vendor modules are
> properly signed with some key. However, I'm not sure if this would still
> work in your case.
Partners can produce both in-tree and out-of-tree modules. We do not
trust either type regarding symbol exposure, as there is no way to know
exactly what sources were used. Furthermore, symbols exported via
EXPORT_SYMBOL_FOR_MODULES can be accessed by any vendor module that
mimics the GKI module name.
Therefore, neither the in-tree flag nor the EXPORT_SYMBOL_FOR_MODULES
mechanism provides a strong enough guarantee for the Android kernel to
identify GKI modules.
Only module signatures are sufficient to allow a module to access the
full set of exported symbols. Unsigned vendor modules may only access
the symbol subset declared ahead of time by partners.
In case such symbol protection is not useful for the Linux community, I
am happy to keep this as an Android-specific feature. However, I would
urge you to at least accept the kflagstab, as it allows us (and
potentially other Linux distributions) to easily introduce additional
flags for symbols. It is also a simplification/clean-up of the module
loader code.
--
Thanks,
Siddharth Nayyar
^ permalink raw reply
* Re: [PATCH 0/4] PCI: Add support and tests for FIXUP quirks in modules
From: Christoph Hellwig @ 2025-09-15 13:48 UTC (permalink / raw)
To: Brian Norris
Cc: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um
In-Reply-To: <20250912230208.967129-1-briannorris@chromium.org>
On Fri, Sep 12, 2025 at 03:59:31PM -0700, Brian Norris wrote:
> This series primarily adds support for DECLARE_PCI_FIXUP_*() in modules.
> There are a few drivers that already use this, and so they are
> presumably broken when built as modules.
That's a reall bad idea, because it allows random code to insert quirks
not even bound to the hardware they support.
So no, modules should not allow quirks, but the kernel should probably
be nice enough to fail compilation when someone is attemping that
instead of silently ignoring the quirks.
^ permalink raw reply
* Re: [PATCH v2] kernel/module: avoid panic when loading corrupt module
From: Petr Pavlu @ 2025-09-15 12:27 UTC (permalink / raw)
To: Daniel v. Kirschten
Cc: mcgrof, Daniel Gomez, Sami Tolvanen, linux-modules, linux-kernel
In-Reply-To: <d1d1756a-2f6d-4b81-bd6d-50ddf7f39996@gmail.com>
On 9/9/25 6:46 PM, Daniel v. Kirschten wrote:
> If the kernel attempts loading a corrupted module where the
> .gnu.linkonce.this_module section is not marked as WRITE,
> the corresponding this_module struct will be remapped read-only
> in the module loading process. This causes a kernel panic later -
> specifically the first time that struct is being written to after the remap.
> (Currently, this happens in complete_formation at kernel/module/main.c:3265,
> when the module state is set to COMING,
> but this doesn't really matter and of course might also change in the future.)
>
> This panic also causes problems down the line:
> after this panic has occurred, all further attempts
> to add or remove modules will freeze the process attempting to do so.
> I did not investigate this further.
>
> The kernel's module building toolchain will not produce such module files.
> However, there's only a single bit difference on-disk
> between a correct module file and one which triggers this panic.
> Also, there are modules which aren't produced by the kernel's module toolchain.
> (Yes, we don't necessarily need to fully support such modules,
> but we shouldn't panic when loading them either.)
>
> Note that from a security point of view, this bug is irrelevant:
> the problematic remap of this_module as readonly
> only happens after all security checks have already passed.
> If an attacker is in the position to insert arbitrary modules into the kernel,
> then it doesn't matter anymore that it's possible to cause a panic too.
> And even in the hypothetical scenario where an attacker
> can trigger this panic but can't insert custom modules,
> the worst that could happen is a DoS
> caused by future module insertion / removal attempts.
>
> Signed-off-by: Daniel Kirschten <danielkirschten@gmail.com>
Nits:
I suggest making the subject of the patch more specific to avoid
potential confusion. For instance:
module: Check that .gnu.linkonce.this_module is writable
I think the description could also be a bit more straightforward and the
text should ideally be reflown to a 75-column boundary (see
Documentation/process/submitting-patches.rst).
> ---
>
> I hope that the wording is clear enough now about this not being a security bug.
> What do you think?
>
> In my first submisison of this patch (on 06/06/2024),
> I was told to add this check to userspace kmod too.
> If I find the time, I will do so, but I think that won't help as much
> because there are of course other ways for userspace to load a module,
> such as any alternative insmod implementation (for example busybox).
>
> Regarding your "next-level university assignment":
> I feel knowing whether the kernel toolchain can or cannot produce such modules
> is a bit beside the point: _if_ such a module is encountered,
> the kernel's going to panic, and it's not going to care where the module came from.
> Also I'm a bit stumped by your wording "university assignment" here.
> Still, I recognize that it would be goot to be certain
> that the official tools don't produce broken modules.
> So, I debugged the module build system a bit and found out the following:
>
> add_header in scripts/mod/modpost.c:1834-1843 is responsible
> for arranging for the .gnu.linkonce.this_module section to exist:
> The C code this function emits defines the symbol __this_module
> with two attributes: `visibility("default")` and `section(".gnu.linkonce.this_module")`.
> In particular, __this_module is not marked const or anything similar,
> and there definitely is no (supported) way
> for the user to add custom modifiers to this symbol.
> When gcc compiles that file, the resulting section is marked WRITE and ALLOC.
> This seems to be default behaviour of gcc / ld,
> but I couldn't find explicit documentation about this.
> I even tried digging in gcc's source code to find hard proof,
> but as expected gcc turns out to be quite convoluted, so eventually I gave up.
>
> kernel/module/main.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index c66b26184936..c415c58b9462 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2092,6 +2092,12 @@ static int elf_validity_cache_index_mod(struct load_info *info)
> return -ENOEXEC;
> }
>
> + if (!(shdr->sh_flags & SHF_WRITE)) {
> + pr_err("module %s: .gnu.linkonce.this_module must be writable\n",
> + info->name ?: "(missing .modinfo section or name field)");
> + return -ENOEXEC;
> + }
> +
> if (shdr->sh_size != sizeof(struct module)) {
> pr_err("module %s: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time\n",
> info->name ?: "(missing .modinfo section or name field)");
This looks ok to me and in line with other checks in
elf_validity_cache_index_mod(), although we should be careful about the
number of ELF sanity checks performed by the module loader.
I think the important part is that the module loader should always at
least get through the signature and blacklist checks without crashing
due to a corrupted ELF file. After that point, the module content is
to be trusted, but we try to error out for most issues that would cause
problems later on. The added check is useful for the latter.
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH 2/4] PCI: Add KUnit tests for FIXUP quirks
From: Tzung-Bi Shih @ 2025-09-15 8:06 UTC (permalink / raw)
To: Brian Norris
Cc: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um
In-Reply-To: <20250912230208.967129-3-briannorris@chromium.org>
On Fri, Sep 12, 2025 at 03:59:33PM -0700, Brian Norris wrote:
> +static int test_config_read(struct pci_bus *bus, unsigned int devfn, int where,
> + int size, u32 *val)
> +{
> + if (PCI_SLOT(devfn) > 0)
> + return PCIBIOS_DEVICE_NOT_FOUND;
> +
> + if (where + size > TEST_CONF_SIZE)
> + return PCIBIOS_BUFFER_TOO_SMALL;
> +
> + if (size == 1)
> + *val = test_readb(test_conf_space + where);
> + else if (size == 2)
> + *val = test_readw(test_conf_space + where);
> + else if (size == 4)
> + *val = test_readl(test_conf_space + where);
> +
> + return PCIBIOS_SUCCESSFUL;
To handle cases where size might be a value other than {1, 2, 4}, would a
switch statement with a default case be more robust here?
> +static int test_config_write(struct pci_bus *bus, unsigned int devfn, int where,
> + int size, u32 val)
> +{
> + if (PCI_SLOT(devfn) > 0)
> + return PCIBIOS_DEVICE_NOT_FOUND;
> +
> + if (where + size > TEST_CONF_SIZE)
> + return PCIBIOS_BUFFER_TOO_SMALL;
> +
> + if (size == 1)
> + test_writeb(test_conf_space + where, val);
> + else if (size == 2)
> + test_writew(test_conf_space + where, val);
> + else if (size == 4)
> + test_writel(test_conf_space + where, val);
> +
> + return PCIBIOS_SUCCESSFUL;
Same here.
> +static struct pci_dev *hook_device_early;
> +static struct pci_dev *hook_device_header;
> +static struct pci_dev *hook_device_final;
> +static struct pci_dev *hook_device_enable;
> +
> +static void pci_fixup_early_hook(struct pci_dev *pdev)
> +{
> + hook_device_early = pdev;
> +}
> +DECLARE_PCI_FIXUP_EARLY(TEST_VENDOR_ID, TEST_DEVICE_ID, pci_fixup_early_hook);
> [...]
> +static int pci_fixup_test_init(struct kunit *test)
> +{
> + hook_device_early = NULL;
> + hook_device_header = NULL;
> + hook_device_final = NULL;
> + hook_device_enable = NULL;
> +
> + return 0;
> +}
FWIW: if the probe is synchronous and the thread is the same task_struct,
the module level variables can be eliminated by using:
test->priv = kunit_kzalloc(...);
KUNIT_ASSERT_PTR_NE(...);
And in the hooks, kunit_get_current_test() returns the struct kunit *.
> +static void pci_fixup_match_test(struct kunit *test)
> +{
> + struct device *dev = kunit_device_register(test, DEVICE_NAME);
> +
> + KUNIT_ASSERT_PTR_NE(test, NULL, dev);
> +
> + test_conf_space = kunit_kzalloc(test, TEST_CONF_SIZE, GFP_KERNEL);
> + KUNIT_ASSERT_PTR_NE(test, NULL, test_conf_space);
The common initialization code can be moved to pci_fixup_test_init().
> + struct pci_host_bridge *bridge = devm_pci_alloc_host_bridge(dev, 0);
> +
> + KUNIT_ASSERT_PTR_NE(test, NULL, bridge);
> + bridge->ops = &test_ops;
The `bridge` allocation can be moved to .init() too.
> + KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_early);
> + KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_header);
> + KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_final);
> + KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_enable);
Does it really need to check them? They are just initialized by .init().
^ permalink raw reply
* Re: [PATCH 1/4] PCI: Support FIXUP quirks in modules
From: Johannes Berg @ 2025-09-15 6:33 UTC (permalink / raw)
To: Brian Norris, Bjorn Helgaas, Luis Chamberlain, Petr Pavlu,
Daniel Gomez
Cc: linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Sami Tolvanen, Richard Weinberger, Wei Liu,
Brendan Higgins, kunit-dev, Anton Ivanov, linux-um
In-Reply-To: <20250912230208.967129-2-briannorris@chromium.org>
On Fri, 2025-09-12 at 15:59 -0700, Brian Norris wrote:
> The PCI framework supports "quirks" for PCI devices via several
> DECLARE_PCI_FIXUP_*() macros. These macros allow arch or driver code to
> match device IDs to provide customizations or workarounds for broken
> devices.
>
> This mechanism is generally used in code that can only be built into the
> kernel, but there are a few occasions where this mechanism is used in
> drivers that can be modules. For example, see commit 574f29036fce ("PCI:
> iproc: Apply quirk_paxc_bridge() for module as well as built-in").
>
> The PCI fixup mechanism only works for built-in code, however, because
> pci_fixup_device() only scans the ".pci_fixup_*" linker sections found
> in the main kernel; it never touches modules.
>
> Extend the fixup approach to modules.
This _feels_ a bit odd to me - what if you reload a module, should the
fixup be done twice? Right now this was not possible in a module, which
is a bit of a gotcha, but at least that's only one for developers, not
for users (unless someone messes up and puts it into modular code, as in
the example you gave.)
Although, come to think of it, you don't even apply the fixup when the
module is loaded, so what I just wrote isn't really true. That almost
seems like an oversight though, now the module has to be loaded before
the PCI device is enumerated, which is unlikely to happen in practice?
But then we get the next gotcha - the device is already enumerated, so
the fixups cannot be applied at the various enumeration stages, and
you're back to having to load the module before PCI enumeration, which
could be tricky, or somehow forcing re-enumeration of a given device
from userspace, but then you're firmly in "gotcha for the user"
territory again ...
I don't really have any skin in this game, but overall I'd probably
argue it's better to occasionally have to fix things such as in the
commit you point out but have a predictable system, than apply things
from modules.
Perhaps it'd be better to extend the section checking infrastructure to
catch and error out on these sections in modules instead, so we catch it
at build time, rather than finding things missing at runtime?
And yeah, now I've totally ignored the kunit angle, but ... not sure how
to combine the two requirements if they are, as I think, conflicting.
johannes
^ permalink raw reply
* [PATCH 4/4] kunit: qemu_configs: Add PCI to arm, arm64
From: Brian Norris @ 2025-09-12 22:59 UTC (permalink / raw)
To: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez
Cc: linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um,
Brian Norris
In-Reply-To: <20250912230208.967129-1-briannorris@chromium.org>
To get some more test coverage on PCI tests.
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
tools/testing/kunit/qemu_configs/arm.py | 1 +
tools/testing/kunit/qemu_configs/arm64.py | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/testing/kunit/qemu_configs/arm.py b/tools/testing/kunit/qemu_configs/arm.py
index db2160200566..101d67e5157c 100644
--- a/tools/testing/kunit/qemu_configs/arm.py
+++ b/tools/testing/kunit/qemu_configs/arm.py
@@ -3,6 +3,7 @@ from ..qemu_config import QemuArchParams
QEMU_ARCH = QemuArchParams(linux_arch='arm',
kconfig='''
CONFIG_ARCH_VIRT=y
+CONFIG_PCI=y
CONFIG_SERIAL_AMBA_PL010=y
CONFIG_SERIAL_AMBA_PL010_CONSOLE=y
CONFIG_SERIAL_AMBA_PL011=y
diff --git a/tools/testing/kunit/qemu_configs/arm64.py b/tools/testing/kunit/qemu_configs/arm64.py
index 5c44d3a87e6d..ba2b4e660ba7 100644
--- a/tools/testing/kunit/qemu_configs/arm64.py
+++ b/tools/testing/kunit/qemu_configs/arm64.py
@@ -2,6 +2,7 @@ from ..qemu_config import QemuArchParams
QEMU_ARCH = QemuArchParams(linux_arch='arm64',
kconfig='''
+CONFIG_PCI=y
CONFIG_SERIAL_AMBA_PL010=y
CONFIG_SERIAL_AMBA_PL010_CONSOLE=y
CONFIG_SERIAL_AMBA_PL011=y
--
2.51.0.384.g4c02a37b29-goog
^ permalink raw reply related
* [PATCH 3/4] um: Select PCI_DOMAINS_GENERIC
From: Brian Norris @ 2025-09-12 22:59 UTC (permalink / raw)
To: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez
Cc: linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um,
Brian Norris
In-Reply-To: <20250912230208.967129-1-briannorris@chromium.org>
This is useful especially for KUnit tests, where we may want to
dynamically add/remove PCI domains.
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
arch/um/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 9083bfdb7735..7fccd63c3229 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -38,6 +38,7 @@ config UML
select HAVE_ARCH_TRACEHOOK
select HAVE_SYSCALL_TRACEPOINTS
select THREAD_INFO_IN_TASK
+ select PCI_DOMAINS_GENERIC if PCI
config MMU
bool
--
2.51.0.384.g4c02a37b29-goog
^ permalink raw reply related
* [PATCH 2/4] PCI: Add KUnit tests for FIXUP quirks
From: Brian Norris @ 2025-09-12 22:59 UTC (permalink / raw)
To: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez
Cc: linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um,
Brian Norris
In-Reply-To: <20250912230208.967129-1-briannorris@chromium.org>
The PCI framework supports device quirks via a series of macros and
linker sections. This support previously did not work when used in
modules. Add a basic set of tests for matching/non-matching devices.
Example run:
$ ./tools/testing/kunit/kunit.py run 'pci_fixup*'
[...]
[15:31:30] ============ pci_fixup_test_cases (2 subtests) =============
[15:31:30] [PASSED] pci_fixup_match_test
[15:31:30] [PASSED] pci_fixup_nomatch_test
[15:31:30] ============== [PASSED] pci_fixup_test_cases ===============
[15:31:30] ============================================================
[15:31:30] Testing complete. Ran 2 tests: passed: 2
[15:31:30] Elapsed time: 11.197s total, 0.001s configuring, 9.870s building, 1.299s running
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
drivers/pci/Kconfig | 11 +++
drivers/pci/Makefile | 1 +
drivers/pci/fixup-test.c | 197 +++++++++++++++++++++++++++++++++++++++
3 files changed, 209 insertions(+)
create mode 100644 drivers/pci/fixup-test.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9a249c65aedc..a4fa9be797e7 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -68,6 +68,17 @@ config PCI_QUIRKS
Disable this only if your target machine is unaffected by PCI
quirks.
+config PCI_FIXUP_KUNIT_TEST
+ tristate "KUnit tests for PCI fixup code" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ depends on PCI_DOMAINS_GENERIC
+ default KUNIT_ALL_TESTS
+ help
+ This builds unit tests for the PCI quirk/fixup framework. Recommended
+ only for kernel developers.
+
+ When in doubt, say N.
+
config PCI_DEBUG
bool "PCI Debugging"
depends on DEBUG_KERNEL
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 67647f1880fb..ade400250ceb 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -20,6 +20,7 @@ endif
obj-$(CONFIG_OF) += of.o
obj-$(CONFIG_PCI_QUIRKS) += quirks.o
+obj-$(CONFIG_PCI_FIXUP_KUNIT_TEST) += fixup-test.o
obj-$(CONFIG_HOTPLUG_PCI) += hotplug/
obj-$(CONFIG_PCI_ATS) += ats.o
obj-$(CONFIG_PCI_IOV) += iov.o
diff --git a/drivers/pci/fixup-test.c b/drivers/pci/fixup-test.c
new file mode 100644
index 000000000000..54b895fc8f3e
--- /dev/null
+++ b/drivers/pci/fixup-test.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2025 Google, Inc.
+ */
+
+#include <kunit/device.h>
+#include <kunit/test.h>
+#include <linux/cleanup.h>
+#include <linux/pci.h>
+
+#define DEVICE_NAME "pci_fixup_test_device"
+#define TEST_VENDOR_ID 0xdead
+#define TEST_DEVICE_ID 0xbeef
+
+#define TEST_CONF_SIZE 4096
+static u8 *test_conf_space;
+
+#define test_readb(addr) (*(u8 *)(addr))
+#define test_readw(addr) le16_to_cpu(*((__force __le16 *)(addr)))
+#define test_readl(addr) le32_to_cpu(*((__force __le32 *)(addr)))
+#define test_writeb(addr, v) (*(u8 *)(addr) = (v))
+#define test_writew(addr, v) (*((__force __le16 *)(addr)) = cpu_to_le16(v))
+#define test_writel(addr, v) (*((__force __le32 *)(addr)) = cpu_to_le32(v))
+
+static int test_config_read(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 *val)
+{
+ if (PCI_SLOT(devfn) > 0)
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ if (where + size > TEST_CONF_SIZE)
+ return PCIBIOS_BUFFER_TOO_SMALL;
+
+ if (size == 1)
+ *val = test_readb(test_conf_space + where);
+ else if (size == 2)
+ *val = test_readw(test_conf_space + where);
+ else if (size == 4)
+ *val = test_readl(test_conf_space + where);
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+static int test_config_write(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 val)
+{
+ if (PCI_SLOT(devfn) > 0)
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ if (where + size > TEST_CONF_SIZE)
+ return PCIBIOS_BUFFER_TOO_SMALL;
+
+ if (size == 1)
+ test_writeb(test_conf_space + where, val);
+ else if (size == 2)
+ test_writew(test_conf_space + where, val);
+ else if (size == 4)
+ test_writel(test_conf_space + where, val);
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+static struct pci_ops test_ops = {
+ .read = test_config_read,
+ .write = test_config_write,
+};
+
+static struct pci_dev *hook_device_early;
+static struct pci_dev *hook_device_header;
+static struct pci_dev *hook_device_final;
+static struct pci_dev *hook_device_enable;
+
+static void pci_fixup_early_hook(struct pci_dev *pdev)
+{
+ hook_device_early = pdev;
+}
+DECLARE_PCI_FIXUP_EARLY(TEST_VENDOR_ID, TEST_DEVICE_ID, pci_fixup_early_hook);
+
+static void pci_fixup_header_hook(struct pci_dev *pdev)
+{
+ hook_device_header = pdev;
+}
+DECLARE_PCI_FIXUP_HEADER(TEST_VENDOR_ID, TEST_DEVICE_ID, pci_fixup_header_hook);
+
+static void pci_fixup_final_hook(struct pci_dev *pdev)
+{
+ hook_device_final = pdev;
+}
+DECLARE_PCI_FIXUP_FINAL(TEST_VENDOR_ID, TEST_DEVICE_ID, pci_fixup_final_hook);
+
+static void pci_fixup_enable_hook(struct pci_dev *pdev)
+{
+ hook_device_enable = pdev;
+}
+DECLARE_PCI_FIXUP_ENABLE(TEST_VENDOR_ID, TEST_DEVICE_ID, pci_fixup_enable_hook);
+
+static int pci_fixup_test_init(struct kunit *test)
+{
+ hook_device_early = NULL;
+ hook_device_header = NULL;
+ hook_device_final = NULL;
+ hook_device_enable = NULL;
+
+ return 0;
+}
+
+static void pci_fixup_match_test(struct kunit *test)
+{
+ struct device *dev = kunit_device_register(test, DEVICE_NAME);
+
+ KUNIT_ASSERT_PTR_NE(test, NULL, dev);
+
+ test_conf_space = kunit_kzalloc(test, TEST_CONF_SIZE, GFP_KERNEL);
+ KUNIT_ASSERT_PTR_NE(test, NULL, test_conf_space);
+
+ /* Minimal configuration space: a stub vendor and device ID */
+ test_writew(test_conf_space + PCI_VENDOR_ID, TEST_VENDOR_ID);
+ test_writew(test_conf_space + PCI_DEVICE_ID, TEST_DEVICE_ID);
+
+ struct pci_host_bridge *bridge = devm_pci_alloc_host_bridge(dev, 0);
+
+ KUNIT_ASSERT_PTR_NE(test, NULL, bridge);
+ bridge->ops = &test_ops;
+
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_early);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_header);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_final);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_enable);
+
+ KUNIT_EXPECT_EQ(test, 0, pci_host_probe(bridge));
+
+ struct pci_dev *pdev __free(pci_dev_put) =
+ pci_get_slot(bridge->bus, PCI_DEVFN(0, 0));
+ KUNIT_ASSERT_PTR_NE(test, NULL, pdev);
+
+ KUNIT_EXPECT_PTR_EQ(test, pdev, hook_device_early);
+ KUNIT_EXPECT_PTR_EQ(test, pdev, hook_device_header);
+ KUNIT_EXPECT_PTR_EQ(test, pdev, hook_device_final);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_enable);
+
+ KUNIT_EXPECT_EQ(test, 0, pci_enable_device(pdev));
+ KUNIT_EXPECT_PTR_EQ(test, pdev, hook_device_enable);
+}
+
+static void pci_fixup_nomatch_test(struct kunit *test)
+{
+ struct device *dev = kunit_device_register(test, DEVICE_NAME);
+
+ KUNIT_ASSERT_PTR_NE(test, NULL, dev);
+
+ test_conf_space = kunit_kzalloc(test, TEST_CONF_SIZE, GFP_KERNEL);
+ KUNIT_ASSERT_PTR_NE(test, NULL, test_conf_space);
+
+ /* Minimal configuration space: a stub vendor and device ID */
+ test_writew(test_conf_space + PCI_VENDOR_ID, TEST_VENDOR_ID + 1);
+ test_writew(test_conf_space + PCI_DEVICE_ID, TEST_DEVICE_ID);
+
+ struct pci_host_bridge *bridge = devm_pci_alloc_host_bridge(dev, 0);
+
+ KUNIT_ASSERT_PTR_NE(test, NULL, bridge);
+ bridge->ops = &test_ops;
+
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_early);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_header);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_final);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_enable);
+
+ KUNIT_EXPECT_EQ(test, 0, pci_host_probe(bridge));
+
+ struct pci_dev *pdev __free(pci_dev_put) =
+ pci_get_slot(bridge->bus, PCI_DEVFN(0, 0));
+ KUNIT_ASSERT_PTR_NE(test, NULL, pdev);
+
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_early);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_header);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_final);
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_enable);
+
+ KUNIT_EXPECT_EQ(test, 0, pci_enable_device(pdev));
+ KUNIT_EXPECT_PTR_EQ(test, NULL, hook_device_enable);
+}
+
+static struct kunit_case pci_fixup_test_cases[] = {
+ KUNIT_CASE(pci_fixup_match_test),
+ KUNIT_CASE(pci_fixup_nomatch_test),
+ {}
+};
+
+static struct kunit_suite pci_fixup_test_suite = {
+ .name = "pci_fixup_test_cases",
+ .test_cases = pci_fixup_test_cases,
+ .init = pci_fixup_test_init,
+};
+
+kunit_test_suite(pci_fixup_test_suite);
+MODULE_DESCRIPTION("PCI fixups unit test suite");
+MODULE_LICENSE("GPL");
--
2.51.0.384.g4c02a37b29-goog
^ permalink raw reply related
* [PATCH 1/4] PCI: Support FIXUP quirks in modules
From: Brian Norris @ 2025-09-12 22:59 UTC (permalink / raw)
To: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez
Cc: linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um,
Brian Norris
In-Reply-To: <20250912230208.967129-1-briannorris@chromium.org>
The PCI framework supports "quirks" for PCI devices via several
DECLARE_PCI_FIXUP_*() macros. These macros allow arch or driver code to
match device IDs to provide customizations or workarounds for broken
devices.
This mechanism is generally used in code that can only be built into the
kernel, but there are a few occasions where this mechanism is used in
drivers that can be modules. For example, see commit 574f29036fce ("PCI:
iproc: Apply quirk_paxc_bridge() for module as well as built-in").
The PCI fixup mechanism only works for built-in code, however, because
pci_fixup_device() only scans the ".pci_fixup_*" linker sections found
in the main kernel; it never touches modules.
Extend the fixup approach to modules.
I don't attempt to be clever here; the algorithm here scales with the
number of modules in the system.
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
drivers/pci/quirks.c | 62 ++++++++++++++++++++++++++++++++++++++++++
include/linux/module.h | 18 ++++++++++++
kernel/module/main.c | 26 ++++++++++++++++++
3 files changed, 106 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d97335a40193..db5e0ac82ed7 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -207,6 +207,62 @@ extern struct pci_fixup __end_pci_fixups_suspend_late[];
static bool pci_apply_fixup_final_quirks;
+struct pci_fixup_arg {
+ struct pci_dev *dev;
+ enum pci_fixup_pass pass;
+};
+
+static int pci_module_fixup(struct module *mod, void *parm)
+{
+ struct pci_fixup_arg *arg = parm;
+ void *start;
+ unsigned int size;
+
+ switch (arg->pass) {
+ case pci_fixup_early:
+ start = mod->pci_fixup_early;
+ size = mod->pci_fixup_early_size;
+ break;
+ case pci_fixup_header:
+ start = mod->pci_fixup_header;
+ size = mod->pci_fixup_header_size;
+ break;
+ case pci_fixup_final:
+ start = mod->pci_fixup_final;
+ size = mod->pci_fixup_final_size;
+ break;
+ case pci_fixup_enable:
+ start = mod->pci_fixup_enable;
+ size = mod->pci_fixup_enable_size;
+ break;
+ case pci_fixup_resume:
+ start = mod->pci_fixup_resume;
+ size = mod->pci_fixup_resume_size;
+ break;
+ case pci_fixup_suspend:
+ start = mod->pci_fixup_suspend;
+ size = mod->pci_fixup_suspend_size;
+ break;
+ case pci_fixup_resume_early:
+ start = mod->pci_fixup_resume_early;
+ size = mod->pci_fixup_resume_early_size;
+ break;
+ case pci_fixup_suspend_late:
+ start = mod->pci_fixup_suspend_late;
+ size = mod->pci_fixup_suspend_late_size;
+ break;
+ default:
+ return 0;
+ }
+
+ if (!size)
+ return 0;
+
+ pci_do_fixups(arg->dev, start, start + size);
+
+ return 0;
+}
+
void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev)
{
struct pci_fixup *start, *end;
@@ -259,6 +315,12 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev)
return;
}
pci_do_fixups(dev, start, end);
+
+ struct pci_fixup_arg arg = {
+ .dev = dev,
+ .pass = pass,
+ };
+ module_for_each_mod(pci_module_fixup, &arg);
}
EXPORT_SYMBOL(pci_fixup_device);
diff --git a/include/linux/module.h b/include/linux/module.h
index 3319a5269d28..7faa8987b9eb 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -539,6 +539,24 @@ struct module {
int num_kunit_suites;
struct kunit_suite **kunit_suites;
#endif
+#ifdef CONFIG_PCI_QUIRKS
+ void *pci_fixup_early;
+ unsigned int pci_fixup_early_size;
+ void *pci_fixup_header;
+ unsigned int pci_fixup_header_size;
+ void *pci_fixup_final;
+ unsigned int pci_fixup_final_size;
+ void *pci_fixup_enable;
+ unsigned int pci_fixup_enable_size;
+ void *pci_fixup_resume;
+ unsigned int pci_fixup_resume_size;
+ void *pci_fixup_suspend;
+ unsigned int pci_fixup_suspend_size;
+ void *pci_fixup_resume_early;
+ unsigned int pci_fixup_resume_early_size;
+ void *pci_fixup_suspend_late;
+ unsigned int pci_fixup_suspend_late_size;
+#endif
#ifdef CONFIG_LIVEPATCH
diff --git a/kernel/module/main.c b/kernel/module/main.c
index c66b26184936..50a80c875adc 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2702,6 +2702,32 @@ static int find_module_sections(struct module *mod, struct load_info *info)
sizeof(*mod->kunit_init_suites),
&mod->num_kunit_init_suites);
#endif
+#ifdef CONFIG_PCI_QUIRKS
+ mod->pci_fixup_early = section_objs(info, ".pci_fixup_early",
+ sizeof(*mod->pci_fixup_early),
+ &mod->pci_fixup_early_size);
+ mod->pci_fixup_header = section_objs(info, ".pci_fixup_header",
+ sizeof(*mod->pci_fixup_header),
+ &mod->pci_fixup_header_size);
+ mod->pci_fixup_final = section_objs(info, ".pci_fixup_final",
+ sizeof(*mod->pci_fixup_final),
+ &mod->pci_fixup_final_size);
+ mod->pci_fixup_enable = section_objs(info, ".pci_fixup_enable",
+ sizeof(*mod->pci_fixup_enable),
+ &mod->pci_fixup_enable_size);
+ mod->pci_fixup_resume = section_objs(info, ".pci_fixup_resume",
+ sizeof(*mod->pci_fixup_resume),
+ &mod->pci_fixup_resume_size);
+ mod->pci_fixup_suspend = section_objs(info, ".pci_fixup_suspend",
+ sizeof(*mod->pci_fixup_suspend),
+ &mod->pci_fixup_suspend_size);
+ mod->pci_fixup_resume_early = section_objs(info, ".pci_fixup_resume_early",
+ sizeof(*mod->pci_fixup_resume_early),
+ &mod->pci_fixup_resume_early_size);
+ mod->pci_fixup_suspend_late = section_objs(info, ".pci_fixup_suspend_late",
+ sizeof(*mod->pci_fixup_suspend_late),
+ &mod->pci_fixup_suspend_late_size);
+#endif
mod->extable = section_objs(info, "__ex_table",
sizeof(*mod->extable), &mod->num_exentries);
--
2.51.0.384.g4c02a37b29-goog
^ permalink raw reply related
* [PATCH 0/4] PCI: Add support and tests for FIXUP quirks in modules
From: Brian Norris @ 2025-09-12 22:59 UTC (permalink / raw)
To: Bjorn Helgaas, Luis Chamberlain, Petr Pavlu, Daniel Gomez
Cc: linux-pci, David Gow, Rae Moar, linux-kselftest, linux-kernel,
linux-modules, Johannes Berg, Sami Tolvanen, Richard Weinberger,
Wei Liu, Brendan Higgins, kunit-dev, Anton Ivanov, linux-um,
Brian Norris
This series primarily adds support for DECLARE_PCI_FIXUP_*() in modules.
There are a few drivers that already use this, and so they are
presumably broken when built as modules.
While at it, I wrote some unit tests that emulate a fake PCI device, and
let the PCI framework match/not-match its vendor/device IDs. This test
can be built into the kernel or built as a module.
I also include some infrastructure changes (patch 3 and 4), so that
ARCH=um (the default for kunit.py), ARCH=arm, and ARCH=arm64 will run
these tests by default. These patches have different maintainers and are
independent, so they can probably be picked up separately. I included
them because otherwise the tests in patch 2 aren't so easy to run.
Brian Norris (4):
PCI: Support FIXUP quirks in modules
PCI: Add KUnit tests for FIXUP quirks
um: Select PCI_DOMAINS_GENERIC
kunit: qemu_configs: Add PCI to arm, arm64
arch/um/Kconfig | 1 +
drivers/pci/Kconfig | 11 ++
drivers/pci/Makefile | 1 +
drivers/pci/fixup-test.c | 197 ++++++++++++++++++++++
drivers/pci/quirks.c | 62 +++++++
include/linux/module.h | 18 ++
kernel/module/main.c | 26 +++
tools/testing/kunit/qemu_configs/arm.py | 1 +
tools/testing/kunit/qemu_configs/arm64.py | 1 +
9 files changed, 318 insertions(+)
create mode 100644 drivers/pci/fixup-test.c
--
2.51.0.384.g4c02a37b29-goog
^ permalink raw reply
* Re: [RFC] crypto: support for a standalone FIPS 140 module
From: Herbert Xu @ 2025-09-11 5:53 UTC (permalink / raw)
To: Vegard Nossum
Cc: David S. Miller, linux-crypto, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Ard Biesheuvel, Eric Biggers, Jason A . Donenfeld,
Greg Kroah-Hartman, Wang, Jay, Nicolai Stange, Vladis Dronov,
Stephan Mueller, Sami Tolvanen, linux-modules
In-Reply-To: <20250904155216.460962-1-vegard.nossum@oracle.com>
On Thu, Sep 04, 2025 at 05:50:32PM +0200, Vegard Nossum wrote:
> Hi all,
>
> This patch set adds support for building and loading a standalone FIPS
> 140 module. This is mostly useful for distributions that want to certify
> their kernel's crypto code with NIST. Please see
> Documentation/crypto/fips140.rst for more details.
>
> I apologize for the large patch series. I could have squashed
> it down to fewer commits but it would really make it harder to see
> what's going on.
Perhaps we can divide this by layer? The public key crypto sits
on top of the Crypto API, which in turns sits on top of lib/crypto.
So it would seem natural to divide this into three parts. The
code in lib/crypto can be converted without affecting anything on
top of it, and then the Crypto API could be converted.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH RFC 005/104] crypto: hide crypto_default_rng variable
From: Herbert Xu @ 2025-09-11 5:48 UTC (permalink / raw)
To: Vegard Nossum
Cc: David S. Miller, linux-crypto, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Ard Biesheuvel, Eric Biggers, Jason A . Donenfeld,
Greg Kroah-Hartman, Wang, Jay, Nicolai Stange, Vladis Dronov,
Stephan Mueller, Sami Tolvanen, linux-modules,
Sriharsha Yadagudde
In-Reply-To: <20250904155216.460962-6-vegard.nossum@oracle.com>
On Thu, Sep 04, 2025 at 05:50:37PM +0200, Vegard Nossum wrote:
>
> -int crypto_get_default_rng(void)
> +int crypto_get_default_rng(struct crypto_rng **result)
The usual way to do this is
struct crypto_rng *crypto_get_default_rng(void)
and return the error value as an ERR_PTR().
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* [PATCH v2] kernel/module: avoid panic when loading corrupt module
From: Daniel v. Kirschten @ 2025-09-09 16:46 UTC (permalink / raw)
To: mcgrof; +Cc: linux-modules, linux-kernel
If the kernel attempts loading a corrupted module where the
.gnu.linkonce.this_module section is not marked as WRITE,
the corresponding this_module struct will be remapped read-only
in the module loading process. This causes a kernel panic later -
specifically the first time that struct is being written to after the remap.
(Currently, this happens in complete_formation at kernel/module/main.c:3265,
when the module state is set to COMING,
but this doesn't really matter and of course might also change in the future.)
This panic also causes problems down the line:
after this panic has occurred, all further attempts
to add or remove modules will freeze the process attempting to do so.
I did not investigate this further.
The kernel's module building toolchain will not produce such module files.
However, there's only a single bit difference on-disk
between a correct module file and one which triggers this panic.
Also, there are modules which aren't produced by the kernel's module toolchain.
(Yes, we don't necessarily need to fully support such modules,
but we shouldn't panic when loading them either.)
Note that from a security point of view, this bug is irrelevant:
the problematic remap of this_module as readonly
only happens after all security checks have already passed.
If an attacker is in the position to insert arbitrary modules into the kernel,
then it doesn't matter anymore that it's possible to cause a panic too.
And even in the hypothetical scenario where an attacker
can trigger this panic but can't insert custom modules,
the worst that could happen is a DoS
caused by future module insertion / removal attempts.
Signed-off-by: Daniel Kirschten <danielkirschten@gmail.com>
---
I hope that the wording is clear enough now about this not being a security bug.
What do you think?
In my first submisison of this patch (on 06/06/2024),
I was told to add this check to userspace kmod too.
If I find the time, I will do so, but I think that won't help as much
because there are of course other ways for userspace to load a module,
such as any alternative insmod implementation (for example busybox).
Regarding your "next-level university assignment":
I feel knowing whether the kernel toolchain can or cannot produce such modules
is a bit beside the point: _if_ such a module is encountered,
the kernel's going to panic, and it's not going to care where the module came from.
Also I'm a bit stumped by your wording "university assignment" here.
Still, I recognize that it would be goot to be certain
that the official tools don't produce broken modules.
So, I debugged the module build system a bit and found out the following:
add_header in scripts/mod/modpost.c:1834-1843 is responsible
for arranging for the .gnu.linkonce.this_module section to exist:
The C code this function emits defines the symbol __this_module
with two attributes: `visibility("default")` and `section(".gnu.linkonce.this_module")`.
In particular, __this_module is not marked const or anything similar,
and there definitely is no (supported) way
for the user to add custom modifiers to this symbol.
When gcc compiles that file, the resulting section is marked WRITE and ALLOC.
This seems to be default behaviour of gcc / ld,
but I couldn't find explicit documentation about this.
I even tried digging in gcc's source code to find hard proof,
but as expected gcc turns out to be quite convoluted, so eventually I gave up.
kernel/module/main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index c66b26184936..c415c58b9462 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2092,6 +2092,12 @@ static int elf_validity_cache_index_mod(struct load_info *info)
return -ENOEXEC;
}
+ if (!(shdr->sh_flags & SHF_WRITE)) {
+ pr_err("module %s: .gnu.linkonce.this_module must be writable\n",
+ info->name ?: "(missing .modinfo section or name field)");
+ return -ENOEXEC;
+ }
+
if (shdr->sh_size != sizeof(struct module)) {
pr_err("module %s: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time\n",
info->name ?: "(missing .modinfo section or name field)");
--
2.39.5
^ permalink raw reply related
* Re: [PATCH] module : fix signature checker pointer arithmetic and bounds check
From: Luis Chamberlain @ 2025-09-09 16:45 UTC (permalink / raw)
To: Christophe Leroy
Cc: Fidal Palamparambil, linux-modules, petr.pavlu, da.gomez,
samitolvanen, linux-kernel
In-Reply-To: <d46498e5-db21-4a79-93b4-3869be3660d2@csgroup.eu>
On Tue, Sep 09, 2025 at 11:32:27AM +0200, Christophe Leroy wrote:
>
>
> Le 05/09/2025 à 17:45, Fidal Palamparambil a écrit :
> > From: Fidal palamparambil <rootuserhere@gmail.com>
> >
> > This patch fixes :
> > - invalid module_param type (bool_enable_only → bool)
>
> Can you explain what the problem is ? Why do you say bool_enable_only is
> invalid ? Was generalised by commit d19f05d8a8fa ("kernel/params.c:
> generalize bool_enable_only")
Christophe, I will try to save you time. You can ignore the submitter's patch.
They are a troll. I recommend you add them to your ignore list.
Luis
^ permalink raw reply
* Re: [PATCH 1/1] module: replace use of system_wq with system_percpu_wq
From: Marco Crivellari @ 2025-09-09 15:03 UTC (permalink / raw)
To: Petr Pavlu
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Luis Chamberlain,
linux-kernel, linux-modules
In-Reply-To: <3e007a66-419d-4933-942e-4e5cdfb06887@suse.com>
On Tue, Sep 9, 2025 at 12:37 PM Petr Pavlu <petr.pavlu@suse.com> wrote:
Hi Petr,
> If I understand the cover letter and its linked discussion correctly,
> the aim is to eventually move users to unbound workqueues unless they
> really need to use per-CPU workqueues.
Yes, correct. This first round is just a 1:1 conversion keeping the
old behavior.
But later yes, the aim is to let per-cpu just who needs to be per-cpu.
> The two work items queued by the dups.c code can run anywhere. I don't
> see a reason why they need to be bound to a specific CPU.
> If it helps, I believe you can already update this code to use the new
> system_dfl_wq.
Cool, I will send the v2 converting directly from system_wq to system_dfl_wq.
Thank you!
--
Marco Crivellari
L3 Support Engineer, Technology & Product
marco.crivellari@suse.com
^ permalink raw reply
* Re: [PATCH 1/1] module: replace use of system_wq with system_percpu_wq
From: Petr Pavlu @ 2025-09-09 10:37 UTC (permalink / raw)
To: Marco Crivellari
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Luis Chamberlain,
linux-kernel, linux-modules
In-Reply-To: <20250905090130.101724-2-marco.crivellari@suse.com>
On 9/5/25 11:01 AM, Marco Crivellari wrote:
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
>
> This lack of consistentcy cannot be addressed without refactoring the API.
>
> system_wq is a per-CPU worqueue, yet nothing in its name tells about that
> CPU affinity constraint, which is very often not required by users. Make
> it clear by adding a system_percpu_wq.
>
> queue_work() / queue_delayed_work() mod_delayed_work() will now use the
> new per-cpu wq: whether the user still stick on the old name a warn will
> be printed along a wq redirect to the new one.
>
> This patch add the new system_percpu_wq except for mm, fs and net
> subsystem, whom are handled in separated patches.
>
> The old wq will be kept for a few release cylces.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
> kernel/module/dups.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module/dups.c b/kernel/module/dups.c
> index bd2149fbe117..e72fa393a2ec 100644
> --- a/kernel/module/dups.c
> +++ b/kernel/module/dups.c
> @@ -113,7 +113,7 @@ static void kmod_dup_request_complete(struct work_struct *work)
> * let this linger forever as this is just a boot optimization for
> * possible abuses of vmalloc() incurred by finit_module() thrashing.
> */
> - queue_delayed_work(system_wq, &kmod_req->delete_work, 60 * HZ);
> + queue_delayed_work(system_percpu_wq, &kmod_req->delete_work, 60 * HZ);
> }
>
> bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret)
> @@ -240,7 +240,7 @@ void kmod_dup_request_announce(char *module_name, int ret)
> * There is no rush. But we also don't want to hold the
> * caller up forever or introduce any boot delays.
> */
> - queue_work(system_wq, &kmod_req->complete_work);
> + queue_work(system_percpu_wq, &kmod_req->complete_work);
>
> out:
> mutex_unlock(&kmod_dup_mutex);
The two work items queued by the dups.c code can run anywhere. I don't
see a reason why they need to be bound to a specific CPU.
If I understand the cover letter and its linked discussion correctly,
the aim is to eventually move users to unbound workqueues unless they
really need to use per-CPU workqueues.
If it helps, I believe you can already update this code to use the new
system_dfl_wq.
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH] tracing: Fix multiple issues in trace_printk module handling
From: Christophe Leroy @ 2025-09-09 9:46 UTC (permalink / raw)
To: Fidal Palamparambil, linux-modules
Cc: mcgrof, petr.pavlu, da.gomez, samitolvanen, linux-kernel
In-Reply-To: <20250906134148.55-1-rootuserhere@gmail.com>
Le 06/09/2025 à 15:41, Fidal Palamparambil a écrit :
> [Vous ne recevez pas souvent de courriers de rootuserhere@gmail.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>
> From: Fidal palamparambil <rootuserhere@gmail.com>
>
> This commit addresses several bugs and potential issues in the
> trace_printk module format handling code:
>
> 1. Memory leak fix: In hold_module_trace_bprintk_format(), ensure
> proper cleanup when format string allocation fails by setting
> tb_fmt to NULL after freeing it.
Why is that needed ?
>
> 2. NULL pointer dereference prevention: Added NULL checks in
> t_show() function before dereferencing format pointers.
isn't it already checked by the caller ?
>
> 3. Input validation: Added NULL check in trace_is_tracepoint_string()
> to prevent potential NULL pointer dereference.
>
> 4. Type safety: Fixed type casting in t_show() to use proper
> unsigned long casting for pointer arithmetic.
>
> 5. Error handling: Improved error checking in
> init_trace_printk_function_export() by using IS_ERR() to check
> dentry pointer.
>
> 6. Code robustness: Added additional pointer validation throughout
> the code to handle potential edge cases.
>
> 7. Memory safety: Ensured consistent handling of format pointers
> when memory allocation failures occur.
All those points look pointless, please elaborate.
>
> These fixes improve the stability and reliability of the trace_printk
> infrastructure, particularly when dealing with module loading/unloading
> and format string management.
>
> Signed-off-by: Fidal palamparambil <rootuserhere@gmail.com>
> ---
> kernel/trace/trace_printk.c | 33 +--
> kernel/trace/trace_printk.c.orig | 400 +++++++++++++++++++++++++++++++
What is the point in adding that .orig file ?
> 2 files changed, 419 insertions(+), 14 deletions(-)
> create mode 100644 kernel/trace/trace_printk.c.orig
>
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index 29f6e95439b6..cb962c6c02f8 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -76,10 +76,12 @@ void hold_module_trace_bprintk_format(const char **start, const char **end)
> list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
> strcpy(fmt, *iter);
> tb_fmt->fmt = fmt;
> - } else
> + } else {
> kfree(tb_fmt);
> + tb_fmt = NULL;
Why is that needed ? tb_fmt is not reused before being assigned again.
> + }
> }
> - *iter = fmt;
> + *iter = tb_fmt ? tb_fmt->fmt : NULL;
Why complicate it ? fmt is already NULL when needed so what's the point
here ?
>
> }
> mutex_unlock(&btrace_mutex);
> @@ -253,7 +255,10 @@ EXPORT_SYMBOL_GPL(__ftrace_vprintk);
>
> bool trace_is_tracepoint_string(const char *str)
> {
> - const char **ptr = __start___tracepoint_str;
> + const char **ptr;
> +
> + if (!str)
> + return false;
What is the problem here ? str is never dereferenced in this function.
>
> for (ptr = __start___tracepoint_str; ptr < __stop___tracepoint_str; ptr++) {
> if (str == *ptr)
> @@ -311,19 +316,19 @@ static void *t_next(struct seq_file *m, void * v, loff_t *pos)
> static int t_show(struct seq_file *m, void *v)
> {
> const char **fmt = v;
> - const char *str = *fmt;
> - int i;
> + const char *str;
>
> - if (!*fmt)
> + if (!fmt || !*fmt)
How can this happen ?
> return 0;
>
> - seq_printf(m, "0x%lx : \"", *(unsigned long *)fmt);
> + str = *fmt;
> + seq_printf(m, "0x%lx : \"", (unsigned long)fmt);
>
> /*
> * Tabs and new lines need to be converted.
> */
> - for (i = 0; str[i]; i++) {
> - switch (str[i]) {
> + for (; *str; str++) {
> + switch (*str) {
Why is this change needed ?
> case '\n':
> seq_puts(m, "\\n");
> break;
> @@ -337,7 +342,7 @@ static int t_show(struct seq_file *m, void *v)
> seq_puts(m, "\\\"");
> break;
> default:
> - seq_putc(m, str[i]);
> + seq_putc(m, *str);
> }
> }
> seq_puts(m, "\"\n");
> @@ -378,10 +383,10 @@ static const struct file_operations ftrace_formats_fops = {
>
> static __init int init_trace_printk_function_export(void)
> {
> - int ret;
> + struct dentry *dentry;
>
> - ret = tracing_init_dentry();
> - if (ret)
> + dentry = tracing_init_dentry();
tracing_init_dentry() returns an int, how can this change build ???!!!
> + if (IS_ERR(dentry))
> return 0;
>
> trace_create_file("printk_formats", TRACE_MODE_READ, NULL,
> @@ -397,4 +402,4 @@ static __init int init_trace_printk(void)
> return register_module_notifier(&module_trace_bprintk_format_nb);
> }
>
> -early_initcall(init_trace_printk);
> +early_initcall(init_trace_printk);
> \ No newline at end of file
> diff --git a/kernel/trace/trace_printk.c.orig b/kernel/trace/trace_printk.c.orig
> new file mode 100644
> index 000000000000..29f6e95439b6
> --- /dev/null
> +++ b/kernel/trace/trace_printk.c.orig
> @@ -0,0 +1,400 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * trace binary printk
> + *
> + * Copyright (C) 2008 Lai Jiangshan <laijs@cn.fujitsu.com>
> + *
> + */
> +#include <linux/seq_file.h>
> +#include <linux/security.h>
> +#include <linux/uaccess.h>
> +#include <linux/kernel.h>
> +#include <linux/ftrace.h>
> +#include <linux/string.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/ctype.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +
> +#include "trace.h"
> +
> +#ifdef CONFIG_MODULES
> +
> +/*
> + * modules trace_printk()'s formats are autosaved in struct trace_bprintk_fmt
> + * which are queued on trace_bprintk_fmt_list.
> + */
> +static LIST_HEAD(trace_bprintk_fmt_list);
> +
> +/* serialize accesses to trace_bprintk_fmt_list */
> +static DEFINE_MUTEX(btrace_mutex);
> +
> +struct trace_bprintk_fmt {
> + struct list_head list;
> + const char *fmt;
> +};
> +
> +static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
> +{
> + struct trace_bprintk_fmt *pos;
> +
> + if (!fmt)
> + return ERR_PTR(-EINVAL);
> +
> + list_for_each_entry(pos, &trace_bprintk_fmt_list, list) {
> + if (!strcmp(pos->fmt, fmt))
> + return pos;
> + }
> + return NULL;
> +}
> +
> +static
> +void hold_module_trace_bprintk_format(const char **start, const char **end)
> +{
> + const char **iter;
> + char *fmt;
> +
> + /* allocate the trace_printk per cpu buffers */
> + if (start != end)
> + trace_printk_init_buffers();
> +
> + mutex_lock(&btrace_mutex);
> + for (iter = start; iter < end; iter++) {
> + struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter);
> + if (tb_fmt) {
> + if (!IS_ERR(tb_fmt))
> + *iter = tb_fmt->fmt;
> + continue;
> + }
> +
> + fmt = NULL;
> + tb_fmt = kmalloc(sizeof(*tb_fmt), GFP_KERNEL);
> + if (tb_fmt) {
> + fmt = kmalloc(strlen(*iter) + 1, GFP_KERNEL);
> + if (fmt) {
> + list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
> + strcpy(fmt, *iter);
> + tb_fmt->fmt = fmt;
> + } else
> + kfree(tb_fmt);
> + }
> + *iter = fmt;
> +
> + }
> + mutex_unlock(&btrace_mutex);
> +}
> +
> +static int module_trace_bprintk_format_notify(struct notifier_block *self,
> + unsigned long val, void *data)
> +{
> + struct module *mod = data;
> + if (mod->num_trace_bprintk_fmt) {
> + const char **start = mod->trace_bprintk_fmt_start;
> + const char **end = start + mod->num_trace_bprintk_fmt;
> +
> + if (val == MODULE_STATE_COMING)
> + hold_module_trace_bprintk_format(start, end);
> + }
> + return NOTIFY_OK;
> +}
> +
> +/*
> + * The debugfs/tracing/printk_formats file maps the addresses with
> + * the ASCII formats that are used in the bprintk events in the
> + * buffer. For userspace tools to be able to decode the events from
> + * the buffer, they need to be able to map the address with the format.
> + *
> + * The addresses of the bprintk formats are in their own section
> + * __trace_printk_fmt. But for modules we copy them into a link list.
> + * The code to print the formats and their addresses passes around the
> + * address of the fmt string. If the fmt address passed into the seq
> + * functions is within the kernel core __trace_printk_fmt section, then
> + * it simply uses the next pointer in the list.
> + *
> + * When the fmt pointer is outside the kernel core __trace_printk_fmt
> + * section, then we need to read the link list pointers. The trick is
> + * we pass the address of the string to the seq function just like
> + * we do for the kernel core formats. To get back the structure that
> + * holds the format, we simply use container_of() and then go to the
> + * next format in the list.
> + */
> +static const char **
> +find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> +{
> + struct trace_bprintk_fmt *mod_fmt;
> +
> + if (list_empty(&trace_bprintk_fmt_list))
> + return NULL;
> +
> + /*
> + * v will point to the address of the fmt record from t_next
> + * v will be NULL from t_start.
> + * If this is the first pointer or called from start
> + * then we need to walk the list.
> + */
> + if (!v || start_index == *pos) {
> + struct trace_bprintk_fmt *p;
> +
> + /* search the module list */
> + list_for_each_entry(p, &trace_bprintk_fmt_list, list) {
> + if (start_index == *pos)
> + return &p->fmt;
> + start_index++;
> + }
> + /* pos > index */
> + return NULL;
> + }
> +
> + /*
> + * v points to the address of the fmt field in the mod list
> + * structure that holds the module print format.
> + */
> + mod_fmt = container_of(v, typeof(*mod_fmt), fmt);
> + if (mod_fmt->list.next == &trace_bprintk_fmt_list)
> + return NULL;
> +
> + mod_fmt = container_of(mod_fmt->list.next, typeof(*mod_fmt), list);
> +
> + return &mod_fmt->fmt;
> +}
> +
> +static void format_mod_start(void)
> +{
> + mutex_lock(&btrace_mutex);
> +}
> +
> +static void format_mod_stop(void)
> +{
> + mutex_unlock(&btrace_mutex);
> +}
> +
> +#else /* !CONFIG_MODULES */
> +__init static int
> +module_trace_bprintk_format_notify(struct notifier_block *self,
> + unsigned long val, void *data)
> +{
> + return NOTIFY_OK;
> +}
> +static inline const char **
> +find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> +{
> + return NULL;
> +}
> +static inline void format_mod_start(void) { }
> +static inline void format_mod_stop(void) { }
> +#endif /* CONFIG_MODULES */
> +
> +static bool __read_mostly trace_printk_enabled = true;
> +
> +void trace_printk_control(bool enabled)
> +{
> + trace_printk_enabled = enabled;
> +}
> +
> +__initdata_or_module static
> +struct notifier_block module_trace_bprintk_format_nb = {
> + .notifier_call = module_trace_bprintk_format_notify,
> +};
> +
> +int __trace_bprintk(unsigned long ip, const char *fmt, ...)
> +{
> + int ret;
> + va_list ap;
> +
> + if (unlikely(!fmt))
> + return 0;
> +
> + if (!trace_printk_enabled)
> + return 0;
> +
> + va_start(ap, fmt);
> + ret = trace_vbprintk(ip, fmt, ap);
> + va_end(ap);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(__trace_bprintk);
> +
> +int __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap)
> +{
> + if (unlikely(!fmt))
> + return 0;
> +
> + if (!trace_printk_enabled)
> + return 0;
> +
> + return trace_vbprintk(ip, fmt, ap);
> +}
> +EXPORT_SYMBOL_GPL(__ftrace_vbprintk);
> +
> +int __trace_printk(unsigned long ip, const char *fmt, ...)
> +{
> + int ret;
> + va_list ap;
> +
> + if (!trace_printk_enabled)
> + return 0;
> +
> + va_start(ap, fmt);
> + ret = trace_vprintk(ip, fmt, ap);
> + va_end(ap);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(__trace_printk);
> +
> +int __ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap)
> +{
> + if (!trace_printk_enabled)
> + return 0;
> +
> + return trace_vprintk(ip, fmt, ap);
> +}
> +EXPORT_SYMBOL_GPL(__ftrace_vprintk);
> +
> +bool trace_is_tracepoint_string(const char *str)
> +{
> + const char **ptr = __start___tracepoint_str;
> +
> + for (ptr = __start___tracepoint_str; ptr < __stop___tracepoint_str; ptr++) {
> + if (str == *ptr)
> + return true;
> + }
> + return false;
> +}
> +
> +static const char **find_next(void *v, loff_t *pos)
> +{
> + const char **fmt = v;
> + int start_index;
> + int last_index;
> +
> + start_index = __stop___trace_bprintk_fmt - __start___trace_bprintk_fmt;
> +
> + if (*pos < start_index)
> + return __start___trace_bprintk_fmt + *pos;
> +
> + /*
> + * The __tracepoint_str section is treated the same as the
> + * __trace_printk_fmt section. The difference is that the
> + * __trace_printk_fmt section should only be used by trace_printk()
> + * in a debugging environment, as if anything exists in that section
> + * the trace_prink() helper buffers are allocated, which would just
> + * waste space in a production environment.
> + *
> + * The __tracepoint_str sections on the other hand are used by
> + * tracepoints which need to map pointers to their strings to
> + * the ASCII text for userspace.
> + */
> + last_index = start_index;
> + start_index = __stop___tracepoint_str - __start___tracepoint_str;
> +
> + if (*pos < last_index + start_index)
> + return __start___tracepoint_str + (*pos - last_index);
> +
> + start_index += last_index;
> + return find_next_mod_format(start_index, v, fmt, pos);
> +}
> +
> +static void *
> +t_start(struct seq_file *m, loff_t *pos)
> +{
> + format_mod_start();
> + return find_next(NULL, pos);
> +}
> +
> +static void *t_next(struct seq_file *m, void * v, loff_t *pos)
> +{
> + (*pos)++;
> + return find_next(v, pos);
> +}
> +
> +static int t_show(struct seq_file *m, void *v)
> +{
> + const char **fmt = v;
> + const char *str = *fmt;
> + int i;
> +
> + if (!*fmt)
> + return 0;
> +
> + seq_printf(m, "0x%lx : \"", *(unsigned long *)fmt);
> +
> + /*
> + * Tabs and new lines need to be converted.
> + */
> + for (i = 0; str[i]; i++) {
> + switch (str[i]) {
> + case '\n':
> + seq_puts(m, "\\n");
> + break;
> + case '\t':
> + seq_puts(m, "\\t");
> + break;
> + case '\\':
> + seq_putc(m, '\\');
> + break;
> + case '"':
> + seq_puts(m, "\\\"");
> + break;
> + default:
> + seq_putc(m, str[i]);
> + }
> + }
> + seq_puts(m, "\"\n");
> +
> + return 0;
> +}
> +
> +static void t_stop(struct seq_file *m, void *p)
> +{
> + format_mod_stop();
> +}
> +
> +static const struct seq_operations show_format_seq_ops = {
> + .start = t_start,
> + .next = t_next,
> + .show = t_show,
> + .stop = t_stop,
> +};
> +
> +static int
> +ftrace_formats_open(struct inode *inode, struct file *file)
> +{
> + int ret;
> +
> + ret = security_locked_down(LOCKDOWN_TRACEFS);
> + if (ret)
> + return ret;
> +
> + return seq_open(file, &show_format_seq_ops);
> +}
> +
> +static const struct file_operations ftrace_formats_fops = {
> + .open = ftrace_formats_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release,
> +};
> +
> +static __init int init_trace_printk_function_export(void)
> +{
> + int ret;
> +
> + ret = tracing_init_dentry();
> + if (ret)
> + return 0;
> +
> + trace_create_file("printk_formats", TRACE_MODE_READ, NULL,
> + NULL, &ftrace_formats_fops);
> +
> + return 0;
> +}
> +
> +fs_initcall(init_trace_printk_function_export);
> +
> +static __init int init_trace_printk(void)
> +{
> + return register_module_notifier(&module_trace_bprintk_format_nb);
> +}
> +
> +early_initcall(init_trace_printk);
> --
> 2.50.1.windows.1
>
>
^ permalink raw reply
* Re: [PATCH] module : fix signature checker pointer arithmetic and bounds check
From: Christophe Leroy @ 2025-09-09 9:32 UTC (permalink / raw)
To: Fidal Palamparambil, linux-modules
Cc: mcgrof, petr.pavlu, da.gomez, samitolvanen, linux-kernel
In-Reply-To: <20250905154550.130-1-rootuserhere@gmail.com>
Le 05/09/2025 à 17:45, Fidal Palamparambil a écrit :
> From: Fidal palamparambil <rootuserhere@gmail.com>
>
> This patch fixes :
> - invalid module_param type (bool_enable_only → bool)
Can you explain what the problem is ? Why do you say bool_enable_only is
invalid ? Was generalised by commit d19f05d8a8fa ("kernel/params.c:
generalize bool_enable_only")
> - unsafe pointer arithmetic on void *
Why is it unsafe in Linux Kernel ? See https://lkml.org/lkml/2022/2/24/978
> - missing bounds check for sig_len, preventing underflow/OOB
This is checked by mod_check_sig(), why check it again ?
> - export set_module_sig_enforced for consistency
Consistency with what ? Can you tell which module needs it ?
>
> Signed-off-by : Fidal Palamparambil <rootuserhere@gmail.com>
> Signed-off-by: Fidal palamparambil <rootuserhere@gmail.com>
Why a double sob ?
> ---
> kernel/module/signing.c | 48 ++++++++------
> kernel/module/signing.orig | 125 +++++++++++++++++++++++++++++++++++++
Why adding this .orig file into the kernel at all ?
> 2 files changed, 155 insertions(+), 18 deletions(-)
> create mode 100644 kernel/module/signing.orig
>
> diff --git a/kernel/module/signing.c b/kernel/module/signing.c
> index a2ff4242e623..8dda6cd2fd73 100644
> --- a/kernel/module/signing.c
> +++ b/kernel/module/signing.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-or-later
> -/* Module signature checker
> +/*
> + * Module signature checker
Don't mix cosmetic changes and real changes, you are making
bisectability more difficult.
> *
> * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
> * Written by David Howells (dhowells@redhat.com)
> @@ -20,11 +21,11 @@
> #define MODULE_PARAM_PREFIX "module."
>
> static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
> -module_param(sig_enforce, bool_enable_only, 0644);
> +module_param(sig_enforce, bool, 0644);
>
> /*
> - * Export sig_enforce kernel cmdline parameter to allow other subsystems rely
> - * on that instead of directly to CONFIG_MODULE_SIG_FORCE config.
> + * Export sig_enforce kernel cmdline parameter to allow other subsystems to
> + * rely on that instead of directly on CONFIG_MODULE_SIG_FORCE config.
> */
> bool is_module_sig_enforced(void)
> {
> @@ -36,6 +37,7 @@ void set_module_sig_enforced(void)
> {
> sig_enforce = true;
> }
> +EXPORT_SYMBOL(set_module_sig_enforced);
>
> /*
> * Verify the signature on a module.
> @@ -45,44 +47,55 @@ int mod_verify_sig(const void *mod, struct load_info *info)
> struct module_signature ms;
> size_t sig_len, modlen = info->len;
> int ret;
> + const unsigned char *data = mod;
Pointless change.
>
> pr_devel("==>%s(,%zu)\n", __func__, modlen);
>
> if (modlen <= sizeof(ms))
> return -EBADMSG;
>
> - memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
> + memcpy(&ms, data + (modlen - sizeof(ms)), sizeof(ms));
Pointless change
>
> ret = mod_check_sig(&ms, modlen, "module");
> if (ret)
> return ret;
>
> sig_len = be32_to_cpu(ms.sig_len);
> +
> + /* Ensure sig_len is valid to prevent underflow/oob */
> + if (sig_len > modlen - sizeof(ms))
> + return -EBADMSG;
Already verified by mod_check_sig()
> +
> modlen -= sig_len + sizeof(ms);
> info->len = modlen;
>
> - return verify_pkcs7_signature(mod, modlen, mod + modlen, sig_len,
> + return verify_pkcs7_signature(data, modlen, data + modlen, sig_len,
pointless change
> VERIFY_USE_SECONDARY_KEYRING,
> VERIFYING_MODULE_SIGNATURE,
> NULL, NULL);
> }
>
> +/*
> + * Check signature validity of a module during load.
> + */
> int module_sig_check(struct load_info *info, int flags)
> {
> int err = -ENODATA;
> const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
> const char *reason;
> - const void *mod = info->hdr;
> + const unsigned char *mod = info->hdr;
info->hdr is not void*, how can this work without a cast ?
> bool mangled_module = flags & (MODULE_INIT_IGNORE_MODVERSIONS |
> MODULE_INIT_IGNORE_VERMAGIC);
> +
Unrelated cosmetic change
> /*
> - * Do not allow mangled modules as a module with version information
> - * removed is no longer the module that was signed.
> + * Do not allow mangled modules: a module with version info removed
> + * is no longer the module that was signed.
> */
> if (!mangled_module &&
> info->len > markerlen &&
> - memcmp(mod + info->len - markerlen, MODULE_SIG_STRING, markerlen) == 0) {
> - /* We truncate the module to discard the signature */
> + memcmp(mod + info->len - markerlen,
> + MODULE_SIG_STRING, markerlen) == 0) {
> + /* Truncate the module to discard the signature marker */
Cosmetic and pointless change.
> info->len -= markerlen;
> err = mod_verify_sig(mod, info);
> if (!err) {
> @@ -92,9 +105,8 @@ int module_sig_check(struct load_info *info, int flags)
> }
>
> /*
> - * We don't permit modules to be loaded into the trusted kernels
> - * without a valid signature on them, but if we're not enforcing,
> - * certain errors are non-fatal.
> + * Enforced mode: only allow modules with a valid signature.
> + * Non-enforced mode: certain errors are downgraded to warnings.
> */
> switch (err) {
> case -ENODATA:
> @@ -106,12 +118,12 @@ int module_sig_check(struct load_info *info, int flags)
> case -ENOKEY:
> reason = "module with unavailable key";
> break;
> -
Cosmetic
> default:
> /*
> - * All other errors are fatal, including lack of memory,
> - * unparseable signatures, and signature check failures --
> - * even if signatures aren't required.
> + * All other errors are fatal, including:
> + * - OOM
> + * - unparseable signatures
> + * - invalid signature failures
> */
> return err;
> }
> diff --git a/kernel/module/signing.orig b/kernel/module/signing.orig
> new file mode 100644
> index 000000000000..a2ff4242e623
> --- /dev/null
> +++ b/kernel/module/signing.orig
> @@ -0,0 +1,125 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Module signature checker
> + *
> + * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +#include <linux/module_signature.h>
> +#include <linux/string.h>
> +#include <linux/verification.h>
> +#include <linux/security.h>
> +#include <crypto/public_key.h>
> +#include <uapi/linux/module.h>
> +#include "internal.h"
> +
> +#undef MODULE_PARAM_PREFIX
> +#define MODULE_PARAM_PREFIX "module."
> +
> +static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
> +module_param(sig_enforce, bool_enable_only, 0644);
> +
> +/*
> + * Export sig_enforce kernel cmdline parameter to allow other subsystems rely
> + * on that instead of directly to CONFIG_MODULE_SIG_FORCE config.
> + */
> +bool is_module_sig_enforced(void)
> +{
> + return sig_enforce;
> +}
> +EXPORT_SYMBOL(is_module_sig_enforced);
> +
> +void set_module_sig_enforced(void)
> +{
> + sig_enforce = true;
> +}
> +
> +/*
> + * Verify the signature on a module.
> + */
> +int mod_verify_sig(const void *mod, struct load_info *info)
> +{
> + struct module_signature ms;
> + size_t sig_len, modlen = info->len;
> + int ret;
> +
> + pr_devel("==>%s(,%zu)\n", __func__, modlen);
> +
> + if (modlen <= sizeof(ms))
> + return -EBADMSG;
> +
> + memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
> +
> + ret = mod_check_sig(&ms, modlen, "module");
> + if (ret)
> + return ret;
> +
> + sig_len = be32_to_cpu(ms.sig_len);
> + modlen -= sig_len + sizeof(ms);
> + info->len = modlen;
> +
> + return verify_pkcs7_signature(mod, modlen, mod + modlen, sig_len,
> + VERIFY_USE_SECONDARY_KEYRING,
> + VERIFYING_MODULE_SIGNATURE,
> + NULL, NULL);
> +}
> +
> +int module_sig_check(struct load_info *info, int flags)
> +{
> + int err = -ENODATA;
> + const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
> + const char *reason;
> + const void *mod = info->hdr;
> + bool mangled_module = flags & (MODULE_INIT_IGNORE_MODVERSIONS |
> + MODULE_INIT_IGNORE_VERMAGIC);
> + /*
> + * Do not allow mangled modules as a module with version information
> + * removed is no longer the module that was signed.
> + */
> + if (!mangled_module &&
> + info->len > markerlen &&
> + memcmp(mod + info->len - markerlen, MODULE_SIG_STRING, markerlen) == 0) {
> + /* We truncate the module to discard the signature */
> + info->len -= markerlen;
> + err = mod_verify_sig(mod, info);
> + if (!err) {
> + info->sig_ok = true;
> + return 0;
> + }
> + }
> +
> + /*
> + * We don't permit modules to be loaded into the trusted kernels
> + * without a valid signature on them, but if we're not enforcing,
> + * certain errors are non-fatal.
> + */
> + switch (err) {
> + case -ENODATA:
> + reason = "unsigned module";
> + break;
> + case -ENOPKG:
> + reason = "module with unsupported crypto";
> + break;
> + case -ENOKEY:
> + reason = "module with unavailable key";
> + break;
> +
> + default:
> + /*
> + * All other errors are fatal, including lack of memory,
> + * unparseable signatures, and signature check failures --
> + * even if signatures aren't required.
> + */
> + return err;
> + }
> +
> + if (is_module_sig_enforced()) {
> + pr_notice("Loading of %s is rejected\n", reason);
> + return -EKEYREJECTED;
> + }
> +
> + return security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
> +}
> --
> 2.50.1.windows.1
>
>
^ permalink raw reply
* Re: [RFC PATCH 00/10] scalable symbol flags with __kflagstab
From: Petr Pavlu @ 2025-09-08 10:09 UTC (permalink / raw)
To: Sid Nayyar
Cc: Nathan Chancellor, Luis Chamberlain, Sami Tolvanen,
Nicolas Schier, Arnd Bergmann, linux-kbuild, linux-arch,
linux-modules, linux-kernel, Giuliano Procida,
Matthias Männich
In-Reply-To: <CA+OvW8ZY1D3ECy2vw_Nojm1Kc8NzJHCpqNJUF0n8z3MhLAQd8A@mail.gmail.com>
On 9/4/25 1:28 AM, Sid Nayyar wrote:
> On Mon, Sep 1, 2025 at 1:27 PM Petr Pavlu <petr.pavlu@suse.com> wrote:
>> Merging __ksymtab and __ksymtab_gpl into a single section looks ok to
>> me, and similarly for __kcrctab and __kcrtab_gpl. The __ksymtab_gpl
>> support originally comes from commit 3344ea3ad4 ("[PATCH] MODULE_LICENSE
>> and EXPORT_SYMBOL_GPL support") [1], where it was named __gpl_ksymtab.
>> The commit doesn't mention why the implementation opts for using
>> a separate section, but I suspect it was designed this way to reduce
>> memory/disk usage.
>>
>> A question is whether the symbol flags should be stored in a new
>> __kflagstab section, instead of adding a flag member to the existing
>> __ksymtab. As far as I'm aware, no userspace tool (particularly kmod)
>> uses the __ksymtab data, so we are free to update its format.
>>
>> Note that I believe that __kcrctab/__kcrtab_gpl is a separate section
>> because the CRC data is available only if CONFIG_MODVERSIONS=y.
>>
>> Including the flags as part of __ksymtab would be obviously a simpler
>> schema. On the other hand, an entry in __ksymtab has in the worst case
>> a size of 24 bytes with an 8-byte alignment requirement. This means that
>> adding a flag to it would require additional 8 bytes per symbol.
>
> Thanks for looking into the history of the _gpl split. We also noted
> that there were up to five separate arrays at one point.
>
> We explored three approaches to this problem: using the existing
> __ksymtab, packing flags as bit-vectors, and the proposed
> __kflagstab. We ruled out the bit-vector approach due to its
> complexity, which would only save a few bits per symbol. The
> __ksymtab approach, while the simplest, was too wasteful of space.
> The __kflagstab seems like a good compromise, offering a slight
> increase in complexity over the __ksymtab method but requiring only
> one extra byte per symbol.
This sounds reasonable to me. Do you have any numbers on hand that would
show the impact of extending __ksymtab?
>
>>>
>>> The motivation for this change comes from the Android kernel, which uses
>>> an additional symbol flag to restrict the use of certain exported
>>> symbols by unsigned modules, thereby enhancing kernel security. This
>>> __kflagstab can be implemented as a bitmap to efficiently manage which
>>> symbols are available for general use versus those restricted to signed
>>> modules only.
>>
>> I think it would be useful to explain in more detail how this protected
>> schema is used in practice and what problem it solves. Who is allowed to
>> provide these limited unsigned modules and if the concern is kernel
>> security, can't you enforce the use of only signed modules?
>
> The Android Common Kernel source is compiled into what we call
> GKI (Generic Kernel Image), which consists of a kernel and a
> number of modules. We maintain a stable interface (based on CRCs and
> types) between the GKI components and vendor-specific modules
> (compiled by device manufacturers, e.g., for hardware-specific
> drivers) for the lifetime of a given GKI version.
>
> This interface is intentionally restricted to the minimal set of
> symbols required by the union of all vendor modules; our partners
> declare their requirements in symbol lists. Any additions to these
> lists are reviewed to ensure kernel internals are not overly exposed.
> For example, we restrict drivers from having the ability to open and
> read arbitrary files. This ABI boundary also allows us to evolve
> internal kernel types that are not exposed to vendor modules, for
> example, when a security fix requires a type to change.
>
> The mechanism we use for this is CONFIG_TRIM_UNUSED_KSYMS and
> CONFIG_UNUSED_KSYMS_WHITELIST. This results in a ksymtab
> containing two kinds of exported symbols: those explicitly required
> by vendors ("vendor-listed") and those only required by GKI modules
> ("GKI use only").
>
> On top of this, we have implemented symbol import protection
> (covered in patches 9/10 and 10/10). This feature prevents vendor
> modules from using symbols that are not on the vendor-listed
> whitelist. It is built on top of CONFIG_MODULE_SIG. GKI modules are
> signed with a specific key, while vendor modules are unsigned and thus
> treated as untrusted. This distinction allows signed GKI modules to
> use any symbol in the ksymtab, while unsigned vendor modules can only
> access the declared subset. This provides a significant layer of
> defense and security against potentially exploitable vendor module
> code.
If I understand correctly, this is similar to the recently introduced
EXPORT_SYMBOL_FOR_MODULES() macro, but with a coarser boundary.
I think that if the goal is to control the kABI scope and limit the use
of certain symbols only to GKI modules, then having the protection
depend on whether the module is signed is somewhat odd. It doesn't give
me much confidence if vendor modules are unsigned in the Android
ecosystem. I would expect that you want to improve this in the long
term.
It would then make more sense to me if the protection was determined by
whether the module is in-tree (the "intree" flag in modinfo) or,
alternatively, if it is signed by a built-in trusted key. I feel this
way the feature could be potentially useful for other distributions that
care about the kABI scope and have ecosystems where vendor modules are
properly signed with some key. However, I'm not sure if this would still
work in your case.
>
> Finally, we have implemented symbol export protection, which prevents
> a vendor module from impersonating a GKI module by exporting any of
> the same symbols. Note that this protection is currently specific to
> the Android kernel and is not included in this patch series.
>
> We have several years of experience with older implementations of
> these protection mechanisms. The code in this series is a
> considerably cleaner and simpler version compared to what has been
> shipping in Android kernels since Android 14 (6.1 kernel).
I agree. I'm not aware of any other distribution that would immediately
need this feature, so it is good if the implementation is kept
reasonably straightforward and doesn't overly complicate the module
loader code.
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH] tracing : Fix multiple issues in trace_printk module handling
From: Luis Chamberlain @ 2025-09-07 19:10 UTC (permalink / raw)
To: Fidal Palamparambil
Cc: linux-modules, petr.pavlu, da.gomez, samitolvanen, linux-kernel
In-Reply-To: <20250907135201.760-1-rootuserhere@gmail.com>
On Sun, Sep 07, 2025 at 05:52:01PM +0400, Fidal Palamparambil wrote:
> From: Fidal palamparambil <rootuserhere@gmail.com>
>
> This commit addresses several bugs and potential issues in the
> trace_printk module format handling code:
>
> 1. Memory leak fix: In hold_module_trace_bprintk_format(), ensure
> proper cleanup when format string allocation fails by setting
> tb_fmt to NULL after freeing it to prevent memory leaks.
>
> 2. NULL pointer dereference prevention: Added comprehensive NULL checks
> in t_show() function before dereferencing format pointers to prevent
> kernel crashes.
>
> 3. Input validation: Added NULL check in trace_is_tracepoint_string()
> to prevent potential NULL pointer dereference when called with
> invalid input.
>
> 4. Type safety: Fixed type casting in t_show() to use proper
> unsigned long casting for pointer arithmetic, ensuring correct
> pointer handling across different architectures.
>
> 5. Error handling: Fixed type mismatch in init_trace_printk_function_export()
> by properly handling struct dentry pointer return from tracing_init_dentry()
> and using IS_ERR_OR_NULL() for comprehensive error checking.
>
> 6. Code robustness: Added additional pointer validation throughout
> the code to handle potential edge cases and improve overall stability.
>
> 7. Memory safety: Ensured consistent handling of format pointers
> when memory allocation failures occur, preventing use-after-free
> and other memory corruption issues.
>
> These fixes improve the stability and reliability of the trace_printk
> infrastructure, particularly when dealing with module loading/unloading
> and format string management.
>
> Reported-by : kernel test robot <lkp@intel.com>
> Closes : https://lore.kernel.org/oe-kbuild-all/202509071540.GTxwwstz-lkp@intel.com/
> Signed-off-by: Fidal palamparambil <rootuserhere@gmail.com>
Stop, at this point after being told to stop, your intent is clear: to bug.
Go away. You're on my ignore list now.
Luis
^ permalink raw reply
* Re: [PATCH] Fixed the build warning in init_trace_printk_function_export():
From: kernel test robot @ 2025-09-07 18:33 UTC (permalink / raw)
To: Fidal Palamparambil, linux-modules
Cc: llvm, oe-kbuild-all, mcgrof, petr.pavlu, da.gomez, samitolvanen,
linux-kernel, Fidal palamparambil
In-Reply-To: <20250907140755.529-1-rootuserhere@gmail.com>
Hi Fidal,
kernel test robot noticed the following build errors:
[auto build test ERROR on trace/for-next]
[also build test ERROR on linus/master v6.17-rc4 next-20250905]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Fidal-Palamparambil/Fixed-the-build-warning-in-init_trace_printk_function_export/20250907-221041
base: https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
patch link: https://lore.kernel.org/r/20250907140755.529-1-rootuserhere%40gmail.com
patch subject: [PATCH] Fixed the build warning in init_trace_printk_function_export():
config: x86_64-buildonly-randconfig-002-20250907 (https://download.01.org/0day-ci/archive/20250908/202509080203.SxCdQOOY-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250908/202509080203.SxCdQOOY-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509080203.SxCdQOOY-lkp@intel.com/
All errors (new ones prefixed by >>):
>> kernel/trace/trace_printk.c:369:18: error: passing 'const struct file *' to parameter of type 'struct file *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
369 | return seq_open(file, &show_format_seq_ops);
| ^~~~
include/linux/seq_file.h:108:27: note: passing argument to parameter here
108 | int seq_open(struct file *, const struct seq_operations *);
| ^
>> kernel/trace/trace_printk.c:373:10: error: incompatible function pointer types initializing 'int (*)(struct inode *, struct file *)' with an expression of type 'int (struct inode *, const struct file *)' [-Wincompatible-function-pointer-types]
373 | .open = ftrace_formats_open,
| ^~~~~~~~~~~~~~~~~~~
2 errors generated.
vim +369 kernel/trace/trace_printk.c
7975a2be16dd42 Steven Rostedt 2009-03-12 359
7975a2be16dd42 Steven Rostedt 2009-03-12 360 static int
66670b02cb828c Fidal palamparambil 2025-09-07 361 ftrace_formats_open(struct inode *inode, const struct file *file)
7975a2be16dd42 Steven Rostedt 2009-03-12 362 {
17911ff38aa58d Steven Rostedt (VMware 2019-10-11 363) int ret;
17911ff38aa58d Steven Rostedt (VMware 2019-10-11 364)
17911ff38aa58d Steven Rostedt (VMware 2019-10-11 365) ret = security_locked_down(LOCKDOWN_TRACEFS);
17911ff38aa58d Steven Rostedt (VMware 2019-10-11 366) if (ret)
17911ff38aa58d Steven Rostedt (VMware 2019-10-11 367) return ret;
17911ff38aa58d Steven Rostedt (VMware 2019-10-11 368)
c8961ec6da22ea Li Zefan 2009-06-24 @369 return seq_open(file, &show_format_seq_ops);
7975a2be16dd42 Steven Rostedt 2009-03-12 370 }
7975a2be16dd42 Steven Rostedt 2009-03-12 371
7975a2be16dd42 Steven Rostedt 2009-03-12 372 static const struct file_operations ftrace_formats_fops = {
7975a2be16dd42 Steven Rostedt 2009-03-12 @373 .open = ftrace_formats_open,
7975a2be16dd42 Steven Rostedt 2009-03-12 374 .read = seq_read,
7975a2be16dd42 Steven Rostedt 2009-03-12 375 .llseek = seq_lseek,
7975a2be16dd42 Steven Rostedt 2009-03-12 376 .release = seq_release,
7975a2be16dd42 Steven Rostedt 2009-03-12 377 };
7975a2be16dd42 Steven Rostedt 2009-03-12 378
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox