From: Jarkko Sakkinen <jarkko@kernel.org>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Jarkko Sakkinen" <jarkko@profian.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Nathaniel McCallum" <nathaniel@profian.com>,
"Russell King" <linux@armlinux.org.uk>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Will Deacon" <will@kernel.org>,
"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
"Helge Deller" <deller@gmx.de>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
"Paul Mackerras" <paulus@samba.org>,
"Paul Walmsley" <paul.walmsley@sifive.com>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Heiko Carstens" <hca@linux.ibm.com>,
"Vasily Gorbik" <gor@linux.ibm.com>,
"Alexander Gordeev" <agordeev@linux.ibm.com>,
"Christian Borntraeger" <borntraeger@linux.ibm.com>,
"Sven Schnelle" <svens@linux.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
"Anil S Keshavamurthy" <anil.s.keshavamurthy@intel.com>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Luis Chamberlain" <mcgrof@kernel.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Kees Cook" <keescook@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Nathan Chancellor" <nathan@kernel.org>,
"Josh Poimboeuf" <jpoimboe@kernel.org>,
"Mark Rutland" <mark.rutland@arm.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Marco Elver" <elver@google.com>,
"Dan Li" <ashimida@linux.alibaba.com>,
"Sami Tolvanen" <samitolvanen@google.com>,
"Song Liu" <song@kernel.org>, "Ard Biesheuvel" <ardb@kernel.org>,
"Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>,
"Nick Desaulniers" <ndesaulniers@google.com>,
"Linus Walleij" <linus.walleij@linaro.org>,
"Chen Zhongjin" <chenzhongjin@huawei.com>,
"Nicolas Pitre" <nico@fluxnic.net>,
"Mark Brown" <broonie@kernel.org>,
"Luis Machado" <luis.machado@linaro.org>,
"Geert Uytterhoeven" <geert@linux-m68k.org>,
"Joey Gouly" <joey.gouly@arm.com>,
"Masahiro Yamada" <masahiroy@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Andrey Konovalov" <andreyknvl@gmail.com>,
"Kefeng Wang" <wangkefeng.wang@huawei.com>,
"Atsushi Nemoto" <anemo@mba.ocn.ne.jp>,
"Guenter Roeck" <linux@roeck-us.net>,
"Dave Anglin" <dave.anglin@bell.net>,
"Alexei Starovoitov" <ast@kernel.org>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Daniel Axtens" <dja@axtens.net>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
"Jordan Niethe" <jniethe5@gmail.com>,
"Guo Ren" <guoren@kernel.org>, "Anup Patel" <anup@brainfault.org>,
"Atish Patra" <atishp@atishpatra.org>,
"Changbin Du" <changbin.du@intel.com>,
"Heiko Stuebner" <heiko@sntech.de>,
"Liao Chang" <liaochang1@huawei.com>,
"Philipp Tomsich" <philipp.tomsich@vrull.eu>,
"Wu Caize" <zepan@sipeed.com>,
"Emil Renner Berthing" <kernel@esmil.dk>,
"Alexander Egorenkov" <egorenar@linux.ibm.com>,
"Thomas Richter" <tmricht@linux.ibm.com>,
"Tobias Huschle" <huschle@linux.ibm.com>,
"Ilya Leoshkevich" <iii@linux.ibm.com>,
"Tom Lendacky" <thomas.lendacky@amd.com>,
"Daniel Bristot de Oliveira" <bristot@redhat.com>,
"Michael Roth" <michael.roth@amd.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Javier Martinez Canillas" <javierm@redhat.com>,
"Miroslav Benes" <mbenes@suse.cz>,
"André Almeida" <andrealmeid@igalia.com>,
"Tiezhu Yang" <yangtiezhu@loongson.cn>,
"Dmitry Torokhov" <dmitry.torokhov@gmail.com>,
"Aaron Tomlin" <atomlin@redhat.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
"linux-modules@vger.kernel.org" <linux-modules@vger.kernel.org>
Subject: Re: [PATCH] kprobes: Enable tracing for mololithic kernel images
Date: Thu, 9 Jun 2022 15:57:54 +0300 [thread overview]
Message-ID: <YqHuUsevcvaaunVq@iki.fi> (raw)
In-Reply-To: <f2030fb4-4978-068b-6250-5bd5b2746675@csgroup.eu>
On Thu, Jun 09, 2022 at 08:30:12AM +0000, Christophe Leroy wrote:
>
>
> Le 08/06/2022 à 01:59, Jarkko Sakkinen a écrit :
> > [You don't often get email from jarkko@profian.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Tracing with kprobes while running a monolithic kernel is currently
> > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES. This
> > dependency is a result of kprobes code using the module allocator for the
> > trampoline code.
> >
> > Detaching kprobes from modules helps to squeeze down the user space,
> > e.g. when developing new core kernel features, while still having all
> > the nice tracing capabilities.
>
> Nice idea, could also be nice to have BPF without MODULES.
Yeah, for sure. You have to start from somewhere :-) I'd guess this
a step forward also for BPF.
> >
> > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > or CONFIG_KPROBES is enabled. In addition, flag kernel module specific
> > code with CONFIG_MODULES.
>
> Nice, but that's not enough. You have to audit every peace of code that
> depends on CONFIG_MODULES and see if it needs to be activated for your
> case as well. For instance some powerpc configurations don't honor exec
> page faults on kernel pages when CONFIG_MODULES is not selected.
Thanks for pointing this out. With "every peace of code" you probably
are referring to the 13 arch-folders, which support kprobes in the first
place (just checking)?
> > As the result, kprobes can be used with a monolithic kernel.
> >
> > Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
>
> I think this patch should be split in a several patches, one (or even
> one per architectures ?) to make modules_alloc() independant of
> CONFIG_MODULES, then a patch to make CONFIG_KPROBES independant on
> CONFIG_MOUDLES.
Agreed. And also because of your previous remark, i.e. each arch needs
it own conclusions of the changes. I purposely did this first as a one
patch in order to get a better picture of the situation.
> > ---
> > Tested with the help of BuildRoot and QEMU:
> > - arm (function tracer)
> > - arm64 (function tracer)
> > - mips (function tracer)
> > - powerpc (function tracer)
> > - riscv (function tracer)
> > - s390 (function tracer)
> > - sparc (function tracer)
> > - x86 (function tracer)
> > - sh (function tracer, for the "pure" kernel/modules_alloc.c path)
> > ---
> > arch/Kconfig | 1 -
> > arch/arm/kernel/Makefile | 5 +++
> > arch/arm/kernel/module.c | 32 ----------------
> > arch/arm/kernel/module_alloc.c | 42 ++++++++++++++++++++
> > arch/arm64/kernel/Makefile | 5 +++
> > arch/arm64/kernel/module.c | 47 -----------------------
> > arch/arm64/kernel/module_alloc.c | 57 ++++++++++++++++++++++++++++
> > arch/mips/kernel/Makefile | 5 +++
> > arch/mips/kernel/module.c | 9 -----
> > arch/mips/kernel/module_alloc.c | 18 +++++++++
> > arch/parisc/kernel/Makefile | 5 +++
> > arch/parisc/kernel/module.c | 11 ------
> > arch/parisc/kernel/module_alloc.c | 23 +++++++++++
> > arch/powerpc/kernel/Makefile | 5 +++
> > arch/powerpc/kernel/module.c | 37 ------------------
> > arch/powerpc/kernel/module_alloc.c | 47 +++++++++++++++++++++++
>
> You are missing necessary changes for powerpc.
>
> On powerpc 8xx or powerpc 603, software TLB handlers don't honor
> instruction TLB miss when CONFIG_MODULES are not set, look into
> head_8xx.S and head_book3s_32.S
>
> On powerpc book3s/32, all kernel space is set to NX except the module
> segment. When CONFIG_MODULES is all space is set NX. See
> mmu_mark_initmem_nx() and is_module_segment().
Thank you! I'll go this through and also try to build an environment
with BuildRoot where I can test-run this configuration.
> > arch/riscv/kernel/Makefile | 5 +++
> > arch/riscv/kernel/module.c | 10 -----
> > arch/riscv/kernel/module_alloc.c | 19 ++++++++++
> > arch/s390/kernel/Makefile | 5 +++
> > arch/s390/kernel/module.c | 17 ---------
> > arch/s390/kernel/module_alloc.c | 33 ++++++++++++++++
> > arch/sparc/kernel/Makefile | 5 +++
> > arch/sparc/kernel/module.c | 30 ---------------
> > arch/sparc/kernel/module_alloc.c | 39 +++++++++++++++++++
> > arch/x86/kernel/Makefile | 5 +++
> > arch/x86/kernel/module.c | 50 ------------------------
> > arch/x86/kernel/module_alloc.c | 61 ++++++++++++++++++++++++++++++
> > kernel/Makefile | 5 +++
> > kernel/kprobes.c | 10 +++++
> > kernel/module/main.c | 17 ---------
> > kernel/module_alloc.c | 26 +++++++++++++
> > kernel/trace/trace_kprobe.c | 10 ++++-
> > 33 files changed, 434 insertions(+), 262 deletions(-)
> > create mode 100644 arch/arm/kernel/module_alloc.c
> > create mode 100644 arch/arm64/kernel/module_alloc.c
> > create mode 100644 arch/mips/kernel/module_alloc.c
> > create mode 100644 arch/parisc/kernel/module_alloc.c
> > create mode 100644 arch/powerpc/kernel/module_alloc.c
> > create mode 100644 arch/riscv/kernel/module_alloc.c
> > create mode 100644 arch/s390/kernel/module_alloc.c
> > create mode 100644 arch/sparc/kernel/module_alloc.c
> > create mode 100644 arch/x86/kernel/module_alloc.c
> > create mode 100644 kernel/module_alloc.c
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index fcf9a41a4ef5..e8e3e7998a2e 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -39,7 +39,6 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
>
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 2e2a2a9bcf43..5a811cdf230b 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -103,6 +103,11 @@ obj-$(CONFIG_HIBERNATION) += swsusp_$(BITS).o
> > endif
> > obj64-$(CONFIG_HIBERNATION) += swsusp_asm64.o
> > obj-$(CONFIG_MODULES) += module.o module_$(BITS).o
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES) += module_alloc.o
> > +endif
>
> Why not just do:
>
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o
>
> However, a new hidden config item (eg: CONFIG_DYNAMIC_TEXT) selected by
> both CONFIG_MODULES and CONFIG_KPROBES would make like easier when
> you'll come to do the changes required.
I'll do this. Russell King also pointed out the same thing.
> > obj-$(CONFIG_44x) += cpu_setup_44x.o
> > obj-$(CONFIG_PPC_FSL_BOOK3E) += cpu_setup_fsl_booke.o
> > obj-$(CONFIG_PPC_DOORBELL) += dbell.o
> > diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
> > index f6d6ae0a1692..b30e00964a60 100644
> > --- a/arch/powerpc/kernel/module.c
> > +++ b/arch/powerpc/kernel/module.c
> > @@ -88,40 +88,3 @@ int module_finalize(const Elf_Ehdr *hdr,
> >
> > return 0;
> > }
> > -
> > -static __always_inline void *
> > -__module_alloc(unsigned long size, unsigned long start, unsigned long end, bool nowarn)
> > -{
> > - pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
> > - gfp_t gfp = GFP_KERNEL | (nowarn ? __GFP_NOWARN : 0);
> > -
> > - /*
> > - * Don't do huge page allocations for modules yet until more testing
> > - * is done. STRICT_MODULE_RWX may require extra work to support this
> > - * too.
> > - */
> > - return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> > - VM_FLUSH_RESET_PERMS,
> > - NUMA_NO_NODE, __builtin_return_address(0));
> > -}
> > -
> > -void *module_alloc(unsigned long size)
> > -{
> > -#ifdef MODULES_VADDR
> > - unsigned long limit = (unsigned long)_etext - SZ_32M;
> > - void *ptr = NULL;
> > -
> > - BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
> > -
> > - /* First try within 32M limit from _etext to avoid branch trampolines */
> > - if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit)
> > - ptr = __module_alloc(size, limit, MODULES_END, true);
> > -
> > - if (!ptr)
> > - ptr = __module_alloc(size, MODULES_VADDR, MODULES_END, false);
> > -
> > - return ptr;
> > -#else
> > - return __module_alloc(size, VMALLOC_START, VMALLOC_END, false);
> > -#endif
> > -}
> > diff --git a/arch/powerpc/kernel/module_alloc.c b/arch/powerpc/kernel/module_alloc.c
> > new file mode 100644
> > index 000000000000..48541c27ce46
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/module_alloc.c
> > @@ -0,0 +1,47 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Kernel module help for powerpc.
> > + * Copyright (C) 2001, 2003 Rusty Russell IBM Corporation.
> > + * Copyright (C) 2008 Freescale Semiconductor, Inc.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +
> > +static __always_inline void *
> > +__module_alloc(unsigned long size, unsigned long start, unsigned long end, bool nowarn)
> > +{
> > + pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
> > + gfp_t gfp = GFP_KERNEL | (nowarn ? __GFP_NOWARN : 0);
> > +
> > + /*
> > + * Don't do huge page allocations for modules yet until more testing
> > + * is done. STRICT_MODULE_RWX may require extra work to support this
> > + * too.
> > + */
> > + return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> > + VM_FLUSH_RESET_PERMS,
> > + NUMA_NO_NODE, __builtin_return_address(0));
> > +}
> > +
> > +void *module_alloc(unsigned long size)
> > +{
> > +#ifdef MODULES_VADDR
>
> Is MODULES_VADDR defined even when CONFIG_MODULES is not ?
Yes, by this in ppc's asm/pgtable.h:
#ifdef CONFIG_PPC_BOOK3S
#include <asm/book3s/pgtable.h>
#else
#include <asm/nohash/pgtable.h>
#endif /* !CONFIG_PPC_BOOK3S */
> > + unsigned long limit = (unsigned long)_etext - SZ_32M;
> > + void *ptr = NULL;
> > +
> > + BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
> > +
> > + /* First try within 32M limit from _etext to avoid branch trampolines */
> > + if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit)
> > + ptr = __module_alloc(size, limit, MODULES_END, true);
> > +
> > + if (!ptr)
> > + ptr = __module_alloc(size, MODULES_VADDR, MODULES_END, false);
> > +
> > + return ptr;
> > +#else
> > + return __module_alloc(size, VMALLOC_START, VMALLOC_END, false);
> > +#endif
> > +}
>
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 318789c728d3..2981fe42060d 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -53,6 +53,11 @@ obj-y += livepatch/
> > obj-y += dma/
> > obj-y += entry/
> > obj-$(CONFIG_MODULES) += module/
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES) += module_alloc.o
> > +endif
>
> Same comment, could be:
>
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o
Ditto.
>
> >
> > obj-$(CONFIG_KCMP) += kcmp.o
> > obj-$(CONFIG_FREEZER) += freezer.o
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index f214f8c088ed..3f9876374cd3 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1569,6 +1569,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
> > @@ -1592,6 +1593,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > ret = -ENOENT;
> > }
> > }
> > +#endif
> > +
> > out:
> > preempt_enable();
> > jump_label_unlock();
> > @@ -2475,6 +2478,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Remove all symbols in given area from kprobe blacklist */
> > static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> > {
> > @@ -2492,6 +2496,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> > {
> > kprobe_remove_area_blacklist(entry, entry + 1);
> > }
> > +#endif /* CONFIG_MODULES */
> >
> > int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> > char *type, char *sym)
> > @@ -2557,6 +2562,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> > return ret ? : arch_populate_kprobe_blacklist();
> > }
> >
> > +#ifdef CONFIG_MODULES
> > static void add_module_kprobe_blacklist(struct module *mod)
> > {
> > unsigned long start, end;
> > @@ -2658,6 +2664,7 @@ static struct notifier_block kprobe_module_nb = {
> > .notifier_call = kprobes_module_callback,
> > .priority = 0
> > };
> > +#endif /* CONFIG_MODULES */
> >
> > void kprobe_free_init_mem(void)
> > {
> > @@ -2717,8 +2724,11 @@ static int __init init_kprobes(void)
> > err = arch_init_kprobes();
> > if (!err)
> > err = register_die_notifier(&kprobe_exceptions_nb);
> > +
> > +#ifdef CONFIG_MODULES
> > if (!err)
> > err = register_module_notifier(&kprobe_module_nb);
> > +#endif
> >
> > kprobes_initialized = (err == 0);
> > kprobe_sysctls_init();
> > diff --git a/kernel/module/main.c b/kernel/module/main.c
> > index fed58d30725d..7fa182b78550 100644
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -1121,16 +1121,6 @@ resolve_symbol_wait(struct module *mod,
> > return ksym;
> > }
> >
> > -void __weak module_memfree(void *module_region)
> > -{
> > - /*
> > - * This memory may be RO, and freeing RO memory in an interrupt is not
> > - * supported by vmalloc.
> > - */
> > - WARN_ON(in_interrupt());
> > - vfree(module_region);
> > -}
> > -
> > void __weak module_arch_cleanup(struct module *mod)
> > {
> > }
> > @@ -1606,13 +1596,6 @@ static void dynamic_debug_remove(struct module *mod, struct _ddebug *debug)
> > ddebug_remove_module(mod->name);
> > }
> >
> > -void * __weak module_alloc(unsigned long size)
> > -{
> > - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> > - GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> > - NUMA_NO_NODE, __builtin_return_address(0));
> > -}
> > -
> > bool __weak module_init_section(const char *name)
> > {
> > return strstarts(name, ".init");
> > diff --git a/kernel/module_alloc.c b/kernel/module_alloc.c
> > new file mode 100644
> > index 000000000000..26a4c60998ad
> > --- /dev/null
> > +++ b/kernel/module_alloc.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2002 Richard Henderson
> > + * Copyright (C) 2001 Rusty Russell, 2002, 2010 Rusty Russell IBM.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +
> > +void * __weak module_alloc(unsigned long size)
> > +{
> > + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> > + GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> > + NUMA_NO_NODE, __builtin_return_address(0));
> > +}
> > +
> > +void __weak module_memfree(void *module_region)
> > +{
> > + /*
> > + * This memory may be RO, and freeing RO memory in an interrupt is not
> > + * supported by vmalloc.
> > + */
> > + WARN_ON(in_interrupt());
> > + vfree(module_region);
> > +}
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 93507330462c..050b2975332e 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -101,6 +101,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> > return kprobe_gone(&tk->rp.kp);
> > }
> >
> > +#ifdef CONFIG_MODULES
> > static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> > struct module *mod)
> > {
> > @@ -109,11 +110,13 @@ static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> >
> > return strncmp(module_name(mod), name, len) == 0 && name[len] == ':';
> > }
> > +#endif /* CONFIG_MODULES */
> >
> > static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > {
> > + bool ret = false;
> > +#ifdef CONFIG_MODULES
> > char *p;
> > - bool ret;
> >
> > if (!tk->symbol)
> > return false;
> > @@ -125,6 +128,7 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > ret = !!find_module(tk->symbol);
> > rcu_read_unlock_sched();
> > *p = ':';
> > +#endif /* CONFIG_MODULES */
> >
> > return ret;
> > }
> > @@ -668,6 +672,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> > return ret;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Module notifier call back, checking event on the module */
> > static int trace_kprobe_module_callback(struct notifier_block *nb,
> > unsigned long val, void *data)
> > @@ -702,6 +707,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> > .notifier_call = trace_kprobe_module_callback,
> > .priority = 1 /* Invoked after kprobe module callback */
> > };
> > +#endif /* CONFIG_MODULES */
> >
> > static int __trace_kprobe_create(int argc, const char *argv[])
> > {
> > @@ -1896,8 +1902,10 @@ static __init int init_kprobe_trace_early(void)
> > if (ret)
> > return ret;
> >
> > +#ifdef CONFIG_MODULES
> > if (register_module_notifier(&trace_kprobe_module_nb))
> > return -EINVAL;
> > +#endif /* CONFIG_MODULES */
> >
> > return 0;
> > }
> > --
> > 2.36.1
> >
Thanks for the well-considered remarks!
BR, Jarkko
WARNING: multiple messages have this Message-ID (diff)
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Jarkko Sakkinen" <jarkko@profian.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Nathaniel McCallum" <nathaniel@profian.com>,
"Russell King" <linux@armlinux.org.uk>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Will Deacon" <will@kernel.org>,
"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
"Helge Deller" <deller@gmx.de>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
"Paul Mackerras" <paulus@samba.org>,
"Paul Walmsley" <paul.walmsley@sifive.com>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Heiko Carstens" <hca@linux.ibm.com>,
"Vasily Gorbik" <gor@linux.ibm.com>,
"Alexander Gordeev" <agordeev@linux.ibm.com>,
"Christian Borntraeger" <borntraeger@linux.ibm.com>,
"Sven Schnelle" <svens@linux.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
"Anil S Keshavamurthy" <anil.s.keshavamurthy@intel.com>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Luis Chamberlain" <mcgrof@kernel.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Kees Cook" <keescook@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Nathan Chancellor" <nathan@kernel.org>,
"Josh Poimboeuf" <jpoimboe@kernel.org>,
"Mark Rutland" <mark.rutland@arm.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Marco Elver" <elver@google.com>,
"Dan Li" <ashimida@linux.alibaba.com>,
"Sami Tolvanen" <samitolvanen@google.com>,
"Song Liu" <song@kernel.org>, "Ard Biesheuvel" <ardb@kernel.org>,
"Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>,
"Nick Desaulniers" <ndesaulniers@google.com>,
"Linus Walleij" <linus.walleij@linaro.org>,
"Chen Zhongjin" <chenzhongjin@huawei.com>,
"Nicolas Pitre" <nico@fluxnic.net>,
"Mark Brown" <broonie@kernel.org>,
"Luis Machado" <luis.machado@linaro.org>,
"Geert Uytterhoeven" <geert@linux-m68k.org>,
"Joey Gouly" <joey.gouly@arm.com>,
"Masahiro Yamada" <masahiroy@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Andrey Konovalov" <andreyknvl@gmail.com>,
"Kefeng Wang" <wangkefeng.wang@huawei.com>,
"Atsushi Nemoto" <anemo@mba.ocn.ne.jp>,
"Guenter Roeck" <linux@roeck-us.net>,
"Dave Anglin" <dave.anglin@bell.net>,
"Alexei Starovoitov" <ast@kernel.org>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Daniel Axtens" <dja@axtens.net>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
"Jordan Niethe" <jniethe5@gmail.com>,
"Guo Ren" <guoren@kernel.org>, "Anup Patel" <anup@brainfault.org>,
"Atish Patra" <atishp@atishpatra.org>,
"Changbin Du" <changbin.du@intel.com>,
"Heiko Stuebner" <heiko@sntech.de>,
"Liao Chang" <liaochang1@huawei.com>,
"Philipp Tomsich" <philipp.tomsich@vrull.eu>,
"Wu Caize" <zepan@sipeed.com>,
"Emil Renner Berthing" <kernel@esmil.dk>,
"Alexander Egorenkov" <egorenar@linux.ibm.com>,
"Thomas Richter" <tmricht@linux.ibm.com>,
"Tobias Huschle" <huschle@linux.ibm.com>,
"Ilya Leoshkevich" <iii@linux.ibm.com>,
"Tom Lendacky" <thomas.lendacky@amd.com>,
"Daniel Bristot de Oliveira" <bristot@redhat.com>,
"Michael Roth" <michael.roth@amd.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Javier Martinez Canillas" <javierm@redhat.com>,
"Miroslav Benes" <mbenes@suse.cz>,
"André Almeida" <andrealmeid@igalia.com>,
"Tiezhu Yang" <yangtiezhu@loongson.cn>,
"Dmitry Torokhov" <dmitry.torokhov@gmail.com>,
"Aaron Tomlin" <atomlin@redhat.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
"linux-modules@vger.kernel.org" <linux-modules@vger.kernel.org>
Subject: Re: [PATCH] kprobes: Enable tracing for mololithic kernel images
Date: Thu, 9 Jun 2022 15:57:54 +0300 [thread overview]
Message-ID: <YqHuUsevcvaaunVq@iki.fi> (raw)
In-Reply-To: <f2030fb4-4978-068b-6250-5bd5b2746675@csgroup.eu>
On Thu, Jun 09, 2022 at 08:30:12AM +0000, Christophe Leroy wrote:
>
>
> Le 08/06/2022 à 01:59, Jarkko Sakkinen a écrit :
> > [You don't often get email from jarkko@profian.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Tracing with kprobes while running a monolithic kernel is currently
> > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES. This
> > dependency is a result of kprobes code using the module allocator for the
> > trampoline code.
> >
> > Detaching kprobes from modules helps to squeeze down the user space,
> > e.g. when developing new core kernel features, while still having all
> > the nice tracing capabilities.
>
> Nice idea, could also be nice to have BPF without MODULES.
Yeah, for sure. You have to start from somewhere :-) I'd guess this
a step forward also for BPF.
> >
> > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > or CONFIG_KPROBES is enabled. In addition, flag kernel module specific
> > code with CONFIG_MODULES.
>
> Nice, but that's not enough. You have to audit every peace of code that
> depends on CONFIG_MODULES and see if it needs to be activated for your
> case as well. For instance some powerpc configurations don't honor exec
> page faults on kernel pages when CONFIG_MODULES is not selected.
Thanks for pointing this out. With "every peace of code" you probably
are referring to the 13 arch-folders, which support kprobes in the first
place (just checking)?
> > As the result, kprobes can be used with a monolithic kernel.
> >
> > Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
>
> I think this patch should be split in a several patches, one (or even
> one per architectures ?) to make modules_alloc() independant of
> CONFIG_MODULES, then a patch to make CONFIG_KPROBES independant on
> CONFIG_MOUDLES.
Agreed. And also because of your previous remark, i.e. each arch needs
it own conclusions of the changes. I purposely did this first as a one
patch in order to get a better picture of the situation.
> > ---
> > Tested with the help of BuildRoot and QEMU:
> > - arm (function tracer)
> > - arm64 (function tracer)
> > - mips (function tracer)
> > - powerpc (function tracer)
> > - riscv (function tracer)
> > - s390 (function tracer)
> > - sparc (function tracer)
> > - x86 (function tracer)
> > - sh (function tracer, for the "pure" kernel/modules_alloc.c path)
> > ---
> > arch/Kconfig | 1 -
> > arch/arm/kernel/Makefile | 5 +++
> > arch/arm/kernel/module.c | 32 ----------------
> > arch/arm/kernel/module_alloc.c | 42 ++++++++++++++++++++
> > arch/arm64/kernel/Makefile | 5 +++
> > arch/arm64/kernel/module.c | 47 -----------------------
> > arch/arm64/kernel/module_alloc.c | 57 ++++++++++++++++++++++++++++
> > arch/mips/kernel/Makefile | 5 +++
> > arch/mips/kernel/module.c | 9 -----
> > arch/mips/kernel/module_alloc.c | 18 +++++++++
> > arch/parisc/kernel/Makefile | 5 +++
> > arch/parisc/kernel/module.c | 11 ------
> > arch/parisc/kernel/module_alloc.c | 23 +++++++++++
> > arch/powerpc/kernel/Makefile | 5 +++
> > arch/powerpc/kernel/module.c | 37 ------------------
> > arch/powerpc/kernel/module_alloc.c | 47 +++++++++++++++++++++++
>
> You are missing necessary changes for powerpc.
>
> On powerpc 8xx or powerpc 603, software TLB handlers don't honor
> instruction TLB miss when CONFIG_MODULES are not set, look into
> head_8xx.S and head_book3s_32.S
>
> On powerpc book3s/32, all kernel space is set to NX except the module
> segment. When CONFIG_MODULES is all space is set NX. See
> mmu_mark_initmem_nx() and is_module_segment().
Thank you! I'll go this through and also try to build an environment
with BuildRoot where I can test-run this configuration.
> > arch/riscv/kernel/Makefile | 5 +++
> > arch/riscv/kernel/module.c | 10 -----
> > arch/riscv/kernel/module_alloc.c | 19 ++++++++++
> > arch/s390/kernel/Makefile | 5 +++
> > arch/s390/kernel/module.c | 17 ---------
> > arch/s390/kernel/module_alloc.c | 33 ++++++++++++++++
> > arch/sparc/kernel/Makefile | 5 +++
> > arch/sparc/kernel/module.c | 30 ---------------
> > arch/sparc/kernel/module_alloc.c | 39 +++++++++++++++++++
> > arch/x86/kernel/Makefile | 5 +++
> > arch/x86/kernel/module.c | 50 ------------------------
> > arch/x86/kernel/module_alloc.c | 61 ++++++++++++++++++++++++++++++
> > kernel/Makefile | 5 +++
> > kernel/kprobes.c | 10 +++++
> > kernel/module/main.c | 17 ---------
> > kernel/module_alloc.c | 26 +++++++++++++
> > kernel/trace/trace_kprobe.c | 10 ++++-
> > 33 files changed, 434 insertions(+), 262 deletions(-)
> > create mode 100644 arch/arm/kernel/module_alloc.c
> > create mode 100644 arch/arm64/kernel/module_alloc.c
> > create mode 100644 arch/mips/kernel/module_alloc.c
> > create mode 100644 arch/parisc/kernel/module_alloc.c
> > create mode 100644 arch/powerpc/kernel/module_alloc.c
> > create mode 100644 arch/riscv/kernel/module_alloc.c
> > create mode 100644 arch/s390/kernel/module_alloc.c
> > create mode 100644 arch/sparc/kernel/module_alloc.c
> > create mode 100644 arch/x86/kernel/module_alloc.c
> > create mode 100644 kernel/module_alloc.c
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index fcf9a41a4ef5..e8e3e7998a2e 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -39,7 +39,6 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
>
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 2e2a2a9bcf43..5a811cdf230b 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -103,6 +103,11 @@ obj-$(CONFIG_HIBERNATION) += swsusp_$(BITS).o
> > endif
> > obj64-$(CONFIG_HIBERNATION) += swsusp_asm64.o
> > obj-$(CONFIG_MODULES) += module.o module_$(BITS).o
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES) += module_alloc.o
> > +endif
>
> Why not just do:
>
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o
>
> However, a new hidden config item (eg: CONFIG_DYNAMIC_TEXT) selected by
> both CONFIG_MODULES and CONFIG_KPROBES would make like easier when
> you'll come to do the changes required.
I'll do this. Russell King also pointed out the same thing.
> > obj-$(CONFIG_44x) += cpu_setup_44x.o
> > obj-$(CONFIG_PPC_FSL_BOOK3E) += cpu_setup_fsl_booke.o
> > obj-$(CONFIG_PPC_DOORBELL) += dbell.o
> > diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
> > index f6d6ae0a1692..b30e00964a60 100644
> > --- a/arch/powerpc/kernel/module.c
> > +++ b/arch/powerpc/kernel/module.c
> > @@ -88,40 +88,3 @@ int module_finalize(const Elf_Ehdr *hdr,
> >
> > return 0;
> > }
> > -
> > -static __always_inline void *
> > -__module_alloc(unsigned long size, unsigned long start, unsigned long end, bool nowarn)
> > -{
> > - pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
> > - gfp_t gfp = GFP_KERNEL | (nowarn ? __GFP_NOWARN : 0);
> > -
> > - /*
> > - * Don't do huge page allocations for modules yet until more testing
> > - * is done. STRICT_MODULE_RWX may require extra work to support this
> > - * too.
> > - */
> > - return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> > - VM_FLUSH_RESET_PERMS,
> > - NUMA_NO_NODE, __builtin_return_address(0));
> > -}
> > -
> > -void *module_alloc(unsigned long size)
> > -{
> > -#ifdef MODULES_VADDR
> > - unsigned long limit = (unsigned long)_etext - SZ_32M;
> > - void *ptr = NULL;
> > -
> > - BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
> > -
> > - /* First try within 32M limit from _etext to avoid branch trampolines */
> > - if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit)
> > - ptr = __module_alloc(size, limit, MODULES_END, true);
> > -
> > - if (!ptr)
> > - ptr = __module_alloc(size, MODULES_VADDR, MODULES_END, false);
> > -
> > - return ptr;
> > -#else
> > - return __module_alloc(size, VMALLOC_START, VMALLOC_END, false);
> > -#endif
> > -}
> > diff --git a/arch/powerpc/kernel/module_alloc.c b/arch/powerpc/kernel/module_alloc.c
> > new file mode 100644
> > index 000000000000..48541c27ce46
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/module_alloc.c
> > @@ -0,0 +1,47 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Kernel module help for powerpc.
> > + * Copyright (C) 2001, 2003 Rusty Russell IBM Corporation.
> > + * Copyright (C) 2008 Freescale Semiconductor, Inc.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +
> > +static __always_inline void *
> > +__module_alloc(unsigned long size, unsigned long start, unsigned long end, bool nowarn)
> > +{
> > + pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
> > + gfp_t gfp = GFP_KERNEL | (nowarn ? __GFP_NOWARN : 0);
> > +
> > + /*
> > + * Don't do huge page allocations for modules yet until more testing
> > + * is done. STRICT_MODULE_RWX may require extra work to support this
> > + * too.
> > + */
> > + return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> > + VM_FLUSH_RESET_PERMS,
> > + NUMA_NO_NODE, __builtin_return_address(0));
> > +}
> > +
> > +void *module_alloc(unsigned long size)
> > +{
> > +#ifdef MODULES_VADDR
>
> Is MODULES_VADDR defined even when CONFIG_MODULES is not ?
Yes, by this in ppc's asm/pgtable.h:
#ifdef CONFIG_PPC_BOOK3S
#include <asm/book3s/pgtable.h>
#else
#include <asm/nohash/pgtable.h>
#endif /* !CONFIG_PPC_BOOK3S */
> > + unsigned long limit = (unsigned long)_etext - SZ_32M;
> > + void *ptr = NULL;
> > +
> > + BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
> > +
> > + /* First try within 32M limit from _etext to avoid branch trampolines */
> > + if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit)
> > + ptr = __module_alloc(size, limit, MODULES_END, true);
> > +
> > + if (!ptr)
> > + ptr = __module_alloc(size, MODULES_VADDR, MODULES_END, false);
> > +
> > + return ptr;
> > +#else
> > + return __module_alloc(size, VMALLOC_START, VMALLOC_END, false);
> > +#endif
> > +}
>
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 318789c728d3..2981fe42060d 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -53,6 +53,11 @@ obj-y += livepatch/
> > obj-y += dma/
> > obj-y += entry/
> > obj-$(CONFIG_MODULES) += module/
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES) += module_alloc.o
> > +endif
>
> Same comment, could be:
>
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o
Ditto.
>
> >
> > obj-$(CONFIG_KCMP) += kcmp.o
> > obj-$(CONFIG_FREEZER) += freezer.o
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index f214f8c088ed..3f9876374cd3 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1569,6 +1569,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
> > @@ -1592,6 +1593,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > ret = -ENOENT;
> > }
> > }
> > +#endif
> > +
> > out:
> > preempt_enable();
> > jump_label_unlock();
> > @@ -2475,6 +2478,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Remove all symbols in given area from kprobe blacklist */
> > static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> > {
> > @@ -2492,6 +2496,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> > {
> > kprobe_remove_area_blacklist(entry, entry + 1);
> > }
> > +#endif /* CONFIG_MODULES */
> >
> > int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> > char *type, char *sym)
> > @@ -2557,6 +2562,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> > return ret ? : arch_populate_kprobe_blacklist();
> > }
> >
> > +#ifdef CONFIG_MODULES
> > static void add_module_kprobe_blacklist(struct module *mod)
> > {
> > unsigned long start, end;
> > @@ -2658,6 +2664,7 @@ static struct notifier_block kprobe_module_nb = {
> > .notifier_call = kprobes_module_callback,
> > .priority = 0
> > };
> > +#endif /* CONFIG_MODULES */
> >
> > void kprobe_free_init_mem(void)
> > {
> > @@ -2717,8 +2724,11 @@ static int __init init_kprobes(void)
> > err = arch_init_kprobes();
> > if (!err)
> > err = register_die_notifier(&kprobe_exceptions_nb);
> > +
> > +#ifdef CONFIG_MODULES
> > if (!err)
> > err = register_module_notifier(&kprobe_module_nb);
> > +#endif
> >
> > kprobes_initialized = (err == 0);
> > kprobe_sysctls_init();
> > diff --git a/kernel/module/main.c b/kernel/module/main.c
> > index fed58d30725d..7fa182b78550 100644
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -1121,16 +1121,6 @@ resolve_symbol_wait(struct module *mod,
> > return ksym;
> > }
> >
> > -void __weak module_memfree(void *module_region)
> > -{
> > - /*
> > - * This memory may be RO, and freeing RO memory in an interrupt is not
> > - * supported by vmalloc.
> > - */
> > - WARN_ON(in_interrupt());
> > - vfree(module_region);
> > -}
> > -
> > void __weak module_arch_cleanup(struct module *mod)
> > {
> > }
> > @@ -1606,13 +1596,6 @@ static void dynamic_debug_remove(struct module *mod, struct _ddebug *debug)
> > ddebug_remove_module(mod->name);
> > }
> >
> > -void * __weak module_alloc(unsigned long size)
> > -{
> > - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> > - GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> > - NUMA_NO_NODE, __builtin_return_address(0));
> > -}
> > -
> > bool __weak module_init_section(const char *name)
> > {
> > return strstarts(name, ".init");
> > diff --git a/kernel/module_alloc.c b/kernel/module_alloc.c
> > new file mode 100644
> > index 000000000000..26a4c60998ad
> > --- /dev/null
> > +++ b/kernel/module_alloc.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2002 Richard Henderson
> > + * Copyright (C) 2001 Rusty Russell, 2002, 2010 Rusty Russell IBM.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +
> > +void * __weak module_alloc(unsigned long size)
> > +{
> > + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> > + GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> > + NUMA_NO_NODE, __builtin_return_address(0));
> > +}
> > +
> > +void __weak module_memfree(void *module_region)
> > +{
> > + /*
> > + * This memory may be RO, and freeing RO memory in an interrupt is not
> > + * supported by vmalloc.
> > + */
> > + WARN_ON(in_interrupt());
> > + vfree(module_region);
> > +}
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 93507330462c..050b2975332e 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -101,6 +101,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> > return kprobe_gone(&tk->rp.kp);
> > }
> >
> > +#ifdef CONFIG_MODULES
> > static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> > struct module *mod)
> > {
> > @@ -109,11 +110,13 @@ static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> >
> > return strncmp(module_name(mod), name, len) == 0 && name[len] == ':';
> > }
> > +#endif /* CONFIG_MODULES */
> >
> > static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > {
> > + bool ret = false;
> > +#ifdef CONFIG_MODULES
> > char *p;
> > - bool ret;
> >
> > if (!tk->symbol)
> > return false;
> > @@ -125,6 +128,7 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > ret = !!find_module(tk->symbol);
> > rcu_read_unlock_sched();
> > *p = ':';
> > +#endif /* CONFIG_MODULES */
> >
> > return ret;
> > }
> > @@ -668,6 +672,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> > return ret;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Module notifier call back, checking event on the module */
> > static int trace_kprobe_module_callback(struct notifier_block *nb,
> > unsigned long val, void *data)
> > @@ -702,6 +707,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> > .notifier_call = trace_kprobe_module_callback,
> > .priority = 1 /* Invoked after kprobe module callback */
> > };
> > +#endif /* CONFIG_MODULES */
> >
> > static int __trace_kprobe_create(int argc, const char *argv[])
> > {
> > @@ -1896,8 +1902,10 @@ static __init int init_kprobe_trace_early(void)
> > if (ret)
> > return ret;
> >
> > +#ifdef CONFIG_MODULES
> > if (register_module_notifier(&trace_kprobe_module_nb))
> > return -EINVAL;
> > +#endif /* CONFIG_MODULES */
> >
> > return 0;
> > }
> > --
> > 2.36.1
> >
Thanks for the well-considered remarks!
BR, Jarkko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Dan Li" <ashimida@linux.alibaba.com>,
"Heiko Stuebner" <heiko@sntech.de>,
"Linus Walleij" <linus.walleij@linaro.org>,
"Paul Mackerras" <paulus@samba.org>,
"Alexander Gordeev" <agordeev@linux.ibm.com>,
"Javier Martinez Canillas" <javierm@redhat.com>,
"Geert Uytterhoeven" <geert@linux-m68k.org>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Christian Borntraeger" <borntraeger@linux.ibm.com>,
"Guenter Roeck" <linux@roeck-us.net>,
"André Almeida" <andrealmeid@igalia.com>,
"Michael Roth" <michael.roth@amd.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Andrey Konovalov" <andreyknvl@gmail.com>,
"Nick Desaulniers" <ndesaulniers@google.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Luis Chamberlain" <mcgrof@kernel.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Wu Caize" <zepan@sipeed.com>, "Guo Ren" <guoren@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Mark Rutland" <mark.rutland@arm.com>,
"Luis Machado" <luis.machado@linaro.org>,
"Atsushi Nemoto" <anemo@mba.ocn.ne.jp>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"Joey Gouly" <joey.gouly@arm.com>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
"Song Liu" <song@kernel.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"Ilya Leoshkevich" <iii@linux.ibm.com>,
"Anup Patel" <anup@brainfault.org>,
"Helge Deller" <deller@gmx.de>,
"Anil S Keshavamurthy" <anil.s.keshavamurthy@intel.com>,
"Sven Schnelle" <svens@linux.ibm.com>,
"Tom Lendacky" <thomas.lendacky@amd.com>,
"Vasily Gorbik" <gor@linux.ibm.com>,
"Philipp Tomsich" <philipp.tomsich@vrull.eu>,
"Dave Anglin" <dave.anglin@bell.net>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"Daniel Axtens" <dja@axtens.net>,
"Nicolas Pitre" <nico@fluxnic.net>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
"Daniel Bristot de Oliveira" <bristot@redhat.com>,
"Kefeng Wang" <wangkefeng.wang@huawei.com>,
"Emil Renner Berthing" <kernel@esmil.dk>
Subject: Re: [PATCH] kprobes: Enable tracing for mololithic kernel images
Date: Thu, 9 Jun 2022 15:57:54 +0300 [thread overview]
Message-ID: <YqHuUsevcvaaunVq@iki.fi> (raw)
In-Reply-To: <f2030fb4-4978-068b-6250-5bd5b2746675@csgroup.eu>
Alexei Starovoitov <ast@kernel.org>, Will Deacon <will@kernel.org>, Masahiro Yamada <masahiroy@kernel.org>, Jarkko Sakkinen <jarkko@profian.com>, Sami Tolvanen <samitolvanen@google.com>, "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>, Marco Elver <elver@google.com>, Kees Cook <keescook@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Nathan Chancellor <nathan@kernel.org>, "Russell King \(Oracle\)" <rmk+kernel@armlinux.org.uk>, Mark Brown <broonie@kernel.org>, Borislav Petkov <bp@alien8.de>, Alexander Egorenkov <egorenar@linux.ibm.com>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, "linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>, Nathaniel McCallum <nathaniel@profian.com>, Dmitry Torokhov <dmitry.torokhov@gmail.com>, "David S. Miller" <davem@davemloft.net>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Tobias Huschle <huschle@linux.ibm.com>, "Peter Zijlstra \(Intel\)" <peterz@infradead.org>, "H. Peter Anvin" <hpa@zytor.com>, "sparclinux@vger.kernel.org" <s
parclinux@vger.kernel.org>, Tiezhu Yang <yangtiezhu@loongson.cn>, Miroslav Benes <mbenes@suse.cz>, Chen Zhongjin <chenzhongjin@huawei.com>, Ard Biesheuvel <ardb@kernel.org>, "x86@kernel.org" <x86@kernel.org>, Russell King <linux@armlinux.org.uk>, "linux-riscv@lists.infradead.org" <linux-riscv@lists.infradead.org>, Ingo Molnar <mingo@redhat.com>, Aaron Tomlin <atomlin@redhat.com>, Albert Ou <aou@eecs.berkeley.edu>, Heiko Carstens <hca@linux.ibm.com>, Liao Chang <liaochang1@huawei.com>, Paul Walmsley <paul.walmsley@sifive.com>, Josh Poimboeuf <jpoimboe@kernel.org>, Thomas Richter <tmricht@linux.ibm.com>, "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>, Changbin Du <changbin.du@intel.com>, Palmer Dabbelt <palmer@dabbelt.com>, "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>, "linux-modules@vger.kernel.org" <linux-modules@vger.kernel.org>
Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org
Sender: "Linuxppc-dev" <linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org>
On Thu, Jun 09, 2022 at 08:30:12AM +0000, Christophe Leroy wrote:
>
>
> Le 08/06/2022 à 01:59, Jarkko Sakkinen a écrit :
> > [You don't often get email from jarkko@profian.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Tracing with kprobes while running a monolithic kernel is currently
> > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES. This
> > dependency is a result of kprobes code using the module allocator for the
> > trampoline code.
> >
> > Detaching kprobes from modules helps to squeeze down the user space,
> > e.g. when developing new core kernel features, while still having all
> > the nice tracing capabilities.
>
> Nice idea, could also be nice to have BPF without MODULES.
Yeah, for sure. You have to start from somewhere :-) I'd guess this
a step forward also for BPF.
> >
> > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > or CONFIG_KPROBES is enabled. In addition, flag kernel module specific
> > code with CONFIG_MODULES.
>
> Nice, but that's not enough. You have to audit every peace of code that
> depends on CONFIG_MODULES and see if it needs to be activated for your
> case as well. For instance some powerpc configurations don't honor exec
> page faults on kernel pages when CONFIG_MODULES is not selected.
Thanks for pointing this out. With "every peace of code" you probably
are referring to the 13 arch-folders, which support kprobes in the first
place (just checking)?
> > As the result, kprobes can be used with a monolithic kernel.
> >
> > Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
>
> I think this patch should be split in a several patches, one (or even
> one per architectures ?) to make modules_alloc() independant of
> CONFIG_MODULES, then a patch to make CONFIG_KPROBES independant on
> CONFIG_MOUDLES.
Agreed. And also because of your previous remark, i.e. each arch needs
it own conclusions of the changes. I purposely did this first as a one
patch in order to get a better picture of the situation.
> > ---
> > Tested with the help of BuildRoot and QEMU:
> > - arm (function tracer)
> > - arm64 (function tracer)
> > - mips (function tracer)
> > - powerpc (function tracer)
> > - riscv (function tracer)
> > - s390 (function tracer)
> > - sparc (function tracer)
> > - x86 (function tracer)
> > - sh (function tracer, for the "pure" kernel/modules_alloc.c path)
> > ---
> > arch/Kconfig | 1 -
> > arch/arm/kernel/Makefile | 5 +++
> > arch/arm/kernel/module.c | 32 ----------------
> > arch/arm/kernel/module_alloc.c | 42 ++++++++++++++++++++
> > arch/arm64/kernel/Makefile | 5 +++
> > arch/arm64/kernel/module.c | 47 -----------------------
> > arch/arm64/kernel/module_alloc.c | 57 ++++++++++++++++++++++++++++
> > arch/mips/kernel/Makefile | 5 +++
> > arch/mips/kernel/module.c | 9 -----
> > arch/mips/kernel/module_alloc.c | 18 +++++++++
> > arch/parisc/kernel/Makefile | 5 +++
> > arch/parisc/kernel/module.c | 11 ------
> > arch/parisc/kernel/module_alloc.c | 23 +++++++++++
> > arch/powerpc/kernel/Makefile | 5 +++
> > arch/powerpc/kernel/module.c | 37 ------------------
> > arch/powerpc/kernel/module_alloc.c | 47 +++++++++++++++++++++++
>
> You are missing necessary changes for powerpc.
>
> On powerpc 8xx or powerpc 603, software TLB handlers don't honor
> instruction TLB miss when CONFIG_MODULES are not set, look into
> head_8xx.S and head_book3s_32.S
>
> On powerpc book3s/32, all kernel space is set to NX except the module
> segment. When CONFIG_MODULES is all space is set NX. See
> mmu_mark_initmem_nx() and is_module_segment().
Thank you! I'll go this through and also try to build an environment
with BuildRoot where I can test-run this configuration.
> > arch/riscv/kernel/Makefile | 5 +++
> > arch/riscv/kernel/module.c | 10 -----
> > arch/riscv/kernel/module_alloc.c | 19 ++++++++++
> > arch/s390/kernel/Makefile | 5 +++
> > arch/s390/kernel/module.c | 17 ---------
> > arch/s390/kernel/module_alloc.c | 33 ++++++++++++++++
> > arch/sparc/kernel/Makefile | 5 +++
> > arch/sparc/kernel/module.c | 30 ---------------
> > arch/sparc/kernel/module_alloc.c | 39 +++++++++++++++++++
> > arch/x86/kernel/Makefile | 5 +++
> > arch/x86/kernel/module.c | 50 ------------------------
> > arch/x86/kernel/module_alloc.c | 61 ++++++++++++++++++++++++++++++
> > kernel/Makefile | 5 +++
> > kernel/kprobes.c | 10 +++++
> > kernel/module/main.c | 17 ---------
> > kernel/module_alloc.c | 26 +++++++++++++
> > kernel/trace/trace_kprobe.c | 10 ++++-
> > 33 files changed, 434 insertions(+), 262 deletions(-)
> > create mode 100644 arch/arm/kernel/module_alloc.c
> > create mode 100644 arch/arm64/kernel/module_alloc.c
> > create mode 100644 arch/mips/kernel/module_alloc.c
> > create mode 100644 arch/parisc/kernel/module_alloc.c
> > create mode 100644 arch/powerpc/kernel/module_alloc.c
> > create mode 100644 arch/riscv/kernel/module_alloc.c
> > create mode 100644 arch/s390/kernel/module_alloc.c
> > create mode 100644 arch/sparc/kernel/module_alloc.c
> > create mode 100644 arch/x86/kernel/module_alloc.c
> > create mode 100644 kernel/module_alloc.c
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index fcf9a41a4ef5..e8e3e7998a2e 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -39,7 +39,6 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
>
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index 2e2a2a9bcf43..5a811cdf230b 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -103,6 +103,11 @@ obj-$(CONFIG_HIBERNATION) += swsusp_$(BITS).o
> > endif
> > obj64-$(CONFIG_HIBERNATION) += swsusp_asm64.o
> > obj-$(CONFIG_MODULES) += module.o module_$(BITS).o
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES) += module_alloc.o
> > +endif
>
> Why not just do:
>
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o
>
> However, a new hidden config item (eg: CONFIG_DYNAMIC_TEXT) selected by
> both CONFIG_MODULES and CONFIG_KPROBES would make like easier when
> you'll come to do the changes required.
I'll do this. Russell King also pointed out the same thing.
> > obj-$(CONFIG_44x) += cpu_setup_44x.o
> > obj-$(CONFIG_PPC_FSL_BOOK3E) += cpu_setup_fsl_booke.o
> > obj-$(CONFIG_PPC_DOORBELL) += dbell.o
> > diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
> > index f6d6ae0a1692..b30e00964a60 100644
> > --- a/arch/powerpc/kernel/module.c
> > +++ b/arch/powerpc/kernel/module.c
> > @@ -88,40 +88,3 @@ int module_finalize(const Elf_Ehdr *hdr,
> >
> > return 0;
> > }
> > -
> > -static __always_inline void *
> > -__module_alloc(unsigned long size, unsigned long start, unsigned long end, bool nowarn)
> > -{
> > - pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
> > - gfp_t gfp = GFP_KERNEL | (nowarn ? __GFP_NOWARN : 0);
> > -
> > - /*
> > - * Don't do huge page allocations for modules yet until more testing
> > - * is done. STRICT_MODULE_RWX may require extra work to support this
> > - * too.
> > - */
> > - return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> > - VM_FLUSH_RESET_PERMS,
> > - NUMA_NO_NODE, __builtin_return_address(0));
> > -}
> > -
> > -void *module_alloc(unsigned long size)
> > -{
> > -#ifdef MODULES_VADDR
> > - unsigned long limit = (unsigned long)_etext - SZ_32M;
> > - void *ptr = NULL;
> > -
> > - BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
> > -
> > - /* First try within 32M limit from _etext to avoid branch trampolines */
> > - if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit)
> > - ptr = __module_alloc(size, limit, MODULES_END, true);
> > -
> > - if (!ptr)
> > - ptr = __module_alloc(size, MODULES_VADDR, MODULES_END, false);
> > -
> > - return ptr;
> > -#else
> > - return __module_alloc(size, VMALLOC_START, VMALLOC_END, false);
> > -#endif
> > -}
> > diff --git a/arch/powerpc/kernel/module_alloc.c b/arch/powerpc/kernel/module_alloc.c
> > new file mode 100644
> > index 000000000000..48541c27ce46
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/module_alloc.c
> > @@ -0,0 +1,47 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Kernel module help for powerpc.
> > + * Copyright (C) 2001, 2003 Rusty Russell IBM Corporation.
> > + * Copyright (C) 2008 Freescale Semiconductor, Inc.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +
> > +static __always_inline void *
> > +__module_alloc(unsigned long size, unsigned long start, unsigned long end, bool nowarn)
> > +{
> > + pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : PAGE_KERNEL_EXEC;
> > + gfp_t gfp = GFP_KERNEL | (nowarn ? __GFP_NOWARN : 0);
> > +
> > + /*
> > + * Don't do huge page allocations for modules yet until more testing
> > + * is done. STRICT_MODULE_RWX may require extra work to support this
> > + * too.
> > + */
> > + return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> > + VM_FLUSH_RESET_PERMS,
> > + NUMA_NO_NODE, __builtin_return_address(0));
> > +}
> > +
> > +void *module_alloc(unsigned long size)
> > +{
> > +#ifdef MODULES_VADDR
>
> Is MODULES_VADDR defined even when CONFIG_MODULES is not ?
Yes, by this in ppc's asm/pgtable.h:
#ifdef CONFIG_PPC_BOOK3S
#include <asm/book3s/pgtable.h>
#else
#include <asm/nohash/pgtable.h>
#endif /* !CONFIG_PPC_BOOK3S */
> > + unsigned long limit = (unsigned long)_etext - SZ_32M;
> > + void *ptr = NULL;
> > +
> > + BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
> > +
> > + /* First try within 32M limit from _etext to avoid branch trampolines */
> > + if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit)
> > + ptr = __module_alloc(size, limit, MODULES_END, true);
> > +
> > + if (!ptr)
> > + ptr = __module_alloc(size, MODULES_VADDR, MODULES_END, false);
> > +
> > + return ptr;
> > +#else
> > + return __module_alloc(size, VMALLOC_START, VMALLOC_END, false);
> > +#endif
> > +}
>
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 318789c728d3..2981fe42060d 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -53,6 +53,11 @@ obj-y += livepatch/
> > obj-y += dma/
> > obj-y += entry/
> > obj-$(CONFIG_MODULES) += module/
> > +ifeq ($(CONFIG_MODULES),y)
> > +obj-y += module_alloc.o
> > +else
> > +obj-$(CONFIG_KPROBES) += module_alloc.o
> > +endif
>
> Same comment, could be:
>
> obj-$(CONFIG_MODULES) += module_alloc.o
> obj-$(CONFIG_KPROBES) += module_alloc.o
Ditto.
>
> >
> > obj-$(CONFIG_KCMP) += kcmp.o
> > obj-$(CONFIG_FREEZER) += freezer.o
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index f214f8c088ed..3f9876374cd3 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1569,6 +1569,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
> > @@ -1592,6 +1593,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > ret = -ENOENT;
> > }
> > }
> > +#endif
> > +
> > out:
> > preempt_enable();
> > jump_label_unlock();
> > @@ -2475,6 +2478,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Remove all symbols in given area from kprobe blacklist */
> > static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> > {
> > @@ -2492,6 +2496,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> > {
> > kprobe_remove_area_blacklist(entry, entry + 1);
> > }
> > +#endif /* CONFIG_MODULES */
> >
> > int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> > char *type, char *sym)
> > @@ -2557,6 +2562,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> > return ret ? : arch_populate_kprobe_blacklist();
> > }
> >
> > +#ifdef CONFIG_MODULES
> > static void add_module_kprobe_blacklist(struct module *mod)
> > {
> > unsigned long start, end;
> > @@ -2658,6 +2664,7 @@ static struct notifier_block kprobe_module_nb = {
> > .notifier_call = kprobes_module_callback,
> > .priority = 0
> > };
> > +#endif /* CONFIG_MODULES */
> >
> > void kprobe_free_init_mem(void)
> > {
> > @@ -2717,8 +2724,11 @@ static int __init init_kprobes(void)
> > err = arch_init_kprobes();
> > if (!err)
> > err = register_die_notifier(&kprobe_exceptions_nb);
> > +
> > +#ifdef CONFIG_MODULES
> > if (!err)
> > err = register_module_notifier(&kprobe_module_nb);
> > +#endif
> >
> > kprobes_initialized = (err == 0);
> > kprobe_sysctls_init();
> > diff --git a/kernel/module/main.c b/kernel/module/main.c
> > index fed58d30725d..7fa182b78550 100644
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -1121,16 +1121,6 @@ resolve_symbol_wait(struct module *mod,
> > return ksym;
> > }
> >
> > -void __weak module_memfree(void *module_region)
> > -{
> > - /*
> > - * This memory may be RO, and freeing RO memory in an interrupt is not
> > - * supported by vmalloc.
> > - */
> > - WARN_ON(in_interrupt());
> > - vfree(module_region);
> > -}
> > -
> > void __weak module_arch_cleanup(struct module *mod)
> > {
> > }
> > @@ -1606,13 +1596,6 @@ static void dynamic_debug_remove(struct module *mod, struct _ddebug *debug)
> > ddebug_remove_module(mod->name);
> > }
> >
> > -void * __weak module_alloc(unsigned long size)
> > -{
> > - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> > - GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> > - NUMA_NO_NODE, __builtin_return_address(0));
> > -}
> > -
> > bool __weak module_init_section(const char *name)
> > {
> > return strstarts(name, ".init");
> > diff --git a/kernel/module_alloc.c b/kernel/module_alloc.c
> > new file mode 100644
> > index 000000000000..26a4c60998ad
> > --- /dev/null
> > +++ b/kernel/module_alloc.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2002 Richard Henderson
> > + * Copyright (C) 2001 Rusty Russell, 2002, 2010 Rusty Russell IBM.
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +
> > +void * __weak module_alloc(unsigned long size)
> > +{
> > + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> > + GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> > + NUMA_NO_NODE, __builtin_return_address(0));
> > +}
> > +
> > +void __weak module_memfree(void *module_region)
> > +{
> > + /*
> > + * This memory may be RO, and freeing RO memory in an interrupt is not
> > + * supported by vmalloc.
> > + */
> > + WARN_ON(in_interrupt());
> > + vfree(module_region);
> > +}
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 93507330462c..050b2975332e 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -101,6 +101,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> > return kprobe_gone(&tk->rp.kp);
> > }
> >
> > +#ifdef CONFIG_MODULES
> > static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> > struct module *mod)
> > {
> > @@ -109,11 +110,13 @@ static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> >
> > return strncmp(module_name(mod), name, len) == 0 && name[len] == ':';
> > }
> > +#endif /* CONFIG_MODULES */
> >
> > static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > {
> > + bool ret = false;
> > +#ifdef CONFIG_MODULES
> > char *p;
> > - bool ret;
> >
> > if (!tk->symbol)
> > return false;
> > @@ -125,6 +128,7 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > ret = !!find_module(tk->symbol);
> > rcu_read_unlock_sched();
> > *p = ':';
> > +#endif /* CONFIG_MODULES */
> >
> > return ret;
> > }
> > @@ -668,6 +672,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> > return ret;
> > }
> >
> > +#ifdef CONFIG_MODULES
> > /* Module notifier call back, checking event on the module */
> > static int trace_kprobe_module_callback(struct notifier_block *nb,
> > unsigned long val, void *data)
> > @@ -702,6 +707,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> > .notifier_call = trace_kprobe_module_callback,
> > .priority = 1 /* Invoked after kprobe module callback */
> > };
> > +#endif /* CONFIG_MODULES */
> >
> > static int __trace_kprobe_create(int argc, const char *argv[])
> > {
> > @@ -1896,8 +1902,10 @@ static __init int init_kprobe_trace_early(void)
> > if (ret)
> > return ret;
> >
> > +#ifdef CONFIG_MODULES
> > if (register_module_notifier(&trace_kprobe_module_nb))
> > return -EINVAL;
> > +#endif /* CONFIG_MODULES */
> >
> > return 0;
> > }
> > --
> > 2.36.1
> >
Thanks for the well-considered remarks!
BR, Jarkko
next prev parent reply other threads:[~2022-06-09 12:59 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-07 23:59 [PATCH] kprobes: Enable tracing for mololithic kernel images Jarkko Sakkinen
2022-06-07 23:59 ` Jarkko Sakkinen
2022-06-07 23:59 ` Jarkko Sakkinen
2022-06-08 2:35 ` Guo Ren
2022-06-08 2:35 ` Guo Ren
2022-06-08 2:35 ` Guo Ren
2022-06-08 5:25 ` Jarkko Sakkinen
2022-06-08 5:25 ` Jarkko Sakkinen
2022-06-08 5:25 ` Jarkko Sakkinen
2022-06-08 14:21 ` Masami Hiramatsu
2022-06-08 14:21 ` Masami Hiramatsu
2022-06-08 14:21 ` Masami Hiramatsu
2022-06-08 16:12 ` Song Liu
2022-06-08 16:12 ` Song Liu
2022-06-08 16:12 ` Song Liu
2022-06-08 18:20 ` Song Liu
2022-06-08 18:20 ` Song Liu
2022-06-08 18:20 ` Song Liu
2022-06-08 20:26 ` Luis Chamberlain
2022-06-08 20:26 ` Luis Chamberlain
2022-06-08 20:26 ` Luis Chamberlain
2022-06-09 3:48 ` Christoph Hellwig
2022-06-09 3:48 ` Christoph Hellwig
2022-06-09 3:48 ` Christoph Hellwig
2022-06-09 13:24 ` Luis Chamberlain
2022-06-09 13:24 ` Luis Chamberlain
2022-06-09 13:24 ` Luis Chamberlain
2022-06-09 18:41 ` Edgecombe, Rick P
2022-06-09 18:41 ` Edgecombe, Rick P
2022-06-09 18:41 ` Edgecombe, Rick P
2022-06-09 22:48 ` Song Liu
2022-06-09 22:48 ` Song Liu
2022-06-09 22:48 ` Song Liu
2022-06-14 12:32 ` jarkko
2022-06-14 12:32 ` jarkko
2022-06-14 12:32 ` jarkko
2022-06-15 6:37 ` hch
2022-06-15 6:37 ` hch
2022-06-15 6:37 ` hch
2022-06-15 21:29 ` jarkko
2022-06-15 21:29 ` jarkko
2022-06-15 21:29 ` jarkko
2022-06-09 8:33 ` Christophe Leroy
2022-06-09 8:33 ` Christophe Leroy
2022-06-09 22:23 ` Song Liu
2022-06-09 22:23 ` Song Liu
2022-06-09 22:23 ` Song Liu
2022-06-09 13:12 ` Jarkko Sakkinen
2022-06-09 13:12 ` Jarkko Sakkinen
2022-06-09 13:12 ` Jarkko Sakkinen
2022-06-09 13:23 ` Ard Biesheuvel
2022-06-09 13:23 ` Ard Biesheuvel
2022-06-09 13:23 ` Ard Biesheuvel
2022-06-12 12:18 ` Masami Hiramatsu
2022-06-12 12:18 ` Masami Hiramatsu
2022-06-12 12:18 ` Masami Hiramatsu
2022-06-12 15:59 ` Christophe Leroy
2022-06-12 15:59 ` Christophe Leroy
2022-06-13 0:01 ` Masami Hiramatsu
2022-06-13 0:01 ` Masami Hiramatsu
2022-06-14 10:54 ` Jarkko Sakkinen
2022-06-14 10:54 ` Jarkko Sakkinen
2022-06-14 10:54 ` Jarkko Sakkinen
2022-06-09 12:59 ` Jarkko Sakkinen
2022-06-09 12:59 ` Jarkko Sakkinen
2022-06-09 12:59 ` Jarkko Sakkinen
2022-06-08 16:27 ` Ard Biesheuvel
2022-06-08 16:27 ` Ard Biesheuvel
2022-06-08 16:27 ` Ard Biesheuvel
2022-06-08 18:19 ` Song Liu
2022-06-08 18:19 ` Song Liu
2022-06-08 18:19 ` Song Liu
2022-06-12 12:30 ` Masami Hiramatsu
2022-06-12 12:30 ` Masami Hiramatsu
2022-06-12 12:30 ` Masami Hiramatsu
2022-06-14 12:30 ` Jarkko Sakkinen
2022-06-14 12:30 ` Jarkko Sakkinen
2022-06-14 12:30 ` Jarkko Sakkinen
2022-06-09 5:37 ` Jarkko Sakkinen
2022-06-09 5:37 ` Jarkko Sakkinen
2022-06-09 5:37 ` Jarkko Sakkinen
2022-06-09 7:47 ` Russell King (Oracle)
2022-06-09 7:47 ` Russell King (Oracle)
2022-06-09 7:47 ` Russell King (Oracle)
2022-06-09 11:48 ` Jarkko Sakkinen
2022-06-09 11:48 ` Jarkko Sakkinen
2022-06-09 11:48 ` Jarkko Sakkinen
2022-06-09 13:44 ` Luis Chamberlain
2022-06-09 13:44 ` Luis Chamberlain
2022-06-09 13:44 ` Luis Chamberlain
2022-06-14 12:26 ` Jarkko Sakkinen
2022-06-14 12:26 ` Jarkko Sakkinen
2022-06-14 12:26 ` Jarkko Sakkinen
2022-06-14 12:36 ` Christophe Leroy
2022-06-14 12:36 ` Christophe Leroy
2022-06-15 21:24 ` Jarkko Sakkinen
2022-06-15 21:24 ` Jarkko Sakkinen
2022-06-15 21:24 ` Jarkko Sakkinen
2022-06-09 8:30 ` Christophe Leroy
2022-06-09 8:30 ` Christophe Leroy
2022-06-09 12:57 ` Jarkko Sakkinen [this message]
2022-06-09 12:57 ` Jarkko Sakkinen
2022-06-09 12:57 ` Jarkko Sakkinen
2022-06-09 13:42 ` Christophe Leroy
2022-06-09 13:42 ` Christophe Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YqHuUsevcvaaunVq@iki.fi \
--to=jarkko@kernel.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andrealmeid@igalia.com \
--cc=andreyknvl@gmail.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=anemo@mba.ocn.ne.jp \
--cc=anil.s.keshavamurthy@intel.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=ardb@kernel.org \
--cc=ashimida@linux.alibaba.com \
--cc=ast@kernel.org \
--cc=atishp@atishpatra.org \
--cc=atomlin@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=bristot@redhat.com \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=changbin.du@intel.com \
--cc=chenzhongjin@huawei.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dave.anglin@bell.net \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=dja@axtens.net \
--cc=dmitry.torokhov@gmail.com \
--cc=ebiederm@xmission.com \
--cc=egorenar@linux.ibm.com \
--cc=elver@google.com \
--cc=geert@linux-m68k.org \
--cc=gor@linux.ibm.com \
--cc=guoren@kernel.org \
--cc=hca@linux.ibm.com \
--cc=heiko@sntech.de \
--cc=hpa@zytor.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=jarkko@profian.com \
--cc=javierm@redhat.com \
--cc=jniethe5@gmail.com \
--cc=joey.gouly@arm.com \
--cc=jpoimboe@kernel.org \
--cc=keescook@chromium.org \
--cc=kernel@esmil.dk \
--cc=kirill.shutemov@linux.intel.com \
--cc=liaochang1@huawei.com \
--cc=linus.walleij@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-modules@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=linux@roeck-us.net \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=luis.machado@linaro.org \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=mbenes@suse.cz \
--cc=mcgrof@kernel.org \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=nathan@kernel.org \
--cc=nathaniel@profian.com \
--cc=naveen.n.rao@linux.ibm.com \
--cc=ndesaulniers@google.com \
--cc=nico@fluxnic.net \
--cc=npiggin@gmail.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=philipp.tomsich@vrull.eu \
--cc=rmk+kernel@armlinux.org.uk \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=song@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=tmricht@linux.ibm.com \
--cc=tsbogend@alpha.franken.de \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yangtiezhu@loongson.cn \
--cc=zepan@sipeed.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.