* Re: [PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching
From: Christopher M. Riedl @ 2020-07-14 19:43 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <9de00169-208f-5c94-0c29-1180364c9bd7@csgroup.eu>
On Thu Jul 9, 2020 at 4:02 AM CDT, Christophe Leroy wrote:
>
>
> Le 09/07/2020 à 06:03, Christopher M. Riedl a écrit :
> > Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> > mappings to other CPUs. These mappings should be kept local to the CPU
> > doing the patching. Use the pre-initialized temporary mm and patching
> > address for this purpose. Also add a check after patching to ensure the
> > patch succeeded.
>
> While trying the LKDTM test, I realised that this is useless for non
> SMP.
> Is it worth applying that change to non SMP systems ?
>
> Christophe
>
Hmm, I would say it's probably preferable to maintain a single
implementation of code-patching under STRICT_KERNEL_RWX instead of
two versions for SMP and non-SMP.
> >
> > Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
> > mapping for patching uses a userspace address (to keep the mapping
> > local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
> > the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
> > (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).
> >
> > Based on x86 implementation:
> >
> > commit b3fd8e83ada0
> > ("x86/alternatives: Use temporary mm for text poking")
> >
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> > arch/powerpc/lib/code-patching.c | 152 +++++++++++--------------------
> > 1 file changed, 54 insertions(+), 98 deletions(-)
> >
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 8ae1a9e5fe6e..80fe3864f377 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -19,6 +19,7 @@
> > #include <asm/code-patching.h>
> > #include <asm/setup.h>
> > #include <asm/inst.h>
> > +#include <asm/mmu_context.h>
> >
> > static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr,
> > struct ppc_inst *patch_addr)
> > @@ -77,106 +78,57 @@ void __init poking_init(void)
> > pte_unmap_unlock(ptep, ptl);
> > }
> >
> > -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> > -
> > -static int text_area_cpu_up(unsigned int cpu)
> > -{
> > - struct vm_struct *area;
> > -
> > - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > - if (!area) {
> > - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > - cpu);
> > - return -1;
> > - }
> > - this_cpu_write(text_poke_area, area);
> > -
> > - return 0;
> > -}
> > -
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > - free_vm_area(this_cpu_read(text_poke_area));
> > - return 0;
> > -}
> > -
> > -/*
> > - * Run as a late init call. This allows all the boot time patching to be done
> > - * simply by patching the code, and then we're called here prior to
> > - * mark_rodata_ro(), which happens after all init calls are run. Although
> > - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> > - * it as being preferable to a kernel that will crash later when someone tries
> > - * to use patch_instruction().
> > - */
> > -static int __init setup_text_poke_area(void)
> > -{
> > - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > - "powerpc/text_poke:online", text_area_cpu_up,
> > - text_area_cpu_down));
> > -
> > - return 0;
> > -}
> > -late_initcall(setup_text_poke_area);
> > +struct patch_mapping {
> > + spinlock_t *ptl; /* for protecting pte table */
> > + pte_t *ptep;
> > + struct temp_mm temp_mm;
> > +};
> >
> > /*
> > * This can be called for kernel text or a module.
> > */
> > -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> > +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> > {
> > - unsigned long pfn;
> > - int err;
> > + struct page *page;
> > + pte_t pte;
> > + pgprot_t pgprot;
> >
> > if (is_vmalloc_addr(addr))
> > - pfn = vmalloc_to_pfn(addr);
> > + page = vmalloc_to_page(addr);
> > else
> > - pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > + page = virt_to_page(addr);
> >
> > - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> > + if (radix_enabled())
> > + pgprot = PAGE_KERNEL;
> > + else
> > + pgprot = PAGE_SHARED;
> >
> > - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
> > - if (err)
> > + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > + &patch_mapping->ptl);
> > + if (unlikely(!patch_mapping->ptep)) {
> > + pr_warn("map patch: failed to allocate pte for patching\n");
> > return -1;
> > + }
> > +
> > + pte = mk_pte(page, pgprot);
> > + pte = pte_mkdirty(pte);
> > + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
> > +
> > + init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> > + use_temporary_mm(&patch_mapping->temp_mm);
> >
> > return 0;
> > }
> >
> > -static inline int unmap_patch_area(unsigned long addr)
> > +static void unmap_patch(struct patch_mapping *patch_mapping)
> > {
> > - pte_t *ptep;
> > - pmd_t *pmdp;
> > - pud_t *pudp;
> > - p4d_t *p4dp;
> > - pgd_t *pgdp;
> > -
> > - pgdp = pgd_offset_k(addr);
> > - if (unlikely(!pgdp))
> > - return -EINVAL;
> > -
> > - p4dp = p4d_offset(pgdp, addr);
> > - if (unlikely(!p4dp))
> > - return -EINVAL;
> > -
> > - pudp = pud_offset(p4dp, addr);
> > - if (unlikely(!pudp))
> > - return -EINVAL;
> > -
> > - pmdp = pmd_offset(pudp, addr);
> > - if (unlikely(!pmdp))
> > - return -EINVAL;
> > -
> > - ptep = pte_offset_kernel(pmdp, addr);
> > - if (unlikely(!ptep))
> > - return -EINVAL;
> > + /* In hash, pte_clear flushes the tlb */
> > + pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
> > + unuse_temporary_mm(&patch_mapping->temp_mm);
> >
> > - pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
> > -
> > - /*
> > - * In hash, pte_clear flushes the tlb, in radix, we have to
> > - */
> > - pte_clear(&init_mm, addr, ptep);
> > - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > -
> > - return 0;
> > + /* In radix, we have to explicitly flush the tlb (no-op in hash) */
> > + local_flush_tlb_mm(patching_mm);
> > + pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
> > }
> >
> > static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> > @@ -184,32 +136,36 @@ static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> > int err;
> > struct ppc_inst *patch_addr = NULL;
> > unsigned long flags;
> > - unsigned long text_poke_addr;
> > - unsigned long kaddr = (unsigned long)addr;
> > + struct patch_mapping patch_mapping;
> >
> > /*
> > - * During early early boot patch_instruction is called
> > - * when text_poke_area is not ready, but we still need
> > - * to allow patching. We just do the plain old patching
> > + * The patching_mm is initialized before calling mark_rodata_ro. Prior
> > + * to this, patch_instruction is called when we don't have (and don't
> > + * need) the patching_mm so just do plain old patching.
> > */
> > - if (!this_cpu_read(text_poke_area))
> > + if (!patching_mm)
> > return raw_patch_instruction(addr, instr);
> >
> > local_irq_save(flags);
> >
> > - text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
> > - if (map_patch_area(addr, text_poke_addr)) {
> > - err = -1;
> > + err = map_patch(addr, &patch_mapping);
> > + if (err)
> > goto out;
> > - }
> >
> > - patch_addr = (struct ppc_inst *)(text_poke_addr + (kaddr & ~PAGE_MASK));
> > + patch_addr = (struct ppc_inst *)(patching_addr | offset_in_page(addr));
> >
> > - __patch_instruction(addr, instr, patch_addr);
> > + if (!radix_enabled())
> > + allow_write_to_user(patch_addr, ppc_inst_len(instr));
> > + err = __patch_instruction(addr, instr, patch_addr);
> > + if (!radix_enabled())
> > + prevent_write_to_user(patch_addr, ppc_inst_len(instr));
> >
> > - err = unmap_patch_area(text_poke_addr);
> > - if (err)
> > - pr_warn("failed to unmap %lx\n", text_poke_addr);
> > + unmap_patch(&patch_mapping);
> > + /*
> > + * Something is wrong if what we just wrote doesn't match what we
> > + * think we just wrote.
> > + */
> > + WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));
> >
> > out:
> > local_irq_restore(flags);
> >
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Peter Zijlstra @ 2020-07-14 16:46 UTC (permalink / raw)
To: Jarkko Sakkinen
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
Heiko Carstens, Alexei Starovoitov, Atish Patra, Will Deacon,
Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714163103.GA1472166@linux.intel.com>
On Tue, Jul 14, 2020 at 07:31:03PM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote:
> > to help with text_alloc() usage in generic code, but I think
> > fundamentally, there's only these two options.
>
> There is one arch (nios2), which uses a regular kzalloc(). This means
> that both text_alloc() and text_memfree() need to be weaks symbols and
> nios2 needs to have overriding text.c to do its thing.
IMO nios2 is just broken.
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Russell King - ARM Linux admin @ 2020-07-14 16:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
Heiko Carstens, Alexei Starovoitov, Jarkko Sakkinen, Atish Patra,
Will Deacon, Daniel Borkmann, Masahiro Yamada, Nayna Jain,
Ley Foon Tan, Christian Borntraeger, Sami Tolvanen, Naveen N. Rao,
Mao Han, Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714130109.GX10769@hirez.programming.kicks-ass.net>
On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote:
> > So perhaps the answer is to have text_alloc() not with a 'where'
> > argument but with a 'why' argument. Or more simply, just have separate
> > alloc/free APIs for each case, with generic versions that can be
> > overridden by the architecture.
>
> Well, there only seem to be 2 cases here, either the pointer needs to
> fit in some immediate displacement, or not.
>
> On x86 we seem have the advantage of a fairly large immediate
> displacement as compared to many other architectures (due to our
> variable sized instructions). And thus have been fairly liberal with our
> usage of it (also our indirect jmps/calls suck, double so with
> RETCH-POLINE).
>
> Still, the indirect jump, as mentioned by Russel should work for
> arbitrarily placed code for us too.
>
>
> So I'm thinking that something like:
>
> enum ptr_type {
> immediate_displacement,
> absolute,
> };
>
> void *text_alloc(unsigned long size, enum ptr_type type)
> {
> unsigned long vstart = VMALLOC_START;
> unsigned long vend = VMALLOC_END;
>
> if (type == immediate_displacement) {
> vstart = MODULES_VADDR;
> vend = MODULES_END;
> }
>
> return __vmalloc_node_range(size, TEXT_ALIGN, vstart, vend,
> GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> NUMA_NO_NODE, _RET_IP_);
> }
>
> void text_free(void *ptr)
> {
> vfree(ptr);
> }
Beware, however, that on 32-bit ARM, if module PLTs are enabled,
we will try to place the module in the module region (which gives
best performance) but if that allocation fails, we will fall back
to placing it in the vmalloc region and using PLTs.
So, for a module allocation, we would need to make up to two calls
to text_alloc(), once with "immediate_displacement", and if that
fails and we have PLT support enabled, again with "absolute".
Hence, as other people have said, why module_alloc() would need to
stay separate.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 16:37 UTC (permalink / raw)
To: Jessica Yu
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714135651.GA27819@linux-8ccs>
On Tue, Jul 14, 2020 at 03:56:52PM +0200, Jessica Yu wrote:
> +++ Jarkko Sakkinen [14/07/20 12:45 +0300]:
> > Rename module_alloc() to text_alloc() and module_memfree() to
> > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > allocate space for executable code without requiring to compile the modules
> > support (CONFIG_MODULES=y) in.
> >
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
>
> As Ard and Will have already explained, the major issue I'm having
> with this is that we're taking module_alloc(), an allocator that was
> originally specific to module loading, and turning it into a generic
> interface to be used by other subsystems. You're pulling in all the
> module loading semantics that vary by architecture and expecting it to
> work as a generic text allocator. I'm not against the existence of a
> generic text_alloc() but I would very much rather that module_alloc()
> be left alone to the module loader and instead work on introducing a
> *separate* generic text_alloc() interface that would work for its
> intended users (kprobes, bpf, etc) and have existing users of
> module_alloc() switch to that instead.
>
> Jessica
This is kind of patch set where you do not have any other chances than
to get it wrong in the first time, so I just did something that flys
in my environment. At the same time I believe that taking the bound
out of tracing and module loading is a generally accepted idea.
I'm refining the next version with CONFIG_HAS_TEXT_ALLOC, which I
explained in more details in my earlier response to this thread.
/Jarkko
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 16:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
Heiko Carstens, Alexei Starovoitov, Atish Patra, Will Deacon,
Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714130109.GX10769@hirez.programming.kicks-ass.net>
On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote:
> > So perhaps the answer is to have text_alloc() not with a 'where'
> > argument but with a 'why' argument. Or more simply, just have separate
> > alloc/free APIs for each case, with generic versions that can be
> > overridden by the architecture.
>
> Well, there only seem to be 2 cases here, either the pointer needs to
> fit in some immediate displacement, or not.
>
> On x86 we seem have the advantage of a fairly large immediate
> displacement as compared to many other architectures (due to our
> variable sized instructions). And thus have been fairly liberal with our
> usage of it (also our indirect jmps/calls suck, double so with
> RETCH-POLINE).
>
> Still, the indirect jump, as mentioned by Russel should work for
> arbitrarily placed code for us too.
>
>
> So I'm thinking that something like:
>
> enum ptr_type {
> immediate_displacement,
> absolute,
> };
>
> void *text_alloc(unsigned long size, enum ptr_type type)
> {
> unsigned long vstart = VMALLOC_START;
> unsigned long vend = VMALLOC_END;
>
> if (type == immediate_displacement) {
> vstart = MODULES_VADDR;
> vend = MODULES_END;
> }
>
> return __vmalloc_node_range(size, TEXT_ALIGN, vstart, vend,
> GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> NUMA_NO_NODE, _RET_IP_);
> }
>
> void text_free(void *ptr)
> {
> vfree(ptr);
> }
>
> Should work for all cases. Yes, we might then want something like a per
> arch:
>
> {BPF,FTRACE,KPROBE}_TEXT_TYPE
>
> to help with text_alloc() usage in generic code, but I think
> fundamentally, there's only these two options.
There is one arch (nios2), which uses a regular kzalloc(). This means
that both text_alloc() and text_memfree() need to be weaks symbols and
nios2 needs to have overriding text.c to do its thing.
/Jarkko
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Steven Rostedt @ 2020-07-14 15:44 UTC (permalink / raw)
To: Jessica Yu
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Jarkko Sakkinen, Atish Patra, Will Deacon,
Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
Marco Elver, Babu Moger, Borislav Petkov, Greentime Hu, Ben Dooks,
Guan Xuetao, Thomas Bogendoerfer, open list:PARISC ARCHITECTURE,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714135651.GA27819@linux-8ccs>
On Tue, 14 Jul 2020 15:56:52 +0200
Jessica Yu <jeyu@kernel.org> wrote:
> As Ard and Will have already explained, the major issue I'm having
> with this is that we're taking module_alloc(), an allocator that was
> originally specific to module loading, and turning it into a generic
> interface to be used by other subsystems. You're pulling in all the
> module loading semantics that vary by architecture and expecting it to
> work as a generic text allocator. I'm not against the existence of a
> generic text_alloc() but I would very much rather that module_alloc()
> be left alone to the module loader and instead work on introducing a
> *separate* generic text_alloc() interface that would work for its
> intended users (kprobes, bpf, etc) and have existing users of
> module_alloc() switch to that instead.
Looks like the consensus is to create a separate text_alloc() that can
be used when modules are not configured in for things like ftrace
dynamic trampolines, and keep module_alloc() untouched.
For those concerned about added unused code for architectures that
don't need text_alloc(), we can always create a config called:
CONFIG_ARCH_NEED_TEXT_ALLOC, and the arch can add that to its list of
kconfigs if it intends to use text_alloc().
-- Steve
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Steven Rostedt @ 2020-07-14 14:03 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, James E.J. Bottomley,
Vincent Chen, Omar Sandoval, open list:S390, Joe Lawrence,
Helge Deller, John Fastabend, Anil S Keshavamurthy, Yonghong Song,
Iurii Zaikin, Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Jarkko Sakkinen, Atish Patra, Will Deacon,
Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
Marco Elver, Babu Moger, Borislav Petkov, Greentime Hu, Ben Dooks,
Guan Xuetao, Thomas Bogendoerfer, open list:PARISC ARCHITECTURE,
Jessica Yu, open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
David S. Miller, Thiago Jung Bauermann, Peter Zijlstra,
David Howells, open list:SPARC + UltraSPARC (sparc/sparc64),
Sandipan Das, H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang,
Miroslav Benes, Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino,
Anders Roxell, Sven Schnelle,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714133314.GA67386@C02TD0UTHF1T.local>
On Tue, 14 Jul 2020 14:33:14 +0100
Mark Rutland <mark.rutland@arm.com> wrote:
> > Should work for all cases. Yes, we might then want something like a per
> > arch:
> >
> > {BPF,FTRACE,KPROBE}_TEXT_TYPE
>
> ... at that point why not:
>
> text_alloc_ftrace();
> text_alloc_module();
> text_alloc_bpf();
> text_alloc_kprobe();
I don't know about bpf and kprobes, but for ftrace, the only place that
it allocates text happens to be in arch specific code.
If you want something special for ftrace, you could just add your own
function. But for x86, a text_alloc_immediate() would work.
(BTW, I like the function names over the enums)
-- Steve
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jessica Yu @ 2020-07-14 13:56 UTC (permalink / raw)
To: Jarkko Sakkinen
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714094625.1443261-2-jarkko.sakkinen@linux.intel.com>
+++ Jarkko Sakkinen [14/07/20 12:45 +0300]:
>Rename module_alloc() to text_alloc() and module_memfree() to
>text_memfree(), and move them to kernel/text.c, which is unconditionally
>compiled to the kernel proper. This allows kprobes, ftrace and bpf to
>allocate space for executable code without requiring to compile the modules
>support (CONFIG_MODULES=y) in.
>
>Cc: Andi Kleen <ak@linux.intel.com>
>Suggested-by: Peter Zijlstra <peterz@infradead.org>
>Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
As Ard and Will have already explained, the major issue I'm having
with this is that we're taking module_alloc(), an allocator that was
originally specific to module loading, and turning it into a generic
interface to be used by other subsystems. You're pulling in all the
module loading semantics that vary by architecture and expecting it to
work as a generic text allocator. I'm not against the existence of a
generic text_alloc() but I would very much rather that module_alloc()
be left alone to the module loader and instead work on introducing a
*separate* generic text_alloc() interface that would work for its
intended users (kprobes, bpf, etc) and have existing users of
module_alloc() switch to that instead.
Jessica
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Ard Biesheuvel @ 2020-07-14 13:47 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, James E.J. Bottomley,
Vincent Chen, Omar Sandoval, open list:S390, Joe Lawrence,
Helge Deller, John Fastabend, Anil S Keshavamurthy, Yonghong Song,
Iurii Zaikin, Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Jarkko Sakkinen, Atish Patra, Will Deacon,
Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
Anders Roxell, Sven Schnelle,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
Josh Poimboeuf, KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714133314.GA67386@C02TD0UTHF1T.local>
On Tue, 14 Jul 2020 at 16:33, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote:
> > > So perhaps the answer is to have text_alloc() not with a 'where'
> > > argument but with a 'why' argument. Or more simply, just have separate
> > > alloc/free APIs for each case, with generic versions that can be
> > > overridden by the architecture.
> >
> > Well, there only seem to be 2 cases here, either the pointer needs to
> > fit in some immediate displacement, or not.
>
> On some arches you have a few choices for immediates depending on
> compiler options, e.g. on arm64:
>
> * +/- 128M with B
> * +/-4G with ADRP+ADD+BR
> * +/- 48/64 bits with a series of MOVK* + BR
>
> ... and you might build core kernel one way and modules another, and
> either could depend on configuration.
>
> > On x86 we seem have the advantage of a fairly large immediate
> > displacement as compared to many other architectures (due to our
> > variable sized instructions). And thus have been fairly liberal with our
> > usage of it (also our indirect jmps/calls suck, double so with
> > RETCH-POLINE).
> >
> > Still, the indirect jump, as mentioned by Russel should work for
> > arbitrarily placed code for us too.
> >
> >
> > So I'm thinking that something like:
> >
> > enum ptr_type {
> > immediate_displacement,
> > absolute,
> > };
> >
> > void *text_alloc(unsigned long size, enum ptr_type type)
> > {
> > unsigned long vstart = VMALLOC_START;
> > unsigned long vend = VMALLOC_END;
> >
> > if (type == immediate_displacement) {
> > vstart = MODULES_VADDR;
> > vend = MODULES_END;
> > }
> >
> > return __vmalloc_node_range(size, TEXT_ALIGN, vstart, vend,
> > GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> > NUMA_NO_NODE, _RET_IP_);
> > }
> >
> > void text_free(void *ptr)
> > {
> > vfree(ptr);
> > }
>
> I think it'd be easier to read with separate functions, e.g.
>
> text_alloc_imm_offset(unsigned long size);
> text_alloc_absolute(unsigned long size);
>
On arm64, we have a 128M window close to the core kernel for modules,
and a separate 128m window for bpf programs, which are kept in
relative branching range of each other, but could be far away from
kernel+modules, and so having 'close' and 'far' as the only
distinction is insufficient.
> > Should work for all cases. Yes, we might then want something like a per
> > arch:
> >
> > {BPF,FTRACE,KPROBE}_TEXT_TYPE
>
> ... at that point why not:
>
> text_alloc_ftrace();
> text_alloc_module();
> text_alloc_bpf();
> text_alloc_kprobe();
>
> ... etc which an arch can alias however it wants? e.g. x86 can have
> those all go to a common text_alloc_generic(), and that could even be a
> generic option for arches that don't care to distinguish these cases.
>
That is basically what i meant with separate alloc/free APIs, which i
think is the sanest approach here.
> Then if there are new places that want to allocate text we have to
> consider their requirements when adding them, too.
>
> Thanks,
> Mark.
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Mark Rutland @ 2020-07-14 13:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, James E.J. Bottomley,
Vincent Chen, Omar Sandoval, open list:S390, Joe Lawrence,
Helge Deller, John Fastabend, Anil S Keshavamurthy, Yonghong Song,
Iurii Zaikin, Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Jarkko Sakkinen, Atish Patra, Will Deacon,
Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714130109.GX10769@hirez.programming.kicks-ass.net>
On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote:
> > So perhaps the answer is to have text_alloc() not with a 'where'
> > argument but with a 'why' argument. Or more simply, just have separate
> > alloc/free APIs for each case, with generic versions that can be
> > overridden by the architecture.
>
> Well, there only seem to be 2 cases here, either the pointer needs to
> fit in some immediate displacement, or not.
On some arches you have a few choices for immediates depending on
compiler options, e.g. on arm64:
* +/- 128M with B
* +/-4G with ADRP+ADD+BR
* +/- 48/64 bits with a series of MOVK* + BR
... and you might build core kernel one way and modules another, and
either could depend on configuration.
> On x86 we seem have the advantage of a fairly large immediate
> displacement as compared to many other architectures (due to our
> variable sized instructions). And thus have been fairly liberal with our
> usage of it (also our indirect jmps/calls suck, double so with
> RETCH-POLINE).
>
> Still, the indirect jump, as mentioned by Russel should work for
> arbitrarily placed code for us too.
>
>
> So I'm thinking that something like:
>
> enum ptr_type {
> immediate_displacement,
> absolute,
> };
>
> void *text_alloc(unsigned long size, enum ptr_type type)
> {
> unsigned long vstart = VMALLOC_START;
> unsigned long vend = VMALLOC_END;
>
> if (type == immediate_displacement) {
> vstart = MODULES_VADDR;
> vend = MODULES_END;
> }
>
> return __vmalloc_node_range(size, TEXT_ALIGN, vstart, vend,
> GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> NUMA_NO_NODE, _RET_IP_);
> }
>
> void text_free(void *ptr)
> {
> vfree(ptr);
> }
I think it'd be easier to read with separate functions, e.g.
text_alloc_imm_offset(unsigned long size);
text_alloc_absolute(unsigned long size);
> Should work for all cases. Yes, we might then want something like a per
> arch:
>
> {BPF,FTRACE,KPROBE}_TEXT_TYPE
... at that point why not:
text_alloc_ftrace();
text_alloc_module();
text_alloc_bpf();
text_alloc_kprobe();
... etc which an arch can alias however it wants? e.g. x86 can have
those all go to a common text_alloc_generic(), and that could even be a
generic option for arches that don't care to distinguish these cases.
Then if there are new places that want to allocate text we have to
consider their requirements when adding them, too.
Thanks,
Mark.
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Peter Zijlstra @ 2020-07-14 13:01 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
Heiko Carstens, Alexei Starovoitov, Jarkko Sakkinen, Atish Patra,
Will Deacon, Daniel Borkmann, Masahiro Yamada, Nayna Jain,
Ley Foon Tan, Christian Borntraeger, Sami Tolvanen, Naveen N. Rao,
Mao Han, Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
Anders Roxell, Sven Schnelle,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
Josh Poimboeuf, KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAMj1kXGG4vxWrp1de1FxdU=8F4Jof00=T1x-7e+BW7_HP-oZMQ@mail.gmail.com>
On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote:
> So perhaps the answer is to have text_alloc() not with a 'where'
> argument but with a 'why' argument. Or more simply, just have separate
> alloc/free APIs for each case, with generic versions that can be
> overridden by the architecture.
Well, there only seem to be 2 cases here, either the pointer needs to
fit in some immediate displacement, or not.
On x86 we seem have the advantage of a fairly large immediate
displacement as compared to many other architectures (due to our
variable sized instructions). And thus have been fairly liberal with our
usage of it (also our indirect jmps/calls suck, double so with
RETCH-POLINE).
Still, the indirect jump, as mentioned by Russel should work for
arbitrarily placed code for us too.
So I'm thinking that something like:
enum ptr_type {
immediate_displacement,
absolute,
};
void *text_alloc(unsigned long size, enum ptr_type type)
{
unsigned long vstart = VMALLOC_START;
unsigned long vend = VMALLOC_END;
if (type == immediate_displacement) {
vstart = MODULES_VADDR;
vend = MODULES_END;
}
return __vmalloc_node_range(size, TEXT_ALIGN, vstart, vend,
GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
NUMA_NO_NODE, _RET_IP_);
}
void text_free(void *ptr)
{
vfree(ptr);
}
Should work for all cases. Yes, we might then want something like a per
arch:
{BPF,FTRACE,KPROBE}_TEXT_TYPE
to help with text_alloc() usage in generic code, but I think
fundamentally, there's only these two options.
^ permalink raw reply
* Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
From: Bjorn Helgaas @ 2020-07-14 18:45 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Keith Busch, Paul Mackerras, sparclinux, Toan Le, Greg Ungerer,
Marek Vasut, Rob Herring, Lorenzo Pieralisi, Sagi Grimberg,
Russell King, Ley Foon Tan, Christoph Hellwig, Geert Uytterhoeven,
Kevin Hilman, linux-pci, Jakub Kicinski, Matt Turner,
linux-kernel-mentees, Guenter Roeck, Ray Jui, Jens Axboe,
Ivan Kokshaysky, Shuah Khan, bjorn, Boris Ostrovsky,
Richard Henderson, Juergen Gross, Bjorn Helgaas,
Thomas Bogendoerfer, Scott Branden, Jingoo Han,
Saheed O. Bolarinwa, linux-kernel@vger.kernel.org, Philipp Zabel,
Greg Kroah-Hartman, Gustavo Pimentel, linuxppc-dev,
David S. Miller, Heiner Kallweit
In-Reply-To: <CAK8P3a3NWSZw6678k1O2eJ6-c5GuW7484PRvEzU9MEPPrCD-yw@mail.gmail.com>
[trimmed the cc list; it's still too large but maybe arch folks care]
On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> <refactormyself@gmail.com> wrote:
> > This goal of these series is to move the definition of *all*
> > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use
> > within there. All other tree specific definition will be left for
> > intact. Maybe they can be renamed.
> >
> > PCIBIOS* is an x86 concept as defined by the PCI spec. The
> > returned error codes of PCIBIOS* are positive values and this
> > introduces some complexities which other archs need not incur.
>
> I think the intention is good, but I find the series in its current
> form very hard to review, in particular the way you touch some
> functions three times with trivial changes. Instead of
>
> 1) replace PCIBIOS_SUCCESSFUL with 0
> 2) drop pointless 0-comparison
> 3) reformat whitespace
>
> I would suggest to combine the first two steps into one patch per
> subsystem and drop the third step.
I agree. BUT please don't just run out and post new patches to do
this. Let's talk about Arnd's further ideas below first.
> ...
> Maybe the work can be split up differently, with a similar end
> result but fewer and easier reviewed patches. The way I'd look at
> the problem, there are three main areas that can be dealt with one
> at a time:
>
> a) callers of the high-level config space accessors
> pci_{write,read}_config_{byte,word,dword}, mostly in device
> drivers.
> b) low-level implementation of the config space accessors
> through struct pci_ops
> c) all other occurrences of these constants
>
> Starting with a), my first question is whether any high-level
> drivers even need to care about errors from these functions. I see
> 4913 callers that ignore the return code, and 576 that actually
> check it, and almost none care about the specific error (as you
> found as well). Unless we conclude that most PCI drivers are wrong,
> could we just change the return type to 'void' and assume they never
> fail for valid arguments on a valid pci_device* ?
I really like this idea.
pci_write_config_*() has one return value, and only 100ish of 2500
callers check for errors. It's sometimes possible for config
accessors to detect PCI errors and return failure, e.g., device was
removed or didn't respond, but most of them don't, and detecting these
errors is not really that valuable.
pci_read_config_*() is much more interesting because it returns two
things, the function return value and the value read from the PCI
device, and it's complicated to check both.
Again it's sometimes possible for config read accessors to detect PCI
errors, but in most cases a PCI error means the accessor returns
success and the value from PCI is ~0.
Checking the function return value catches programming errors (bad
alignment, etc) but misses most of the interesting errors (device was
unplugged or reported a PCI error).
Checking the value returned from PCI is tricky because ~0 is a valid
value for some config registers, and only the driver knows for sure.
If the driver knows that ~0 is a possible value, it would have to do
something else, e.g., another config read of a register that *cannot*
be ~0, to see whether it's really an error.
I suspect that if we had a single value to look at it would be easier
to get right. Error checking with current interface would look like
this:
err = pci_read_config_word(dev, addr, &val);
if (err)
return -EINVAL;
if (PCI_POSSIBLE_ERROR(val)) {
/* if driver knows ~0 is invalid */
return -EINVAL;
/* if ~0 is potentially a valid value */
err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
if (err)
return -EINVAL;
if (PCI_POSSIBLE_ERROR(val2))
return -EINVAL;
}
Error checking with a possible interface that returned only a single
value could look like this:
val = pci_config_read_word(dev, addr);
if (PCI_POSSIBLE_ERROR(val)) {
/* if driver knows ~0 is invalid */
return -EINVAL;
/* if ~0 is potentially a valid value */
val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
if (PCI_POSSIBLE_ERROR(val2))
return -EINVAL;
}
Am I understanding you correctly?
> For b), it might be nice to also change other aspects of the
> interface, e.g. passing a pci_host_bridge pointer plus bus number
> instead of a pci_bus pointer, or having the callback in the
> pci_host_bridge structure.
I like this idea a lot, too. I think the fact that
pci_bus_read_config_word() requires a pci_bus * complicates things in
a few places.
I think it's completely separate, as you say, and we should defer it
for now because even part a) is a lot of work. I added it to my list
of possible future projects.
Bjorn
^ permalink raw reply
* Re: [PATCH v2] powerpc/pseries: detect secure and trusted boot state of the system.
From: Mimi Zohar @ 2020-07-14 15:07 UTC (permalink / raw)
To: Daniel Axtens, Nayna Jain, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <87y2nmtxce.fsf@dja-thinkpad.axtens.net>
On Tue, 2020-07-14 at 16:38 +1000, Daniel Axtens wrote:
> Hi Nayna,
>
> Thanks! Would you be able to fold in some of the information from my
> reply to v1 into the changelog? Until we have public PAPR release with
> it, that information is the extent of the public documentation. It would
> be good to get it into the git log rather than just floating around in
> the mail archives!
>
> A couple of small nits:
>
> > + if (enabled)
> > + goto out;
> > +
> > + if (!of_property_read_u32(of_root, "ibm,secure-boot", &secureboot)) {
> > + if (secureboot)
> > + enabled = (secureboot > 1) ? true : false;
>
> Your tests double up here - you don't need both the 'if' statement and
> the 'secureboot > 1' ternary operator.
>
> Just
>
> + if (!of_property_read_u32(of_root, "ibm,secure-boot", &secureboot)) {
> + enabled = (secureboot > 1) ? true : false;
>
> or even
>
> + if (!of_property_read_u32(of_root, "ibm,secure-boot", &secureboot)) {
> + enabled = (secureboot > 1);
>
> would work.
I haven't been following this thread, which might be the reason I'm
missing something here. The patch description should explain why the
test is for "(secureboot > 1)", rather than a fixed number.
thanks,
Mimi
^ permalink raw reply
* [PATCH 07/13] cpufreq: powernv-cpufreq: Fix a bunch of kerneldoc related issues
From: Lee Jones @ 2020-07-14 14:50 UTC (permalink / raw)
To: rjw, viresh.kumar
Cc: linux-pm, linuxppc-dev, linux-kernel, Paul Mackerras, Lee Jones,
linux-arm-kernel
In-Reply-To: <20200714145049.2496163-1-lee.jones@linaro.org>
Repair problems with formatting and missing attributes/parameters, and
demote header comments which do not meet the required standards
applicable to kerneldoc.
Fixes the following W=1 kernel build warning(s):
drivers/cpufreq/powernv-cpufreq.c:84: warning: Function parameter or member 'last_lpstate_idx' not described in 'global_pstate_info'
drivers/cpufreq/powernv-cpufreq.c:84: warning: Function parameter or member 'last_gpstate_idx' not described in 'global_pstate_info'
drivers/cpufreq/powernv-cpufreq.c:84: warning: Function parameter or member 'policy' not described in 'global_pstate_info'
drivers/cpufreq/powernv-cpufreq.c:182: warning: Function parameter or member 'i' not described in 'idx_to_pstate'
drivers/cpufreq/powernv-cpufreq.c:201: warning: Function parameter or member 'pstate' not described in 'pstate_to_idx'
drivers/cpufreq/powernv-cpufreq.c:670: warning: Function parameter or member 't' not described in 'gpstate_timer_handler'
drivers/cpufreq/powernv-cpufreq.c:670: warning: Excess function parameter 'data' description in 'gpstate_timer_handler'
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
drivers/cpufreq/powernv-cpufreq.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 068cc53abe320..2e5a8b8a4abaa 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,13 +64,14 @@
* highest_lpstate_idx
* @last_sampled_time: Time from boot in ms when global pstates were
* last set
- * @last_lpstate_idx, Last set value of local pstate and global
- * last_gpstate_idx pstate in terms of cpufreq table index
+ * @last_lpstate_idx: Last set value of local pstate and global
+ * @last_gpstate_idx: pstate in terms of cpufreq table index
* @timer: Is used for ramping down if cpu goes idle for
* a long time with global pstate held high
* @gpstate_lock: A spinlock to maintain synchronization between
* routines called by the timer handler and
* governer's target_index calls
+ * @policy: Associated CPUFreq policy
*/
struct global_pstate_info {
int highest_lpstate_idx;
@@ -170,7 +171,7 @@ static inline u8 extract_pstate(u64 pmsr_val, unsigned int shift)
/* Use following functions for conversions between pstate_id and index */
-/**
+/*
* idx_to_pstate : Returns the pstate id corresponding to the
* frequency in the cpufreq frequency table
* powernv_freqs indexed by @i.
@@ -188,7 +189,7 @@ static inline u8 idx_to_pstate(unsigned int i)
return powernv_freqs[i].driver_data;
}
-/**
+/*
* pstate_to_idx : Returns the index in the cpufreq frequencytable
* powernv_freqs for the frequency whose corresponding
* pstate id is @pstate.
@@ -660,7 +661,7 @@ static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
/**
* gpstate_timer_handler
*
- * @data: pointer to cpufreq_policy on which timer was queued
+ * @t: Timer context used to fetch global pstate info struct
*
* This handler brings down the global pstate closer to the local pstate
* according quadratic equation. Queues a new timer if it is still not equal
--
2.25.1
^ permalink raw reply related
* [PATCH 05/13] cpufreq/arch: powerpc: pasemi: Move prototypes to shared header
From: Lee Jones @ 2020-07-14 14:50 UTC (permalink / raw)
To: rjw, viresh.kumar
Cc: linux-pm, linuxppc-dev, linux-kernel, Paul Mackerras,
Olof Johansson, Lee Jones, linux-arm-kernel
In-Reply-To: <20200714145049.2496163-1-lee.jones@linaro.org>
If function callers and providers do not share the same prototypes the
compiler complains of missing prototypes. Fix this by moving the
already existing prototypes out to a mutually convenient location.
Fixes the following W=1 kernel build warning(s):
drivers/cpufreq/pasemi-cpufreq.c:109:5: warning: no previous prototype for ‘check_astate’ [-Wmissing-prototypes]
109 | int check_astate(void)
| ^~~~~~~~~~~~
drivers/cpufreq/pasemi-cpufreq.c:114:6: warning: no previous prototype for ‘restore_astate’ [-Wmissing-prototypes]
114 | void restore_astate(int cpu)
| ^~~~~~~~~~~~~~
Cc: Olof Johansson <olof@lixom.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
arch/powerpc/platforms/pasemi/pasemi.h | 15 ------------
arch/powerpc/platforms/pasemi/powersave.S | 2 ++
drivers/cpufreq/pasemi-cpufreq.c | 1 +
include/linux/platform_data/pasemi.h | 28 +++++++++++++++++++++++
4 files changed, 31 insertions(+), 15 deletions(-)
create mode 100644 include/linux/platform_data/pasemi.h
diff --git a/arch/powerpc/platforms/pasemi/pasemi.h b/arch/powerpc/platforms/pasemi/pasemi.h
index 70b56048ed1be..528d81ef748ad 100644
--- a/arch/powerpc/platforms/pasemi/pasemi.h
+++ b/arch/powerpc/platforms/pasemi/pasemi.h
@@ -15,21 +15,6 @@ extern void __init pasemi_map_registers(void);
extern void idle_spin(void);
extern void idle_doze(void);
-/* Restore astate to last set */
-#ifdef CONFIG_PPC_PASEMI_CPUFREQ
-extern int check_astate(void);
-extern void restore_astate(int cpu);
-#else
-static inline int check_astate(void)
-{
- /* Always return >0 so we never power save */
- return 1;
-}
-static inline void restore_astate(int cpu)
-{
-}
-#endif
-
extern struct pci_controller_ops pasemi_pci_controller_ops;
#endif /* _PASEMI_PASEMI_H */
diff --git a/arch/powerpc/platforms/pasemi/powersave.S b/arch/powerpc/platforms/pasemi/powersave.S
index d0215d5329ca7..7747b48963286 100644
--- a/arch/powerpc/platforms/pasemi/powersave.S
+++ b/arch/powerpc/platforms/pasemi/powersave.S
@@ -5,6 +5,8 @@
* Maintained by: Olof Johansson <olof@lixom.net>
*/
+#include <linux/platform_data/pasemi.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/ppc_asm.h>
diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
index c66f566a854cb..c6bb3ecc90ef3 100644
--- a/drivers/cpufreq/pasemi-cpufreq.c
+++ b/drivers/cpufreq/pasemi-cpufreq.c
@@ -15,6 +15,7 @@
#include <linux/timer.h>
#include <linux/module.h>
#include <linux/of_address.h>
+#include <linux/platform_data/pasemi.h>
#include <asm/hw_irq.h>
#include <asm/io.h>
diff --git a/include/linux/platform_data/pasemi.h b/include/linux/platform_data/pasemi.h
new file mode 100644
index 0000000000000..3fed0687fcc9a
--- /dev/null
+++ b/include/linux/platform_data/pasemi.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Linaro Ltd.
+ *
+ * Author: Lee Jones <lee.jones@linaro.org>
+ */
+
+#ifndef _LINUX_PLATFORM_DATA_PASEMI_H
+#define _LINUX_PLATFORM_DATA_PASEMI_H
+
+/* Restore astate to last set */
+#ifdef CONFIG_PPC_PASEMI_CPUFREQ
+int check_astate(void);
+void restore_astate(int cpu);
+#else
+static inline int check_astate(void)
+{
+ /* Always return >0 so we never power save */
+ return 1;
+}
+static inline void restore_astate(int cpu)
+{
+}
+#endif
+
+#endif /* _LINUX_PLATFORM_DATA_PASEMI_H */
+
+
--
2.25.1
^ permalink raw reply related
* [PATCH 06/13] cpufreq: powernv-cpufreq: Functions only used in call-backs should be static
From: Lee Jones @ 2020-07-14 14:50 UTC (permalink / raw)
To: rjw, viresh.kumar
Cc: linux-pm, linuxppc-dev, linux-kernel, Paul Mackerras, Lee Jones,
linux-arm-kernel
In-Reply-To: <20200714145049.2496163-1-lee.jones@linaro.org>
Fixes the following W=1 kernel build warning(s):
drivers/cpufreq/powernv-cpufreq.c:669:6: warning: no previous prototype for ‘gpstate_timer_handler’ [-Wmissing-prototypes]
drivers/cpufreq/powernv-cpufreq.c:902:6: warning: no previous prototype for ‘powernv_cpufreq_work_fn’ [-Wmissing-prototypes]
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
drivers/cpufreq/powernv-cpufreq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 8646eb197cd96..068cc53abe320 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -666,7 +666,7 @@ static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
* according quadratic equation. Queues a new timer if it is still not equal
* to local pstate
*/
-void gpstate_timer_handler(struct timer_list *t)
+static void gpstate_timer_handler(struct timer_list *t)
{
struct global_pstate_info *gpstates = from_timer(gpstates, t, timer);
struct cpufreq_policy *policy = gpstates->policy;
@@ -899,7 +899,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
.notifier_call = powernv_cpufreq_reboot_notifier,
};
-void powernv_cpufreq_work_fn(struct work_struct *work)
+static void powernv_cpufreq_work_fn(struct work_struct *work)
{
struct chip *chip = container_of(work, struct chip, throttle);
struct cpufreq_policy *policy;
--
2.25.1
^ permalink raw reply related
* [PATCH -next] cpuidle/pseries: Make symbol 'pseries_idle_driver' static
From: Wei Yongjun @ 2020-07-14 14:24 UTC (permalink / raw)
To: Hulk Robot, Rafael J. Wysocki, Daniel Lezcano, Michael Ellerman
Cc: linuxppc-dev, Wei Yongjun, linux-pm
The sparse tool complains as follows:
drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
symbol 'pseries_idle_driver' was not declared. Should it be static?
'pseries_idle_driver' is not used outside of this file, so marks
it static.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
drivers/cpuidle/cpuidle-pseries.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 6513ef2af66a..3e058ad2bb51 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -22,7 +22,7 @@
#include <asm/idle.h>
#include <asm/plpar_wrappers.h>
-struct cpuidle_driver pseries_idle_driver = {
+static struct cpuidle_driver pseries_idle_driver = {
.name = "pseries_idle",
.owner = THIS_MODULE,
};
^ permalink raw reply related
* [PATCH -next] cpufreq: powernv: Make some symbols static
From: Wei Yongjun @ 2020-07-14 14:23 UTC (permalink / raw)
To: Hulk Robot, Rafael J. Wysocki, Viresh Kumar, Michael Ellerman
Cc: linuxppc-dev, Wei Yongjun, linux-pm
The sparse tool complains as follows:
drivers/cpufreq/powernv-cpufreq.c:88:1: warning:
symbol 'pstate_revmap' was not declared. Should it be static?
drivers/cpufreq/powernv-cpufreq.c:383:18: warning:
symbol 'cpufreq_freq_attr_cpuinfo_nominal_freq' was not declared. Should it be static?
drivers/cpufreq/powernv-cpufreq.c:669:6: warning:
symbol 'gpstate_timer_handler' was not declared. Should it be static?
drivers/cpufreq/powernv-cpufreq.c:902:6: warning:
symbol 'powernv_cpufreq_work_fn' was not declared. Should it be static?
Those symbols are not used outside of this file, so mark
them static.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
drivers/cpufreq/powernv-cpufreq.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 8646eb197cd9..cf118263ec65 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -85,7 +85,7 @@ struct global_pstate_info {
static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-DEFINE_HASHTABLE(pstate_revmap, POWERNV_MAX_PSTATES_ORDER);
+static DEFINE_HASHTABLE(pstate_revmap, POWERNV_MAX_PSTATES_ORDER);
/**
* struct pstate_idx_revmap_data: Entry in the hashmap pstate_revmap
* indexed by a function of pstate id.
@@ -380,7 +380,7 @@ static ssize_t cpuinfo_nominal_freq_show(struct cpufreq_policy *policy,
powernv_freqs[powernv_pstate_info.nominal].frequency);
}
-struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
+static struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
__ATTR_RO(cpuinfo_nominal_freq);
#define SCALING_BOOST_FREQS_ATTR_INDEX 2
@@ -666,7 +666,7 @@ static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
* according quadratic equation. Queues a new timer if it is still not equal
* to local pstate
*/
-void gpstate_timer_handler(struct timer_list *t)
+static void gpstate_timer_handler(struct timer_list *t)
{
struct global_pstate_info *gpstates = from_timer(gpstates, t, timer);
struct cpufreq_policy *policy = gpstates->policy;
@@ -899,7 +899,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
.notifier_call = powernv_cpufreq_reboot_notifier,
};
-void powernv_cpufreq_work_fn(struct work_struct *work)
+static void powernv_cpufreq_work_fn(struct work_struct *work)
{
struct chip *chip = container_of(work, struct chip, throttle);
struct cpufreq_policy *policy;
^ permalink raw reply related
* Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-07-14 13:23 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-arch, Arnd Bergmann, X86 ML, LKML, Nicholas Piggin,
Linux-MM, Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <6D3D1346-DB1E-43EB-812A-184918CCC16A@amacapital.net>
On Tue, Jul 14, 2020 at 05:46:05AM -0700, Andy Lutomirski wrote:
> x86 has this exact problem. At least no more than 64*8 CPUs share the cache line :)
I've seen patches for a 'sparse' bitmap to solve related problems.
It's basically the same code, except it multiplies everything (size,
bit-nr) by a constant to reduce the number of active bits per line.
This sadly doesn't take topology into account, but reducing contention
is still good ofcourse.
^ permalink raw reply
* Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Andy Lutomirski @ 2020-07-14 12:46 UTC (permalink / raw)
To: Nicholas Piggin
Cc: linux-arch, Arnd Bergmann, Peter Zijlstra, X86 ML, LKML, Linux-MM,
Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <1594708054.04iuyxuyb5.astroid@bobo.none>
> On Jul 13, 2020, at 11:31 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Nicholas Piggin's message of July 14, 2020 3:04 pm:
>> Excerpts from Andy Lutomirski's message of July 14, 2020 4:18 am:
>>>
>>>> On Jul 13, 2020, at 9:48 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>
>>>> Excerpts from Andy Lutomirski's message of July 14, 2020 1:59 am:
>>>>>> On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>>>
>>>>>> On big systems, the mm refcount can become highly contented when doing
>>>>>> a lot of context switching with threaded applications (particularly
>>>>>> switching between the idle thread and an application thread).
>>>>>>
>>>>>> Abandoning lazy tlb slows switching down quite a bit in the important
>>>>>> user->idle->user cases, so so instead implement a non-refcounted scheme
>>>>>> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
>>>>>> any remaining lazy ones.
>>>>>>
>>>>>> On a 16-socket 192-core POWER8 system, a context switching benchmark
>>>>>> with as many software threads as CPUs (so each switch will go in and
>>>>>> out of idle), upstream can achieve a rate of about 1 million context
>>>>>> switches per second. After this patch it goes up to 118 million.
>>>>>>
>>>>>
>>>>> I read the patch a couple of times, and I have a suggestion that could
>>>>> be nonsense. You are, effectively, using mm_cpumask() as a sort of
>>>>> refcount. You're saying "hey, this mm has no more references, but it
>>>>> still has nonempty mm_cpumask(), so let's send an IPI and shoot down
>>>>> those references too." I'm wondering whether you actually need the
>>>>> IPI. What if, instead, you actually treated mm_cpumask as a refcount
>>>>> for real? Roughly, in __mmdrop(), you would only free the page tables
>>>>> if mm_cpumask() is empty. And, in the code that removes a CPU from
>>>>> mm_cpumask(), you would check if mm_users == 0 and, if so, check if
>>>>> you just removed the last bit from mm_cpumask and potentially free the
>>>>> mm.
>>>>>
>>>>> Getting the locking right here could be a bit tricky -- you need to
>>>>> avoid two CPUs simultaneously exiting lazy TLB and thinking they
>>>>> should free the mm, and you also need to avoid an mm with mm_users
>>>>> hitting zero concurrently with the last remote CPU using it lazily
>>>>> exiting lazy TLB. Perhaps this could be resolved by having mm_count
>>>>> == 1 mean "mm_cpumask() is might contain bits and, if so, it owns the
>>>>> mm" and mm_count == 0 meaning "now it's dead" and using some careful
>>>>> cmpxchg or dec_return to make sure that only one CPU frees it.
>>>>>
>>>>> Or maybe you'd need a lock or RCU for this, but the idea would be to
>>>>> only ever take the lock after mm_users goes to zero.
>>>>
>>>> I don't think it's nonsense, it could be a good way to avoid IPIs.
>>>>
>>>> I haven't seen much problem here that made me too concerned about IPIs
>>>> yet, so I think the simple patch may be good enough to start with
>>>> for powerpc. I'm looking at avoiding/reducing the IPIs by combining the
>>>> unlazying with the exit TLB flush without doing anything fancy with
>>>> ref counting, but we'll see.
>>>
>>> I would be cautious with benchmarking here. I would expect that the
>>> nasty cases may affect power consumption more than performance — the
>>> specific issue is IPIs hitting idle cores, and the main effects are to
>>> slow down exit() a bit but also to kick the idle core out of idle.
>>> Although, if the idle core is in a deep sleep, that IPI could be
>>> *very* slow.
>>
>> It will tend to be self-limiting to some degree (deeper idle cores
>> would tend to have less chance of IPI) but we have bigger issues on
>> powerpc with that, like broadcast IPIs to the mm cpumask for THP
>> management. Power hasn't really shown up as an issue but powerpc
>> CPUs may have their own requirements and issues there, shall we say.
>>
>>> So I think it’s worth at least giving this a try.
>>
>> To be clear it's not a complete solution itself. The problem is of
>> course that mm cpumask gives you false negatives, so the bits
>> won't always clean up after themselves as CPUs switch away from their
>> lazy tlb mms.
>
> ^^
>
> False positives: CPU is in the mm_cpumask, but is not using the mm
> as a lazy tlb. So there can be bits left and never freed.
>
> If you closed the false positives, you're back to a shared mm cache
> line on lazy mm context switches.
x86 has this exact problem. At least no more than 64*8 CPUs share the cache line :)
Can your share your benchmark?
>
> Thanks,
> Nick
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 12:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714112927.GV10769@hirez.programming.kicks-ass.net>
On Tue, Jul 14, 2020 at 01:29:27PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 11:28:27AM +0100, Will Deacon wrote:
>
> > As Ard says, module_alloc() _is_ special, in the sense that the virtual
> > memory it allocates wants to be close to the kernel text, whereas the
> > concept of allocating executable memory is broader and doesn't have these
> > restrictions. So, while I'm in favour of having a text_alloc() interface
> > that can be used by callers which only require an executable mapping, I'd
> > much prefer for the module_alloc() code to remain for, err, modules.
>
> So on x86 all those things (kprobes, bpf, ftrace) require that same
> closeness.
>
> An interface like the late vmalloc_exec() will simply not work for us.
>
> We recently talked about arm64-kprobes and how you're not doing any of
> the optimizations and fully rely on the exception return. And I see
> you're one of the few archs that has bpf_jit_alloc_exec() (also,
> shouldn't you be using PAGE_KERNEL_EXEC there?). But the BPF core seems
> to use module_alloc() as a default means of allocating text.
>
>
> So what should this look like? Have a text_alloc() with an argument that
> indicates where? But then I suppose we also need a means to manage PLT
> entries. Otherwise I don't exactly see how you're going to call BPF
> code, or how that BPF stuff is going to call back into its helpers.
To make this less of a havoc to arch maintainers what if:
void * __weak module_alloc(unsigned long size)
{
if (IS_ENABLED(HAS_TEXT_ALLOC))
return text_alloc(size);
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
NUMA_NO_NODE, __builtin_return_address(0));
}
Then in arch/x86/Kconfig I could put:
config HAS_TEXT_ALLOC
def_bool y
This would scale down the patch set just to add kernel/text.c and
arch/x86/kernel/text.c, and allows gradual migration to other arch's.
/Jarkko
^ permalink raw reply
* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 12:11 UTC (permalink / raw)
To: Russell King - ARM Linux admin
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
Helge Deller, John Fastabend, Anil S Keshavamurthy,
Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
Gerald Schaefer, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Sergey Senozhatsky, Palmer Dabbelt,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714103333.GB1551@shell.armlinux.org.uk>
On Tue, Jul 14, 2020 at 11:33:33AM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Jul 14, 2020 at 01:17:22PM +0300, Ard Biesheuvel wrote:
> > On Tue, 14 Jul 2020 at 12:53, Jarkko Sakkinen
> > <jarkko.sakkinen@linux.intel.com> wrote:
> > >
> > > On Mon, Jul 13, 2020 at 10:49:48PM +0300, Ard Biesheuvel wrote:
> > > > This patch suggests that there are other reasons why conflating
> > > > allocation of module space and allocating text pages for other uses
> > > > is a bad idea, but switching all users to text_alloc() is a step in
> > > > the wrong direction. It would be better to stop using module_alloc()
> > > > in core code except in the module loader, and have a generic
> > > > text_alloc() that can be overridden by the arch if necessary. Note
> > > > that x86 and s390 are the only architectures that use module_alloc()
> > > > in ftrace code.
> > >
> > > This series essentially does this: introduces text_alloc() and
> > > text_memfree(), which have generic implementations in kernel/text.c.
> > > Those can be overriddent by arch specific implementations.
> > >
> > > What you think should be done differently than in my patch set?
> > >
> >
> > On arm64, module_alloc is only used by the module loader, and so
> > pulling it out and renaming it will cause unused code to be
> > incorporated into the kernel when building without module support,
> > which is the use case you claim to be addressing.
> >
> > Module_alloc has semantics that are intimately tied to the module
> > loader, but over the years, it ended up being (ab)used by other
> > subsystems, which don't require those semantics but just need n pages
> > of vmalloc space with executable permissions.
> >
> > So the correct approach is to make text_alloc() implement just that,
> > generically, and switch bpf etc to use it. Then, only on architectures
> > that need it, override it with an implementation that has the required
> > additional semantics.
> >
> > Refactoring 10+ architectures like this without any regard for how
> > text_alloc() deviates from module_alloc() just creates a lot of churn
> > that others will have to clean up after you.
>
> For 32-bit ARM, our bpf code uses "blx/bx" (or equivalent code
> sequences) rather than encoding a "bl" or "b", so BPF there doesn't
> care where the executable memory is mapped, and doesn't need any
> PLTs. Given that, should bpf always allocate from the vmalloc()
> region to preserve the module space for modules?
Most of the allocators use __vmalloc_node_range() but arch/nios2
uses just plain kmalloc():
/*
* Modules should NOT be allocated with kmalloc for (obvious) reasons.
* But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
* from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
* addresses in 0xc0000000)
*/
void *module_alloc(unsigned long size)
{
if (size == 0)
return NULL;
return kmalloc(size, GFP_KERNEL);
}
Also consider arch/x86 module_alloc():
void *module_alloc(unsigned long size)
{
void *p;
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
p = __vmalloc_node_range(size, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL,
PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
if (p && (kasan_module_alloc(p, size) < 0)) {
vfree(p);
return NULL;
}
return p;
}
The generic version is
void * __weak module_alloc(unsigned long size)
{
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
NUMA_NO_NODE, __builtin_return_address(0));
}
There is quite a lot of divergence from the generic version.
However, in other arch's it's mostly just divergence in vmalloc()
parameters and not as radical as in x86.
I could probably limit the total havoc to just nios2 and x86 if there
is a set of vmalloc parameters that work for all arch's. Then there
could be kernel/text.c and re-implementations for x86 and nios2.
I'm all for having separate text_alloc() and text_memfree() if these
issues can be somehow sorted out.
/Jarkko
^ permalink raw reply
* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 11:55 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
Helge Deller, John Fastabend, Anil S Keshavamurthy,
Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
Anders Roxell, Sven Schnelle,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
Josh Poimboeuf, KP Singh, Gerald Schaefer, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Sergey Senozhatsky, Palmer Dabbelt,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAMj1kXGV_bWehdQvxaMBTOYHXUoFjifBWNpyVy3gaWKktko1mg@mail.gmail.com>
On Tue, Jul 14, 2020 at 01:17:22PM +0300, Ard Biesheuvel wrote:
> > This series essentially does this: introduces text_alloc() and
> > text_memfree(), which have generic implementations in kernel/text.c.
> > Those can be overriddent by arch specific implementations.
> >
> > What you think should be done differently than in my patch set?
> >
>
> On arm64, module_alloc is only used by the module loader, and so
> pulling it out and renaming it will cause unused code to be
> incorporated into the kernel when building without module support,
> which is the use case you claim to be addressing.
It certainly does not cause the full module loader to be bundle, only
the allocator.
> Module_alloc has semantics that are intimately tied to the module
> loader, but over the years, it ended up being (ab)used by other
> subsystems, which don't require those semantics but just need n pages
> of vmalloc space with executable permissions.
>
> So the correct approach is to make text_alloc() implement just that,
> generically, and switch bpf etc to use it. Then, only on architectures
> that need it, override it with an implementation that has the required
> additional semantics.
>
> Refactoring 10+ architectures like this without any regard for how
> text_alloc() deviates from module_alloc() just creates a lot of churn
> that others will have to clean up after you.
Using generic text_alloc() in kernel/kprobes.c would make it behave
differently in arch's that reimplement module_alloc(). That's the main
driver for my approach.
/Jarkko
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Ard Biesheuvel @ 2020-07-14 12:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
Heiko Carstens, Alexei Starovoitov, Jarkko Sakkinen, Atish Patra,
Will Deacon, Daniel Borkmann, Masahiro Yamada, Nayna Jain,
Ley Foon Tan, Christian Borntraeger, Sami Tolvanen, Naveen N. Rao,
Mao Han, Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann, David Howells,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
Anders Roxell, Sven Schnelle,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
Josh Poimboeuf, KP Singh, Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714112927.GV10769@hirez.programming.kicks-ass.net>
On Tue, 14 Jul 2020 at 14:31, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jul 14, 2020 at 11:28:27AM +0100, Will Deacon wrote:
>
> > As Ard says, module_alloc() _is_ special, in the sense that the virtual
> > memory it allocates wants to be close to the kernel text, whereas the
> > concept of allocating executable memory is broader and doesn't have these
> > restrictions. So, while I'm in favour of having a text_alloc() interface
> > that can be used by callers which only require an executable mapping, I'd
> > much prefer for the module_alloc() code to remain for, err, modules.
>
> So on x86 all those things (kprobes, bpf, ftrace) require that same
> closeness.
>
> An interface like the late vmalloc_exec() will simply not work for us.
>
Fair enough. So for x86, implementing text_alloc() as an alias of
module_alloc() makes sense. But that is not the case in general.
> We recently talked about arm64-kprobes and how you're not doing any of
> the optimizations and fully rely on the exception return. And I see
> you're one of the few archs that has bpf_jit_alloc_exec() (also,
> shouldn't you be using PAGE_KERNEL_EXEC there?). But the BPF core seems
> to use module_alloc() as a default means of allocating text.
>
Indeed. Which means it uses up module space which may be scarce,
especially on 32-bit ARM, and gets backed by kasan shadow pages, which
only makes sense for modules (if CONFIG_KASAN=y)
>
> So what should this look like? Have a text_alloc() with an argument that
> indicates where? But then I suppose we also need a means to manage PLT
> entries. Otherwise I don't exactly see how you're going to call BPF
> code, or how that BPF stuff is going to call back into its helpers.
>
If x86 chooses to back its implementation of text_alloc() by
module_alloc(), that is absolutely fine. But arm64 has no use for
text_alloc() at all today (bpf and kprobes don't use module_alloc(),
and ftrace does not implement dynamic trampoline allocation), and in
the general case, bpf, kprobes, ftrace and the module loader all have
different requirements that deviate subtly between architectures.
So perhaps the answer is to have text_alloc() not with a 'where'
argument but with a 'why' argument. Or more simply, just have separate
alloc/free APIs for each case, with generic versions that can be
overridden by the architecture.
^ permalink raw reply
* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Peter Zijlstra @ 2020-07-14 11:45 UTC (permalink / raw)
To: Russell King - ARM Linux admin
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
Masami Hiramatsu, Andrew Morton, Mark Rutland,
James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
Helge Deller, John Fastabend, Anil S Keshavamurthy,
Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
Jarkko Sakkinen, Atish Patra, Will Deacon, Daniel Borkmann,
Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
Dmitry Vyukov, Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
Thiago Jung Bauermann,
open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
Gerald Schaefer, Nick Hu,
open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
Sergey Senozhatsky, Palmer Dabbelt,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714103333.GB1551@shell.armlinux.org.uk>
On Tue, Jul 14, 2020 at 11:33:33AM +0100, Russell King - ARM Linux admin wrote:
> For 32-bit ARM, our bpf code uses "blx/bx" (or equivalent code
> sequences) rather than encoding a "bl" or "b", so BPF there doesn't
> care where the executable memory is mapped, and doesn't need any
> PLTs. Given that, should bpf always allocate from the vmalloc()
> region to preserve the module space for modules?
Ah, okay, then I suspect arm64 does something similar there. Thanks!
> I'm more concerned about ftrace though, but only because I don't
> have the understanding of that subsystem to really say whether there
> are any side effects from having the allocations potentially be out
> of range of a "bl" or "b" instruction.
>
> If ftrace jumps both to and from the allocated page using a "load
> address to register, branch to register" approach like BPF does, then
> ftrace should be safe - and again, raises the issue that maybe it
> should always come from vmalloc() space.
I think the problem with ftrace is patching multiple instruction;
because it sounds like you'd need something to load the absolute address
in a register and then jump to that. And where it's relatively easy to
replace a single instruction, replace multiple instructions gets real
tricky real quick.
Which then leads to you being stuck with that 26bit displacement, IIRC.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox