LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v6 1/4] powerpc/64s: Introduce temporary mm for Radix MMU
From: Christopher M. Riedl @ 2021-09-16  0:24 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: linuxppc-dev, linux-hardening
In-Reply-To: <CACzsE9rThU0JBACJoeeHOyEOA8CbFwRExrOhTsySaOH44yJa6g@mail.gmail.com>

On Sat Sep 11, 2021 at 3:26 AM CDT, Jordan Niethe wrote:
> On Sat, Sep 11, 2021 at 12:35 PM Christopher M. Riedl
> <cmr@bluescreens.de> wrote:
> >
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. Another benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> >
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> >
> > Interrupts must be disabled while the temporary mm is in use. HW
> > breakpoints, which may have been set by userspace as watchpoints on
> > addresses now within the temporary mm, are saved and disabled when
> > loading the temporary mm. The HW breakpoints are restored when unloading
> > the temporary mm. All HW breakpoints are indiscriminately disabled while
> > the temporary mm is in use - this may include breakpoints set by perf.
>
> I had thought CPUs with a DAWR might not need to do this because the
> privilege level that breakpoints trigger on can be configured. But it
> turns out in ptrace, etc we use HW_BRK_TYPE_PRIV_ALL.

Thanks for double checking :)

>
> >
> > Based on x86 implementation:
> >
> > commit cefa929c034e
> > ("x86/mm: Introduce temporary mm structs")
> >
> > Signed-off-by: Christopher M. Riedl <cmr@bluescreens.de>
> >
> > ---
> >
> > v6:  * Use {start,stop}_using_temporary_mm() instead of
> >        {use,unuse}_temporary_mm() as suggested by Christophe.
> >
> > v5:  * Drop support for using a temporary mm on Book3s64 Hash MMU.
> >
> > v4:  * Pass the prev mm instead of NULL to switch_mm_irqs_off() when
> >        using/unusing the temp mm as suggested by Jann Horn to keep
> >        the context.active counter in-sync on mm/nohash.
> >      * Disable SLB preload in the temporary mm when initializing the
> >        temp_mm struct.
> >      * Include asm/debug.h header to fix build issue with
> >        ppc44x_defconfig.
> > ---
> >  arch/powerpc/include/asm/debug.h |  1 +
> >  arch/powerpc/kernel/process.c    |  5 +++
> >  arch/powerpc/lib/code-patching.c | 56 ++++++++++++++++++++++++++++++++
> >  3 files changed, 62 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> > index 86a14736c76c..dfd82635ea8b 100644
> > --- a/arch/powerpc/include/asm/debug.h
> > +++ b/arch/powerpc/include/asm/debug.h
> > @@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
> >  #endif
> >
> >  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> > +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> >  bool ppc_breakpoint_available(void);
> >  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> >  extern void do_send_trap(struct pt_regs *regs, unsigned long address,
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 50436b52c213..6aa1f5c4d520 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -865,6 +865,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
> >         return 0;
> >  }
> >
> > +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> > +{
> > +       memcpy(brk, this_cpu_ptr(&current_brk[nr]), sizeof(*brk));
> > +}
>
> The breakpoint code is already a little hard to follow. I'm worried
> doing this might spread breakpoint handling into more places in the
> future.
> What about something like having a breakpoint_pause() function which
> clears the hardware registers only and then a breakpoint_resume()
> function that copies from current_brk[] back to the hardware
> registers?
> Then we don't have to make another copy of the breakpoint state.

I think that sounds reasonable - I'll add those functions instead with
the next spin.

>
> > +
> >  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> >  {
> >         memcpy(this_cpu_ptr(&current_brk[nr]), brk, sizeof(*brk));
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index f9a3019e37b4..8d61a7d35b89 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
>
> Sorry I might have missed it, but what was the reason for not putting
> this stuff in mmu_context.h?

x86 ended up moving this code into their code-patching file as well. I
suppose nobody has thought of another use for a temporary mm like this
yet :)

>
> > @@ -17,6 +17,9 @@
> >  #include <asm/code-patching.h>
> >  #include <asm/setup.h>
> >  #include <asm/inst.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/debug.h>
> > +#include <asm/tlb.h>
> >
> >  static int __patch_instruction(u32 *exec_addr, struct ppc_inst instr, u32 *patch_addr)
> >  {
> > @@ -45,6 +48,59 @@ int raw_patch_instruction(u32 *addr, struct ppc_inst instr)
> >  }
> >
> >  #ifdef CONFIG_STRICT_KERNEL_RWX
> > +
> > +struct temp_mm {
> > +       struct mm_struct *temp;
> > +       struct mm_struct *prev;
> > +       struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> ^ Then we wouldn't need this.
> > +};
> > +
> > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> > +{
> > +       /* We currently only support temporary mm on the Book3s64 Radix MMU */
> > +       WARN_ON(!radix_enabled());
> > +
> > +       temp_mm->temp = mm;
> > +       temp_mm->prev = NULL;
> > +       memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> > +}
> > +
> > +static inline void start_using_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +       lockdep_assert_irqs_disabled();
> > +
> > +       temp_mm->prev = current->active_mm;
> > +       switch_mm_irqs_off(temp_mm->prev, temp_mm->temp, current);
>
> Now that we only support radix it should be fine again to have it like
> this:
> switch_mm_irqs_off(NULL, temp_mm->temp, current);?
> It was changed from that because it would cause issues on nohash I
> thought.

That's true - but if we want to support the other MMUs in the future
I'd rather just keep it as-is. AFAICS there's no harm in passing
temp_mm->prev here instead of NULL.

>
> > +
> > +       WARN_ON(!mm_is_thread_local(temp_mm->temp));
> > +
> > +       if (ppc_breakpoint_available()) {
> > +               struct arch_hw_breakpoint null_brk = {0};
> > +               int i = 0;
> > +
> > +               for (; i < nr_wp_slots(); ++i) {
> > +                       __get_breakpoint(i, &temp_mm->brk[i]);
> > +                       if (temp_mm->brk[i].type != 0)
> > +                               __set_breakpoint(i, &null_brk);
> > +               }
> > +       }
> > +}
> > +
> > +static inline void stop_using_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +       lockdep_assert_irqs_disabled();
> > +
> > +       switch_mm_irqs_off(temp_mm->temp, temp_mm->prev, current);
> > +
> > +       if (ppc_breakpoint_available()) {
> > +               int i = 0;
> > +
> > +               for (; i < nr_wp_slots(); ++i)
> > +                       if (temp_mm->brk[i].type != 0)
> > +                               __set_breakpoint(i, &temp_mm->brk[i]);
> > +       }
> > +}
> > +
> >  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >
> >  static int text_area_cpu_up(unsigned int cpu)
> > --
> > 2.32.0
> >
> Thanks,
> Jordan


^ permalink raw reply

* Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
From: Christopher M. Riedl @ 2021-09-16  0:29 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: linuxppc-dev, linux-hardening
In-Reply-To: <CACzsE9rHnN9hY4b926r6Fc5tC0Tc7cvkF8cgVODunz7ZZYNFyA@mail.gmail.com>

On Sat Sep 11, 2021 at 4:14 AM CDT, Jordan Niethe wrote:
> On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl
> <cmr@bluescreens.de> wrote:
> >
> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> > address to be patched is temporarily mapped as writeable. Currently, a
> > per-cpu vmalloc patch area is used for this purpose. While the patch
> > area is per-cpu, the temporary page mapping is inserted into the kernel
> > page tables for the duration of patching. The mapping is exposed to CPUs
> > other than the patching CPU - this is undesirable from a hardening
> > perspective. Use a temporary mm instead which keeps the mapping local to
> > the CPU doing the patching.
> >
> > Use the `poking_init` init hook to prepare a temporary mm and patching
> > address. Initialize the temporary mm by copying the init mm. Choose a
> > randomized patching address inside the temporary mm userspace address
> > space. The patching address is randomized between PAGE_SIZE and
> > DEFAULT_MAP_WINDOW-PAGE_SIZE.
> >
> > Bits of entropy with 64K page size on BOOK3S_64:
> >
> >         bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> >
> >         PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> >         bits of entropy = log2(128TB / 64K)
> >         bits of entropy = 31
> >
> > The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU
> > operates - by default the space above DEFAULT_MAP_WINDOW is not
> > available. Currently the Hash MMU does not use a temporary mm so
> > technically this upper limit isn't necessary; however, a larger
> > randomization range does not further "harden" this overall approach and
> > future work may introduce patching with a temporary mm on Hash as well.
> >
> > Randomization occurs only once during initialization at boot for each
> > possible CPU in the system.
> >
> > Introduce two new functions, map_patch_mm() and unmap_patch_mm(), to
> > respectively create and remove the temporary mapping with write
> > permissions at patching_addr. Map the page with PAGE_KERNEL to set
> > EAA[0] for the PTE which ignores the AMR (so no need to unlock/lock
> > KUAP) according to PowerISA v3.0b Figure 35 on Radix.
> >
> > Based on x86 implementation:
> >
> > commit 4fc19708b165
> > ("x86/alternatives: Initialize temporary mm for patching")
> >
> > and:
> >
> > commit b3fd8e83ada0
> > ("x86/alternatives: Use temporary mm for text poking")
> >
> > Signed-off-by: Christopher M. Riedl <cmr@bluescreens.de>
> >
> > ---
> >
> > v6:  * Small clean-ups (naming, formatting, style, etc).
> >      * Call stop_using_temporary_mm() before pte_unmap_unlock() after
> >        patching.
> >      * Replace BUG_ON()s in poking_init() w/ WARN_ON()s.
> >
> > v5:  * Only support Book3s64 Radix MMU for now.
> >      * Use a per-cpu datastructure to hold the patching_addr and
> >        patching_mm to avoid the need for a synchronization lock/mutex.
> >
> > v4:  * In the previous series this was two separate patches: one to init
> >        the temporary mm in poking_init() (unused in powerpc at the time)
> >        and the other to use it for patching (which removed all the
> >        per-cpu vmalloc code). Now that we use poking_init() in the
> >        existing per-cpu vmalloc approach, that separation doesn't work
> >        as nicely anymore so I just merged the two patches into one.
> >      * Preload the SLB entry and hash the page for the patching_addr
> >        when using Hash on book3s64 to avoid taking an SLB and Hash fault
> >        during patching. The previous implementation was a hack which
> >        changed current->mm to allow the SLB and Hash fault handlers to
> >        work with the temporary mm since both of those code-paths always
> >        assume mm == current->mm.
> >      * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we
> >        have to manage the mm->context.active_cpus counter and mm cpumask
> >        since they determine (via mm_is_thread_local()) if the TLB flush
> >        in pte_clear() is local or not - it should always be local when
> >        we're using the temporary mm. On book3s64's Radix MMU we can
> >        just call local_flush_tlb_mm().
> >      * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of
> >        KUAP.
> > ---
> >  arch/powerpc/lib/code-patching.c | 119 +++++++++++++++++++++++++++++--
> >  1 file changed, 112 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index e802e42c2789..af8e2a02a9dd 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -11,6 +11,7 @@
> >  #include <linux/cpuhotplug.h>
> >  #include <linux/slab.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/random.h>
> >
> >  #include <asm/tlbflush.h>
> >  #include <asm/page.h>
> > @@ -103,6 +104,7 @@ static inline void stop_using_temporary_mm(struct temp_mm *temp_mm)
> >
> >  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >  static DEFINE_PER_CPU(unsigned long, cpu_patching_addr);
> > +static DEFINE_PER_CPU(struct mm_struct *, cpu_patching_mm);
> >
> >  static int text_area_cpu_up(unsigned int cpu)
> >  {
> > @@ -126,8 +128,48 @@ static int text_area_cpu_down(unsigned int cpu)
> >         return 0;
> >  }
> >
> > +static __always_inline void __poking_init_temp_mm(void)
> > +{
> > +       int cpu;
> > +       spinlock_t *ptl; /* for protecting pte table */
>
> ptl is just used so we don't have to open code allocating a pte in
> patching_mm isn't it?

Yup - I think that comment was a copy-pasta... I'll improve it.

>
> > +       pte_t *ptep;
> > +       struct mm_struct *patching_mm;
> > +       unsigned long patching_addr;
> > +
> > +       for_each_possible_cpu(cpu) {
> > +               patching_mm = copy_init_mm();
> > +               WARN_ON(!patching_mm);
>
> Would it be okay to just let the mmu handle null pointer dereferences?

In general I think yes; however, the NULL dereference wouldn't occur
until later during actual patching so I thought an early WARN here is
appropriate. 

>
> > +               per_cpu(cpu_patching_mm, cpu) = patching_mm;
> > +
> > +               /*
> > +                * Choose a randomized, page-aligned address from the range:
> > +                * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE] The lower
> > +                * address bound is PAGE_SIZE to avoid the zero-page.  The
> > +                * upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to
> > +                * stay under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
> > +                */
> > +               patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
> > +                               % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
> > +               per_cpu(cpu_patching_addr, cpu) = patching_addr;
>
> On x86 the randomization depends on CONFIG_RANDOMIZE_BASE. Should it
> be controllable here too?

IIRC CONFIG_RANDOMIZE_BASE is for KASLR which IMO doesn't really have
much to do with this.

>
> > +
> > +               /*
> > +                * PTE allocation uses GFP_KERNEL which means we need to
> > +                * pre-allocate the PTE here because we cannot do the
> > +                * allocation during patching when IRQs are disabled.
> > +                */
> > +               ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> > +               WARN_ON(!ptep);
> > +               pte_unmap_unlock(ptep, ptl);
> > +       }
> > +}
> > +
> >  void __init poking_init(void)
> >  {
> > +       if (radix_enabled()) {
> > +               __poking_init_temp_mm();
>
> Should this also be done with cpuhp_setup_state()?

I think I prefer doing the setup ahead of time during boot.

>
> > +               return;
> > +       }
> > +
> >         WARN_ON(cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >                 "powerpc/text_poke:online", text_area_cpu_up,
> >                 text_area_cpu_down) < 0);
> > @@ -197,30 +239,93 @@ static inline int unmap_patch_area(void)
> >         return 0;
> >  }
> >
> > +struct patch_mapping {
> > +       spinlock_t *ptl; /* for protecting pte table */
> > +       pte_t *ptep;
> > +       struct temp_mm temp_mm;
> > +};
> > +
> > +/*
> > + * This can be called for kernel text or a module.
> > + */
> > +static int map_patch_mm(const void *addr, struct patch_mapping *patch_mapping)
> > +{
> > +       struct page *page;
> > +       struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm);
> > +       unsigned long patching_addr = __this_cpu_read(cpu_patching_addr);
> > +
> > +       if (is_vmalloc_or_module_addr(addr))
> > +               page = vmalloc_to_page(addr);
> > +       else
> > +               page = virt_to_page(addr);
> > +
> > +       patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > +                                            &patch_mapping->ptl);
> > +       if (unlikely(!patch_mapping->ptep)) {
> > +               pr_warn("map patch: failed to allocate pte for patching\n");
> > +               return -1;
> > +       }
> > +
> > +       set_pte_at(patching_mm, patching_addr, patch_mapping->ptep,
> > +                  pte_mkdirty(mk_pte(page, PAGE_KERNEL)));
> > +
> > +       init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> > +       start_using_temporary_mm(&patch_mapping->temp_mm);
> > +
> > +       return 0;
> > +}
> > +
> > +static int unmap_patch_mm(struct patch_mapping *patch_mapping)
> > +{
> > +       struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm);
> > +       unsigned long patching_addr = __this_cpu_read(cpu_patching_addr);
> > +
> > +       pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
> > +
> > +       local_flush_tlb_mm(patching_mm);
> > +       stop_using_temporary_mm(&patch_mapping->temp_mm);
> > +
> > +       pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_patch_instruction(u32 *addr, struct ppc_inst instr)
> >  {
> >         int err, rc = 0;
> >         u32 *patch_addr = NULL;
> >         unsigned long flags;
> > +       struct patch_mapping patch_mapping;
> >
> >         /*
> > -        * During early early boot patch_instruction is called
> > -        * when text_poke_area is not ready, but we still need
> > -        * to allow patching. We just do the plain old patching
> > +        * During early early boot patch_instruction is called when the
> > +        * patching_mm/text_poke_area is not ready, but we still need to allow
> > +        * patching. We just do the plain old patching.
> >          */
> > -       if (!this_cpu_read(text_poke_area))
> > -               return raw_patch_instruction(addr, instr);
> > +       if (radix_enabled()) {
> > +               if (!this_cpu_read(cpu_patching_mm))
> > +                       return raw_patch_instruction(addr, instr);
> > +       } else {
> > +               if (!this_cpu_read(text_poke_area))
> > +                       return raw_patch_instruction(addr, instr);
> > +       }
>
> Would testing cpu_patching_addr handler both of these cases?
>
> Then I think it might be clearer to do something like this:
> if (radix_enabled()) {
> return patch_instruction_mm(addr, instr);
> }
>
> patch_instruction_mm() would combine map_patch_mm(), then patching and
> unmap_patch_mm() into one function.
>
> IMO, a bit of code duplication would be cleaner than checking multiple
> times for radix_enabled() and having struct patch_mapping especially
> for maintaining state.

Hmm, I think it's a good idea - I'll give it a go for the next version.
Thanks for the suggestion!

>
> >
> >         local_irq_save(flags);
> >
> > -       err = map_patch_area(addr);
> > +       if (radix_enabled())
> > +               err = map_patch_mm(addr, &patch_mapping);
> > +       else
> > +               err = map_patch_area(addr);
> >         if (err)
> >                 goto out;
> >
> >         patch_addr = (u32 *)(__this_cpu_read(cpu_patching_addr) | offset_in_page(addr));
> >         rc = __patch_instruction(addr, instr, patch_addr);
> >
> > -       err = unmap_patch_area();
> > +       if (radix_enabled())
> > +               err = unmap_patch_mm(&patch_mapping);
> > +       else
> > +               err = unmap_patch_area();
> >
> >  out:
> >         local_irq_restore(flags);
> > --
> > 2.32.0
> >
> Thanks,
> Jordan


^ permalink raw reply

* Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
From: Christopher M. Riedl @ 2021-09-16  0:45 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: linuxppc-dev, linux-hardening
In-Reply-To: <CACzsE9qr6QK_Xm6yVXT061sxR9SXaeFx5fkjiNAXFBFr6WDQOw@mail.gmail.com>

On Tue Sep 14, 2021 at 11:24 PM CDT, Jordan Niethe wrote:
> On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl
> <cmr@bluescreens.de> wrote:
> > ... 
> > +/*
> > + * This can be called for kernel text or a module.
> > + */
> > +static int map_patch_mm(const void *addr, struct patch_mapping *patch_mapping)
> > +{
> > +       struct page *page;
> > +       struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm);
> > +       unsigned long patching_addr = __this_cpu_read(cpu_patching_addr);
> > +
> > +       if (is_vmalloc_or_module_addr(addr))
> > +               page = vmalloc_to_page(addr);
> > +       else
> > +               page = virt_to_page(addr);
> > +
> > +       patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > +                                            &patch_mapping->ptl);
> > +       if (unlikely(!patch_mapping->ptep)) {
> > +               pr_warn("map patch: failed to allocate pte for patching\n");
> > +               return -1;
> > +       }
> > +
> > +       set_pte_at(patching_mm, patching_addr, patch_mapping->ptep,
> > +                  pte_mkdirty(mk_pte(page, PAGE_KERNEL)));
>
> I think because switch_mm_irqs_off() will not necessarily have a
> barrier so a ptesync would be needed.
> A spurious fault here from __patch_instruction() would not be handled
> correctly.

Sorry I don't quite follow - can you explain this to me in a bit more
detail?

^ permalink raw reply

* Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
From: Jordan Niethe @ 2021-09-16  1:52 UTC (permalink / raw)
  To: Christopher M. Riedl; +Cc: linuxppc-dev, linux-hardening
In-Reply-To: <CEAVVEORU7UL.1ZDGQIF33JSOX@wrwlf0000>

On Thu, Sep 16, 2021 at 10:38 AM Christopher M. Riedl
<cmr@bluescreens.de> wrote:
>
> On Sat Sep 11, 2021 at 4:14 AM CDT, Jordan Niethe wrote:
> > On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl
> > <cmr@bluescreens.de> wrote:
> > >
> > > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> > > address to be patched is temporarily mapped as writeable. Currently, a
> > > per-cpu vmalloc patch area is used for this purpose. While the patch
> > > area is per-cpu, the temporary page mapping is inserted into the kernel
> > > page tables for the duration of patching. The mapping is exposed to CPUs
> > > other than the patching CPU - this is undesirable from a hardening
> > > perspective. Use a temporary mm instead which keeps the mapping local to
> > > the CPU doing the patching.
> > >
> > > Use the `poking_init` init hook to prepare a temporary mm and patching
> > > address. Initialize the temporary mm by copying the init mm. Choose a
> > > randomized patching address inside the temporary mm userspace address
> > > space. The patching address is randomized between PAGE_SIZE and
> > > DEFAULT_MAP_WINDOW-PAGE_SIZE.
> > >
> > > Bits of entropy with 64K page size on BOOK3S_64:
> > >
> > >         bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> > >
> > >         PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> > >         bits of entropy = log2(128TB / 64K)
> > >         bits of entropy = 31
> > >
> > > The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU
> > > operates - by default the space above DEFAULT_MAP_WINDOW is not
> > > available. Currently the Hash MMU does not use a temporary mm so
> > > technically this upper limit isn't necessary; however, a larger
> > > randomization range does not further "harden" this overall approach and
> > > future work may introduce patching with a temporary mm on Hash as well.
> > >
> > > Randomization occurs only once during initialization at boot for each
> > > possible CPU in the system.
> > >
> > > Introduce two new functions, map_patch_mm() and unmap_patch_mm(), to
> > > respectively create and remove the temporary mapping with write
> > > permissions at patching_addr. Map the page with PAGE_KERNEL to set
> > > EAA[0] for the PTE which ignores the AMR (so no need to unlock/lock
> > > KUAP) according to PowerISA v3.0b Figure 35 on Radix.
> > >
> > > Based on x86 implementation:
> > >
> > > commit 4fc19708b165
> > > ("x86/alternatives: Initialize temporary mm for patching")
> > >
> > > and:
> > >
> > > commit b3fd8e83ada0
> > > ("x86/alternatives: Use temporary mm for text poking")
> > >
> > > Signed-off-by: Christopher M. Riedl <cmr@bluescreens.de>
> > >
> > > ---
> > >
> > > v6:  * Small clean-ups (naming, formatting, style, etc).
> > >      * Call stop_using_temporary_mm() before pte_unmap_unlock() after
> > >        patching.
> > >      * Replace BUG_ON()s in poking_init() w/ WARN_ON()s.
> > >
> > > v5:  * Only support Book3s64 Radix MMU for now.
> > >      * Use a per-cpu datastructure to hold the patching_addr and
> > >        patching_mm to avoid the need for a synchronization lock/mutex.
> > >
> > > v4:  * In the previous series this was two separate patches: one to init
> > >        the temporary mm in poking_init() (unused in powerpc at the time)
> > >        and the other to use it for patching (which removed all the
> > >        per-cpu vmalloc code). Now that we use poking_init() in the
> > >        existing per-cpu vmalloc approach, that separation doesn't work
> > >        as nicely anymore so I just merged the two patches into one.
> > >      * Preload the SLB entry and hash the page for the patching_addr
> > >        when using Hash on book3s64 to avoid taking an SLB and Hash fault
> > >        during patching. The previous implementation was a hack which
> > >        changed current->mm to allow the SLB and Hash fault handlers to
> > >        work with the temporary mm since both of those code-paths always
> > >        assume mm == current->mm.
> > >      * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we
> > >        have to manage the mm->context.active_cpus counter and mm cpumask
> > >        since they determine (via mm_is_thread_local()) if the TLB flush
> > >        in pte_clear() is local or not - it should always be local when
> > >        we're using the temporary mm. On book3s64's Radix MMU we can
> > >        just call local_flush_tlb_mm().
> > >      * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of
> > >        KUAP.
> > > ---
> > >  arch/powerpc/lib/code-patching.c | 119 +++++++++++++++++++++++++++++--
> > >  1 file changed, 112 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > > index e802e42c2789..af8e2a02a9dd 100644
> > > --- a/arch/powerpc/lib/code-patching.c
> > > +++ b/arch/powerpc/lib/code-patching.c
> > > @@ -11,6 +11,7 @@
> > >  #include <linux/cpuhotplug.h>
> > >  #include <linux/slab.h>
> > >  #include <linux/uaccess.h>
> > > +#include <linux/random.h>
> > >
> > >  #include <asm/tlbflush.h>
> > >  #include <asm/page.h>
> > > @@ -103,6 +104,7 @@ static inline void stop_using_temporary_mm(struct temp_mm *temp_mm)
> > >
> > >  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> > >  static DEFINE_PER_CPU(unsigned long, cpu_patching_addr);
> > > +static DEFINE_PER_CPU(struct mm_struct *, cpu_patching_mm);
> > >
> > >  static int text_area_cpu_up(unsigned int cpu)
> > >  {
> > > @@ -126,8 +128,48 @@ static int text_area_cpu_down(unsigned int cpu)
> > >         return 0;
> > >  }
> > >
> > > +static __always_inline void __poking_init_temp_mm(void)
> > > +{
> > > +       int cpu;
> > > +       spinlock_t *ptl; /* for protecting pte table */
> >
> > ptl is just used so we don't have to open code allocating a pte in
> > patching_mm isn't it?
>
> Yup - I think that comment was a copy-pasta... I'll improve it.
>
> >
> > > +       pte_t *ptep;
> > > +       struct mm_struct *patching_mm;
> > > +       unsigned long patching_addr;
> > > +
> > > +       for_each_possible_cpu(cpu) {
> > > +               patching_mm = copy_init_mm();
> > > +               WARN_ON(!patching_mm);
> >
> > Would it be okay to just let the mmu handle null pointer dereferences?
>
> In general I think yes; however, the NULL dereference wouldn't occur
> until later during actual patching so I thought an early WARN here is
> appropriate.
>
> >
> > > +               per_cpu(cpu_patching_mm, cpu) = patching_mm;
> > > +
> > > +               /*
> > > +                * Choose a randomized, page-aligned address from the range:
> > > +                * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE] The lower
> > > +                * address bound is PAGE_SIZE to avoid the zero-page.  The
> > > +                * upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to
> > > +                * stay under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
> > > +                */
> > > +               patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
> > > +                               % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
> > > +               per_cpu(cpu_patching_addr, cpu) = patching_addr;
> >
> > On x86 the randomization depends on CONFIG_RANDOMIZE_BASE. Should it
> > be controllable here too?
>
> IIRC CONFIG_RANDOMIZE_BASE is for KASLR which IMO doesn't really have
> much to do with this.
>
> >
> > > +
> > > +               /*
> > > +                * PTE allocation uses GFP_KERNEL which means we need to
> > > +                * pre-allocate the PTE here because we cannot do the
> > > +                * allocation during patching when IRQs are disabled.
> > > +                */
> > > +               ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> > > +               WARN_ON(!ptep);
> > > +               pte_unmap_unlock(ptep, ptl);
> > > +       }
> > > +}
> > > +
> > >  void __init poking_init(void)
> > >  {
> > > +       if (radix_enabled()) {
> > > +               __poking_init_temp_mm();
> >
> > Should this also be done with cpuhp_setup_state()?
>
> I think I prefer doing the setup ahead of time during boot.

It does lose the ability to free up memory after a cpu is hot
unplugged but I'm not sure if that's a big problem.

>
> >
> > > +               return;
> > > +       }
> > > +
> > >         WARN_ON(cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > >                 "powerpc/text_poke:online", text_area_cpu_up,
> > >                 text_area_cpu_down) < 0);
> > > @@ -197,30 +239,93 @@ static inline int unmap_patch_area(void)
> > >         return 0;
> > >  }
> > >
> > > +struct patch_mapping {
> > > +       spinlock_t *ptl; /* for protecting pte table */
> > > +       pte_t *ptep;
> > > +       struct temp_mm temp_mm;
> > > +};
> > > +
> > > +/*
> > > + * This can be called for kernel text or a module.
> > > + */
> > > +static int map_patch_mm(const void *addr, struct patch_mapping *patch_mapping)
> > > +{
> > > +       struct page *page;
> > > +       struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm);
> > > +       unsigned long patching_addr = __this_cpu_read(cpu_patching_addr);
> > > +
> > > +       if (is_vmalloc_or_module_addr(addr))
> > > +               page = vmalloc_to_page(addr);
> > > +       else
> > > +               page = virt_to_page(addr);
> > > +
> > > +       patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > > +                                            &patch_mapping->ptl);
> > > +       if (unlikely(!patch_mapping->ptep)) {
> > > +               pr_warn("map patch: failed to allocate pte for patching\n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       set_pte_at(patching_mm, patching_addr, patch_mapping->ptep,
> > > +                  pte_mkdirty(mk_pte(page, PAGE_KERNEL)));
> > > +
> > > +       init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> > > +       start_using_temporary_mm(&patch_mapping->temp_mm);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int unmap_patch_mm(struct patch_mapping *patch_mapping)
> > > +{
> > > +       struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm);
> > > +       unsigned long patching_addr = __this_cpu_read(cpu_patching_addr);
> > > +
> > > +       pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
> > > +
> > > +       local_flush_tlb_mm(patching_mm);
> > > +       stop_using_temporary_mm(&patch_mapping->temp_mm);
> > > +
> > > +       pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static int do_patch_instruction(u32 *addr, struct ppc_inst instr)
> > >  {
> > >         int err, rc = 0;
> > >         u32 *patch_addr = NULL;
> > >         unsigned long flags;
> > > +       struct patch_mapping patch_mapping;
> > >
> > >         /*
> > > -        * During early early boot patch_instruction is called
> > > -        * when text_poke_area is not ready, but we still need
> > > -        * to allow patching. We just do the plain old patching
> > > +        * During early early boot patch_instruction is called when the
> > > +        * patching_mm/text_poke_area is not ready, but we still need to allow
> > > +        * patching. We just do the plain old patching.
> > >          */
> > > -       if (!this_cpu_read(text_poke_area))
> > > -               return raw_patch_instruction(addr, instr);
> > > +       if (radix_enabled()) {
> > > +               if (!this_cpu_read(cpu_patching_mm))
> > > +                       return raw_patch_instruction(addr, instr);
> > > +       } else {
> > > +               if (!this_cpu_read(text_poke_area))
> > > +                       return raw_patch_instruction(addr, instr);
> > > +       }
> >
> > Would testing cpu_patching_addr handler both of these cases?
> >
> > Then I think it might be clearer to do something like this:
> > if (radix_enabled()) {
> > return patch_instruction_mm(addr, instr);
> > }
> >
> > patch_instruction_mm() would combine map_patch_mm(), then patching and
> > unmap_patch_mm() into one function.
> >
> > IMO, a bit of code duplication would be cleaner than checking multiple
> > times for radix_enabled() and having struct patch_mapping especially
> > for maintaining state.
>
> Hmm, I think it's a good idea - I'll give it a go for the next version.
> Thanks for the suggestion!
>
> >
> > >
> > >         local_irq_save(flags);
> > >
> > > -       err = map_patch_area(addr);
> > > +       if (radix_enabled())
> > > +               err = map_patch_mm(addr, &patch_mapping);
> > > +       else
> > > +               err = map_patch_area(addr);
> > >         if (err)
> > >                 goto out;
> > >
> > >         patch_addr = (u32 *)(__this_cpu_read(cpu_patching_addr) | offset_in_page(addr));
> > >         rc = __patch_instruction(addr, instr, patch_addr);
> > >
> > > -       err = unmap_patch_area();
> > > +       if (radix_enabled())
> > > +               err = unmap_patch_mm(&patch_mapping);
> > > +       else
> > > +               err = unmap_patch_area();
> > >
> > >  out:
> > >         local_irq_restore(flags);
> > > --
> > > 2.32.0
> > >
> > Thanks,
> > Jordan
>

^ permalink raw reply

* Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
From: Jordan Niethe @ 2021-09-16  2:04 UTC (permalink / raw)
  To: Christopher M. Riedl; +Cc: linuxppc-dev, linux-hardening
In-Reply-To: <CEAW7GNXW96H.18ANPMC01JA2C@wrwlf0000>

On Thu, Sep 16, 2021 at 10:40 AM Christopher M. Riedl
<cmr@bluescreens.de> wrote:
>
> On Tue Sep 14, 2021 at 11:24 PM CDT, Jordan Niethe wrote:
> > On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl
> > <cmr@bluescreens.de> wrote:
> > > ...
> > > +/*
> > > + * This can be called for kernel text or a module.
> > > + */
> > > +static int map_patch_mm(const void *addr, struct patch_mapping *patch_mapping)
> > > +{
> > > +       struct page *page;
> > > +       struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm);
> > > +       unsigned long patching_addr = __this_cpu_read(cpu_patching_addr);
> > > +
> > > +       if (is_vmalloc_or_module_addr(addr))
> > > +               page = vmalloc_to_page(addr);
> > > +       else
> > > +               page = virt_to_page(addr);
> > > +
> > > +       patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > > +                                            &patch_mapping->ptl);
> > > +       if (unlikely(!patch_mapping->ptep)) {
> > > +               pr_warn("map patch: failed to allocate pte for patching\n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       set_pte_at(patching_mm, patching_addr, patch_mapping->ptep,
> > > +                  pte_mkdirty(mk_pte(page, PAGE_KERNEL)));
> >
> > I think because switch_mm_irqs_off() will not necessarily have a
> > barrier so a ptesync would be needed.
> > A spurious fault here from __patch_instruction() would not be handled
> > correctly.
>
> Sorry I don't quite follow - can you explain this to me in a bit more
> detail?

radix__set_pte_at() skips calling ptesync as an optimization.
If there is no ordering between changing the pte and then accessing
the page with __patch_instruction(), a spurious fault could be raised.
I think such a fault would end up being causing bad_kernel_fault() ->
true and would not be fixed up.

I thought there might be a barrier in switch_mm_irqs_off() that would
provide this ordering but afaics that is not always the case.

So I think that we need to have a ptesync after set_pte_at().

^ permalink raw reply

* Re: [PATCH v4] lockdown,selinux: fix wrong subject in some SELinux lockdown checks
From: Paul Moore @ 2021-09-16  2:59 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: linux-efi, linux-pci, linux-cxl, Steffen Klassert, Herbert Xu,
	x86, James Morris, linux-acpi, Ingo Molnar, linux-serial,
	linux-pm, selinux, Steven Rostedt, Casey Schaufler, Dan Williams,
	netdev, Stephen Smalley, kexec, linux-kernel,
	linux-security-module, linux-fsdevel, bpf, linuxppc-dev,
	David S . Miller
In-Reply-To: <CAHC9VhRw-S+zZUFz5QFFLMBATjo+YbPAiR21jX6p7cT0T+MVLA@mail.gmail.com>

On Mon, Sep 13, 2021 at 5:05 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Mon, Sep 13, 2021 at 10:02 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> > lockdown") added an implementation of the locked_down LSM hook to
> > SELinux, with the aim to restrict which domains are allowed to perform
> > operations that would breach lockdown.
> >
> > However, in several places the security_locked_down() hook is called in
> > situations where the current task isn't doing any action that would
> > directly breach lockdown, leading to SELinux checks that are basically
> > bogus.
> >
> > To fix this, add an explicit struct cred pointer argument to
> > security_lockdown() and define NULL as a special value to pass instead
> > of current_cred() in such situations. LSMs that take the subject
> > credentials into account can then fall back to some default or ignore
> > such calls altogether. In the SELinux lockdown hook implementation, use
> > SECINITSID_KERNEL in case the cred argument is NULL.
> >
> > Most of the callers are updated to pass current_cred() as the cred
> > pointer, thus maintaining the same behavior. The following callers are
> > modified to pass NULL as the cred pointer instead:
> > 1. arch/powerpc/xmon/xmon.c
> >      Seems to be some interactive debugging facility. It appears that
> >      the lockdown hook is called from interrupt context here, so it
> >      should be more appropriate to request a global lockdown decision.
> > 2. fs/tracefs/inode.c:tracefs_create_file()
> >      Here the call is used to prevent creating new tracefs entries when
> >      the kernel is locked down. Assumes that locking down is one-way -
> >      i.e. if the hook returns non-zero once, it will never return zero
> >      again, thus no point in creating these files. Also, the hook is
> >      often called by a module's init function when it is loaded by
> >      userspace, where it doesn't make much sense to do a check against
> >      the current task's creds, since the task itself doesn't actually
> >      use the tracing functionality (i.e. doesn't breach lockdown), just
> >      indirectly makes some new tracepoints available to whoever is
> >      authorized to use them.
> > 3. net/xfrm/xfrm_user.c:copy_to_user_*()
> >      Here a cryptographic secret is redacted based on the value returned
> >      from the hook. There are two possible actions that may lead here:
> >      a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
> >         task context is relevant, since the dumped data is sent back to
> >         the current task.
> >      b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
> >         dumped SA is broadcasted to tasks subscribed to XFRM events -
> >         here the current task context is not relevant as it doesn't
> >         represent the tasks that could potentially see the secret.
> >      It doesn't seem worth it to try to keep using the current task's
> >      context in the a) case, since the eventual data leak can be
> >      circumvented anyway via b), plus there is no way for the task to
> >      indicate that it doesn't care about the actual key value, so the
> >      check could generate a lot of "false alert" denials with SELinux.
> >      Thus, let's pass NULL instead of current_cred() here faute de
> >      mieux.
> >
> > Improvements-suggested-by: Casey Schaufler <casey@schaufler-ca.com>
> > Improvements-suggested-by: Paul Moore <paul@paul-moore.com>
> > Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
> > Acked-by: Dan Williams <dan.j.williams@intel.com>         [cxl]
> > Acked-by: Steffen Klassert <steffen.klassert@secunet.com> [xfrm]
> > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >
> > v4:
> > - rebase on top of TODO
> > - fix rebase conflicts:
> >   * drivers/cxl/pci.c
> >     - trivial: the lockdown reason was corrected in mainline
> >   * kernel/bpf/helpers.c, kernel/trace/bpf_trace.c
> >     - trivial: LOCKDOWN_BPF_READ was renamed to LOCKDOWN_BPF_READ_KERNEL
> >       in mainline
> >   * kernel/power/hibernate.c
> >     - trivial: !secretmem_active() was added to the condition in
> >       hibernation_available()
> > - cover new security_locked_down() call in kernel/bpf/helpers.c
> >   (LOCKDOWN_BPF_WRITE_USER in BPF_FUNC_probe_write_user case)
> >
> > v3: https://lore.kernel.org/lkml/20210616085118.1141101-1-omosnace@redhat.com/
> > - add the cred argument to security_locked_down() and adapt all callers
> > - keep using current_cred() in BPF, as the hook calls have been shifted
> >   to program load time (commit ff40e51043af ("bpf, lockdown, audit: Fix
> >   buggy SELinux lockdown permission checks"))
> > - in SELinux, don't ignore hook calls where cred == NULL, but use
> >   SECINITSID_KERNEL as the subject instead
> > - update explanations in the commit message
> >
> > v2: https://lore.kernel.org/lkml/20210517092006.803332-1-omosnace@redhat.com/
> > - change to a single hook based on suggestions by Casey Schaufler
> >
> > v1: https://lore.kernel.org/lkml/20210507114048.138933-1-omosnace@redhat.com/
>
> The changes between v3 and v4 all seem sane to me, but I'm going to
> let this sit for a few days in hopes that we can collect a few more
> Reviewed-bys and ACKs.  If I don't see any objections I'll merge it
> mid-week(ish) into selinux/stable-5.15 and plan on sending it to Linus
> after it goes through a build/test cycle.

Time's up, I just merged this into selinux/stable-5.15 and I'll send
this to Linus once it passes testing.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH v4] lockdown,selinux: fix wrong subject in some SELinux lockdown checks
From: Ondrej Mosnacek @ 2021-09-16  6:57 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-efi, Linux PCI, linux-cxl, Steffen Klassert, Herbert Xu,
	X86 ML, James Morris, Linux ACPI, Ingo Molnar, linux-serial,
	Linux-pm mailing list, SElinux list, Steven Rostedt,
	Casey Schaufler, Dan Williams, network dev, Stephen Smalley,
	Kexec Mailing List, Linux kernel mailing list,
	Linux Security Module list, Linux FS Devel, bpf, linuxppc-dev,
	David S . Miller
In-Reply-To: <CAHC9VhQyejnmLn0NHQiWzikHs8ZdzAUdZ2WqNxgGM6xhJ4mvMQ@mail.gmail.com>

On Thu, Sep 16, 2021 at 4:59 AM Paul Moore <paul@paul-moore.com> wrote:
> On Mon, Sep 13, 2021 at 5:05 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 10:02 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > >
> > > Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> > > lockdown") added an implementation of the locked_down LSM hook to
> > > SELinux, with the aim to restrict which domains are allowed to perform
> > > operations that would breach lockdown.
> > >
> > > However, in several places the security_locked_down() hook is called in
> > > situations where the current task isn't doing any action that would
> > > directly breach lockdown, leading to SELinux checks that are basically
> > > bogus.
> > >
> > > To fix this, add an explicit struct cred pointer argument to
> > > security_lockdown() and define NULL as a special value to pass instead
> > > of current_cred() in such situations. LSMs that take the subject
> > > credentials into account can then fall back to some default or ignore
> > > such calls altogether. In the SELinux lockdown hook implementation, use
> > > SECINITSID_KERNEL in case the cred argument is NULL.
> > >
> > > Most of the callers are updated to pass current_cred() as the cred
> > > pointer, thus maintaining the same behavior. The following callers are
> > > modified to pass NULL as the cred pointer instead:
> > > 1. arch/powerpc/xmon/xmon.c
> > >      Seems to be some interactive debugging facility. It appears that
> > >      the lockdown hook is called from interrupt context here, so it
> > >      should be more appropriate to request a global lockdown decision.
> > > 2. fs/tracefs/inode.c:tracefs_create_file()
> > >      Here the call is used to prevent creating new tracefs entries when
> > >      the kernel is locked down. Assumes that locking down is one-way -
> > >      i.e. if the hook returns non-zero once, it will never return zero
> > >      again, thus no point in creating these files. Also, the hook is
> > >      often called by a module's init function when it is loaded by
> > >      userspace, where it doesn't make much sense to do a check against
> > >      the current task's creds, since the task itself doesn't actually
> > >      use the tracing functionality (i.e. doesn't breach lockdown), just
> > >      indirectly makes some new tracepoints available to whoever is
> > >      authorized to use them.
> > > 3. net/xfrm/xfrm_user.c:copy_to_user_*()
> > >      Here a cryptographic secret is redacted based on the value returned
> > >      from the hook. There are two possible actions that may lead here:
> > >      a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
> > >         task context is relevant, since the dumped data is sent back to
> > >         the current task.
> > >      b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
> > >         dumped SA is broadcasted to tasks subscribed to XFRM events -
> > >         here the current task context is not relevant as it doesn't
> > >         represent the tasks that could potentially see the secret.
> > >      It doesn't seem worth it to try to keep using the current task's
> > >      context in the a) case, since the eventual data leak can be
> > >      circumvented anyway via b), plus there is no way for the task to
> > >      indicate that it doesn't care about the actual key value, so the
> > >      check could generate a lot of "false alert" denials with SELinux.
> > >      Thus, let's pass NULL instead of current_cred() here faute de
> > >      mieux.
> > >
> > > Improvements-suggested-by: Casey Schaufler <casey@schaufler-ca.com>
> > > Improvements-suggested-by: Paul Moore <paul@paul-moore.com>
> > > Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
> > > Acked-by: Dan Williams <dan.j.williams@intel.com>         [cxl]
> > > Acked-by: Steffen Klassert <steffen.klassert@secunet.com> [xfrm]
> > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > ---
> > >
> > > v4:
> > > - rebase on top of TODO
> > > - fix rebase conflicts:
> > >   * drivers/cxl/pci.c
> > >     - trivial: the lockdown reason was corrected in mainline
> > >   * kernel/bpf/helpers.c, kernel/trace/bpf_trace.c
> > >     - trivial: LOCKDOWN_BPF_READ was renamed to LOCKDOWN_BPF_READ_KERNEL
> > >       in mainline
> > >   * kernel/power/hibernate.c
> > >     - trivial: !secretmem_active() was added to the condition in
> > >       hibernation_available()
> > > - cover new security_locked_down() call in kernel/bpf/helpers.c
> > >   (LOCKDOWN_BPF_WRITE_USER in BPF_FUNC_probe_write_user case)
> > >
> > > v3: https://lore.kernel.org/lkml/20210616085118.1141101-1-omosnace@redhat.com/
> > > - add the cred argument to security_locked_down() and adapt all callers
> > > - keep using current_cred() in BPF, as the hook calls have been shifted
> > >   to program load time (commit ff40e51043af ("bpf, lockdown, audit: Fix
> > >   buggy SELinux lockdown permission checks"))
> > > - in SELinux, don't ignore hook calls where cred == NULL, but use
> > >   SECINITSID_KERNEL as the subject instead
> > > - update explanations in the commit message
> > >
> > > v2: https://lore.kernel.org/lkml/20210517092006.803332-1-omosnace@redhat.com/
> > > - change to a single hook based on suggestions by Casey Schaufler
> > >
> > > v1: https://lore.kernel.org/lkml/20210507114048.138933-1-omosnace@redhat.com/
> >
> > The changes between v3 and v4 all seem sane to me, but I'm going to
> > let this sit for a few days in hopes that we can collect a few more
> > Reviewed-bys and ACKs.  If I don't see any objections I'll merge it
> > mid-week(ish) into selinux/stable-5.15 and plan on sending it to Linus
> > after it goes through a build/test cycle.
>
> Time's up, I just merged this into selinux/stable-5.15 and I'll send
> this to Linus once it passes testing.

Thanks!

-- 
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.


^ permalink raw reply

* Re: [PATCH] powerpc: warn on emulation of dcbz instruction
From: Benjamin Herrenschmidt @ 2021-09-16  7:15 UTC (permalink / raw)
  To: Christophe Leroy, Paul Mackerras, Michael Ellerman
  Cc: Finn Thain, linuxppc-dev, linux-kernel, Stan Johnson
In-Reply-To: <62b33ca839f3d1d7d4b64b6f56af0bbe4d2c9057.1631716292.git.christophe.leroy@csgroup.eu>

On Wed, 2021-09-15 at 16:31 +0200, Christophe Leroy wrote:
> dcbz instruction shouldn't be used on non-cached memory. Using
> it on non-cached memory can result in alignment exception and
> implies a heavy handling.
> 
> Instead of silentely emulating the instruction and resulting in high
> performance degradation, warn whenever an alignment exception is
> taken due to dcbz, so that the user is made aware that dcbz
> instruction has been used unexpectedly.
> 
> Reported-by: Stan Johnson <userm57@yahoo.com>
> Cc: Finn Thain <fthain@linux-m68k.org>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/align.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/kernel/align.c
> b/arch/powerpc/kernel/align.c
> index bbb4181621dd..adc3a4a9c6e4 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
>  		if (op.type != CACHEOP + DCBZ)
>  			return -EINVAL;
>  		PPC_WARN_ALIGNMENT(dcbz, regs);
> +		WARN_ON_ONCE(1);

This is heavy handed ... It will be treated as an oops by various
things uselessly spit out a kernel backtrace. Isn't PPC_WARN_ALIGNMENT
enough ?

Ben.



^ permalink raw reply

* Re: [PATCH] powerpc: warn on emulation of dcbz instruction
From: Benjamin Herrenschmidt @ 2021-09-16  7:16 UTC (permalink / raw)
  To: Christophe Leroy, Paul Mackerras, Michael Ellerman
  Cc: Finn Thain, linuxppc-dev, linux-kernel, Stan Johnson
In-Reply-To: <2c0fd775625c76c4dd09b3e923da4405a003f3bd.camel@kernel.crashing.org>

On Thu, 2021-09-16 at 17:15 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2021-09-15 at 16:31 +0200, Christophe Leroy wrote:
> > dcbz instruction shouldn't be used on non-cached memory. Using
> > it on non-cached memory can result in alignment exception and
> > implies a heavy handling.
> > 
> > Instead of silentely emulating the instruction and resulting in
> > high
> > performance degradation, warn whenever an alignment exception is
> > taken due to dcbz, so that the user is made aware that dcbz
> > instruction has been used unexpectedly.
> > 
> > Reported-by: Stan Johnson <userm57@yahoo.com>
> > Cc: Finn Thain <fthain@linux-m68k.org>
> > Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> > ---
> >  arch/powerpc/kernel/align.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/powerpc/kernel/align.c
> > b/arch/powerpc/kernel/align.c
> > index bbb4181621dd..adc3a4a9c6e4 100644
> > --- a/arch/powerpc/kernel/align.c
> > +++ b/arch/powerpc/kernel/align.c
> > @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
> >  		if (op.type != CACHEOP + DCBZ)
> >  			return -EINVAL;
> >  		PPC_WARN_ALIGNMENT(dcbz, regs);
> > +		WARN_ON_ONCE(1);
> 
> This is heavy handed ... It will be treated as an oops by various
> things uselessly spit out a kernel backtrace. Isn't
> PPC_WARN_ALIGNMENT
> enough ?

Ah I saw your other one about fbdev...  Ok what about you do that in a
if (!user_mode(regs)) ?

Indeed the kernel should not do that.

Cheers,
Ben.



^ permalink raw reply

* Re: [PATCH] powerpc: warn on emulation of dcbz instruction
From: Christophe Leroy @ 2021-09-16  7:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: Finn Thain, linuxppc-dev, linux-kernel, Stan Johnson
In-Reply-To: <eb1a39368401bf46e805ca64256604cc649f771e.camel@kernel.crashing.org>



Le 16/09/2021 à 09:16, Benjamin Herrenschmidt a écrit :
> On Thu, 2021-09-16 at 17:15 +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2021-09-15 at 16:31 +0200, Christophe Leroy wrote:
>>> dcbz instruction shouldn't be used on non-cached memory. Using
>>> it on non-cached memory can result in alignment exception and
>>> implies a heavy handling.
>>>
>>> Instead of silentely emulating the instruction and resulting in
>>> high
>>> performance degradation, warn whenever an alignment exception is
>>> taken due to dcbz, so that the user is made aware that dcbz
>>> instruction has been used unexpectedly.
>>>
>>> Reported-by: Stan Johnson <userm57@yahoo.com>
>>> Cc: Finn Thain <fthain@linux-m68k.org>
>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>> ---
>>>   arch/powerpc/kernel/align.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/powerpc/kernel/align.c
>>> b/arch/powerpc/kernel/align.c
>>> index bbb4181621dd..adc3a4a9c6e4 100644
>>> --- a/arch/powerpc/kernel/align.c
>>> +++ b/arch/powerpc/kernel/align.c
>>> @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
>>>   		if (op.type != CACHEOP + DCBZ)
>>>   			return -EINVAL;
>>>   		PPC_WARN_ALIGNMENT(dcbz, regs);
>>> +		WARN_ON_ONCE(1);
>>
>> This is heavy handed ... It will be treated as an oops by various
>> things uselessly spit out a kernel backtrace. Isn't
>> PPC_WARN_ALIGNMENT
>> enough ?


PPC_WARN_ALIGNMENT() only warns if explicitely activated, I want to 
catch uses on 'dcbz' on non-cached memory all the time as they are most 
often the result of using memset() instead of memset_io().

> 
> Ah I saw your other one about fbdev...  Ok what about you do that in a
> if (!user_mode(regs)) ?

Yes I can do WARN_ON_ONCE(!user_mode(regs)); instead.

> 
> Indeed the kernel should not do that.


Does userspace accesses non-cached memory directly ?

Christophe

^ permalink raw reply

* Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
From: Christoph Hellwig @ 2021-09-16  7:35 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Sathyanarayanan Kuppuswamy, linux-efi, Brijesh Singh, kvm,
	dri-devel, platform-driver-x86, Paul Mackerras, linux-s390,
	Andi Kleen, Joerg Roedel, x86, amd-gfx, Christoph Hellwig,
	linux-graphics-maintainer, Tom Lendacky, Tianyu Lan,
	Borislav Petkov, kexec, linux-kernel, iommu, linux-fsdevel,
	linuxppc-dev
In-Reply-To: <f8388f18-5e90-5d0f-d681-0b17f8307dd4@csgroup.eu>

On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote:
> Could you please provide more explicit explanation why inlining such an
> helper is considered as bad practice and messy ?

Because now we get architectures to all subly differ.  Look at the mess
for ioremap and the ioremap* variant.

The only good reason to allow for inlines if if they are used in a hot
path.  Which cc_platform_has is not, especially not on powerpc.

^ permalink raw reply

* Re: [PATCH] serial: 8250: SERIAL_8250_FSL should not default to y when compile-testing
From: Johan Hovold @ 2021-09-16  8:46 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-kernel, Greg Kroah-Hartman, linuxppc-dev, Li Yang,
	Scott Wood, linux-serial, Shawn Guo, Jiri Slaby, linux-arm-kernel
In-Reply-To: <c5f8aa5c081755f3c960b86fc61c2baaa33edcd9.1631710216.git.geert+renesas@glider.be>

On Wed, Sep 15, 2021 at 02:56:52PM +0200, Geert Uytterhoeven wrote:
> Commit b1442c55ce8977aa ("serial: 8250: extend compile-test coverage")
> added compile-test support to the Freescale 16550 driver.  However, as
> SERIAL_8250_FSL is an invisible symbol, merely enabling COMPILE_TEST now
> enables this driver.
> 
> Fix this by making SERIAL_8250_FSL visible.  Tighten the dependencies to
> prevent asking the user about this driver when configuring a kernel
> without appropriate Freescale SoC or ACPI support.

This tightening is arguable a separate change which risk introducing
regressions if you get it wrong and should go in a separate patch at
least.

> Fixes: b1442c55ce8977aa ("serial: 8250: extend compile-test coverage")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> Yes, it's ugly, but I see no better solution. Do you?
> 
>  drivers/tty/serial/8250/Kconfig | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
> index 808268edd2e82a45..a2978b31144e94f2 100644
> --- a/drivers/tty/serial/8250/Kconfig
> +++ b/drivers/tty/serial/8250/Kconfig
> @@ -361,9 +361,13 @@ config SERIAL_8250_BCM2835AUX
>  	  If unsure, say N.
>  
>  config SERIAL_8250_FSL
> -	bool
> +	bool "Freescale 16550-style UART support (8250 based driver)"
>  	depends on SERIAL_8250_CONSOLE
> -	default PPC || ARM || ARM64 || COMPILE_TEST
> +	depends on FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || (ARM64 && ACPI) || COMPILE_TEST
> +	default FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || (ARM64 && ACPI)

I'd suggest just doing

	bool "Freescale 16550-style UART support (8250 based driver)"
	depends on SERIAL_8250_CONSOLE
	default PPC || ARM || ARM64

Since neither of the symbols you add to that "depends on" line is an
actual build or runtime dependency.

Then you can refine the "default" line in a follow up (or argue why you
think there should be a "depends on FSL_SOC || ...").

> +	help
> +	  Selecting this option will add support for the 16550-style serial
> +	  port hardware found on Freescale SoCs.
>  
>  config SERIAL_8250_DW
>  	tristate "Support for Synopsys DesignWare 8250 quirks"

Johan

^ permalink raw reply

* Re: [PATCH] serial: 8250: SERIAL_8250_FSL should not default to y when compile-testing
From: Geert Uytterhoeven @ 2021-09-16  8:55 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, linuxppc-dev,
	Li Yang, Scott Wood, open list:SERIAL DRIVERS, Shawn Guo,
	Jiri Slaby, Linux ARM
In-Reply-To: <YUMESxr907YHM3ZT@hovoldconsulting.com>

Hi Johan,

On Thu, Sep 16, 2021 at 10:46 AM Johan Hovold <johan@kernel.org> wrote:
> On Wed, Sep 15, 2021 at 02:56:52PM +0200, Geert Uytterhoeven wrote:
> > Commit b1442c55ce8977aa ("serial: 8250: extend compile-test coverage")
> > added compile-test support to the Freescale 16550 driver.  However, as
> > SERIAL_8250_FSL is an invisible symbol, merely enabling COMPILE_TEST now
> > enables this driver.
> >
> > Fix this by making SERIAL_8250_FSL visible.  Tighten the dependencies to
> > prevent asking the user about this driver when configuring a kernel
> > without appropriate Freescale SoC or ACPI support.
>
> This tightening is arguable a separate change which risk introducing
> regressions if you get it wrong and should go in a separate patch at
> least.

Getting it wrong would indeed be a regression, but not tightening
that at the same time would mean I have to send a separate patch with
a Fixes tag referring to this fix, following this template:

    foo should depend on bar

    The foo hardware is only present on bar SoCs.  Hence add a
    dependency on bar, to prevent asking the user about this driver
    when configuring a kernel without bar support.

> > Fixes: b1442c55ce8977aa ("serial: 8250: extend compile-test coverage")
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > ---
> > Yes, it's ugly, but I see no better solution. Do you?
> >
> >  drivers/tty/serial/8250/Kconfig | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
> > index 808268edd2e82a45..a2978b31144e94f2 100644
> > --- a/drivers/tty/serial/8250/Kconfig
> > +++ b/drivers/tty/serial/8250/Kconfig
> > @@ -361,9 +361,13 @@ config SERIAL_8250_BCM2835AUX
> >         If unsure, say N.
> >
> >  config SERIAL_8250_FSL
> > -     bool
> > +     bool "Freescale 16550-style UART support (8250 based driver)"
> >       depends on SERIAL_8250_CONSOLE
> > -     default PPC || ARM || ARM64 || COMPILE_TEST
> > +     depends on FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || (ARM64 && ACPI) || COMPILE_TEST
> > +     default FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || (ARM64 && ACPI)
>
> I'd suggest just doing
>
>         bool "Freescale 16550-style UART support (8250 based driver)"
>         depends on SERIAL_8250_CONSOLE
>         default PPC || ARM || ARM64
>
> Since neither of the symbols you add to that "depends on" line is an
> actual build or runtime dependency.

They are.

> Then you can refine the "default" line in a follow up (or argue why you
> think there should be a "depends on FSL_SOC || ...").
>
> > +     help
> > +       Selecting this option will add support for the 16550-style serial
> > +       port hardware found on Freescale SoCs.
> >
> >  config SERIAL_8250_DW
> >       tristate "Support for Synopsys DesignWare 8250 quirks"

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH v3 00/30]Change wildcards on ABI files
From: Mauro Carvalho Chehab @ 2021-09-16  8:59 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Heikki Krogerus, Kees Cook, Jonathan Corbet,
	Mauro Carvalho Chehab, netdev, Richard Cochran, Anton Vorontsov,
	linux-kernel, Johan Hovold, Tony Luck, Colin Cross, linux-usb,
	linuxppc-dev, Peter Rosin

The ABI files are meant to be parsed via a script (scripts/get_abi.pl).

A new improvement on it will allow it to help to detect if an ABI description
is missing, or if the What: field won't match the actual location of the symbol.

In order for get_abi.pl to convert What: into regex, changes are needed on
existing ABI files, as the conversion should not be ambiguous.

One alternative would be to convert everything into regexes, but this
would generate a huge amount of patches/changes. So, instead, let's
touch only the ABI files that aren't following the de-facto wildcard 
standards already found on most of the ABI files, e. g.:

	/.../
	*
	<foo>
	(option1|option2)
	X
	Y
	Z
	[0-9] (and variants)

---

v3:
   - Added a new patch for sysfs-class-rapidio;
   - sysfs-class-typec had a typo, instead of a wildcard;
   - sysfs-bus-soundwire-* had some additional What to be fixed;
   - added some reviewed-by/acked-by tags.

v2:
   - Added several patches to address uppercase "N" meaning
     as a wildcard.

Mauro Carvalho Chehab (30):
  ABI: sysfs-bus-usb: better document variable argument
  ABI: sysfs-tty: better document module name parameter
  ABI: sysfs-kernel-slab: use a wildcard for the cache name
  ABI: security: fix location for evm and ima_policy
  ABI: sysfs-class-tpm: use wildcards for pcr-* nodes
  ABI: sysfs-bus-rapidio: use wildcards on What definitions
  ABI: sysfs-class-cxl: place "not in a guest" at description
  ABI: sysfs-class-devfreq-event: use the right wildcards on What
  ABI: sysfs-class-mic: use the right wildcards on What definitions
  ABI: pstore: Fix What field
  ABI:  fix a typo on a What field
  ABI: sysfs-ata: use a proper wildcard for ata_*
  ABI: sysfs-class-infiniband: use wildcards on What definitions
  ABI: sysfs-bus-pci: use wildcards on What definitions
  ABI: -master: use wildcards on What definitions
  ABI: sysfs-bus-soundwire-slave: use wildcards on What definitions
  ABI: sysfs-class-gnss: use wildcards on What definitions
  ABI: sysfs-class-mei: use wildcards on What definitions
  ABI: sysfs-class-mux: use wildcards on What definitions
  ABI: sysfs-class-pwm: use wildcards on What definitions
  ABI: sysfs-class-rc: use wildcards on What definitions
  ABI: sysfs-class-rc-nuvoton: use wildcards on What definitions
  ABI: sysfs-class-uwb_rc: use wildcards on What definitions
  ABI: sysfs-class-uwb_rc-wusbhc: use wildcards on What definitions
  ABI: sysfs-devices-platform-dock: use wildcards on What definitions
  ABI: sysfs-devices-system-cpu: use wildcards on What definitions
  ABI: sysfs-firmware-efi-esrt: use wildcards on What definitions
  ABI: sysfs-platform-sst-atom: use wildcards on What definitions
  ABI: sysfs-ptp: use wildcards on What definitions
  ABI: sysfs-class-rapidio: use wildcards on What definitions

 .../ABI/stable/sysfs-class-infiniband         | 64 ++++++-------
 Documentation/ABI/stable/sysfs-class-tpm      |  2 +-
 Documentation/ABI/testing/evm                 |  4 +-
 Documentation/ABI/testing/ima_policy          |  2 +-
 Documentation/ABI/testing/pstore              |  3 +-
 Documentation/ABI/testing/sysfs-ata           |  2 +-
 Documentation/ABI/testing/sysfs-bus-pci       |  2 +-
 Documentation/ABI/testing/sysfs-bus-rapidio   | 32 +++----
 .../ABI/testing/sysfs-bus-soundwire-master    | 20 ++--
 .../ABI/testing/sysfs-bus-soundwire-slave     | 60 ++++++------
 Documentation/ABI/testing/sysfs-bus-usb       | 16 ++--
 Documentation/ABI/testing/sysfs-class-cxl     | 15 ++-
 .../ABI/testing/sysfs-class-devfreq-event     | 12 +--
 Documentation/ABI/testing/sysfs-class-gnss    |  2 +-
 Documentation/ABI/testing/sysfs-class-mei     | 18 ++--
 Documentation/ABI/testing/sysfs-class-mic     | 24 ++---
 Documentation/ABI/testing/sysfs-class-mux     |  2 +-
 Documentation/ABI/testing/sysfs-class-pwm     | 20 ++--
 Documentation/ABI/testing/sysfs-class-rapidio |  4 +-
 Documentation/ABI/testing/sysfs-class-rc      | 14 +--
 .../ABI/testing/sysfs-class-rc-nuvoton        |  2 +-
 Documentation/ABI/testing/sysfs-class-typec   |  2 +-
 Documentation/ABI/testing/sysfs-class-uwb_rc  | 26 ++---
 .../ABI/testing/sysfs-class-uwb_rc-wusbhc     | 10 +-
 .../ABI/testing/sysfs-devices-platform-dock   | 10 +-
 .../ABI/testing/sysfs-devices-system-cpu      | 16 ++--
 .../ABI/testing/sysfs-firmware-efi-esrt       | 16 ++--
 Documentation/ABI/testing/sysfs-kernel-slab   | 94 +++++++++----------
 .../ABI/testing/sysfs-platform-sst-atom       |  2 +-
 Documentation/ABI/testing/sysfs-ptp           | 30 +++---
 Documentation/ABI/testing/sysfs-tty           | 32 +++----
 31 files changed, 282 insertions(+), 276 deletions(-)

-- 
2.31.1



^ permalink raw reply

* [PATCH v3 07/30] ABI: sysfs-class-cxl: place "not in a guest" at description
From: Mauro Carvalho Chehab @ 2021-09-16  8:59 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Andrew Donnellan, Jonathan Corbet, Mauro Carvalho Chehab,
	linux-kernel, Frederic Barrat, linuxppc-dev
In-Reply-To: <cover.1631782432.git.mchehab+huawei@kernel.org>

The What: field should have just the location of the ABI.
Anything else should be inside the description.

This fixes its parsing by get_abi.pl script.

Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/ABI/testing/sysfs-class-cxl | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
index 818f55970efb..3c77677e0ca7 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -166,10 +166,11 @@ Description:    read only
                 Decimal value of the Per Process MMIO space length.
 Users:		https://github.com/ibm-capi/libcxl
 
-What:           /sys/class/cxl/<afu>m/pp_mmio_off (not in a guest)
+What:           /sys/class/cxl/<afu>m/pp_mmio_off
 Date:           September 2014
 Contact:        linuxppc-dev@lists.ozlabs.org
 Description:    read only
+                (not in a guest)
                 Decimal value of the Per Process MMIO space offset.
 Users:		https://github.com/ibm-capi/libcxl
 
@@ -190,28 +191,31 @@ Description:    read only
                 Identifies the revision level of the PSL.
 Users:		https://github.com/ibm-capi/libcxl
 
-What:           /sys/class/cxl/<card>/base_image (not in a guest)
+What:           /sys/class/cxl/<card>/base_image
 Date:           September 2014
 Contact:        linuxppc-dev@lists.ozlabs.org
 Description:    read only
+                (not in a guest)
                 Identifies the revision level of the base image for devices
                 that support loadable PSLs. For FPGAs this field identifies
                 the image contained in the on-adapter flash which is loaded
                 during the initial program load.
 Users:		https://github.com/ibm-capi/libcxl
 
-What:           /sys/class/cxl/<card>/image_loaded (not in a guest)
+What:           /sys/class/cxl/<card>/image_loaded
 Date:           September 2014
 Contact:        linuxppc-dev@lists.ozlabs.org
 Description:    read only
+                (not in a guest)
                 Will return "user" or "factory" depending on the image loaded
                 onto the card.
 Users:		https://github.com/ibm-capi/libcxl
 
-What:           /sys/class/cxl/<card>/load_image_on_perst (not in a guest)
+What:           /sys/class/cxl/<card>/load_image_on_perst
 Date:           December 2014
 Contact:        linuxppc-dev@lists.ozlabs.org
 Description:    read/write
+                (not in a guest)
                 Valid entries are "none", "user", and "factory".
                 "none" means PERST will not cause image to be loaded to the
                 card.  A power cycle is required to load the image.
@@ -235,10 +239,11 @@ Description:    write only
                 contexts on the card AFUs.
 Users:		https://github.com/ibm-capi/libcxl
 
-What:		/sys/class/cxl/<card>/perst_reloads_same_image (not in a guest)
+What:		/sys/class/cxl/<card>/perst_reloads_same_image
 Date:		July 2015
 Contact:	linuxppc-dev@lists.ozlabs.org
 Description:	read/write
+                (not in a guest)
 		Trust that when an image is reloaded via PERST, it will not
 		have changed.
 
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH] serial: 8250: SERIAL_8250_FSL should not default to y when compile-testing
From: Johan Hovold @ 2021-09-16  9:08 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, linuxppc-dev,
	Li Yang, Scott Wood, open list:SERIAL DRIVERS, Shawn Guo,
	Jiri Slaby, Linux ARM
In-Reply-To: <CAMuHMdX7_AOuGEjvTHpQ-4KHMH+m800KTu7wads6UTfMZiu9BQ@mail.gmail.com>

On Thu, Sep 16, 2021 at 10:55:49AM +0200, Geert Uytterhoeven wrote:
> Hi Johan,
> 
> On Thu, Sep 16, 2021 at 10:46 AM Johan Hovold <johan@kernel.org> wrote:
> > On Wed, Sep 15, 2021 at 02:56:52PM +0200, Geert Uytterhoeven wrote:
> > > Commit b1442c55ce8977aa ("serial: 8250: extend compile-test coverage")
> > > added compile-test support to the Freescale 16550 driver.  However, as
> > > SERIAL_8250_FSL is an invisible symbol, merely enabling COMPILE_TEST now
> > > enables this driver.
> > >
> > > Fix this by making SERIAL_8250_FSL visible.  Tighten the dependencies to
> > > prevent asking the user about this driver when configuring a kernel
> > > without appropriate Freescale SoC or ACPI support.
> >
> > This tightening is arguable a separate change which risk introducing
> > regressions if you get it wrong and should go in a separate patch at
> > least.
> 
> Getting it wrong would indeed be a regression, but not tightening
> that at the same time would mean I have to send a separate patch with
> a Fixes tag referring to this fix, following this template:
> 
>     foo should depend on bar
> 
>     The foo hardware is only present on bar SoCs.  Hence add a
>     dependency on bar, to prevent asking the user about this driver
>     when configuring a kernel without bar support.

I know this is a pet peeve of yours, but asking users about one more
symbol when configuring their kernels is hardly something that requires
a Fixes tag.

Either way it's a pretty weak argument for not separating the change.

Johan

^ permalink raw reply

* Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
From: Roman Skakun @ 2021-09-16  9:40 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-doc, Peter Zijlstra, Viresh Kumar,
	Linux Kernel Mailing List, Paul Mackerras, Jan Beulich,
	Will Deacon, Boris Ostrovsky, Marek Szyprowski, Jonathan Corbet,
	Christoph Hellwig, xen-devel, Paul E. McKenney,
	Konrad Rzeszutek Wilk, Muchun Song, Thomas Gleixner,
	Juergen Gross, Thomas Bogendoerfer, Andrii Anisov, linuxppc-dev,
	Randy Dunlap, linux-mips, iommu, Roman Skakun, Andrew Morton,
	Maciej W. Rozycki, Robin Murphy, Mike Rapoport, Lu Baolu
In-Reply-To: <alpine.DEB.2.21.2109141838290.21985@sstabellini-ThinkPad-T480s>

Hi Stefano,

> Also, Option 1 listed in the webpage seems to be a lot better. Any
> reason you can't do that? Because that option both solves the problem
> and increases performance.

Yes, Option 1 is probably more efficient.
But I use another platform under Xen without DMA adjustment functionality.
I pinned this webpage only for example to describe the similar problem I had.

Cheers,
Roman

ср, 15 сент. 2021 г. в 04:51, Stefano Stabellini <sstabellini@kernel.org>:

>
> On Tue, 14 Sep 2021, Christoph Hellwig wrote:
> > On Tue, Sep 14, 2021 at 05:29:07PM +0200, Jan Beulich wrote:
> > > I'm not convinced the swiotlb use describe there falls under "intended
> > > use" - mapping a 1280x720 framebuffer in a single chunk? (As an aside,
> > > the bottom of this page is also confusing, as following "Then we can
> > > confirm the modified swiotlb size in the boot log:" there is a log
> > > fragment showing the same original size of 64Mb.
> >
> > It doesn't.  We also do not add hacks to the kernel for whacky out
> > of tree modules.
>
> Also, Option 1 listed in the webpage seems to be a lot better. Any
> reason you can't do that? Because that option both solves the problem
> and increases performance.



--
Best Regards, Roman.

^ permalink raw reply

* Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses
From: Michael Ellerman @ 2021-09-16 10:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, alexander.shishkin,
	Kajol Jain, linux-kernel, acme, ast, linux-perf-users, yao.jin,
	mingo, paulus, maddy, jolsa, namhyung, songliubraving,
	linuxppc-dev, kan.liang
In-Reply-To: <YUCMUZbchMjD54eY@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:
> On Tue, Sep 14, 2021 at 08:40:38PM +1000, Michael Ellerman wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>
>> > I'm thinking we ought to keep hops as steps along the NUMA fabric, with
>> > 0 hops being the local node. That only gets us:
>> >
>> >  L2, remote=0, hops=HOPS_0 -- our L2
>> >  L2, remote=1, hops=HOPS_0 -- L2 on the local node but not ours
>> >  L2, remote=1, hops!=HOPS_0 -- L2 on a remote node
>> 
>> Hmm. I'm not sure about tying it directly to NUMA hops. I worry we're
>> going to see more and more systems where there's a hierarchy within the
>> chip/package, in addition to the traditional NUMA hierarchy.
>> 
>> Although then I guess it becomes a question of what exactly is a NUMA
>> hop, maybe the answer is that on those future systems those
>> intra-chip/package hops should be represented as NUMA hops.
>> 
>> It's not like we have a hard definition of what a NUMA hop is?
>
> Not really, typically whatever the BIOS/DT/whatever tables tell us. I
> think in case of Power you're mostly making things up in software :-)

Firmware is software so yes :)

> But yeah, I think we have plenty wriggle room there.

OK.

cheers

^ permalink raw reply

* Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
From: Michael Ellerman @ 2021-09-16 11:51 UTC (permalink / raw)
  To: Christoph Hellwig, Christophe Leroy
  Cc: Sathyanarayanan Kuppuswamy, linux-efi, Brijesh Singh, kvm,
	dri-devel, platform-driver-x86, Paul Mackerras, linux-s390,
	Andi Kleen, Joerg Roedel, x86, amd-gfx, Christoph Hellwig,
	linux-graphics-maintainer, Tom Lendacky, Tianyu Lan,
	Borislav Petkov, kexec, linux-kernel, iommu, linux-fsdevel,
	linuxppc-dev
In-Reply-To: <YULztMRLJ55YLVU9@infradead.org>

Christoph Hellwig <hch@infradead.org> writes:
> On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote:
>> Could you please provide more explicit explanation why inlining such an
>> helper is considered as bad practice and messy ?
>
> Because now we get architectures to all subly differ.  Look at the mess
> for ioremap and the ioremap* variant.
>
> The only good reason to allow for inlines if if they are used in a hot
> path.  Which cc_platform_has is not, especially not on powerpc.

Yes I agree, it's not a hot path so it doesn't really matter, which is
why I Acked it.

I think it is possible to do both, share the declaration across arches
but also give arches flexibility to use an inline if they prefer, see
patch below.

I'm not suggesting we actually do that for this series now, but I think
it would solve the problem if we ever needed to in future.

cheers


diff --git a/arch/powerpc/platforms/pseries/cc_platform.c b/arch/powerpc/include/asm/cc_platform.h
similarity index 74%
rename from arch/powerpc/platforms/pseries/cc_platform.c
rename to arch/powerpc/include/asm/cc_platform.h
index e8021af83a19..6285c3c385a6 100644
--- a/arch/powerpc/platforms/pseries/cc_platform.c
+++ b/arch/powerpc/include/asm/cc_platform.h
@@ -7,13 +7,10 @@
  * Author: Tom Lendacky <thomas.lendacky@amd.com>
  */
 
-#include <linux/export.h>
 #include <linux/cc_platform.h>
-
-#include <asm/machdep.h>
 #include <asm/svm.h>
 
-bool cc_platform_has(enum cc_attr attr)
+static inline bool arch_cc_platform_has(enum cc_attr attr)
 {
 	switch (attr) {
 	case CC_ATTR_MEM_ENCRYPT:
@@ -23,4 +20,3 @@ bool cc_platform_has(enum cc_attr attr)
 		return false;
 	}
 }
-EXPORT_SYMBOL_GPL(cc_platform_has);
diff --git a/arch/x86/include/asm/cc_platform.h b/arch/x86/include/asm/cc_platform.h
new file mode 100644
index 000000000000..0a4220697043
--- /dev/null
+++ b/arch/x86/include/asm/cc_platform.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CC_PLATFORM_H_
+#define _ASM_X86_CC_PLATFORM_H_
+
+bool arch_cc_platform_has(enum cc_attr attr);
+
+#endif // _ASM_X86_CC_PLATFORM_H_
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 3c9bacd3c3f3..77e8c3465979 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,11 +11,11 @@
 #include <linux/cc_platform.h>
 #include <linux/mem_encrypt.h>
 
-bool cc_platform_has(enum cc_attr attr)
+bool arch_cc_platform_has(enum cc_attr attr)
 {
 	if (sme_me_mask)
 		return amd_cc_platform_has(attr);
 
 	return false;
 }
-EXPORT_SYMBOL_GPL(cc_platform_has);
+EXPORT_SYMBOL_GPL(arch_cc_platform_has);
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 253f3ea66cd8..f3306647c5d9 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -65,6 +65,8 @@ enum cc_attr {
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
 
+#include <asm/cc_platform.h>
+
 /**
  * cc_platform_has() - Checks if the specified cc_attr attribute is active
  * @attr: Confidential computing attribute to check
@@ -77,7 +79,10 @@ enum cc_attr {
  * * TRUE  - Specified Confidential Computing attribute is active
  * * FALSE - Specified Confidential Computing attribute is not active
  */
-bool cc_platform_has(enum cc_attr attr);
+static inline bool cc_platform_has(enum cc_attr attr)
+{
+	return arch_cc_platform_has(attr);
+}
 
 #else	/* !CONFIG_ARCH_HAS_CC_PLATFORM */
 



^ permalink raw reply related

* Re: [PATCH] nvmem: NVMEM_NINTENDO_OTP should depend on WII
From: Emmanuel Gil Peyrot @ 2021-09-16 12:12 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Emmanuel Gil Peyrot, Greg Kroah-Hartman, linux-kernel,
	Srinivas Kandagatla, Paul Mackerras, linuxppc-dev
In-Reply-To: <01318920709dddc4d85fe895e2083ca0eee234d8.1631611652.git.geert+renesas@glider.be>

[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]

Hi, thanks for this patch, once the Wii U platform will be added it will
need an additional test for WIIU, but for now this is:

Reviewed-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>

On Tue, Sep 14, 2021 at 11:29:49AM +0200, Geert Uytterhoeven wrote:
> The Nintendo Wii and Wii U OTP is only present on Nintendo Wii and Wii U
> consoles.  Hence add a dependency on WII, to prevent asking the user
> about this driver when configuring a kernel without Nintendo Wii and Wii
> U console support.
> 
> Fixes: 3683b761fe3a10ad ("nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  drivers/nvmem/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig
> index 39854d43758be3fb..da414617a54d4b99 100644
> --- a/drivers/nvmem/Kconfig
> +++ b/drivers/nvmem/Kconfig
> @@ -109,6 +109,7 @@ config MTK_EFUSE
>  
>  config NVMEM_NINTENDO_OTP
>  	tristate "Nintendo Wii and Wii U OTP Support"
> +	depends on WII || COMPILE_TEST
>  	help
>  	  This is a driver exposing the OTP of a Nintendo Wii or Wii U console.
>  
> -- 
> 2.25.1

-- 
Emmanuel Gil Peyrot

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH] lib/zlib_inflate/inffast: Check config in C to avoid unused function warning
From: Paul Menzel @ 2021-09-16 14:22 UTC (permalink / raw)
  To: Nathan Chancellor, Nick Desaulniers
  Cc: Paul Menzel, llvm, Zhen Lei, linux-kernel, Paul Mackerras,
	Andrew Morton, linuxppc-dev

Building Linux for ppc64le with Ubuntu clang version 12.0.0-3ubuntu1~21.04.1
shows the warning below.

    arch/powerpc/boot/inffast.c:20:1: warning: unused function 'get_unaligned16' [-Wunused-function]
    get_unaligned16(const unsigned short *p)
    ^
    1 warning generated.

Fix it, by moving the check from the preprocessor to C, so the compiler
sees the use.

Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
---
 lib/zlib_inflate/inffast.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/zlib_inflate/inffast.c b/lib/zlib_inflate/inffast.c
index f19c4fbe1be7..444ad3c3ccd3 100644
--- a/lib/zlib_inflate/inffast.c
+++ b/lib/zlib_inflate/inffast.c
@@ -254,11 +254,7 @@ void inflate_fast(z_streamp strm, unsigned start)
 			sfrom = (unsigned short *)(from);
 			loops = len >> 1;
 			do
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-			    *sout++ = *sfrom++;
-#else
-			    *sout++ = get_unaligned16(sfrom++);
-#endif
+			    *sout++ = CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ? *sfrom++ : get_unaligned16(sfrom++);
 			while (--loops);
 			out = (unsigned char *)sout;
 			from = (unsigned char *)sfrom;
-- 
2.33.0


^ permalink raw reply related

* RE: [PATCH] powerpc: warn on emulation of dcbz instruction
From: David Laight @ 2021-09-16 14:36 UTC (permalink / raw)
  To: 'Christophe Leroy', Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman
  Cc: Finn Thain, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, Stan Johnson
In-Reply-To: <43f736d4-8625-2848-786f-79b902d5c753@csgroup.eu>

From: Christophe Leroy
> Sent: 16 September 2021 08:24
> 
> Le 16/09/2021 à 09:16, Benjamin Herrenschmidt a écrit :
> > On Thu, 2021-09-16 at 17:15 +1000, Benjamin Herrenschmidt wrote:
> >> On Wed, 2021-09-15 at 16:31 +0200, Christophe Leroy wrote:
> >>> dcbz instruction shouldn't be used on non-cached memory. Using
> >>> it on non-cached memory can result in alignment exception and
> >>> implies a heavy handling.
> >>>
> >>> Instead of silentely emulating the instruction and resulting in
> >>> high
> >>> performance degradation, warn whenever an alignment exception is
> >>> taken due to dcbz, so that the user is made aware that dcbz
> >>> instruction has been used unexpectedly.
> >>>
> >>> Reported-by: Stan Johnson <userm57@yahoo.com>
> >>> Cc: Finn Thain <fthain@linux-m68k.org>
> >>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> >>> ---
> >>>   arch/powerpc/kernel/align.c | 1 +
> >>>   1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/arch/powerpc/kernel/align.c
> >>> b/arch/powerpc/kernel/align.c
> >>> index bbb4181621dd..adc3a4a9c6e4 100644
> >>> --- a/arch/powerpc/kernel/align.c
> >>> +++ b/arch/powerpc/kernel/align.c
> >>> @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
> >>>   		if (op.type != CACHEOP + DCBZ)
> >>>   			return -EINVAL;
> >>>   		PPC_WARN_ALIGNMENT(dcbz, regs);
> >>> +		WARN_ON_ONCE(1);
> >>
> >> This is heavy handed ... It will be treated as an oops by various
> >> things uselessly spit out a kernel backtrace. Isn't
> >> PPC_WARN_ALIGNMENT
> >> enough ?
> 
> 
> PPC_WARN_ALIGNMENT() only warns if explicitely activated, I want to
> catch uses on 'dcbz' on non-cached memory all the time as they are most
> often the result of using memset() instead of memset_io().
> 
> >
> > Ah I saw your other one about fbdev...  Ok what about you do that in a
> > if (!user_mode(regs)) ?
> 
> Yes I can do WARN_ON_ONCE(!user_mode(regs)); instead.
> 
> > Indeed the kernel should not do that.
> 
> Does userspace accesses non-cached memory directly ?

It probably can if a driver mmaps PCI space directly into user space.
That certainly works on x86-64.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [RFC PATCH 1/8] arm64: add CPU field to struct thread_info
From: Catalin Marinas @ 2021-09-16 14:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Peter Zijlstra, Paul Mackerras, linux-riscv, Will Deacon,
	linux-s390, Arnd Bergmann, Russell King, Christian Borntraeger,
	Ingo Molnar, Albert Ou, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Keith Packard, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Thomas Gleixner, linux-arm-kernel, linuxppc-dev, linux-kernel,
	Palmer Dabbelt, Linus Torvalds
In-Reply-To: <20210914121036.3975026-2-ardb@kernel.org>

On Tue, Sep 14, 2021 at 02:10:29PM +0200, Ard Biesheuvel wrote:
> The CPU field will be moved back into thread_info even when
> THREAD_INFO_IN_TASK is enabled, so add it back to arm64's definition of
> struct thread_info.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply

* [PATCH v2] powerpc: warn on emulation of dcbz instruction in kernel mode
From: Christophe Leroy @ 2021-09-16 14:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: Stan Johnson, Finn Thain, linuxppc-dev, linux-kernel

dcbz instruction shouldn't be used on non-cached memory. Using
it on non-cached memory can result in alignment exception and
implies a heavy handling.

Instead of silentely emulating the instruction and resulting in high
performance degradation, warn whenever an alignment exception is
taken in kernel mode due to dcbz, so that the user is made aware that
dcbz instruction has been used unexpectedly by the kernel.

Reported-by: Stan Johnson <userm57@yahoo.com>
Cc: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Warn only when emulating kernel mode
---
 arch/powerpc/kernel/align.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index bbb4181621dd..bf96b954a4eb 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (op.type != CACHEOP + DCBZ)
 			return -EINVAL;
 		PPC_WARN_ALIGNMENT(dcbz, regs);
+		WARN_ON_ONCE(!user_mode(regs));
 		r = emulate_dcbz(op.ea, regs);
 	} else {
 		if (type == LARX || type == STCX)
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH] lib/zlib_inflate/inffast: Check config in C to avoid unused function warning
From: Nathan Chancellor @ 2021-09-16 14:58 UTC (permalink / raw)
  To: Paul Menzel, Nick Desaulniers
  Cc: llvm, Zhen Lei, linux-kernel, Paul Mackerras, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20210916142210.26722-1-pmenzel@molgen.mpg.de>

Hi Paul,

On 9/16/2021 7:22 AM, Paul Menzel wrote:
> Building Linux for ppc64le with Ubuntu clang version 12.0.0-3ubuntu1~21.04.1
> shows the warning below.
> 
>      arch/powerpc/boot/inffast.c:20:1: warning: unused function 'get_unaligned16' [-Wunused-function]
>      get_unaligned16(const unsigned short *p)
>      ^
>      1 warning generated.
> 
> Fix it, by moving the check from the preprocessor to C, so the compiler
> sees the use.
> 
> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
> ---
>   lib/zlib_inflate/inffast.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/lib/zlib_inflate/inffast.c b/lib/zlib_inflate/inffast.c
> index f19c4fbe1be7..444ad3c3ccd3 100644
> --- a/lib/zlib_inflate/inffast.c
> +++ b/lib/zlib_inflate/inffast.c
> @@ -254,11 +254,7 @@ void inflate_fast(z_streamp strm, unsigned start)
>   			sfrom = (unsigned short *)(from);
>   			loops = len >> 1;
>   			do
> -#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> -			    *sout++ = *sfrom++;
> -#else
> -			    *sout++ = get_unaligned16(sfrom++);
> -#endif
> +			    *sout++ = CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ? *sfrom++ : get_unaligned16(sfrom++);
>   			while (--loops);
>   			out = (unsigned char *)sout;
>   			from = (unsigned char *)sfrom;
> 

Thanks for the patch. This should probably be

IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ? ...

which matches the rest of the kernel tree, as certain CONFIG_... values 
are not guaranteed to always be defined.

Cheers,
Nathan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox