* Re: [PATCH v9 5/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
From: Christophe Leroy @ 2020-12-01 16:56 UTC (permalink / raw)
To: Daniel Axtens, linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
christophe.leroy, aneesh.kumar, bsingharora
In-Reply-To: <20201201161632.1234753-6-dja@axtens.net>
Le 01/12/2020 à 17:16, Daniel Axtens a écrit :
> kasan is already implied by the directory name, we don't need to
> repeat it.
>
> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
My new address is <christophe.leroy@csgroup.eu>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
> arch/powerpc/mm/kasan/Makefile | 2 +-
> arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
> 2 files changed, 1 insertion(+), 1 deletion(-)
> rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
>
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index bb1a5408b86b..42fb628a44fd 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -2,6 +2,6 @@
>
> KASAN_SANITIZE := n
>
> -obj-$(CONFIG_PPC32) += kasan_init_32.o
> +obj-$(CONFIG_PPC32) += init_32.o
> obj-$(CONFIG_PPC_8xx) += 8xx.o
> obj-$(CONFIG_PPC_BOOK3S_32) += book3s_32.o
> diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
> similarity index 100%
> rename from arch/powerpc/mm/kasan/kasan_init_32.c
> rename to arch/powerpc/mm/kasan/init_32.c
>
^ permalink raw reply
* Re: [PATCH v9 6/6] powerpc: Book3S 64-bit outline-only KASAN support
From: Christophe Leroy @ 2020-12-01 17:26 UTC (permalink / raw)
To: Daniel Axtens, linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
christophe.leroy, aneesh.kumar, bsingharora
In-Reply-To: <20201201161632.1234753-7-dja@axtens.net>
Le 01/12/2020 à 17:16, Daniel Axtens a écrit :
> Implement a limited form of KASAN for Book3S 64-bit machines running under
> the Radix MMU, supporting only outline mode.
>
> - Enable the compiler instrumentation to check addresses and maintain the
> shadow region. (This is the guts of KASAN which we can easily reuse.)
>
> - Require kasan-vmalloc support to handle modules and anything else in
> vmalloc space.
>
> - KASAN needs to be able to validate all pointer accesses, but we can't
> instrument all kernel addresses - only linear map and vmalloc. On boot,
> set up a single page of read-only shadow that marks all iomap and
> vmemmap accesses as valid.
>
> - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
> generic code, arm64, s390 and x86 all do this for similar sorts of
> reasons: when unwinding a stack, we might touch memory that KASAN has
> marked as being out-of-bounds. In our case we often get this when
> checking for an exception frame because we're checking an arbitrary
> offset into the stack frame.
>
> See commit 20955746320e ("s390/kasan: avoid false positives during stack
> unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
> frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
> Prevent KASAN false positive warnings") and commit 6e22c8366416
> ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")
>
> - Document KASAN in both generic and powerpc docs.
>
> Background
> ----------
>
> KASAN support on Book3S is a bit tricky to get right:
>
> - It would be good to support inline instrumentation so as to be able to
> catch stack issues that cannot be caught with outline mode.
>
> - Inline instrumentation requires a fixed offset.
>
> - Book3S runs code with translations off ("real mode") during boot,
> including a lot of generic device-tree parsing code which is used to
> determine MMU features.
>
> [ppc64 mm note: The kernel installs a linear mapping at effective
> address c000...-c008.... This is a one-to-one mapping with physical
> memory from 0000... onward. Because of how memory accesses work on
> powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
> same memory both with translations on (accessing as an 'effective
> address'), and with translations off (accessing as a 'real
> address'). This works in both guests and the hypervisor. For more
> details, see s5.7 of Book III of version 3 of the ISA, in particular
> the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
> KASAN implementation currently only supports Radix.]
>
> - Some code - most notably a lot of KVM code - also runs with translations
> off after boot.
>
> - Therefore any offset has to point to memory that is valid with
> translations on or off.
>
> One approach is just to give up on inline instrumentation. This way
> boot-time checks can be delayed until after the MMU is set is up, and we
> can just not instrument any code that runs with translations off after
> booting. Take this approach for now and require outline instrumentation.
>
> Previous attempts allowed inline instrumentation. However, they came with
> some unfortunate restrictions: only physically contiguous memory could be
> used and it had to be specified at compile time. Maybe we can do better in
> the future.
>
> Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> # ppc64 hash version
> Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
> Documentation/dev-tools/kasan.rst | 9 +-
> Documentation/powerpc/kasan.txt | 48 +++++++++-
> arch/powerpc/Kconfig | 4 +-
> arch/powerpc/Kconfig.debug | 2 +-
> arch/powerpc/include/asm/book3s/64/hash.h | 4 +
> arch/powerpc/include/asm/book3s/64/pgtable.h | 7 ++
> arch/powerpc/include/asm/book3s/64/radix.h | 13 ++-
> arch/powerpc/include/asm/kasan.h | 34 ++++++-
> arch/powerpc/kernel/Makefile | 5 +
> arch/powerpc/kernel/process.c | 16 ++--
> arch/powerpc/kvm/Makefile | 5 +
> arch/powerpc/mm/book3s64/Makefile | 8 ++
> arch/powerpc/mm/kasan/Makefile | 1 +
> arch/powerpc/mm/kasan/init_book3s_64.c | 98 ++++++++++++++++++++
> arch/powerpc/mm/ptdump/ptdump.c | 20 +++-
> arch/powerpc/platforms/Kconfig.cputype | 1 +
> arch/powerpc/platforms/powernv/Makefile | 6 ++
> arch/powerpc/platforms/pseries/Makefile | 3 +
> 18 files changed, 265 insertions(+), 19 deletions(-)
> create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c
>
> diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
> index eaf868094a8e..28f08959bd2e 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -19,8 +19,9 @@ out-of-bounds accesses for global variables is only supported since Clang 11.
> Tag-based KASAN is only supported in Clang.
>
> Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and
> -riscv architectures. It is also supported on 32-bit powerpc kernels. Tag-based
> -KASAN is supported only on arm64.
> +riscv architectures. It is also supported on powerpc, for 32-bit kernels, and
> +for 64-bit kernels running under the Radix MMU. Tag-based KASAN is supported
> +only on arm64.
>
> Usage
> -----
> @@ -257,8 +258,8 @@ CONFIG_KASAN_VMALLOC
>
> With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
> cost of greater memory usage. Currently this supported on x86, s390
> -and 32-bit powerpc. It is optional, except on 32-bit powerpc kernels
> -with module support, where it is required.
> +and powerpc. It is optional, except on 64-bit powerpc kernels, and on
> +32-bit powerpc kernels with module support, where it is required.
>
> This works by hooking into vmalloc and vmap, and dynamically
> allocating real shadow memory to back the mappings.
> diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
> index 26bb0e8bb18c..f032b4eaf205 100644
> --- a/Documentation/powerpc/kasan.txt
> +++ b/Documentation/powerpc/kasan.txt
> @@ -1,4 +1,4 @@
> -KASAN is supported on powerpc on 32-bit only.
> +KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
>
> 32 bit support
> ==============
> @@ -10,3 +10,49 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
>
> Instrumentation of the vmalloc area is optional, unless built with modules,
> in which case it is required.
> +
> +64 bit support
> +==============
> +
> +Currently, only the radix MMU is supported. There have been versions for hash
> +and Book3E processors floating around on the mailing list, but nothing has been
> +merged.
> +
> +KASAN support on Book3S is a bit tricky to get right:
> +
> + - It would be good to support inline instrumentation so as to be able to catch
> + stack issues that cannot be caught with outline mode.
> +
> + - Inline instrumentation requires a fixed offset.
> +
> + - Book3S runs code with translations off ("real mode") during boot, including a
> + lot of generic device-tree parsing code which is used to determine MMU
> + features.
> +
> + - Some code - most notably a lot of KVM code - also runs with translations off
> + after boot.
> +
> + - Therefore any offset has to point to memory that is valid with
> + translations on or off.
> +
> +One approach is just to give up on inline instrumentation. This way boot-time
> +checks can be delayed until after the MMU is set is up, and we can just not
> +instrument any code that runs with translations off after booting. This is the
> +current approach.
> +
> +To avoid this limitiation, the KASAN shadow would have to be placed inside the
> +linear mapping, using the same high-bits trick we use for the rest of the linear
> +mapping. This is tricky:
> +
> + - We'd like to place it near the start of physical memory. In theory we can do
> + this at run-time based on how much physical memory we have, but this requires
> + being able to arbitrarily relocate the kernel, which is basically the tricky
> + part of KASLR. Not being game to implement both tricky things at once, this
> + is hopefully something we can revisit once we get KASLR for Book3S.
> +
> + - Alternatively, we can place the shadow at the _end_ of memory, but this
> + requires knowing how much contiguous physical memory a system has _at compile
> + time_. This is a big hammer, and has some unfortunate consequences: inablity
> + to handle discontiguous physical memory, total failure to boot on machines
> + with less memory than specified, and that machines with more memory than
> + specified can't use it. This was deemed unacceptable.
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e9f13fe08492..e6bd02af6ebd 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -180,7 +180,9 @@ config PPC
> select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU
> select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> - select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> + select HAVE_ARCH_KASAN if PPC_BOOK3S_64 && PPC_RADIX_MMU
PPC_RADIX_MMU already depends on PPC_BOOK3S_64 so 'if PPC_RADIX_MMU' would be enough
> + select HAVE_ARCH_NO_KASAN_INLINE if PPC_BOOK3S_64 && PPC_RADIX_MMU
This list must respect Alphabetical order.
> + select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
> select HAVE_ARCH_KGDB
> select HAVE_ARCH_MMAP_RND_BITS
> select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index b88900f4832f..60c1bba72a6f 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -396,5 +396,5 @@ config PPC_FAST_ENDIAN_SWITCH
>
> config KASAN_SHADOW_OFFSET
> hex
> - depends on KASAN
> + depends on KASAN && PPC32
> default 0xe0000000
Instead of the above, why not doing:
default 0xe0000000 if PPC32
default 0xa80e000000000000 is PPC_BOOK3S_64
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index 73ad038ed10b..105b90594a8a 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -18,6 +18,10 @@
> #include <asm/book3s/64/hash-4k.h>
> #endif
>
> +#define H_PTRS_PER_PTE (1 << H_PTE_INDEX_SIZE)
> +#define H_PTRS_PER_PMD (1 << H_PMD_INDEX_SIZE)
> +#define H_PTRS_PER_PUD (1 << H_PUD_INDEX_SIZE)
> +
> /* Bits to set in a PMD/PUD/PGD entry valid bit*/
> #define HASH_PMD_VAL_BITS (0x8000000000000000UL)
> #define HASH_PUD_VAL_BITS (0x8000000000000000UL)
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index a39886681629..767e239d75e3 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -230,6 +230,13 @@ extern unsigned long __pmd_frag_size_shift;
> #define PTRS_PER_PUD (1 << PUD_INDEX_SIZE)
> #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)
>
> +#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \
> + H_PTRS_PER_PTE : R_PTRS_PER_PTE)
Nowadays we allow 100 chars per line. Could this fit on a single line ?
> +#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \
> + H_PTRS_PER_PMD : R_PTRS_PER_PMD)
> +#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \
> + H_PTRS_PER_PUD : R_PTRS_PER_PUD)
> +
> /* PMD_SHIFT determines what a second-level page table entry can map */
> #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
> #define PMD_SIZE (1UL << PMD_SHIFT)
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index c7813dc628fc..b3492b80f858 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -35,6 +35,11 @@
> #define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
> #define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
> #define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> +
> +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE)
> +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE)
> +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE)
> +
> /*
> * Size of EA range mapped by our pagetables.
> */
> @@ -68,11 +73,11 @@
> *
> *
> * 3rd quadrant expanded:
> - * +------------------------------+
> + * +------------------------------+ Highest address (0xc010000000000000)
> + * +------------------------------+ KASAN shadow end (0xc00fc00000000000)
> * | |
> * | |
> - * | |
> - * +------------------------------+ Kernel vmemmap end (0xc010000000000000)
> + * +------------------------------+ Kernel vmemmap end/shadow start (0xc00e000000000000)
> * | |
> * | 512TB |
> * | |
> @@ -126,6 +131,8 @@
> #define RADIX_VMEMMAP_SIZE RADIX_KERN_MAP_SIZE
> #define RADIX_VMEMMAP_END (RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
>
> +/* For the sizes of the shadow area, see kasan.h */
> +
> #ifndef __ASSEMBLY__
> #define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
> #define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
> diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
> index 7355ed05e65e..c72fd9281b44 100644
> --- a/arch/powerpc/include/asm/kasan.h
> +++ b/arch/powerpc/include/asm/kasan.h
> @@ -28,9 +28,41 @@
> #define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \
> (KASAN_KERN_START >> KASAN_SHADOW_SCALE_SHIFT))
>
> +#ifdef CONFIG_KASAN_SHADOW_OFFSET
> #define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
> +#endif
>
> +#ifdef CONFIG_PPC32
> #define KASAN_SHADOW_END (-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +/*
> + * We define the offset such that the shadow of the linear map lives
> + * at the end of vmemmap space, that is, we choose offset such that
> + * shadow(c000_0000_0000_0000) = c00e_0000_0000_0000. This gives:
> + * c00e000000000000 - c000000000000000 >> 3 = a80e000000000000
> + */
> +#define KASAN_SHADOW_OFFSET ASM_CONST(0xa80e000000000000)
Why can't you use CONFIG_KASAN_SHADOW_OFFSET ?
> +
> +/*
> + * The shadow ends before the highest accessible address
> + * because we don't need a shadow for the shadow. Instead:
> + * c00e000000000000 << 3 + a80e000000000000000 = c00fc00000000000
> + */
> +#define KASAN_SHADOW_END 0xc00fc00000000000UL
I think we should be able to have a common formula for PPC32 and PPC64.
> +
> +DECLARE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static inline bool kasan_arch_is_ready_ppc64(void)
I'd make it __always_inline
> +{
> + if (static_branch_likely(&powerpc_kasan_enabled_key))
> + return true;
> + return false;
> +}
> +
> +#define kasan_arch_is_ready kasan_arch_is_ready_ppc64
Usually we keep the generic name, you don't need to have an arch specific name.
> +#endif
>
> #ifdef CONFIG_KASAN
> void kasan_early_init(void);
> @@ -47,5 +79,5 @@ void kasan_update_early_region(unsigned long k_start, unsigned long k_end, pte_t
> int kasan_init_shadow_page_tables(unsigned long k_start, unsigned long k_end);
> int kasan_init_region(void *start, size_t size);
>
> -#endif /* __ASSEMBLY */
> +#endif /* !__ASSEMBLY__ */
This patch is already big. Is that worth it ?
> #endif
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index fe2ef598e2ea..cd58202459dd 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -32,6 +32,11 @@ KASAN_SANITIZE_early_32.o := n
> KASAN_SANITIZE_cputable.o := n
> KASAN_SANITIZE_prom_init.o := n
> KASAN_SANITIZE_btext.o := n
> +KASAN_SANITIZE_paca.o := n
> +KASAN_SANITIZE_setup_64.o := n
The entire setup_64 ?
Can you split things out into an early_64.o like was done for ppc32 ?
> +KASAN_SANITIZE_mce.o := n
> +KASAN_SANITIZE_traps.o := n
Why ? ppc32 doesn't need that.
> +KASAN_SANITIZE_mce_power.o := n
>
> ifdef CONFIG_KASAN
> CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index d421a2c7f822..f02b2766015c 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2151,8 +2151,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
> break;
>
> stack = (unsigned long *) sp;
> - newsp = stack[0];
> - ip = stack[STACK_FRAME_LR_SAVE];
> + newsp = READ_ONCE_NOCHECK(stack[0]);
> + ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
> if (!firstframe || ip != lr) {
> printk("%s["REG"] ["REG"] %pS",
> loglvl, sp, ip, (void *)ip);
> @@ -2170,14 +2170,16 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
> * See if this is an exception frame.
> * We look for the "regshere" marker in the current frame.
> */
> - if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
> - && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
> + if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE) &&
> + (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
> + STACK_FRAME_REGS_MARKER)) {
> struct pt_regs *regs = (struct pt_regs *)
> (sp + STACK_FRAME_OVERHEAD);
> - lr = regs->link;
> + lr = READ_ONCE_NOCHECK(regs->link);
> printk("%s--- interrupt: %lx at %pS\n LR = %pS\n",
> - loglvl, regs->trap,
> - (void *)regs->nip, (void *)lr);
> + loglvl, READ_ONCE_NOCHECK(regs->trap),
> + (void *)READ_ONCE_NOCHECK(regs->nip),
> + (void *)READ_ONCE_NOCHECK(lr));
> firstframe = 1;
> }
>
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 2bfeaa13befb..7f1592dacbeb 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -136,3 +136,8 @@ obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
> obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
>
> obj-y += $(kvm-book3s_64-builtin-objs-y)
> +
> +# KVM does a lot in real-mode, and 64-bit Book3S KASAN doesn't support that
> +ifdef CONFIG_PPC_BOOK3S_64
> +KASAN_SANITIZE := n
> +endif
> diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
> index fd393b8be14f..41a86d2c7da4 100644
> --- a/arch/powerpc/mm/book3s64/Makefile
> +++ b/arch/powerpc/mm/book3s64/Makefile
> @@ -21,3 +21,11 @@ obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o
>
> # Instrumenting the SLB fault path can lead to duplicate SLB entries
> KCOV_INSTRUMENT_slb.o := n
> +
> +# Parts of these can run in real mode and therefore are
> +# not safe with the current outline KASAN implementation
> +KASAN_SANITIZE_mmu_context.o := n
> +KASAN_SANITIZE_pgtable.o := n
> +KASAN_SANITIZE_radix_pgtable.o := n
> +KASAN_SANITIZE_radix_tlb.o := n
> +KASAN_SANITIZE_slb.o := n
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index 42fb628a44fd..07eef87abd6c 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -5,3 +5,4 @@ KASAN_SANITIZE := n
> obj-$(CONFIG_PPC32) += init_32.o
> obj-$(CONFIG_PPC_8xx) += 8xx.o
> obj-$(CONFIG_PPC_BOOK3S_32) += book3s_32.o
> +obj-$(CONFIG_PPC_BOOK3S_64) += init_book3s_64.o
> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> new file mode 100644
> index 000000000000..b26ada73215d
> --- /dev/null
> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> @@ -0,0 +1,98 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KASAN for 64-bit Book3S powerpc
> + *
> + * Copyright (C) 2019-2020 IBM Corporation
> + * Author: Daniel Axtens <dja@axtens.net>
> + */
> +
> +#define DISABLE_BRANCH_PROFILING
> +
> +#include <linux/kasan.h>
> +#include <linux/printk.h>
> +#include <linux/sched/task.h>
> +#include <linux/memblock.h>
> +#include <asm/pgalloc.h>
> +
> +DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static void __init kasan_init_phys_region(void *start, void *end)
> +{
> + unsigned long k_start, k_end, k_cur;
> + void *va;
> +
> + if (start >= end)
> + return;
> +
> + k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
> + k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
> +
> + va = memblock_alloc(k_end - k_start, PAGE_SIZE);
> + for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> + map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
> + va += PAGE_SIZE;
> + }
What about:
for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
> +}
> +
> +void __init kasan_init(void)
> +{
> + /*
> + * We want to do the following things:
> + * 1) Map real memory into the shadow for all physical memblocks
> + * This takes us from c000... to c008...
> + * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
> + * will manage this for us.
> + * This takes us from c008... to c00a...
> + * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
> + * This takes us up to where we start at c00e...
> + */
> +
> + void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
> + void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
> + phys_addr_t start, end;
> + u64 i;
> + pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
> +
> + if (!early_radix_enabled())
> + panic("KASAN requires radix!");
> +
> + for_each_mem_range(i, &start, &end) {
> + kasan_init_phys_region((void *)start, (void *)end);
> + }
No need of { } for single line loops. Check the kernel codyign stype
> +
> + for (i = 0; i < PTRS_PER_PTE; i++)
> + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> + &kasan_early_shadow_pte[i], zero_pte, 0);
> +
> + for (i = 0; i < PTRS_PER_PMD; i++)
> + pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
> + kasan_early_shadow_pte);
> +
> + for (i = 0; i < PTRS_PER_PUD; i++)
> + pud_populate(&init_mm, &kasan_early_shadow_pud[i],
> + kasan_early_shadow_pmd);
> +
> + /* map the early shadow over the iomap and vmemmap space */
> + kasan_populate_early_shadow(k_start, k_end);
> +
> + /* mark early shadow region as RO and wipe it */
> + zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
> + for (i = 0; i < PTRS_PER_PTE; i++)
> + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> + &kasan_early_shadow_pte[i], zero_pte, 0);
> +
> + /*
> + * clear_page relies on some cache info that hasn't been set up yet.
> + * It ends up looping ~forever and blows up other data.
> + * Use memset instead.
> + */
> + memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> +
> + static_branch_inc(&powerpc_kasan_enabled_key);
> +
> + /* Enable error messages */
> + init_task.kasan_depth = 0;
> + pr_info("KASAN init done (64-bit Book3S)\n");
> +}
> +
> +void __init kasan_late_init(void) { }
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index aca354fb670b..63672aa656e8 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -20,6 +20,7 @@
> #include <linux/seq_file.h>
> #include <asm/fixmap.h>
> #include <linux/const.h>
> +#include <linux/kasan.h>
> #include <asm/page.h>
> #include <asm/hugetlb.h>
>
> @@ -317,6 +318,23 @@ static void walk_pud(struct pg_state *st, p4d_t *p4d, unsigned long start)
> unsigned long addr;
> unsigned int i;
>
> +#if defined(CONFIG_KASAN) && defined(CONFIG_PPC_BOOK3S_64)
> + /*
> + * On radix + KASAN, we want to check for the KASAN "early" shadow
> + * which covers huge quantities of memory with the same set of
> + * read-only PTEs. If it is, we want to note the first page (to see
> + * the status change), and then note the last page. This gives us good
> + * results without spending ages noting the exact same PTEs over 100s of
> + * terabytes of memory.
> + */
> + if (p4d_page(*p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud))) {
> + walk_pmd(st, pud, start);
> + addr = start + (PTRS_PER_PUD - 1) * PUD_SIZE;
> + walk_pmd(st, pud, addr);
> + return;
> + }
> +#endif
Why do you need that ? When PTEs are all pointing to the same page, it shoud already appear in a
single line into []
> +
> for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
> addr = start + i * PUD_SIZE;
> if (!pud_none(*pud) && !pud_is_leaf(*pud))
> @@ -387,11 +405,11 @@ static void populate_markers(void)
> #endif
> address_markers[i++].start_address = FIXADDR_START;
> address_markers[i++].start_address = FIXADDR_TOP;
> +#endif /* CONFIG_PPC64 */
> #ifdef CONFIG_KASAN
> address_markers[i++].start_address = KASAN_SHADOW_START;
> address_markers[i++].start_address = KASAN_SHADOW_END;
> #endif
> -#endif /* CONFIG_PPC64 */
> }
>
> static int ptdump_show(struct seq_file *m, void *v)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index c194c4ae8bc7..b6eb8ec1e5ad 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -92,6 +92,7 @@ config PPC_BOOK3S_64
> select ARCH_SUPPORTS_NUMA_BALANCING
> select IRQ_WORK
> select PPC_MM_SLICES
> + select KASAN_VMALLOC if KASAN
>
> config PPC_BOOK3E_64
> bool "Embedded processors"
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index 2eb6ae150d1f..f277e4793696 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -1,4 +1,10 @@
> # SPDX-License-Identifier: GPL-2.0
> +
> +# nothing that deals with real mode is safe to KASAN
> +# in particular, idle code runs a bunch of things in real mode
> +KASAN_SANITIZE_idle.o := n
> +KASAN_SANITIZE_pci-ioda.o := n
> +
> obj-y += setup.o opal-call.o opal-wrappers.o opal.o opal-async.o
> obj-y += idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
> obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index c8a2b0b05ac0..202199ef9e5c 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -30,3 +30,6 @@ obj-$(CONFIG_PPC_SVM) += svm.o
> obj-$(CONFIG_FA_DUMP) += rtas-fadump.o
>
> obj-$(CONFIG_SUSPEND) += suspend.o
> +
> +# nothing that operates in real mode is safe for KASAN
> +KASAN_SANITIZE_ras.o := n
>
Christophe
^ permalink raw reply
* Re: powerpc32: BUG: KASAN: use-after-free in test_bpf_init+0x6f8/0xde8 [test_bpf]
From: Christophe Leroy @ 2020-12-01 17:43 UTC (permalink / raw)
To: netdev@vger.kernel.org, bpf, Naveen N. Rao, Sandipan Das,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <ccdc2bc3-ce71-1faa-c83f-cd3b0eaf963d@csgroup.eu>
Le 01/12/2020 à 15:03, Christophe Leroy a écrit :
> I've got the following KASAN error while running test_bpf module on a powerpc 8xx (32 bits).
>
> That's reproductible, happens each time at the same test.
>
> Can someone help me to investigate and fix that ?
>
> [ 209.381037] test_bpf: #298 LD_IND byte frag
Without KASAN, this test and a few others fail:
[12493.832074] test_bpf: #298 LD_IND byte frag jited:1 ret 201 != 66 FAIL (1 times)
[12493.844921] test_bpf: #299 LD_IND halfword frag jited:1 ret 51509 != 17220 FAIL (1 times)
[12493.869990] test_bpf: #301 LD_IND halfword mixed head/frag jited:1 ret 51509 != 1305 FAIL (1 times)
[12493.897298] test_bpf: #303 LD_ABS byte frag jited:1 ret 201 != 66 FAIL (1 times)
[12493.911351] test_bpf: #304 LD_ABS halfword frag jited:1 ret 51509 != 17220 FAIL (1 times)
[12493.933244] test_bpf: #306 LD_ABS halfword mixed head/frag jited:1 ret 51509 != 1305 FAIL (1 times)
[12494.471983] test_bpf: Summary: 371 PASSED, 7 FAILED, [119/366 JIT'ed]
Christophe
> [ 209.383041] Pass 1: shrink = 0, seen = 0x30000
> [ 209.383284] Pass 2: shrink = 0, seen = 0x30000
> [ 209.383562] flen=3 proglen=104 pass=3 image=8166dc91 from=modprobe pid=380
> [ 209.383805] JIT code: 00000000: 7c 08 02 a6 90 01 00 04 91 c1 ff b8 91 e1 ff bc
> [ 209.384044] JIT code: 00000010: 94 21 ff 70 80 e3 00 58 81 e3 00 54 7d e7 78 50
> [ 209.384279] JIT code: 00000020: 81 c3 00 a0 38 a0 00 00 38 80 00 00 38 a0 00 40
> [ 209.384516] JIT code: 00000030: 3c e0 c0 02 60 e7 62 14 7c e8 03 a6 38 c5 00 00
> [ 209.384753] JIT code: 00000040: 4e 80 00 21 41 80 00 0c 60 00 00 00 7c 83 23 78
> [ 209.384990] JIT code: 00000050: 38 21 00 90 80 01 00 04 7c 08 03 a6 81 c1 ff b8
> [ 209.385207] JIT code: 00000060: 81 e1 ff bc 4e 80 00 20
> [ 209.385442] jited:1
> [ 209.385762] ==================================================================
> [ 209.386272] BUG: KASAN: use-after-free in test_bpf_init+0x6f8/0xde8 [test_bpf]
> [ 209.386503] Read of size 4 at addr c2de70c0 by task modprobe/380
> [ 209.386622]
> [ 209.386881] CPU: 0 PID: 380 Comm: modprobe Not tainted 5.10.0-rc5-s3k-dev-01341-g72d20eec3f8b #4178
> [ 209.387032] Call Trace:
> [ 209.387404] [cad6b878] [c020e0d4] print_address_description.constprop.0+0x70/0x4e0 (unreliable)
> [ 209.387920] [cad6b8f8] [c020dc98] kasan_report+0x118/0x1c0
> [ 209.388503] [cad6b938] [cb0e0c98] test_bpf_init+0x6f8/0xde8 [test_bpf]
> [ 209.388918] [cad6ba58] [c0004084] do_one_initcall+0xa4/0x33c
> [ 209.389377] [cad6bb28] [c00f9144] do_init_module+0x158/0x7f4
> [ 209.389820] [cad6bbc8] [c00fccb0] load_module+0x3394/0x38d8
> [ 209.390273] [cad6be38] [c00fd4e0] sys_finit_module+0x118/0x17c
> [ 209.390700] [cad6bf38] [c00170d0] ret_from_syscall+0x0/0x34
> [ 209.391020] --- interrupt: c01 at 0xfd5e7c0
> [ 209.395301]
> [ 209.395472] Allocated by task 276:
> [ 209.395767] __kasan_kmalloc.constprop.0+0xe8/0x134
> [ 209.396029] kmem_cache_alloc+0x150/0x290
> [ 209.396281] __alloc_skb+0x58/0x28c
> [ 209.396563] alloc_skb_with_frags+0x74/0x314
> [ 209.396872] sock_alloc_send_pskb+0x404/0x424
> [ 209.397205] unix_dgram_sendmsg+0x200/0xbf0
> [ 209.397473] __sys_sendto+0x17c/0x21c
> [ 209.397754] ret_from_syscall+0x0/0x34
> [ 209.397877]
> [ 209.398039] Freed by task 274:
> [ 209.398308] kasan_set_track+0x34/0x6c
> [ 209.398608] kasan_set_free_info+0x28/0x48
> [ 209.398878] __kasan_slab_free+0x10c/0x19c
> [ 209.399141] kmem_cache_free+0x68/0x390
> [ 209.399433] skb_free_datagram+0x20/0x8c
> [ 209.399759] unix_dgram_recvmsg+0x474/0x710
> [ 209.400084] sock_read_iter+0x17c/0x228
> [ 209.400348] vfs_read+0x3c8/0x4f4
> [ 209.400603] ksys_read+0x17c/0x1cc
> [ 209.400878] ret_from_syscall+0x0/0x34
> [ 209.401001]
> [ 209.401222] The buggy address belongs to the object at c2de70c0
> [ 209.401222] which belongs to the cache skbuff_head_cache of size 176
> [ 209.401462] The buggy address is located 0 bytes inside of
> [ 209.401462] 176-byte region [c2de70c0, c2de7170)
> [ 209.401604] The buggy address belongs to the page:
> [ 209.401867] page:464e6411 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xb79
> [ 209.402080] flags: 0x200(slab)
> [ 209.402477] raw: 00000200 00000100 00000122 c2004a90 00000000 00440088 ffffffff 00000001
> [ 209.402646] page dumped because: kasan: bad access detected
> [ 209.402765]
> [ 209.402897] Memory state around the buggy address:
> [ 209.403142] c2de6f80: fb fb fc fc fc fc fc fc fc fc fa fb fb fb fb fb
> [ 209.403388] c2de7000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 209.403639] >c2de7080: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
> [ 209.403798] ^
> [ 209.404048] c2de7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
> [ 209.404304] c2de7180: fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb fb
> [ 209.404456] ==================================================================
> [ 209.404591] Disabling lock debugging due to kernel taint
>
>
> Thanks
> Christophe
^ permalink raw reply
* Re: [PATCH net v3 0/2] ibmvnic: Bug fixes for queue descriptor processing
From: David Miller @ 2020-12-01 18:12 UTC (permalink / raw)
To: tlfalcon
Cc: cforno12, netdev, ljp, ricklind, dnbanerg, drt, brking, kuba,
sukadev, linuxppc-dev
In-Reply-To: <1606837931-22676-1-git-send-email-tlfalcon@linux.ibm.com>
From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Tue, 1 Dec 2020 09:52:09 -0600
> This series resolves a few issues in the ibmvnic driver's
> RX buffer and TX completion processing. The first patch
> includes memory barriers to synchronize queue descriptor
> reads. The second patch fixes a memory leak that could
> occur if the device returns a TX completion with an error
> code in the descriptor, in which case the respective socket
> buffer and other relevant data structures may not be freed
> or updated properly.
>
> v3: Correct length of Fixes tags, requested by Jakub Kicinski
>
> v2: Provide more detailed comments explaining specifically what
> reads are being ordered, suggested by Michael Ellerman
Series applied, thanks!
^ permalink raw reply
* RE: [PATCH 1/5] ARM: configs: drop unused BACKLIGHT_GENERIC option
From: ZHIZHIKIN Andrey @ 2020-12-01 19:48 UTC (permalink / raw)
To: Arnd Bergmann, Alexandre Belloni
Cc: tony@atomide.com, linux-kernel@vger.kernel.org,
James.Bottomley@HansenPartnership.com, thierry.reding@gmail.com,
paulus@samba.org, sam@ravnborg.org, daniel.thompson@linaro.org,
linux-omap@vger.kernel.org, Arnd Bergmann, deller@gmx.de,
linux@armlinux.org.uk, Krzysztof Kozlowski, jonathanh@nvidia.com,
ludovic.desroches@microchip.com, arm-soc, Catalin Marinas,
linux-mips@vger.kernel.org, will@kernel.org, mripard@kernel.org,
linux-tegra@vger.kernel.org, lee.jones@linaro.org, wens@csie.org,
linux-arm-kernel@lists.infradead.org, jernej.skrabec@siol.net,
tsbogend@alpha.franken.de, linux-parisc@vger.kernel.org,
emil.l.velikov@gmail.com, nicolas.ferre@microchip.com,
Olof Johansson, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <CAK8P3a0N24zuQ+CM-_t66CS8AprzdtdfirfLWwGpjgcXjWjn=Q@mail.gmail.com>
Hello Arnd,
> -----Original Message-----
> From: Arnd Bergmann <arnd@kernel.org>
> Sent: Tuesday, December 1, 2020 4:50 PM
> To: Alexandre Belloni <alexandre.belloni@bootlin.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>; ZHIZHIKIN Andrey
> <andrey.zhizhikin@leica-geosystems.com>; Krzysztof Kozlowski
> <krzk@kernel.org>; linux@armlinux.org.uk; nicolas.ferre@microchip.com;
> ludovic.desroches@microchip.com; tony@atomide.com;
> mripard@kernel.org; wens@csie.org; jernej.skrabec@siol.net;
> thierry.reding@gmail.com; jonathanh@nvidia.com; will@kernel.org;
> tsbogend@alpha.franken.de; James.Bottomley@HansenPartnership.com;
> deller@gmx.de; mpe@ellerman.id.au; benh@kernel.crashing.org;
> paulus@samba.org; lee.jones@linaro.org; sam@ravnborg.org;
> emil.l.velikov@gmail.com; daniel.thompson@linaro.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> omap@vger.kernel.org; linux-tegra@vger.kernel.org; linux-
> mips@vger.kernel.org; linux-parisc@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org; Arnd Bergmann <arnd@arndb.de>; Olof Johansson
> <olof@lixom.net>; arm-soc <arm@kernel.org>
> Subject: Re: [PATCH 1/5] ARM: configs: drop unused BACKLIGHT_GENERIC
> option
>
>
> On Tue, Dec 1, 2020 at 4:41 PM Alexandre Belloni
> <alexandre.belloni@bootlin.com> wrote:
> > On 01/12/2020 14:40:53+0000, Catalin Marinas wrote:
> > > On Mon, Nov 30, 2020 at 07:50:25PM +0000, ZHIZHIKIN Andrey wrote:
> > > > From Krzysztof Kozlowski <krzk@kernel.org>:
>
> > > I tried to convince them before, it didn't work. I guess they don't
> > > like to be spammed ;).
> >
> > The first rule of arm-soc is: you do not talk about arm@ and soc@
>
> I don't mind having the addresses documented better, but it needs to be
> done in a way that avoids having any patch for arch/arm*/boot/dts and
> arch/arm/*/configs Cc:d to soc@kernel.org.
>
> If anyone has suggestions for how to do that, let me know.
Just as a proposal:
Maybe those addresses should at least be included in the Documentation ("Select the recipients for your patch" section of "Submitting patches"), much like stable@ is. Those who get themselves familiarized with it - would get an idea about which list they would need to include in Cc: for such changes.
That should IMHO partially reduce the traffic on the list since it would not pop-up in the output of get_maintainer.pl, but would at least be documented so contributors can follow the process.
>
> > > Or rather, SoC-specific patches, even to defconfig, should go
> > > through the specific SoC maintainers. However, there are occasional
> > > defconfig patches which are more generic or affecting multiple SoCs.
> > > I just ignore them as the arm64 defconfig is usually handled by the
> > > arm-soc folk (when I need a defconfig change, I go for
> > > arch/arm64/Kconfig directly ;)).
> >
> > IIRC, the plan was indeed to get defconfig changes through the
> > platform sub-trees. It is also supposed to be how multi_v5 and
> > multi_v7 are handled and they will take care of the merge.
>
> For cross-platform changes like this one, I'm definitely happy to pick up the
> patch directly from soc@kernel.org, or from mailing list if I know about it.
Should I collect all Ack's and re-send this series including the list "nobody talks about" :), or the series can be picked up as-is?
Your advice would be really welcomed here!
>
> We usually do the merges for the soc tree in batches and rely on patchwork
> to keep track of what I'm missing, so if Olof and I are just on Cc to a mail, we
> might have forgotten about it by the time we do the next merges.
>
> Arnd
Regards,
Andrey
^ permalink raw reply
* Re: [PATCH 1/5] ARM: configs: drop unused BACKLIGHT_GENERIC option
From: Arnd Bergmann @ 2020-12-01 20:44 UTC (permalink / raw)
To: ZHIZHIKIN Andrey
Cc: Alexandre Belloni, tony@atomide.com, linux-kernel@vger.kernel.org,
James.Bottomley@HansenPartnership.com, thierry.reding@gmail.com,
paulus@samba.org, sam@ravnborg.org, daniel.thompson@linaro.org,
linux-omap@vger.kernel.org, Arnd Bergmann, deller@gmx.de,
linux@armlinux.org.uk, Krzysztof Kozlowski, jonathanh@nvidia.com,
ludovic.desroches@microchip.com, arm-soc, Catalin Marinas,
linux-mips@vger.kernel.org, will@kernel.org, mripard@kernel.org,
linux-tegra@vger.kernel.org, lee.jones@linaro.org, wens@csie.org,
linux-arm-kernel@lists.infradead.org, jernej.skrabec@siol.net,
tsbogend@alpha.franken.de, linux-parisc@vger.kernel.org,
emil.l.velikov@gmail.com, nicolas.ferre@microchip.com,
Olof Johansson, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <AM6PR06MB4691A5E1603BBE57F35F3B17A6F40@AM6PR06MB4691.eurprd06.prod.outlook.com>
On Tue, Dec 1, 2020 at 8:48 PM ZHIZHIKIN Andrey
<andrey.zhizhikin@leica-geosystems.com> wrote:
> Hello Arnd,
> > > > Or rather, SoC-specific patches, even to defconfig, should go
> > > > through the specific SoC maintainers. However, there are occasional
> > > > defconfig patches which are more generic or affecting multiple SoCs.
> > > > I just ignore them as the arm64 defconfig is usually handled by the
> > > > arm-soc folk (when I need a defconfig change, I go for
> > > > arch/arm64/Kconfig directly ;)).
> > >
> > > IIRC, the plan was indeed to get defconfig changes through the
> > > platform sub-trees. It is also supposed to be how multi_v5 and
> > > multi_v7 are handled and they will take care of the merge.
> >
> > For cross-platform changes like this one, I'm definitely happy to pick up the
> > patch directly from soc@kernel.org, or from mailing list if I know about it.
>
> Should I collect all Ack's and re-send this series including the list "nobody
> talks about" :), or the series can be picked up as-is?
>
> Your advice would be really welcomed here!
Yes, please do, that makes my life easier. I would apply the patches
for arch/arm and arch/arm64 when you send them to soc@kernel.org,
the others go to the respective architecture maintainers, unless they
want me to pick up the whole series.
Arnd
^ permalink raw reply
* Re: [PATCH v2 2/2] kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
From: Kees Cook @ 2020-12-01 20:56 UTC (permalink / raw)
To: Masahiro Yamada
Cc: Michal Marek, kernelci . org bot, Linux Kbuild mailing list,
Catalin Marinas, Mark Brown,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Nick Desaulniers,
Russell King, LKML, linuxppc-dev, Arvind Sankar, Ingo Molnar,
Borislav Petkov, clang-built-linux, Nathan Chancellor,
Will Deacon, Thomas Gleixner, Linux ARM
In-Reply-To: <CAK7LNAR=_+1K7EtpvGzgyM+ans-iNOT0PBXdLRApnsyAzakQ3w@mail.gmail.com>
On Tue, Dec 01, 2020 at 10:31:37PM +0900, Masahiro Yamada wrote:
> On Wed, Nov 25, 2020 at 7:22 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Thu, Nov 19, 2020 at 01:13:27PM -0800, Nick Desaulniers wrote:
> > > On Thu, Nov 19, 2020 at 12:57 PM Nathan Chancellor
> > > <natechancellor@gmail.com> wrote:
> > > >
> > > > ld.lld 10.0.1 spews a bunch of various warnings about .rela sections,
> > > > along with a few others. Newer versions of ld.lld do not have these
> > > > warnings. As a result, do not add '--orphan-handling=warn' to
> > > > LDFLAGS_vmlinux if ld.lld's version is not new enough.
> > > >
> > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1187
> > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1193
> > > > Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
> > > > Reported-by: kernelci.org bot <bot@kernelci.org>
> > > > Reported-by: Mark Brown <broonie@kernel.org>
> > > > Reviewed-by: Kees Cook <keescook@chromium.org>
> > > > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> > >
> > > Thanks for the additions in v2.
> > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> >
> > I'm going to carry this for a few days in -next, and if no one screams,
> > ask Linus to pull it for v5.10-rc6.
> >
> > Thanks!
> >
> > --
> > Kees Cook
>
>
> Sorry for the delay.
> Applied to linux-kbuild.
Great, thanks!
> But, I already see this in linux-next.
> Please let me know if I should drop it from my tree.
My intention was to get this to Linus this week. Do you want to do that
yourself, or Ack the patches in my tree and I'll send it?
-Kees
--
Kees Cook
^ permalink raw reply
* Re: CONFIG_PPC_VAS depends on 64k pages...?
From: Bulent Abali @ 2020-12-01 13:16 UTC (permalink / raw)
To: Sukadev Bhattiprolu
Cc: Tulio Magno Quites Machado Filho, daniel, haren, Will Springer,
Bulent Abali, linuxppc-dev, Raphael M Zinsly
In-Reply-To: <20201201055228.GA2213889@us.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]
I don't know anything about VAS page size requirements in the kernel. I
checked the user compression library and saw that we do a sysconf to get
the page size; so the library should be immune to page size by design.
But it wouldn't surprise me if a 64KB constant is inadvertently hardcoded
somewhere else in the library. Giving heads up to Tulio and Raphael who
are owners of the github repo.
https://github.com/libnxz/power-gzip/blob/master/lib/nx_zlib.c#L922
If we got this wrong in the library it might manifest itself as an error
message of the sort "excessive page faults". The library must touch pages
ahead to make them present in the memory; occasional page faults is
acceptable. It will retry.
Bulent
From: "Sukadev Bhattiprolu" <sukadev@linux.ibm.com>
To: "Christophe Leroy" <christophe.leroy@csgroup.eu>
Cc: "Will Springer" <skirmisher@protonmail.com>,
linuxppc-dev@lists.ozlabs.org, daniel@octaforge.org, Bulent
Abali/Watson/IBM@IBM, haren@linux.ibm.com
Date: 12/01/2020 12:53 AM
Subject: Re: CONFIG_PPC_VAS depends on 64k pages...?
Christophe Leroy [christophe.leroy@csgroup.eu] wrote:
> Hi,
>
> Le 19/11/2020 à 11:58, Will Springer a écrit :
> > I learned about the POWER9 gzip accelerator a few months ago when the
> > support hit upstream Linux 5.8. However, for some reason the Kconfig
> > dictates that VAS depends on a 64k page size, which is problematic as
I
> > run Void Linux, which uses a 4k-page kernel.
> >
> > Some early poking by others indicated there wasn't an obvious page
size
> > dependency in the code, and suggested I try modifying the config to
switch
> > it on. I did so, but was stopped by a minor complaint of an
"unexpected DT
> > configuration" by the VAS code. I wasn't equipped to figure out
exactly what
> > this meant, even after finding the offending condition, so after
writing a
> > very drawn-out forum post asking for help, I dropped the subject.
> >
> > Fast forward to today, when I was reminded of the whole thing again,
and
> > decided to debug a bit further. Apparently the VAS platform device
> > (derived from the DT node) has 5 resources on my 4k kernel, instead of
4
> > (which evidently works for others who have had success on 64k
kernels). I
> > have no idea what this means in practice (I don't know how to
introspect
> > it), but after making a tiny patch[1], everything came up smoothly and
I
> > was doing blazing-fast gzip (de)compression in no time.
> >
> > Everything seems to work fine on 4k pages. So, what's up? Are there
> > pitfalls lurking around that I've yet to stumble over? More
reasonably,
> > I'm curious as to why the feature supposedly depends on 64k pages, or
if
> > there's anything else I should be concerned about.
Will,
The reason I put in that config check is because we were only able to
test 64K pages at that point.
It is interesting that it is working for you. Following code in skiboot
https://github.com/open-power/skiboot/blob/master/hw/vas.c should restrict
it to 64K pages. IIRC there is also a corresponding change in some NX
registers that should also be configured to allow 4K pages.
static int init_north_ctl(struct proc_chip *chip)
{
uint64_t val = 0ULL;
val = SETFIELD(VAS_64K_MODE_MASK, val,
true);
val = SETFIELD(VAS_ACCEPT_PASTE_MASK,
val, true);
val = SETFIELD(VAS_ENABLE_WC_MMIO_BAR,
val, true);
val = SETFIELD(VAS_ENABLE_UWC_MMIO_BAR,
val, true);
val = SETFIELD(VAS_ENABLE_RMA_MMIO_BAR,
val, true);
return vas_scom_write(chip,
VAS_MISC_N_CTL, val);
}
I am copying Bulent Albali and Haren Myneni who have been working with
VAS/NX for their thoughts/experience.
> >
>
> Maybe ask Sukadev who did the implementation and is maintaining it ?
>
> > I do have to say I'm quite satisfied with the results of the NX
> > accelerator, though. Being able to shuffle data to a RaptorCS box over
gigE
> > and get compressed data back faster than most software gzip could ever
> > hope to achieve is no small feat, let alone the instantaneous results
locally.
> > :)
> >
> > Cheers,
> > Will Springer [she/her]
> >
> > [1]:
https://github.com/Skirmisher/void-packages/blob/vas-4k-pages/srcpkgs/linux5.9/patches/ppc-vas-on-4k.patch
> >
>
>
> Christophe
[-- Attachment #2: Type: text/html, Size: 7501 bytes --]
^ permalink raw reply
* Re: CONFIG_PPC_VAS depends on 64k pages...?
From: Carlos Eduardo de Paula @ 2020-12-01 21:05 UTC (permalink / raw)
To: Sukadev Bhattiprolu; +Cc: daniel, haren, Will Springer, abali, linuxppc-dev
In-Reply-To: <20201201055228.GA2213889@us.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 4119 bytes --]
On Tue, Dec 1, 2020 at 2:54 AM Sukadev Bhattiprolu <sukadev@linux.ibm.com>
wrote:
>
> Christophe Leroy [christophe.leroy@csgroup.eu] wrote:
> > Hi,
> >
> > Le 19/11/2020 à 11:58, Will Springer a écrit :
> > > I learned about the POWER9 gzip accelerator a few months ago when the
> > > support hit upstream Linux 5.8. However, for some reason the Kconfig
> > > dictates that VAS depends on a 64k page size, which is problematic as I
> > > run Void Linux, which uses a 4k-page kernel.
> > >
> > > Some early poking by others indicated there wasn't an obvious page size
> > > dependency in the code, and suggested I try modifying the config to
> switch
> > > it on. I did so, but was stopped by a minor complaint of an
> "unexpected DT
> > > configuration" by the VAS code. I wasn't equipped to figure out
> exactly what
> > > this meant, even after finding the offending condition, so after
> writing a
> > > very drawn-out forum post asking for help, I dropped the subject.
> > >
> > > Fast forward to today, when I was reminded of the whole thing again,
> and
> > > decided to debug a bit further. Apparently the VAS platform device
> > > (derived from the DT node) has 5 resources on my 4k kernel, instead of
> 4
> > > (which evidently works for others who have had success on 64k
> kernels). I
> > > have no idea what this means in practice (I don't know how to
> introspect
> > > it), but after making a tiny patch[1], everything came up smoothly and
> I
> > > was doing blazing-fast gzip (de)compression in no time.
> > >
> > > Everything seems to work fine on 4k pages. So, what's up? Are there
> > > pitfalls lurking around that I've yet to stumble over? More reasonably,
> > > I'm curious as to why the feature supposedly depends on 64k pages, or
> if
> > > there's anything else I should be concerned about.
>
> Will,
>
> The reason I put in that config check is because we were only able to
> test 64K pages at that point.
>
> It is interesting that it is working for you. Following code in skiboot
> https://github.com/open-power/skiboot/blob/master/hw/vas.c should restrict
> it to 64K pages. IIRC there is also a corresponding change in some NX
> registers that should also be configured to allow 4K pages.
>
>
> static int init_north_ctl(struct proc_chip *chip)
> {
> uint64_t val = 0ULL;
>
> val = SETFIELD(VAS_64K_MODE_MASK, val, true);
> val = SETFIELD(VAS_ACCEPT_PASTE_MASK, val, true);
> val = SETFIELD(VAS_ENABLE_WC_MMIO_BAR, val, true);
> val = SETFIELD(VAS_ENABLE_UWC_MMIO_BAR, val, true);
> val = SETFIELD(VAS_ENABLE_RMA_MMIO_BAR, val, true);
>
> return vas_scom_write(chip, VAS_MISC_N_CTL, val);
> }
>
> I am copying Bulent Albali and Haren Myneni who have been working with
> VAS/NX for their thoughts/experience.
>
> > >
> >
> > Maybe ask Sukadev who did the implementation and is maintaining it ?
> >
> > > I do have to say I'm quite satisfied with the results of the NX
> > > accelerator, though. Being able to shuffle data to a RaptorCS box over
> gigE
> > > and get compressed data back faster than most software gzip could ever
> > > hope to achieve is no small feat, let alone the instantaneous results
> locally.
> > > :)
> > >
> > > Cheers,
> > > Will Springer [she/her]
> > >
> > > [1]:
> https://github.com/Skirmisher/void-packages/blob/vas-4k-pages/srcpkgs/linux5.9/patches/ppc-vas-on-4k.patch
> > >
> >
> >
> > Christophe
>
Hi all, I'd like to report that with Will's patch, I'm using NX-Gzip
perfectly on Linux 5.9.10 built with 4K pages and no changes on firmware in
a Raptor Computing Blackbird workstation.
I'm using Debian 10 distro.
Ref. https://twitter.com/carlosedp/status/1328424799216021511
Carlos
--
________________________________________
Carlos Eduardo de Paula
me@carlosedp.com
http://carlosedp.com
https://twitter.com/carlosedp
https://www.linkedin.com/in/carlosedp/
________________________________________
[-- Attachment #2: Type: text/html, Size: 6384 bytes --]
^ permalink raw reply
* Re: [PATCH 1/5] ARM: configs: drop unused BACKLIGHT_GENERIC option
From: Krzysztof Kozlowski @ 2020-12-01 21:18 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Alexandre Belloni, tony@atomide.com, linux-kernel@vger.kernel.org,
James.Bottomley@HansenPartnership.com, thierry.reding@gmail.com,
paulus@samba.org, sam@ravnborg.org, daniel.thompson@linaro.org,
linux-omap@vger.kernel.org, Arnd Bergmann, deller@gmx.de,
linux@armlinux.org.uk, jonathanh@nvidia.com,
ludovic.desroches@microchip.com, arm-soc, Catalin Marinas,
linux-mips@vger.kernel.org, will@kernel.org, mripard@kernel.org,
ZHIZHIKIN Andrey, linux-tegra@vger.kernel.org,
lee.jones@linaro.org, wens@csie.org,
linux-arm-kernel@lists.infradead.org, jernej.skrabec@siol.net,
tsbogend@alpha.franken.de, linux-parisc@vger.kernel.org,
emil.l.velikov@gmail.com, nicolas.ferre@microchip.com,
Olof Johansson, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <CAK8P3a0N24zuQ+CM-_t66CS8AprzdtdfirfLWwGpjgcXjWjn=Q@mail.gmail.com>
On Tue, Dec 01, 2020 at 04:50:22PM +0100, Arnd Bergmann wrote:
> On Tue, Dec 1, 2020 at 4:41 PM Alexandre Belloni
> <alexandre.belloni@bootlin.com> wrote:
> > On 01/12/2020 14:40:53+0000, Catalin Marinas wrote:
> > > On Mon, Nov 30, 2020 at 07:50:25PM +0000, ZHIZHIKIN Andrey wrote:
> > > > From Krzysztof Kozlowski <krzk@kernel.org>:
>
> > > I tried to convince them before, it didn't work. I guess they don't like
> > > to be spammed ;).
> >
> > The first rule of arm-soc is: you do not talk about arm@ and soc@
>
> I don't mind having the addresses documented better, but it needs to
> be done in a way that avoids having any patch for arch/arm*/boot/dts
> and arch/arm/*/configs Cc:d to soc@kernel.org.
>
> If anyone has suggestions for how to do that, let me know.
Not a perfect solution but something. How about:
https://lore.kernel.org/linux-arm-kernel/20201201211516.24921-2-krzk@kernel.org/T/#u
Would not work on defconfigs but there is a chance someone will find
your addresses this way. Should not cause to much additional traffic.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Will Deacon @ 2020-12-01 21:27 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-arch, Arnd Bergmann, Vasily Gorbik, Christian Borntraeger,
Peter Zijlstra, Catalin Marinas, Heiko Carstens, X86 ML, LKML,
Nicholas Piggin, Linux-MM, Dave Hansen, Mathieu Desnoyers,
linuxppc-dev
In-Reply-To: <CALCETrXAR_9EGaOF8ymVkZycxgZkYk0dR+NjEpTfVzdcS3sOVw@mail.gmail.com>
On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> other arch folk: there's some background here:
>
> https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
>
> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > >
> > > > On big systems, the mm refcount can become highly contented when doing
> > > > a lot of context switching with threaded applications (particularly
> > > > switching between the idle thread and an application thread).
> > > >
> > > > Abandoning lazy tlb slows switching down quite a bit in the important
> > > > user->idle->user cases, so so instead implement a non-refcounted scheme
> > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> > > > any remaining lazy ones.
> > > >
> > > > Shootdown IPIs are some concern, but they have not been observed to be
> > > > a big problem with this scheme (the powerpc implementation generated
> > > > 314 additional interrupts on a 144 CPU system during a kernel compile).
> > > > There are a number of strategies that could be employed to reduce IPIs
> > > > if they turn out to be a problem for some workload.
> > >
> > > I'm still wondering whether we can do even better.
> > >
> >
> > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
> > the TLB. On x86, this will shoot down all lazies as long as even a
> > single pagetable was freed. (Or at least it will if we don't have a
> > serious bug, but the code seems okay. We'll hit pmd_free_tlb, which
> > sets tlb->freed_tables, which will trigger the IPI.) So, on
> > architectures like x86, the shootdown approach should be free. The
> > only way it ought to have any excess IPIs is if we have CPUs in
> > mm_cpumask() that don't need IPI to free pagetables, which could
> > happen on paravirt.
>
> Indeed, on x86, we do this:
>
> [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d
> [ 11.559905] tlb_finish_mmu+0x10e/0x1a0
> [ 11.561068] exit_mmap+0xc8/0x1a0
> [ 11.561932] mmput+0x29/0xd0
> [ 11.562688] do_exit+0x316/0xa90
> [ 11.563588] do_group_exit+0x34/0xb0
> [ 11.564476] __x64_sys_exit_group+0xf/0x10
> [ 11.565512] do_syscall_64+0x34/0x50
>
> and we have info->freed_tables set.
>
> What are the architectures that have large systems like?
>
> x86: we already zap lazies, so it should cost basically nothing to do
> a little loop at the end of __mmput() to make sure that no lazies are
> left. If we care about paravirt performance, we could implement one
> of the optimizations I mentioned above to fix up the refcounts instead
> of sending an IPI to any remaining lazies.
>
> arm64: AFAICT arm64's flush uses magic arm64 hardware support for
> remote flushes, so any lazy mm references will still exist after
> exit_mmap(). (arm64 uses lazy TLB, right?) So this is kind of like
> the x86 paravirt case. Are there large enough arm64 systems that any
> of this matters?
Yes, there are large arm64 systems where performance of TLB invalidation
matters, but they're either niche (supercomputers) or not readily available
(NUMA boxes).
But anyway, we blow away the TLB for everybody in tlb_finish_mmu() after
freeing the page-tables. We have an optimisation to avoid flushing if
we're just unmapping leaf entries when the mm is going away, but we don't
have a choice once we get to actually reclaiming the page-tables.
One thing I probably should mention, though, is that we don't maintain
mm_cpumask() because we're not able to benefit from it and the atomic
update is a waste of time.
Will
^ permalink raw reply
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Andy Lutomirski @ 2020-12-01 21:50 UTC (permalink / raw)
To: Will Deacon
Cc: linux-arch, Arnd Bergmann, Vasily Gorbik, Christian Borntraeger,
Peter Zijlstra, Catalin Marinas, Heiko Carstens, X86 ML, LKML,
Nicholas Piggin, Linux-MM, Dave Hansen, Mathieu Desnoyers,
Andy Lutomirski, linuxppc-dev
In-Reply-To: <20201201212758.GA28300@willie-the-truck>
On Tue, Dec 1, 2020 at 1:28 PM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> > other arch folk: there's some background here:
> >
> > https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
> >
> > On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > >
> > > > > On big systems, the mm refcount can become highly contented when doing
> > > > > a lot of context switching with threaded applications (particularly
> > > > > switching between the idle thread and an application thread).
> > > > >
> > > > > Abandoning lazy tlb slows switching down quite a bit in the important
> > > > > user->idle->user cases, so so instead implement a non-refcounted scheme
> > > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> > > > > any remaining lazy ones.
> > > > >
> > > > > Shootdown IPIs are some concern, but they have not been observed to be
> > > > > a big problem with this scheme (the powerpc implementation generated
> > > > > 314 additional interrupts on a 144 CPU system during a kernel compile).
> > > > > There are a number of strategies that could be employed to reduce IPIs
> > > > > if they turn out to be a problem for some workload.
> > > >
> > > > I'm still wondering whether we can do even better.
> > > >
> > >
> > > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
> > > the TLB. On x86, this will shoot down all lazies as long as even a
> > > single pagetable was freed. (Or at least it will if we don't have a
> > > serious bug, but the code seems okay. We'll hit pmd_free_tlb, which
> > > sets tlb->freed_tables, which will trigger the IPI.) So, on
> > > architectures like x86, the shootdown approach should be free. The
> > > only way it ought to have any excess IPIs is if we have CPUs in
> > > mm_cpumask() that don't need IPI to free pagetables, which could
> > > happen on paravirt.
> >
> > Indeed, on x86, we do this:
> >
> > [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d
> > [ 11.559905] tlb_finish_mmu+0x10e/0x1a0
> > [ 11.561068] exit_mmap+0xc8/0x1a0
> > [ 11.561932] mmput+0x29/0xd0
> > [ 11.562688] do_exit+0x316/0xa90
> > [ 11.563588] do_group_exit+0x34/0xb0
> > [ 11.564476] __x64_sys_exit_group+0xf/0x10
> > [ 11.565512] do_syscall_64+0x34/0x50
> >
> > and we have info->freed_tables set.
> >
> > What are the architectures that have large systems like?
> >
> > x86: we already zap lazies, so it should cost basically nothing to do
> > a little loop at the end of __mmput() to make sure that no lazies are
> > left. If we care about paravirt performance, we could implement one
> > of the optimizations I mentioned above to fix up the refcounts instead
> > of sending an IPI to any remaining lazies.
> >
> > arm64: AFAICT arm64's flush uses magic arm64 hardware support for
> > remote flushes, so any lazy mm references will still exist after
> > exit_mmap(). (arm64 uses lazy TLB, right?) So this is kind of like
> > the x86 paravirt case. Are there large enough arm64 systems that any
> > of this matters?
>
> Yes, there are large arm64 systems where performance of TLB invalidation
> matters, but they're either niche (supercomputers) or not readily available
> (NUMA boxes).
>
> But anyway, we blow away the TLB for everybody in tlb_finish_mmu() after
> freeing the page-tables. We have an optimisation to avoid flushing if
> we're just unmapping leaf entries when the mm is going away, but we don't
> have a choice once we get to actually reclaiming the page-tables.
>
> One thing I probably should mention, though, is that we don't maintain
> mm_cpumask() because we're not able to benefit from it and the atomic
> update is a waste of time.
Do you do anything special for lazy TLB or do you just use the generic
code? (i.e. where do your user pagetables point when you go from a
user task to idle or to a kernel thread?)
Do you end up with all cpus set in mm_cpumask or can you have the mm
loaded on a CPU that isn't in mm_cpumask?
--Andy
>
> Will
^ permalink raw reply
* [PATCH v2 0/5] drop unused BACKLIGHT_GENERIC option
From: Andrey Zhizhikin @ 2020-12-01 22:29 UTC (permalink / raw)
To: linux, nicolas.ferre, alexandre.belloni, ludovic.desroches, tony,
mripard, wens, jernej.skrabec, thierry.reding, jonathanh,
catalin.marinas, will, tsbogend, James.Bottomley, deller, mpe,
benh, paulus, lee.jones, sam, emil.l.velikov, daniel.thompson,
krzk, linux-arm-kernel, linux-kernel, linux-omap, linux-tegra,
linux-mips, linux-parisc, linuxppc-dev, soc
Since the removal of generic_bl driver from the source tree in commit
7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
unused") BACKLIGHT_GENERIC config option became obsolete as well and
therefore subject to clean-up from all configuration files.
This series introduces patches to address this removal, separated by
architectures in the kernel tree.
Changes in v2:
- Collect all Acked-by: and Reviewed-by: tags
- Include ARM SOC maintainer list to recipients
Andrey Zhizhikin (5):
ARM: configs: drop unused BACKLIGHT_GENERIC option
arm64: defconfig: drop unused BACKLIGHT_GENERIC option
MIPS: configs: drop unused BACKLIGHT_GENERIC option
parisc: configs: drop unused BACKLIGHT_GENERIC option
powerpc/configs: drop unused BACKLIGHT_GENERIC option
arch/arm/configs/at91_dt_defconfig | 1 -
arch/arm/configs/cm_x300_defconfig | 1 -
arch/arm/configs/colibri_pxa300_defconfig | 1 -
arch/arm/configs/jornada720_defconfig | 1 -
arch/arm/configs/magician_defconfig | 1 -
arch/arm/configs/mini2440_defconfig | 1 -
arch/arm/configs/omap2plus_defconfig | 1 -
arch/arm/configs/pxa3xx_defconfig | 1 -
arch/arm/configs/qcom_defconfig | 1 -
arch/arm/configs/sama5_defconfig | 1 -
arch/arm/configs/sunxi_defconfig | 1 -
arch/arm/configs/tegra_defconfig | 1 -
arch/arm/configs/u8500_defconfig | 1 -
arch/arm64/configs/defconfig | 1 -
arch/mips/configs/gcw0_defconfig | 1 -
arch/mips/configs/gpr_defconfig | 1 -
arch/mips/configs/lemote2f_defconfig | 1 -
arch/mips/configs/loongson3_defconfig | 1 -
arch/mips/configs/mtx1_defconfig | 1 -
arch/mips/configs/rs90_defconfig | 1 -
arch/parisc/configs/generic-64bit_defconfig | 1 -
arch/powerpc/configs/powernv_defconfig | 1 -
22 files changed, 22 deletions(-)
base-commit: b65054597872ce3aefbc6a666385eabdf9e288da
--
2.17.1
^ permalink raw reply
* [PATCH v2 1/5] ARM: configs: drop unused BACKLIGHT_GENERIC option
From: Andrey Zhizhikin @ 2020-12-01 22:29 UTC (permalink / raw)
To: linux, nicolas.ferre, alexandre.belloni, ludovic.desroches, tony,
mripard, wens, jernej.skrabec, thierry.reding, jonathanh,
catalin.marinas, will, tsbogend, James.Bottomley, deller, mpe,
benh, paulus, lee.jones, sam, emil.l.velikov, daniel.thompson,
krzk, linux-arm-kernel, linux-kernel, linux-omap, linux-tegra,
linux-mips, linux-parisc, linuxppc-dev, soc
In-Reply-To: <20201201222922.3183-1-andrey.zhizhikin@leica-geosystems.com>
Commit 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
unused") removed geenric_bl driver from the tree, together with
corresponding config option.
Remove BACKLIGHT_GENERIC config item from all ARM configurations.
Fixes: 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is unused")
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
arch/arm/configs/at91_dt_defconfig | 1 -
arch/arm/configs/cm_x300_defconfig | 1 -
arch/arm/configs/colibri_pxa300_defconfig | 1 -
arch/arm/configs/jornada720_defconfig | 1 -
arch/arm/configs/magician_defconfig | 1 -
arch/arm/configs/mini2440_defconfig | 1 -
arch/arm/configs/omap2plus_defconfig | 1 -
arch/arm/configs/pxa3xx_defconfig | 1 -
arch/arm/configs/qcom_defconfig | 1 -
arch/arm/configs/sama5_defconfig | 1 -
arch/arm/configs/sunxi_defconfig | 1 -
arch/arm/configs/tegra_defconfig | 1 -
arch/arm/configs/u8500_defconfig | 1 -
13 files changed, 13 deletions(-)
diff --git a/arch/arm/configs/at91_dt_defconfig b/arch/arm/configs/at91_dt_defconfig
index 4a0ba2ae1a25..6e52c9c965e6 100644
--- a/arch/arm/configs/at91_dt_defconfig
+++ b/arch/arm/configs/at91_dt_defconfig
@@ -132,7 +132,6 @@ CONFIG_DRM_ATMEL_HLCDC=y
CONFIG_DRM_PANEL_SIMPLE=y
CONFIG_FB_ATMEL=y
CONFIG_BACKLIGHT_ATMEL_LCDC=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=y
diff --git a/arch/arm/configs/cm_x300_defconfig b/arch/arm/configs/cm_x300_defconfig
index 2f7acde2d921..502a9d870ca4 100644
--- a/arch/arm/configs/cm_x300_defconfig
+++ b/arch/arm/configs/cm_x300_defconfig
@@ -87,7 +87,6 @@ CONFIG_FB=y
CONFIG_FB_PXA=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_TDO24M=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_DA903X=m
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
diff --git a/arch/arm/configs/colibri_pxa300_defconfig b/arch/arm/configs/colibri_pxa300_defconfig
index 0dae3b185284..26e5a67f8e2d 100644
--- a/arch/arm/configs/colibri_pxa300_defconfig
+++ b/arch/arm/configs/colibri_pxa300_defconfig
@@ -34,7 +34,6 @@ CONFIG_FB=y
CONFIG_FB_PXA=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=y
diff --git a/arch/arm/configs/jornada720_defconfig b/arch/arm/configs/jornada720_defconfig
index 9f079be2b84b..069f60ffdcd8 100644
--- a/arch/arm/configs/jornada720_defconfig
+++ b/arch/arm/configs/jornada720_defconfig
@@ -48,7 +48,6 @@ CONFIG_FB=y
CONFIG_FB_S1D13XXX=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
diff --git a/arch/arm/configs/magician_defconfig b/arch/arm/configs/magician_defconfig
index d2e684f6565a..b4670d42f378 100644
--- a/arch/arm/configs/magician_defconfig
+++ b/arch/arm/configs/magician_defconfig
@@ -95,7 +95,6 @@ CONFIG_FB_PXA_OVERLAY=y
CONFIG_FB_W100=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
diff --git a/arch/arm/configs/mini2440_defconfig b/arch/arm/configs/mini2440_defconfig
index 301f29a1fcc3..898490aaa39e 100644
--- a/arch/arm/configs/mini2440_defconfig
+++ b/arch/arm/configs/mini2440_defconfig
@@ -158,7 +158,6 @@ CONFIG_FB_S3C2410=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_PLATFORM=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
diff --git a/arch/arm/configs/omap2plus_defconfig b/arch/arm/configs/omap2plus_defconfig
index de3b7813a1ce..7eae097a75d2 100644
--- a/arch/arm/configs/omap2plus_defconfig
+++ b/arch/arm/configs/omap2plus_defconfig
@@ -388,7 +388,6 @@ CONFIG_FB_TILEBLITTING=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_PLATFORM=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-CONFIG_BACKLIGHT_GENERIC=m
CONFIG_BACKLIGHT_PWM=m
CONFIG_BACKLIGHT_PANDORA=m
CONFIG_BACKLIGHT_GPIO=m
diff --git a/arch/arm/configs/pxa3xx_defconfig b/arch/arm/configs/pxa3xx_defconfig
index 06bbc7a59b60..f0c34017f2aa 100644
--- a/arch/arm/configs/pxa3xx_defconfig
+++ b/arch/arm/configs/pxa3xx_defconfig
@@ -74,7 +74,6 @@ CONFIG_FB_PXA=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_TDO24M=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_DA903X=y
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index c882167e1496..d6733e745b80 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -159,7 +159,6 @@ CONFIG_FB=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_LM3630A=y
CONFIG_BACKLIGHT_LP855X=y
CONFIG_SOUND=y
diff --git a/arch/arm/configs/sama5_defconfig b/arch/arm/configs/sama5_defconfig
index 037d3a718a60..0a167891eb05 100644
--- a/arch/arm/configs/sama5_defconfig
+++ b/arch/arm/configs/sama5_defconfig
@@ -161,7 +161,6 @@ CONFIG_DRM_ATMEL_HLCDC=y
CONFIG_DRM_PANEL_SIMPLE=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_SOUND=y
diff --git a/arch/arm/configs/sunxi_defconfig b/arch/arm/configs/sunxi_defconfig
index 244126172fd6..af6e80d1a0f2 100644
--- a/arch/arm/configs/sunxi_defconfig
+++ b/arch/arm/configs/sunxi_defconfig
@@ -111,7 +111,6 @@ CONFIG_DRM_SIMPLE_BRIDGE=y
CONFIG_DRM_LIMA=y
CONFIG_FB_SIMPLE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
CONFIG_SOUND=y
CONFIG_SND=y
diff --git a/arch/arm/configs/tegra_defconfig b/arch/arm/configs/tegra_defconfig
index fff5fae0db30..74739a52a8ad 100644
--- a/arch/arm/configs/tegra_defconfig
+++ b/arch/arm/configs/tegra_defconfig
@@ -205,7 +205,6 @@ CONFIG_DRM_PANEL_SIMPLE=y
CONFIG_DRM_LVDS_CODEC=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
diff --git a/arch/arm/configs/u8500_defconfig b/arch/arm/configs/u8500_defconfig
index 28dd7cf56048..24aacc255021 100644
--- a/arch/arm/configs/u8500_defconfig
+++ b/arch/arm/configs/u8500_defconfig
@@ -92,7 +92,6 @@ CONFIG_DRM_PANEL_SONY_ACX424AKP=y
CONFIG_DRM_LIMA=y
CONFIG_DRM_MCDE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-CONFIG_BACKLIGHT_GENERIC=m
CONFIG_BACKLIGHT_GPIO=y
CONFIG_LOGO=y
CONFIG_SOUND=y
--
2.17.1
^ permalink raw reply related
* [PATCH v2 2/5] arm64: defconfig: drop unused BACKLIGHT_GENERIC option
From: Andrey Zhizhikin @ 2020-12-01 22:29 UTC (permalink / raw)
To: linux, nicolas.ferre, alexandre.belloni, ludovic.desroches, tony,
mripard, wens, jernej.skrabec, thierry.reding, jonathanh,
catalin.marinas, will, tsbogend, James.Bottomley, deller, mpe,
benh, paulus, lee.jones, sam, emil.l.velikov, daniel.thompson,
krzk, linux-arm-kernel, linux-kernel, linux-omap, linux-tegra,
linux-mips, linux-parisc, linuxppc-dev, soc
In-Reply-To: <20201201222922.3183-1-andrey.zhizhikin@leica-geosystems.com>
Commit 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
unused") removed geenric_bl driver from the tree, together with
corresponding config option.
Remove BACKLIGHT_GENERIC config item from arm64 configuration.
Fixes: 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is unused")
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
arch/arm64/configs/defconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 8e3f7ae71de5..280ed7404a1d 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -681,7 +681,6 @@ CONFIG_DRM_PANFROST=m
CONFIG_FB=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_EFI=y
-CONFIG_BACKLIGHT_GENERIC=m
CONFIG_BACKLIGHT_PWM=m
CONFIG_BACKLIGHT_LP855X=m
CONFIG_LOGO=y
--
2.17.1
^ permalink raw reply related
* [PATCH v2 3/5] MIPS: configs: drop unused BACKLIGHT_GENERIC option
From: Andrey Zhizhikin @ 2020-12-01 22:29 UTC (permalink / raw)
To: linux, nicolas.ferre, alexandre.belloni, ludovic.desroches, tony,
mripard, wens, jernej.skrabec, thierry.reding, jonathanh,
catalin.marinas, will, tsbogend, James.Bottomley, deller, mpe,
benh, paulus, lee.jones, sam, emil.l.velikov, daniel.thompson,
krzk, linux-arm-kernel, linux-kernel, linux-omap, linux-tegra,
linux-mips, linux-parisc, linuxppc-dev, soc
In-Reply-To: <20201201222922.3183-1-andrey.zhizhikin@leica-geosystems.com>
Commit 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
unused") removed geenric_bl driver from the tree, together with
corresponding config option.
Remove BACKLIGHT_GENERIC config item from all MIPS configurations.
Fixes: 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is unused")
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
arch/mips/configs/gcw0_defconfig | 1 -
arch/mips/configs/gpr_defconfig | 1 -
arch/mips/configs/lemote2f_defconfig | 1 -
arch/mips/configs/loongson3_defconfig | 1 -
arch/mips/configs/mtx1_defconfig | 1 -
arch/mips/configs/rs90_defconfig | 1 -
6 files changed, 6 deletions(-)
diff --git a/arch/mips/configs/gcw0_defconfig b/arch/mips/configs/gcw0_defconfig
index 7e28a4fe9d84..460683b52285 100644
--- a/arch/mips/configs/gcw0_defconfig
+++ b/arch/mips/configs/gcw0_defconfig
@@ -73,7 +73,6 @@ CONFIG_DRM_PANEL_NOVATEK_NT39016=y
CONFIG_DRM_INGENIC=y
CONFIG_DRM_ETNAVIV=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
diff --git a/arch/mips/configs/gpr_defconfig b/arch/mips/configs/gpr_defconfig
index 9085f4d6c698..87e20f3391ed 100644
--- a/arch/mips/configs/gpr_defconfig
+++ b/arch/mips/configs/gpr_defconfig
@@ -251,7 +251,6 @@ CONFIG_SSB_DRIVER_PCICORE=y
# CONFIG_VGA_ARB is not set
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_VGA_CONSOLE is not set
CONFIG_USB_HID=m
CONFIG_USB_HIDDEV=y
diff --git a/arch/mips/configs/lemote2f_defconfig b/arch/mips/configs/lemote2f_defconfig
index 3a9a453b1264..688c91918db2 100644
--- a/arch/mips/configs/lemote2f_defconfig
+++ b/arch/mips/configs/lemote2f_defconfig
@@ -145,7 +145,6 @@ CONFIG_FB_SIS_300=y
CONFIG_FB_SIS_315=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-CONFIG_BACKLIGHT_GENERIC=m
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
diff --git a/arch/mips/configs/loongson3_defconfig b/arch/mips/configs/loongson3_defconfig
index 38a817ead8e7..9c5fadef38cb 100644
--- a/arch/mips/configs/loongson3_defconfig
+++ b/arch/mips/configs/loongson3_defconfig
@@ -286,7 +286,6 @@ CONFIG_DRM_VIRTIO_GPU=y
CONFIG_FB_RADEON=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_PLATFORM=m
-CONFIG_BACKLIGHT_GENERIC=m
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
diff --git a/arch/mips/configs/mtx1_defconfig b/arch/mips/configs/mtx1_defconfig
index 914af125a7fa..0ef2373404e5 100644
--- a/arch/mips/configs/mtx1_defconfig
+++ b/arch/mips/configs/mtx1_defconfig
@@ -450,7 +450,6 @@ CONFIG_WDT_MTX1=y
# CONFIG_VGA_ARB is not set
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_VGA_CONSOLE is not set
CONFIG_SOUND=m
CONFIG_SND=m
diff --git a/arch/mips/configs/rs90_defconfig b/arch/mips/configs/rs90_defconfig
index dfbb9fed9a42..4f540bb94628 100644
--- a/arch/mips/configs/rs90_defconfig
+++ b/arch/mips/configs/rs90_defconfig
@@ -97,7 +97,6 @@ CONFIG_DRM_FBDEV_OVERALLOC=300
CONFIG_DRM_PANEL_SIMPLE=y
CONFIG_DRM_INGENIC=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PWM=y
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
--
2.17.1
^ permalink raw reply related
* [PATCH v2 4/5] parisc: configs: drop unused BACKLIGHT_GENERIC option
From: Andrey Zhizhikin @ 2020-12-01 22:29 UTC (permalink / raw)
To: linux, nicolas.ferre, alexandre.belloni, ludovic.desroches, tony,
mripard, wens, jernej.skrabec, thierry.reding, jonathanh,
catalin.marinas, will, tsbogend, James.Bottomley, deller, mpe,
benh, paulus, lee.jones, sam, emil.l.velikov, daniel.thompson,
krzk, linux-arm-kernel, linux-kernel, linux-omap, linux-tegra,
linux-mips, linux-parisc, linuxppc-dev, soc
In-Reply-To: <20201201222922.3183-1-andrey.zhizhikin@leica-geosystems.com>
Commit 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
unused") removed geenric_bl driver from the tree, together with
corresponding config option.
Remove BACKLIGHT_GENERIC config item from generic-64bit_defconfig.
Fixes: 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is unused")
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
arch/parisc/configs/generic-64bit_defconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/parisc/configs/generic-64bit_defconfig b/arch/parisc/configs/generic-64bit_defconfig
index 7e2d7026285e..8f81fcbf04c4 100644
--- a/arch/parisc/configs/generic-64bit_defconfig
+++ b/arch/parisc/configs/generic-64bit_defconfig
@@ -191,7 +191,6 @@ CONFIG_DRM=y
CONFIG_DRM_RADEON=y
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_MODE_HELPERS=y
-# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
CONFIG_HIDRAW=y
CONFIG_HID_PID=y
--
2.17.1
^ permalink raw reply related
* [PATCH v2 5/5] powerpc/configs: drop unused BACKLIGHT_GENERIC option
From: Andrey Zhizhikin @ 2020-12-01 22:29 UTC (permalink / raw)
To: linux, nicolas.ferre, alexandre.belloni, ludovic.desroches, tony,
mripard, wens, jernej.skrabec, thierry.reding, jonathanh,
catalin.marinas, will, tsbogend, James.Bottomley, deller, mpe,
benh, paulus, lee.jones, sam, emil.l.velikov, daniel.thompson,
krzk, linux-arm-kernel, linux-kernel, linux-omap, linux-tegra,
linux-mips, linux-parisc, linuxppc-dev, soc
In-Reply-To: <20201201222922.3183-1-andrey.zhizhikin@leica-geosystems.com>
Commit 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
unused") removed geenric_bl driver from the tree, together with
corresponding config option.
Remove BACKLIGHT_GENERIC config item from generic-64bit_defconfig.
Fixes: 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is unused")
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
---
arch/powerpc/configs/powernv_defconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index cf30fc24413b..60a30fffeda0 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -208,7 +208,6 @@ CONFIG_FB_MATROX_G=y
CONFIG_FB_RADEON=m
CONFIG_FB_IBM_GXT4500=m
CONFIG_LCD_PLATFORM=m
-CONFIG_BACKLIGHT_GENERIC=m
# CONFIG_VGA_CONSOLE is not set
CONFIG_LOGO=y
CONFIG_HID_A4TECH=m
--
2.17.1
^ permalink raw reply related
* Re: [PATCH kernel v2] powerpc/pci: Remove LSI mappings on device teardown
From: Alexey Kardashevskiy @ 2020-12-01 22:45 UTC (permalink / raw)
To: Cédric Le Goater, linuxppc-dev
Cc: Frederic Barrat, Oliver O'Halloran
In-Reply-To: <350f6a85-77d8-c0bc-3ba5-f3fd3c50ffe1@kaod.org>
On 01/12/2020 20:31, Cédric Le Goater wrote:
> On 12/1/20 8:39 AM, Alexey Kardashevskiy wrote:
>> From: Oliver O'Halloran <oohall@gmail.com>
>>
>> When a passthrough IO adapter is removed from a pseries machine using hash
>> MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
>> to clear all page table entries related to the adapter. If some are still
>> present, the RTAS call which isolates the PCI slot returns error 9001
>> "valid outstanding translations" and the removal of the IO adapter fails.
>> This is because when the PHBs are scanned, Linux maps automatically the
>> INTx interrupts in the Linux interrupt number space but these are never
>> removed.
>>
>> This problem can be fixed by adding the corresponding unmap operation when
>> the device is removed. There's no pcibios_* hook for the remove case, but
>> the same effect can be achieved using a bus notifier.
>>
>> Because INTx are shared among PHBs (and potentially across the system),
>> this adds tracking of virq to unmap them only when the last user is gone.
>>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> [aik: added refcounter]
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Looks good to me and the system survives all the PCI hotplug tests I used
> to do on my first attempts to fix this issue.
>
> One comment below,
>
>> ---
>>
>>
>> Doing this in the generic irq code is just too much for my small brain :-/
>
> may be more cleanups are required in the PCI/MSI/IRQ PPC layers before
> considering your first approach. You think too much in advance !
>
>>
>> ---
>> arch/powerpc/kernel/pci-common.c | 71 ++++++++++++++++++++++++++++++++
>> 1 file changed, 71 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>> index be108616a721..0acf17f17253 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -353,6 +353,55 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
>> return NULL;
>> }
>>
>> +struct pci_intx_virq {
>> + int virq;
>> + struct kref kref;
>> + struct list_head list_node;
>> +};
>> +
>> +static LIST_HEAD(intx_list);
>> +static DEFINE_MUTEX(intx_mutex);
>> +
>> +static void ppc_pci_intx_release(struct kref *kref)
>> +{
>> + struct pci_intx_virq *vi = container_of(kref, struct pci_intx_virq, kref);
>> +
>> + list_del(&vi->list_node);
>> + irq_dispose_mapping(vi->virq);
>> + kfree(vi);
>> +}
>> +
>> +static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
>> + unsigned long action, void *data)
>> +{
>> + struct pci_dev *pdev = to_pci_dev(data);
>> +
>> + if (action == BUS_NOTIFY_DEL_DEVICE) {
>> + struct pci_intx_virq *vi;
>> +
>> + mutex_lock(&intx_mutex);
>> + list_for_each_entry(vi, &intx_list, list_node) {
>> + if (vi->virq == pdev->irq) {
>> + kref_put(&vi->kref, ppc_pci_intx_release);
>> + break;
>> + }
>> + }
>> + mutex_unlock(&intx_mutex);
>> + }
>> +
>> + return NOTIFY_DONE;
>> +}
>> +
>> +static struct notifier_block ppc_pci_unmap_irq_notifier = {
>> + .notifier_call = ppc_pci_unmap_irq_line,
>> +};
>> +
>> +static int ppc_pci_register_irq_notifier(void)
>> +{
>> + return bus_register_notifier(&pci_bus_type, &ppc_pci_unmap_irq_notifier);
>> +}
>> +arch_initcall(ppc_pci_register_irq_notifier);
>> +
>> /*
>> * Reads the interrupt pin to determine if interrupt is use by card.
>> * If the interrupt is used, then gets the interrupt line from the
>> @@ -361,6 +410,12 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
>> static int pci_read_irq_line(struct pci_dev *pci_dev)
>> {
>> int virq;
>> + struct pci_intx_virq *vi, *vitmp;
>> +
>> + /* Preallocate vi as rewind is complex if this fails after mapping */
>
> AFAICT, we only need to call irq_dispose_mapping() if allocation fails.
Today - yes but in the future (hierarchical domains or whatever other
awesome thing we'll use from there) - not necessarily. Too much is
hidden under irq_create_fwspec_mapping(). Thanks,
> If so, it would be simpler to isolate the code in a pci_intx_register(virq)
> helper and call it from pci_read_irq_line().
>
>> + vi = kzalloc(sizeof(struct pci_intx_virq), GFP_KERNEL);
>> + if (!vi)
>> + return -1;
>>
>> pr_debug("PCI: Try to map irq for %s...\n", pci_name(pci_dev));
>>
>> @@ -401,6 +456,22 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>>
>> pci_dev->irq = virq;
>>
>> + mutex_lock(&intx_mutex);
>> + list_for_each_entry(vitmp, &intx_list, list_node) {
>> + if (vitmp->virq == virq) {
>> + kref_get(&vitmp->kref);
>> + kfree(vi);
>> + vi = NULL;
>> + break;
>> + }
>> + }
>> + if (vi) {
>> + vi->virq = virq;
>> + kref_init(&vi->kref);
>> + list_add_tail(&vi->list_node, &intx_list);
>> + }
>> + mutex_unlock(&intx_mutex);
>> +
>> return 0;
>> }
>>
>>
>
--
Alexey
^ permalink raw reply
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Will Deacon @ 2020-12-01 23:04 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-arch, Arnd Bergmann, Vasily Gorbik, Christian Borntraeger,
Peter Zijlstra, Catalin Marinas, Heiko Carstens, X86 ML, LKML,
Nicholas Piggin, Linux-MM, Dave Hansen, Mathieu Desnoyers,
linuxppc-dev
In-Reply-To: <CALCETrVP3qAQ50yHU-AzZQsiRB9JGO5FQf91kuk7DCvNY51EXQ@mail.gmail.com>
On Tue, Dec 01, 2020 at 01:50:38PM -0800, Andy Lutomirski wrote:
> On Tue, Dec 1, 2020 at 1:28 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> > > other arch folk: there's some background here:
> > >
> > > https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
> > >
> > > On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > > >
> > > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > > >
> > > > > > On big systems, the mm refcount can become highly contented when doing
> > > > > > a lot of context switching with threaded applications (particularly
> > > > > > switching between the idle thread and an application thread).
> > > > > >
> > > > > > Abandoning lazy tlb slows switching down quite a bit in the important
> > > > > > user->idle->user cases, so so instead implement a non-refcounted scheme
> > > > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> > > > > > any remaining lazy ones.
> > > > > >
> > > > > > Shootdown IPIs are some concern, but they have not been observed to be
> > > > > > a big problem with this scheme (the powerpc implementation generated
> > > > > > 314 additional interrupts on a 144 CPU system during a kernel compile).
> > > > > > There are a number of strategies that could be employed to reduce IPIs
> > > > > > if they turn out to be a problem for some workload.
> > > > >
> > > > > I'm still wondering whether we can do even better.
> > > > >
> > > >
> > > > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
> > > > the TLB. On x86, this will shoot down all lazies as long as even a
> > > > single pagetable was freed. (Or at least it will if we don't have a
> > > > serious bug, but the code seems okay. We'll hit pmd_free_tlb, which
> > > > sets tlb->freed_tables, which will trigger the IPI.) So, on
> > > > architectures like x86, the shootdown approach should be free. The
> > > > only way it ought to have any excess IPIs is if we have CPUs in
> > > > mm_cpumask() that don't need IPI to free pagetables, which could
> > > > happen on paravirt.
> > >
> > > Indeed, on x86, we do this:
> > >
> > > [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d
> > > [ 11.559905] tlb_finish_mmu+0x10e/0x1a0
> > > [ 11.561068] exit_mmap+0xc8/0x1a0
> > > [ 11.561932] mmput+0x29/0xd0
> > > [ 11.562688] do_exit+0x316/0xa90
> > > [ 11.563588] do_group_exit+0x34/0xb0
> > > [ 11.564476] __x64_sys_exit_group+0xf/0x10
> > > [ 11.565512] do_syscall_64+0x34/0x50
> > >
> > > and we have info->freed_tables set.
> > >
> > > What are the architectures that have large systems like?
> > >
> > > x86: we already zap lazies, so it should cost basically nothing to do
> > > a little loop at the end of __mmput() to make sure that no lazies are
> > > left. If we care about paravirt performance, we could implement one
> > > of the optimizations I mentioned above to fix up the refcounts instead
> > > of sending an IPI to any remaining lazies.
> > >
> > > arm64: AFAICT arm64's flush uses magic arm64 hardware support for
> > > remote flushes, so any lazy mm references will still exist after
> > > exit_mmap(). (arm64 uses lazy TLB, right?) So this is kind of like
> > > the x86 paravirt case. Are there large enough arm64 systems that any
> > > of this matters?
> >
> > Yes, there are large arm64 systems where performance of TLB invalidation
> > matters, but they're either niche (supercomputers) or not readily available
> > (NUMA boxes).
> >
> > But anyway, we blow away the TLB for everybody in tlb_finish_mmu() after
> > freeing the page-tables. We have an optimisation to avoid flushing if
> > we're just unmapping leaf entries when the mm is going away, but we don't
> > have a choice once we get to actually reclaiming the page-tables.
> >
> > One thing I probably should mention, though, is that we don't maintain
> > mm_cpumask() because we're not able to benefit from it and the atomic
> > update is a waste of time.
>
> Do you do anything special for lazy TLB or do you just use the generic
> code? (i.e. where do your user pagetables point when you go from a
> user task to idle or to a kernel thread?)
We don't do anything special (there's something funny with the PAN emulation
but you can ignore that); the page-table just points wherever it did before
for userspace. Switching explicitly to the init_mm, however, causes us to
unmap userspace entirely.
Since we have ASIDs, switch_mm() generally doesn't have to care about the
TLBs at all.
> Do you end up with all cpus set in mm_cpumask or can you have the mm
> loaded on a CPU that isn't in mm_cpumask?
I think the mask is always zero (we never set anything in there).
Will
^ permalink raw reply
* [PATCH kernel v3] powerpc/pci: Remove LSI mappings on device teardown
From: Alexey Kardashevskiy @ 2020-12-02 0:52 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Frederic Barrat, Oliver O'Halloran,
Cédric Le Goater
From: Oliver O'Halloran <oohall@gmail.com>
When a passthrough IO adapter is removed from a pseries machine using hash
MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
to clear all page table entries related to the adapter. If some are still
present, the RTAS call which isolates the PCI slot returns error 9001
"valid outstanding translations" and the removal of the IO adapter fails.
This is because when the PHBs are scanned, Linux maps automatically the
INTx interrupts in the Linux interrupt number space but these are never
removed.
This problem can be fixed by adding the corresponding unmap operation when
the device is removed. There's no pcibios_* hook for the remove case, but
the same effect can be achieved using a bus notifier.
Because INTx are shared among PHBs (and potentially across the system),
this adds tracking of virq to unmap them only when the last user is gone.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[aik: added refcounter]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v3:
* free @vi on error path
v2:
* added refcounter
---
arch/powerpc/kernel/pci-common.c | 82 ++++++++++++++++++++++++++++++--
1 file changed, 78 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..2b555997b295 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -353,6 +353,55 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
return NULL;
}
+struct pci_intx_virq {
+ int virq;
+ struct kref kref;
+ struct list_head list_node;
+};
+
+static LIST_HEAD(intx_list);
+static DEFINE_MUTEX(intx_mutex);
+
+static void ppc_pci_intx_release(struct kref *kref)
+{
+ struct pci_intx_virq *vi = container_of(kref, struct pci_intx_virq, kref);
+
+ list_del(&vi->list_node);
+ irq_dispose_mapping(vi->virq);
+ kfree(vi);
+}
+
+static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct pci_dev *pdev = to_pci_dev(data);
+
+ if (action == BUS_NOTIFY_DEL_DEVICE) {
+ struct pci_intx_virq *vi;
+
+ mutex_lock(&intx_mutex);
+ list_for_each_entry(vi, &intx_list, list_node) {
+ if (vi->virq == pdev->irq) {
+ kref_put(&vi->kref, ppc_pci_intx_release);
+ break;
+ }
+ }
+ mutex_unlock(&intx_mutex);
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block ppc_pci_unmap_irq_notifier = {
+ .notifier_call = ppc_pci_unmap_irq_line,
+};
+
+static int ppc_pci_register_irq_notifier(void)
+{
+ return bus_register_notifier(&pci_bus_type, &ppc_pci_unmap_irq_notifier);
+}
+arch_initcall(ppc_pci_register_irq_notifier);
+
/*
* Reads the interrupt pin to determine if interrupt is use by card.
* If the interrupt is used, then gets the interrupt line from the
@@ -361,6 +410,12 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
static int pci_read_irq_line(struct pci_dev *pci_dev)
{
int virq;
+ struct pci_intx_virq *vi, *vitmp;
+
+ /* Preallocate vi as rewind is complex if this fails after mapping */
+ vi = kzalloc(sizeof(struct pci_intx_virq), GFP_KERNEL);
+ if (!vi)
+ return -1;
pr_debug("PCI: Try to map irq for %s...\n", pci_name(pci_dev));
@@ -377,12 +432,12 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
* function.
*/
if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_PIN, &pin))
- return -1;
+ goto error_exit;
if (pin == 0)
- return -1;
+ goto error_exit;
if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, &line) ||
line == 0xff || line == 0) {
- return -1;
+ goto error_exit;
}
pr_debug(" No map ! Using line %d (pin %d) from PCI config\n",
line, pin);
@@ -394,14 +449,33 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
if (!virq) {
pr_debug(" Failed to map !\n");
- return -1;
+ goto error_exit;
}
pr_debug(" Mapped to linux irq %d\n", virq);
pci_dev->irq = virq;
+ mutex_lock(&intx_mutex);
+ list_for_each_entry(vitmp, &intx_list, list_node) {
+ if (vitmp->virq == virq) {
+ kref_get(&vitmp->kref);
+ kfree(vi);
+ vi = NULL;
+ break;
+ }
+ }
+ if (vi) {
+ vi->virq = virq;
+ kref_init(&vi->kref);
+ list_add_tail(&vi->list_node, &intx_list);
+ }
+ mutex_unlock(&intx_mutex);
+
return 0;
+error_exit:
+ kfree(vi);
+ return -1;
}
/*
--
2.17.1
^ permalink raw reply related
* [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses
From: Tyrel Datwyler @ 2020-12-02 0:53 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
linuxppc-dev
In-Reply-To: <20201202005329.4538-1-tyreld@linux.ibm.com>
The logic for iterating over the Sub-CRQ responses is similiar to that
of the primary CRQ. Add the necessary handlers for processing those
responses.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 77 ++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 97f00fefa809..e9da3f60c793 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3381,6 +3381,83 @@ static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable)
return rc;
}
+static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
+{
+ struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
+ unsigned long flags;
+
+ switch (crq->valid) {
+ case IBMVFC_CRQ_CMD_RSP:
+ break;
+ case IBMVFC_CRQ_XPORT_EVENT:
+ return;
+ default:
+ dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", crq->valid);
+ return;
+ }
+
+ /* The only kind of payload CRQs we should get are responses to
+ * things we send. Make sure this response is to something we
+ * actually sent
+ */
+ if (unlikely(!ibmvfc_valid_event(&vhost->pool, evt))) {
+ dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n",
+ crq->ioba);
+ return;
+ }
+
+ if (unlikely(atomic_read(&evt->free))) {
+ dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n",
+ crq->ioba);
+ return;
+ }
+
+ spin_lock_irqsave(vhost->host->host_lock, flags);
+ del_timer(&evt->timer);
+ list_del(&evt->queue);
+ ibmvfc_trc_end(evt);
+ spin_unlock_irqrestore(vhost->host->host_lock, flags);
+ evt->done(evt);
+}
+
+static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
+{
+ struct ibmvfc_crq *crq;
+
+ crq = &scrq->msgs[scrq->cur].crq;
+ if (crq->valid & 0x80) {
+ if (++scrq->cur == scrq->size)
+ scrq->cur = 0;
+ rmb();
+ } else
+ crq = NULL;
+
+ return crq;
+}
+
+static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq)
+{
+ struct ibmvfc_crq *crq;
+ int done = 0;
+
+ while (!done) {
+ while ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+ ibmvfc_handle_scrq(crq, scrq->vhost);
+ crq->valid = 0;
+ wmb();
+ }
+
+ ibmvfc_toggle_scrq_irq(scrq, 1);
+ if ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+ ibmvfc_toggle_scrq_irq(scrq, 0);
+ ibmvfc_handle_scrq(crq, scrq->vhost);
+ crq->valid = 0;
+ wmb();
+ } else
+ done = 1;
+ }
+}
+
/**
* ibmvfc_init_tgt - Set the next init job step for the target
* @tgt: ibmvfc target struct
--
2.27.0
^ permalink raw reply related
* [PATCH v2 04/17] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
From: Tyrel Datwyler @ 2020-12-02 0:53 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
linuxppc-dev
In-Reply-To: <20201202005329.4538-1-tyreld@linux.ibm.com>
Allocate a set of Sub-CRQs in advance. During channel setup the client
and VIOS negotiate the number of queues the VIOS supports and the number
that the client desires to request. Its possible that the final channel
resources allocated is less than requested, but the client is still
responsible for sending handles for every queue it is hoping for.
Also, provide deallocation cleanup routines.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 128 +++++++++++++++++++++++++++++++++
drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
2 files changed, 129 insertions(+)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 64674054dbae..4860487c6779 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -793,6 +793,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_crq_queue *crq = &vhost->crq;
+ struct ibmvfc_sub_queue *scrq;
+ int i;
/* Close the CRQ */
do {
@@ -809,6 +811,14 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs, 0, PAGE_SIZE);
crq->cur = 0;
+ if (vhost->scsi_scrqs.scrqs) {
+ for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+ scrq = &vhost->scsi_scrqs.scrqs[i];
+ memset(scrq->msgs, 0, PAGE_SIZE);
+ scrq->cur = 0;
+ }
+ }
+
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -4983,6 +4993,117 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
return retrc;
}
+static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
+ int index)
+{
+ struct device *dev = vhost->dev;
+ struct vio_dev *vdev = to_vio_dev(dev);
+ struct ibmvfc_sub_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
+ int rc = -ENOMEM;
+
+ ENTER;
+
+ scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
+ if (!scrq->msgs)
+ return rc;
+
+ scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
+ scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
+ DMA_BIDIRECTIONAL);
+
+ if (dma_mapping_error(dev, scrq->msg_token))
+ goto dma_map_failed;
+
+ rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
+ &scrq->cookie, &scrq->hw_irq);
+
+ if (rc) {
+ dev_warn(dev, "Error registering sub-crq: %d\n", rc);
+ dev_warn(dev, "Firmware may not support MQ\n");
+ goto reg_failed;
+ }
+
+ scrq->hwq_id = index;
+ scrq->vhost = vhost;
+
+ LEAVE;
+ return 0;
+
+reg_failed:
+ dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+dma_map_failed:
+ free_page((unsigned long)scrq->msgs);
+ LEAVE;
+ return rc;
+}
+
+static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index)
+{
+ struct device *dev = vhost->dev;
+ struct vio_dev *vdev = to_vio_dev(dev);
+ struct ibmvfc_sub_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
+ long rc;
+
+ ENTER;
+
+ do {
+ rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
+ scrq->cookie);
+ } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+
+ if (rc)
+ dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
+
+ dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+ free_page((unsigned long)scrq->msgs);
+ LEAVE;
+}
+
+static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+{
+ int i, j;
+
+ ENTER;
+
+ vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
+ sizeof(*vhost->scsi_scrqs.scrqs),
+ GFP_KERNEL);
+ if (!vhost->scsi_scrqs.scrqs)
+ return -1;
+
+ for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+ if (ibmvfc_register_scsi_channel(vhost, i)) {
+ for (j = i; j > 0; j--)
+ ibmvfc_deregister_scsi_channel(vhost, j - 1);
+ kfree(vhost->scsi_scrqs.scrqs);
+ vhost->scsi_scrqs.scrqs = NULL;
+ vhost->scsi_scrqs.active_queues = 0;
+ LEAVE;
+ return -1;
+ }
+ }
+
+ LEAVE;
+ return 0;
+}
+
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
+{
+ int i;
+
+ ENTER;
+ if (!vhost->scsi_scrqs.scrqs)
+ return;
+
+ for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++)
+ ibmvfc_deregister_scsi_channel(vhost, i);
+
+ kfree(vhost->scsi_scrqs.scrqs);
+ vhost->scsi_scrqs.scrqs = NULL;
+ vhost->scsi_scrqs.active_queues = 0;
+ LEAVE;
+}
+
/**
* ibmvfc_free_mem - Free memory for vhost
* @vhost: ibmvfc host struct
@@ -5239,6 +5360,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
goto remove_shost;
}
+ if (vhost->mq_enabled) {
+ rc = ibmvfc_init_sub_crqs(vhost);
+ if (rc)
+ dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", rc);
+ }
+
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
dev_set_drvdata(dev, vhost);
@@ -5296,6 +5423,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
ibmvfc_purge_requests(vhost, DID_ERROR);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_free_event_pool(vhost);
+ ibmvfc_release_sub_crqs(vhost);
ibmvfc_free_mem(vhost);
spin_lock(&ibmvfc_driver_lock);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index b3cd35cbf067..986ce4530382 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -838,6 +838,7 @@ struct ibmvfc_host {
mempool_t *tgt_pool;
struct ibmvfc_crq_queue crq;
struct ibmvfc_async_crq_queue async_crq;
+ struct ibmvfc_scsi_channels scsi_scrqs;
struct ibmvfc_npiv_login login_info;
union ibmvfc_npiv_login_data *login_buf;
dma_addr_t login_buf_dma;
--
2.27.0
^ permalink raw reply related
* [PATCH v2 00/17] ibmvfc: initial MQ development
From: Tyrel Datwyler @ 2020-12-02 0:53 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
linuxppc-dev
Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.
This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels. The event pool and locking still leverages the legacy single
queue implementation, and as such lock contention is problematic when increasing
the number of queues. However, this initial work demonstrates a 1.2x factor
increase in IOPs when configured with two HW queues despite lock contention.
changes in v2:
* Patch 4: NULL'd scsi_scrq reference after deallocation [brking]
* Patch 6: Added switch case to handle XPORT event [brking]
* Patch 9: fixed ibmvfc_event leak and double free [brking]
* added support for cancel command with MQ
* added parameter toggles for MQ settings
Tyrel Datwyler (17):
ibmvfc: add vhost fields and defaults for MQ enablement
ibmvfc: define hcall wrapper for registering a Sub-CRQ
ibmvfc: add Subordinate CRQ definitions
ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
ibmvfc: add Sub-CRQ IRQ enable/disable routine
ibmvfc: add handlers to drain and complete Sub-CRQ responses
ibmvfc: define Sub-CRQ interrupt handler routine
ibmvfc: map/request irq and register Sub-CRQ interrupt handler
ibmvfc: implement channel enquiry and setup commands
ibmvfc: advertise client support for using hardware channels
ibmvfc: set and track hw queue in ibmvfc_event struct
ibmvfc: send commands down HW Sub-CRQ when channelized
ibmvfc: register Sub-CRQ handles with VIOS during channel setup
ibmvfc: add cancel mad initialization helper
ibmvfc: send Cancel MAD down each hw scsi channel
ibmvfc: enable MQ and set reasonable defaults
ibmvfc: provide modules parameters for MQ settings
drivers/scsi/ibmvscsi/ibmvfc.c | 706 +++++++++++++++++++++++++++++----
drivers/scsi/ibmvscsi/ibmvfc.h | 41 +-
2 files changed, 675 insertions(+), 72 deletions(-)
--
2.27.0
^ permalink raw reply
* [PATCH v2 12/17] ibmvfc: send commands down HW Sub-CRQ when channelized
From: Tyrel Datwyler @ 2020-12-02 0:53 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
Brian King, brking, linuxppc-dev
In-Reply-To: <20201202005329.4538-1-tyreld@linux.ibm.com>
When the client has negotiated the use of channels all vfcFrames are
required to go down a Sub-CRQ channel or it is a protocoal violation. If
the adapter state is channelized submit vfcFrames to the appropriate
Sub-CRQ via the h_send_sub_crq() helper.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 4555775ea74b..3bb20bfdaf4b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -701,6 +701,15 @@ static int ibmvfc_send_crq(struct ibmvfc_host *vhost, u64 word1, u64 word2)
return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, word1, word2);
}
+static int ibmvfc_send_sub_crq(struct ibmvfc_host *vhost, u64 cookie, u64 word1,
+ u64 word2, u64 word3, u64 word4)
+{
+ struct vio_dev *vdev = to_vio_dev(vhost->dev);
+
+ return plpar_hcall_norets(H_SEND_SUB_CRQ, vdev->unit_address, cookie,
+ word1, word2, word3, word4);
+}
+
/**
* ibmvfc_send_crq_init - Send a CRQ init message
* @vhost: ibmvfc host struct
@@ -1513,15 +1522,19 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
struct ibmvfc_host *vhost, unsigned long timeout)
{
__be64 *crq_as_u64 = (__be64 *) &evt->crq;
+ int channel_cmd = 0;
int rc;
/* Copy the IU into the transfer area */
*evt->xfer_iu = evt->iu;
- if (evt->crq.format == IBMVFC_CMD_FORMAT)
+ if (evt->crq.format == IBMVFC_CMD_FORMAT) {
evt->xfer_iu->cmd.tag = cpu_to_be64((u64)evt);
- else if (evt->crq.format == IBMVFC_MAD_FORMAT)
+ channel_cmd = 1;
+ } else if (evt->crq.format == IBMVFC_MAD_FORMAT) {
evt->xfer_iu->mad_common.tag = cpu_to_be64((u64)evt);
- else
+ if (evt->xfer_iu->mad_common.opcode == IBMVFC_TMF_MAD)
+ channel_cmd = 1;
+ } else
BUG();
list_add_tail(&evt->queue, &vhost->sent);
@@ -1534,8 +1547,17 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
mb();
- if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
- be64_to_cpu(crq_as_u64[1])))) {
+ if (vhost->using_channels && channel_cmd)
+ rc = ibmvfc_send_sub_crq(vhost,
+ vhost->scsi_scrqs.scrqs[evt->hwq].vios_cookie,
+ be64_to_cpu(crq_as_u64[0]),
+ be64_to_cpu(crq_as_u64[1]),
+ 0, 0);
+ else
+ rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
+ be64_to_cpu(crq_as_u64[1]));
+
+ if (rc) {
list_del(&evt->queue);
del_timer(&evt->timer);
--
2.27.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox