* [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
From: Alex Bennée @ 2018-05-24 10:09 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527005119-6842-11-git-send-email-Dave.Martin@arm.com>
Dave Martin <Dave.Martin@arm.com> writes:
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64. This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
>
> Four hooks are defined in order to enable this:
>
> * kvm_arch_vcpu_run_map_fp():
> Called on PID change to map necessary bits of current to Hyp.
>
> * kvm_arch_vcpu_load_fp():
> Set up FP/SIMD for entering the KVM run loop (parse as
> "vcpu_load fp").
>
> * kvm_arch_vcpu_ctxsync_fp():
> Get FP/SIMD into a safe state for re-enabling interrupts after a
> guest exit back to the run loop.
>
> For arm64 specifically, this involves updating the host kernel's
> FPSIMD context tracking metadata so that kernel-mode NEON use
> will cause the vcpu's FPSIMD state to be saved back correctly
> into the vcpu struct. This must be done before re-enabling
> interrupts because kernel-mode NEON may be used by softirqs.
>
> * kvm_arch_vcpu_put_fp():
> Save guest FP/SIMD state back to memory and dissociate from the
> CPU ("vcpu_put fp").
>
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current. A few
> helpers drive this:
>
> * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
> mark this CPU as having context fp (which may belong to a vcpu)
> currently loaded in its registers. This is the non-task
> equivalent of the static function fpsimd_bind_to_cpu() in
> fpsimd.c.
>
> * task_fpsimd_save():
> exported to allow KVM to save the guest's FPSIMD state back to
> memory on exit from the run loop.
>
> * fpsimd_flush_state():
> invalidate any context's FPSIMD state that is currently loaded.
> Used to disassociate the vcpu from the CPU regs on run loop exit.
>
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
>
> Some new vcpu_arch fields are added to make all this work. Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state. However, it is still needed for
> preserving other things such as the host's system registers. To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
>
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic. It could provide implementations of
> the helpers later if desired.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>
> ---
>
> Reviewers note: tags retained because this delta is straightforward by
> itself. Please shout if you're not happy!
>
> Changes since v9:
>
> * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
> implicit in fpsimd_flush_cpu_state().
> ---
> arch/arm/include/asm/kvm_host.h | 8 +++
> arch/arm64/include/asm/fpsimd.h | 6 +++
> arch/arm64/include/asm/kvm_host.h | 21 ++++++++
> arch/arm64/kernel/fpsimd.c | 17 ++++--
> arch/arm64/kvm/Kconfig | 1 +
> arch/arm64/kvm/Makefile | 2 +-
> arch/arm64/kvm/fpsimd.c | 111 ++++++++++++++++++++++++++++++++++++++
> arch/arm64/kvm/hyp/switch.c | 51 +++++++++---------
> virt/kvm/arm/arm.c | 4 ++
> 9 files changed, 191 insertions(+), 30 deletions(-)
> create mode 100644 arch/arm64/kvm/fpsimd.c
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index c7c28c8..ac870b2 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr);
>
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
> /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> static inline void kvm_fpsimd_flush_cpu_state(void) {}
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index aa7162a..3e00f70 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,8 @@ struct task_struct;
> extern void fpsimd_save_state(struct user_fpsimd_state *state);
> extern void fpsimd_load_state(struct user_fpsimd_state *state);
>
> +extern void fpsimd_save(void);
> +
> extern void fpsimd_thread_switch(struct task_struct *next);
> extern void fpsimd_flush_thread(void);
>
> @@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
> extern void fpsimd_restore_current_state(void);
> extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>
> +extern void fpsimd_bind_task_to_cpu(void);
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
> extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
> extern void sve_flush_cpu_state(void);
>
> /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 146c167..b3fe730 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
> #include <asm/kvm.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>
> #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>
> @@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
>
> /* Pointer to host CPU context */
> kvm_cpu_context_t *host_cpu_context;
> +
> + struct thread_info *host_thread_info; /* hyp VA */
> + struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
> +
> struct {
> /* {Break,watch}point registers */
> struct kvm_guest_debug_arch regs;
> @@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
>
> /* vcpu_arch flags field values: */
> #define KVM_ARM64_DEBUG_DIRTY (1 << 0)
> +#define KVM_ARM64_FP_ENABLED (1 << 1) /* guest FP regs loaded */
> +#define KVM_ARM64_FP_HOST (1 << 2) /* host FP regs loaded
> */
I may be descending into bike-shedding territory here but it seems a
little incongruous to have _ENABLED = guest FP state when we have _HOST
for host FP state. Why not KVM_ARM64_FP_GUEST?
> +#define KVM_ARM64_HOST_SVE_IN_USE (1 << 3) /* backup for host TIF_SVE */
>
> #define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
>
> @@ -423,6 +431,19 @@ static inline void __cpu_init_stage2(void)
> "PARange is %d bits, unsupported configuration!", parange);
> }
>
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> + return kvm_arch_vcpu_run_map_fp(vcpu);
> +}
> +#endif
> +
> /*
> * All host FP/SIMD state is restored on guest exit, so nothing needs
> * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ba9e7df..ded7ffd 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -265,7 +265,7 @@ static void task_fpsimd_load(void)
> *
> * Softirqs (and preemption) must be disabled.
> */
> -static void fpsimd_save(void)
> +void fpsimd_save(void)
> {
> struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
>
> @@ -981,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
> * Associate current's FPSIMD context with this cpu
> * Preemption must be disabled when calling this function.
> */
> -static void fpsimd_bind_task_to_cpu(void)
> +void fpsimd_bind_task_to_cpu(void)
> {
> struct fpsimd_last_state_struct *last =
> this_cpu_ptr(&fpsimd_last_state);
> @@ -1001,6 +1001,17 @@ static void fpsimd_bind_task_to_cpu(void)
> }
> }
>
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> + struct fpsimd_last_state_struct *last =
> + this_cpu_ptr(&fpsimd_last_state);
> +
> + WARN_ON(!in_softirq() && !irqs_disabled());
> +
> + last->st = st;
> + last->sve_in_use = false;
> +}
> +
> /*
> * Load the userland FPSIMD state of 'current' from memory, but only if the
> * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1053,7 +1064,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
> t->thread.fpsimd_cpu = NR_CPUS;
> }
>
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
> {
> __this_cpu_write(fpsimd_last_state.st, NULL);
> set_thread_flag(TIF_FOREIGN_FPSTATE);
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a2e3a5a..47b23bf 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
> select HAVE_KVM_IRQ_ROUTING
> select IRQ_BYPASS_MANAGER
> select HAVE_KVM_IRQ_BYPASS
> + select HAVE_KVM_VCPU_RUN_PID_CHANGE
> ---help---
> Support hosting virtualized guest machines.
> We don't support KVM with 16K page tables yet, due to the multiple
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 93afff9..0f2a135 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
> kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
> kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> new file mode 100644
> index 0000000..365933a
> --- /dev/null
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arch/arm64/kvm/fpsimd.c: Guest/host FPSIMD context coordination helpers
> + *
> + * Copyright 2018 Arm Limited
> + * Author: Dave Martin <Dave.Martin@arm.com>
> + */
> +#include <linux/bottom_half.h>
> +#include <linux/sched.h>
> +#include <linux/thread_info.h>
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm_mmu.h>
> +
> +/*
> + * Called on entry to KVM_RUN unless this vcpu previously ran at least
> + * once and the most recent prior KVM_RUN for this vcpu was called from
> + * the same task as current (highly likely).
> + *
> + * This is guaranteed to execute before kvm_arch_vcpu_load_fp(vcpu),
> + * such that on entering hyp the relevant parts of current are already
> + * mapped.
> + */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> +{
> + int ret;
> +
> + struct thread_info *ti = ¤t->thread_info;
> + struct user_fpsimd_state *fpsimd = ¤t->thread.uw.fpsimd_state;
> +
> + /*
> + * Make sure the host task thread flags and fpsimd state are
> + * visible to hyp:
> + */
> + ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
> + if (ret)
> + goto error;
> +
> + ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
> + if (ret)
> + goto error;
> +
> + vcpu->arch.host_thread_info = kern_hyp_va(ti);
> + vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> +error:
> + return ret;
> +}
> +
> +/*
> + * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
> + * The actual loading is done by the FPSIMD access trap taken to hyp.
> + *
> + * Here, we just set the correct metadata to indicate that the FPSIMD
> + * state in the cpu regs (if any) belongs to current on the host.
> + *
> + * TIF_SVE is backed up here, since it may get clobbered with guest state.
> + * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
> + */
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
> +{
> + BUG_ON(system_supports_sve());
> + BUG_ON(!current->mm);
> +
> + vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> + vcpu->arch.flags |= KVM_ARM64_FP_HOST;
> + if (test_thread_flag(TIF_SVE))
> + vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
> +}
> +
> +/*
> + * If the guest FPSIMD state was loaded, update the host's context
> + * tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu
> + * so that they will be written back if the kernel clobbers them due to
> + * kernel-mode NEON before re-entry into the guest.
> + */
> +void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
> +{
> + WARN_ON_ONCE(!irqs_disabled());
> +
> + if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> + fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs);
> + clear_thread_flag(TIF_FOREIGN_FPSTATE);
> + clear_thread_flag(TIF_SVE);
> + }
> +}
> +
> +/*
> + * Write back the vcpu FPSIMD regs if they are dirty, and invalidate the
> + * cpu FPSIMD regs so that they can't be spuriously reused if this vcpu
> + * disappears and another task or vcpu appears that recycles the same
> + * struct fpsimd_state.
> + */
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> +{
> + local_bh_disable();
> +
> + update_thread_flag(TIF_SVE,
> + vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
> +
> + if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> + /* Clean guest FP state to memory and invalidate cpu view */
> + fpsimd_save();
> + fpsimd_flush_cpu_state();
> + } else if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> + /* Ensure user trap controls are correctly restored */
> + fpsimd_bind_task_to_cpu();
> + }
> +
> + local_bh_enable();
> +}
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c0796c4..118f300 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -23,19 +23,21 @@
>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_emulate.h>
> +#include <asm/kvm_host.h>
> #include <asm/kvm_hyp.h>
> #include <asm/kvm_mmu.h>
> #include <asm/fpsimd.h>
> #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> +/* Check whether the FP regs were dirtied while in the host-side run loop: */
> +static bool __hyp_text update_fp_enabled(struct kvm_vcpu *vcpu)
> {
> - return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> + if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE)
> + vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> + KVM_ARM64_FP_HOST);
>
> -static bool fpsimd_enabled_vhe(void)
> -{
> - return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> + return !!(vcpu->arch.flags & KVM_ARM64_FP_ENABLED);
> }
>
> /* Save the 32-bit only FPSIMD system register state */
> @@ -92,7 +94,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>
> val = read_sysreg(cpacr_el1);
> val |= CPACR_EL1_TTA;
> - val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> + val &= ~CPACR_EL1_ZEN;
> + if (!update_fp_enabled(vcpu))
> + val &= ~CPACR_EL1_FPEN;
> +
> write_sysreg(val, cpacr_el1);
>
> write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> @@ -105,7 +110,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> __activate_traps_common(vcpu);
>
> val = CPTR_EL2_DEFAULT;
> - val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
> + val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> + if (!update_fp_enabled(vcpu))
> + val |= CPTR_EL2_TFP;
> +
> write_sysreg(val, cptr_el2);
> }
>
> @@ -321,8 +329,6 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
> void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> struct kvm_vcpu *vcpu)
> {
> - kvm_cpu_context_t *host_ctxt;
> -
> if (has_vhe())
> write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> cpacr_el1);
> @@ -332,14 +338,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>
> isb();
>
> - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> - __fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> + if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> + __fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> + vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
> + }
> +
> __fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
>
> /* Skip restoring fpexc32 for AArch64 guests */
> if (!(read_sysreg(hcr_el2) & HCR_RW))
> write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> fpexc32_el2);
> +
> + vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
> }
>
> /*
> @@ -418,7 +429,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpu_context *host_ctxt;
> struct kvm_cpu_context *guest_ctxt;
> - bool fp_enabled;
> u64 exit_code;
>
> host_ctxt = vcpu->arch.host_cpu_context;
> @@ -440,19 +450,14 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> /* And we're baaack! */
> } while (fixup_guest_exit(vcpu, &exit_code));
>
> - fp_enabled = fpsimd_enabled_vhe();
> -
> sysreg_save_guest_state_vhe(guest_ctxt);
>
> __deactivate_traps(vcpu);
>
> sysreg_restore_host_state_vhe(host_ctxt);
>
> - if (fp_enabled) {
> - __fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> - __fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> + if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
> __fpsimd_save_fpexc32(vcpu);
> - }
>
> __debug_switch_to_host(vcpu);
>
> @@ -464,7 +469,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpu_context *host_ctxt;
> struct kvm_cpu_context *guest_ctxt;
> - bool fp_enabled;
> u64 exit_code;
>
> vcpu = kern_hyp_va(vcpu);
> @@ -496,8 +500,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> /* And we're baaack! */
> } while (fixup_guest_exit(vcpu, &exit_code));
>
> - fp_enabled = __fpsimd_enabled_nvhe();
> -
> __sysreg_save_state_nvhe(guest_ctxt);
> __sysreg32_save_state(vcpu);
> __timer_disable_traps(vcpu);
> @@ -508,11 +510,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>
> __sysreg_restore_state_nvhe(host_ctxt);
>
> - if (fp_enabled) {
> - __fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> - __fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> + if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
> __fpsimd_save_fpexc32(vcpu);
> - }
>
> /*
> * This must come after restoring the host sysregs, since a non-VHE
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index a4c1b76..bee226c 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -363,10 +363,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> kvm_vgic_load(vcpu);
> kvm_timer_vcpu_load(vcpu);
> kvm_vcpu_load_sysregs(vcpu);
> + kvm_arch_vcpu_load_fp(vcpu);
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + kvm_arch_vcpu_put_fp(vcpu);
> kvm_vcpu_put_sysregs(vcpu);
> kvm_timer_vcpu_put(vcpu);
> kvm_vgic_put(vcpu);
> @@ -778,6 +780,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> if (static_branch_unlikely(&userspace_irqchip_in_use))
> kvm_timer_sync_hwstate(vcpu);
>
> + kvm_arch_vcpu_ctxsync_fp(vcpu);
> +
> /*
> * We may have taken a host interrupt in HYP mode (ie
> * while executing the guest). This interrupt is still
Minor bike-shedding aside:
Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
--
Alex Benn?e
^ permalink raw reply
* [PATCH v10 11/18] arm64/sve: Move read_zcr_features() out of cpufeature.h
From: Alex Bennée @ 2018-05-24 10:12 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527005119-6842-12-git-send-email-Dave.Martin@arm.com>
Dave Martin <Dave.Martin@arm.com> writes:
> Having read_zcr_features() inline in cpufeature.h results in that
> header requiring #includes which make it hard to include
> <asm/fpsimd.h> elsewhere without triggering header inclusion
> cycles.
>
> This is not a hot-path function and arguably should not be in
> cpufeature.h in the first place, so this patch moves it to
> fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.
>
> This allows some SVE-related #includes to be dropped from
> cpufeature.h, which will ease future maintenance.
>
> A couple of missing #includes of <asm/fpsimd.h> are exposed by this
> change under arch/arm64/. This patch adds the missing #includes as
> necessary.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
> arch/arm64/include/asm/cpufeature.h | 29 -----------------------------
> arch/arm64/include/asm/fpsimd.h | 2 ++
> arch/arm64/include/asm/processor.h | 1 +
> arch/arm64/kernel/fpsimd.c | 28 ++++++++++++++++++++++++++++
> arch/arm64/kernel/ptrace.c | 1 +
> 5 files changed, 32 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 09b0f2a..0a6b713 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -11,9 +11,7 @@
>
> #include <asm/cpucaps.h>
> #include <asm/cputype.h>
> -#include <asm/fpsimd.h>
> #include <asm/hwcap.h>
> -#include <asm/sigcontext.h>
> #include <asm/sysreg.h>
>
> /*
> @@ -510,33 +508,6 @@ static inline bool system_supports_sve(void)
> cpus_have_const_cap(ARM64_SVE);
> }
>
> -/*
> - * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
> - * vector length.
> - *
> - * Use only if SVE is present.
> - * This function clobbers the SVE vector length.
> - */
> -static inline u64 read_zcr_features(void)
> -{
> - u64 zcr;
> - unsigned int vq_max;
> -
> - /*
> - * Set the maximum possible VL, and write zeroes to all other
> - * bits to see if they stick.
> - */
> - sve_kernel_enable(NULL);
> - write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
> -
> - zcr = read_sysreg_s(SYS_ZCR_EL1);
> - zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
> - vq_max = sve_vq_from_vl(sve_get_vl());
> - zcr |= vq_max - 1; /* set LEN field to maximum effective value */
> -
> - return zcr;
> -}
> -
> #endif /* __ASSEMBLY__ */
>
> #endif
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 3e00f70..fb60b22 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,8 @@ extern unsigned int sve_get_vl(void);
> struct arm64_cpu_capabilities;
> extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
>
> +extern u64 read_zcr_features(void);
> +
> extern int __ro_after_init sve_max_vl;
>
> #ifdef CONFIG_ARM64_SVE
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 7675989..f902b6d 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -40,6 +40,7 @@
>
> #include <asm/alternative.h>
> #include <asm/cpufeature.h>
> +#include <asm/fpsimd.h>
> #include <asm/hw_breakpoint.h>
> #include <asm/lse.h>
> #include <asm/pgtable-hwdef.h>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ded7ffd..5152bbc 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -37,6 +37,7 @@
> #include <linux/sched/task_stack.h>
> #include <linux/signal.h>
> #include <linux/slab.h>
> +#include <linux/stddef.h>
> #include <linux/sysctl.h>
>
> #include <asm/esr.h>
> @@ -754,6 +755,33 @@ void sve_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
> isb();
> }
>
> +/*
> + * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
> + * vector length.
> + *
> + * Use only if SVE is present.
> + * This function clobbers the SVE vector length.
> + */
> +u64 read_zcr_features(void)
> +{
> + u64 zcr;
> + unsigned int vq_max;
> +
> + /*
> + * Set the maximum possible VL, and write zeroes to all other
> + * bits to see if they stick.
> + */
> + sve_kernel_enable(NULL);
> + write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
> +
> + zcr = read_sysreg_s(SYS_ZCR_EL1);
> + zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
> + vq_max = sve_vq_from_vl(sve_get_vl());
> + zcr |= vq_max - 1; /* set LEN field to maximum effective value */
> +
> + return zcr;
> +}
> +
> void __init sve_setup(void)
> {
> u64 zcr;
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 7ff81fe..78889c4 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -44,6 +44,7 @@
> #include <asm/compat.h>
> #include <asm/cpufeature.h>
> #include <asm/debug-monitors.h>
> +#include <asm/fpsimd.h>
> #include <asm/pgtable.h>
> #include <asm/stacktrace.h>
> #include <asm/syscall.h>
--
Alex Benn?e
^ permalink raw reply
* [PATCH v10 12/18] arm64/sve: Switch sve_pffr() argument from task to thread
From: Alex Bennée @ 2018-05-24 10:12 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527005119-6842-13-git-send-email-Dave.Martin@arm.com>
Dave Martin <Dave.Martin@arm.com> writes:
> sve_pffr(), which is used to derive the base address used for
> low-level SVE save/restore routines, currently takes the relevant
> task_struct as an argument.
>
> The only accessed fields are actually part of thread_struct, so
> this patch changes the argument type accordingly. This is done in
> preparation for moving this function to a header, where we do not
> want to have to include <linux/sched.h> due to the consequent
> circular #include problems.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
> arch/arm64/kernel/fpsimd.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 5152bbc..c4e9762 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -44,6 +44,7 @@
> #include <asm/fpsimd.h>
> #include <asm/cpufeature.h>
> #include <asm/cputype.h>
> +#include <asm/processor.h>
> #include <asm/simd.h>
> #include <asm/sigcontext.h>
> #include <asm/sysreg.h>
> @@ -167,10 +168,9 @@ static size_t sve_ffr_offset(int vl)
> return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
> }
>
> -static void *sve_pffr(struct task_struct *task)
> +static void *sve_pffr(struct thread_struct *thread)
> {
> - return (char *)task->thread.sve_state +
> - sve_ffr_offset(task->thread.sve_vl);
> + return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
> }
>
> static void change_cpacr(u64 val, u64 mask)
> @@ -253,7 +253,7 @@ static void task_fpsimd_load(void)
> WARN_ON(!in_softirq() && !irqs_disabled());
>
> if (system_supports_sve() && test_thread_flag(TIF_SVE))
> - sve_load_state(sve_pffr(current),
> + sve_load_state(sve_pffr(¤t->thread),
> ¤t->thread.uw.fpsimd_state.fpsr,
> sve_vq_from_vl(current->thread.sve_vl) - 1);
> else
> @@ -284,7 +284,7 @@ void fpsimd_save(void)
> return;
> }
>
> - sve_save_state(sve_pffr(current), &st->fpsr);
> + sve_save_state(sve_pffr(¤t->thread), &st->fpsr);
> } else
> fpsimd_save_state(st);
> }
--
Alex Benn?e
^ permalink raw reply
* [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
From: Dave Martin @ 2018-05-24 10:18 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <877enttlfl.fsf@linaro.org>
On Thu, May 24, 2018 at 11:09:02AM +0100, Alex Benn?e wrote:
>
> Dave Martin <Dave.Martin@arm.com> writes:
>
> > This patch refactors KVM to align the host and guest FPSIMD
> > save/restore logic with each other for arm64. This reduces the
> > number of redundant save/restore operations that must occur, and
> > reduces the common-case IRQ blackout time during guest exit storms
> > by saving the host state lazily and optimising away the need to
> > restore the host state before returning to the run loop.
> >
> > Four hooks are defined in order to enable this:
> >
> > * kvm_arch_vcpu_run_map_fp():
> > Called on PID change to map necessary bits of current to Hyp.
> >
> > * kvm_arch_vcpu_load_fp():
> > Set up FP/SIMD for entering the KVM run loop (parse as
> > "vcpu_load fp").
> >
> > * kvm_arch_vcpu_ctxsync_fp():
> > Get FP/SIMD into a safe state for re-enabling interrupts after a
> > guest exit back to the run loop.
> >
> > For arm64 specifically, this involves updating the host kernel's
> > FPSIMD context tracking metadata so that kernel-mode NEON use
> > will cause the vcpu's FPSIMD state to be saved back correctly
> > into the vcpu struct. This must be done before re-enabling
> > interrupts because kernel-mode NEON may be used by softirqs.
> >
> > * kvm_arch_vcpu_put_fp():
> > Save guest FP/SIMD state back to memory and dissociate from the
> > CPU ("vcpu_put fp").
> >
> > Also, the arm64 FPSIMD context switch code is updated to enable it
> > to save back FPSIMD state for a vcpu, not just current. A few
> > helpers drive this:
> >
> > * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
> > mark this CPU as having context fp (which may belong to a vcpu)
> > currently loaded in its registers. This is the non-task
> > equivalent of the static function fpsimd_bind_to_cpu() in
> > fpsimd.c.
> >
> > * task_fpsimd_save():
> > exported to allow KVM to save the guest's FPSIMD state back to
> > memory on exit from the run loop.
> >
> > * fpsimd_flush_state():
> > invalidate any context's FPSIMD state that is currently loaded.
> > Used to disassociate the vcpu from the CPU regs on run loop exit.
> >
> > These changes allow the run loop to enable interrupts (and thus
> > softirqs that may use kernel-mode NEON) without having to save the
> > guest's FPSIMD state eagerly.
> >
> > Some new vcpu_arch fields are added to make all this work. Because
> > host FPSIMD state can now be saved back directly into current's
> > thread_struct as appropriate, host_cpu_context is no longer used
> > for preserving the FPSIMD state. However, it is still needed for
> > preserving other things such as the host's system registers. To
> > avoid ABI churn, the redundant storage space in host_cpu_context is
> > not removed for now.
> >
> > arch/arm is not addressed by this patch and continues to use its
> > current save/restore logic. It could provide implementations of
> > the helpers later if desired.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> >
> > ---
> >
> > Reviewers note: tags retained because this delta is straightforward by
> > itself. Please shout if you're not happy!
> >
> > Changes since v9:
> >
> > * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
> > implicit in fpsimd_flush_cpu_state().
> > ---
> > arch/arm/include/asm/kvm_host.h | 8 +++
> > arch/arm64/include/asm/fpsimd.h | 6 +++
> > arch/arm64/include/asm/kvm_host.h | 21 ++++++++
> > arch/arm64/kernel/fpsimd.c | 17 ++++--
> > arch/arm64/kvm/Kconfig | 1 +
> > arch/arm64/kvm/Makefile | 2 +-
> > arch/arm64/kvm/fpsimd.c | 111 ++++++++++++++++++++++++++++++++++++++
> > arch/arm64/kvm/hyp/switch.c | 51 +++++++++---------
> > virt/kvm/arm/arm.c | 4 ++
> > 9 files changed, 191 insertions(+), 30 deletions(-)
> > create mode 100644 arch/arm64/kvm/fpsimd.c
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index c7c28c8..ac870b2 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> > int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> > struct kvm_device_attr *attr);
> >
> > +/*
> > + * VFP/NEON switching is all done by the hyp switch code, so no need to
> > + * coordinate with host context handling for this state:
> > + */
> > +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> > +
> > /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> > static inline void kvm_fpsimd_flush_cpu_state(void) {}
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> > index aa7162a..3e00f70 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -41,6 +41,8 @@ struct task_struct;
> > extern void fpsimd_save_state(struct user_fpsimd_state *state);
> > extern void fpsimd_load_state(struct user_fpsimd_state *state);
> >
> > +extern void fpsimd_save(void);
> > +
> > extern void fpsimd_thread_switch(struct task_struct *next);
> > extern void fpsimd_flush_thread(void);
> >
> > @@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
> > extern void fpsimd_restore_current_state(void);
> > extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
> >
> > +extern void fpsimd_bind_task_to_cpu(void);
> > +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> > +
> > extern void fpsimd_flush_task_state(struct task_struct *target);
> > +extern void fpsimd_flush_cpu_state(void);
> > extern void sve_flush_cpu_state(void);
> >
> > /* Maximum VL that SVE VL-agnostic software can transparently support */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 146c167..b3fe730 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -30,6 +30,7 @@
> > #include <asm/kvm.h>
> > #include <asm/kvm_asm.h>
> > #include <asm/kvm_mmio.h>
> > +#include <asm/thread_info.h>
> >
> > #define __KVM_HAVE_ARCH_INTC_INITIALIZED
> >
> > @@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
> >
> > /* Pointer to host CPU context */
> > kvm_cpu_context_t *host_cpu_context;
> > +
> > + struct thread_info *host_thread_info; /* hyp VA */
> > + struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
> > +
> > struct {
> > /* {Break,watch}point registers */
> > struct kvm_guest_debug_arch regs;
> > @@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
> >
> > /* vcpu_arch flags field values: */
> > #define KVM_ARM64_DEBUG_DIRTY (1 << 0)
> > +#define KVM_ARM64_FP_ENABLED (1 << 1) /* guest FP regs loaded */
> > +#define KVM_ARM64_FP_HOST (1 << 2) /* host FP regs loaded
> > */
>
> I may be descending into bike-shedding territory here but it seems a
> little incongruous to have _ENABLED = guest FP state when we have _HOST
> for host FP state. Why not KVM_ARM64_FP_GUEST?
I thought about this, but wanted to retain the clear relationship
between the _ENABLED flag and the state of the FPSIMD trap controls.
The HOST flag has no direct relationship with trap controls, so these
seemed different enough things to justify different names, though the
inconsistency was a bit annoying.
[...]
> Minor bike-shedding aside:
>
> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
Thanks. I'll probably leave it as is, but shout if you're unhappy with
this.
Cheers
---Dave
^ permalink raw reply
* [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
From: Alex Bennée @ 2018-05-24 10:20 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527005119-6842-14-git-send-email-Dave.Martin@arm.com>
Dave Martin <Dave.Martin@arm.com> writes:
> In order to make sve_save_state()/sve_load_state() more easily
> reusable and to get rid of a potential branch on context switch
> critical paths, this patch makes sve_pffr() inline and moves it to
> fpsimd.h.
>
> <asm/processor.h> must be included in fpsimd.h in order to make
> this work, and this creates an #include cycle that is tricky to
> avoid without modifying core code, due to the way the PR_SVE_*()
> prctl helpers are included in the core prctl implementation.
>
> Instead of breaking the cycle, this patch defers inclusion of
> <asm/fpsimd.h> in <asm/processor.h> until the point where it is
> actually needed: i.e., immediately before the prctl definitions.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
> arch/arm64/include/asm/fpsimd.h | 13 +++++++++++++
> arch/arm64/include/asm/processor.h | 3 ++-
> arch/arm64/kernel/fpsimd.c | 12 ------------
> 3 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index fb60b22..fa92747 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -18,6 +18,8 @@
>
> #include <asm/ptrace.h>
> #include <asm/errno.h>
> +#include <asm/processor.h>
> +#include <asm/sigcontext.h>
>
> #ifndef __ASSEMBLY__
>
> @@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
> /* Maximum VL that SVE VL-agnostic software can transparently support */
> #define SVE_VL_ARCH_MAX 0x100
>
> +/* Offset of FFR in the SVE register dump */
> +static inline size_t sve_ffr_offset(int vl)
> +{
> + return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
> +}
> +
> +static inline void *sve_pffr(struct thread_struct *thread)
> +{
> + return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
> +}
> +
> extern void sve_save_state(void *state, u32 *pfpsr);
> extern void sve_load_state(void const *state, u32 const *pfpsr,
> unsigned long vq_minus_1);
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index f902b6d..ebaadb1 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -40,7 +40,6 @@
>
> #include <asm/alternative.h>
> #include <asm/cpufeature.h>
> -#include <asm/fpsimd.h>
> #include <asm/hw_breakpoint.h>
> #include <asm/lse.h>
> #include <asm/pgtable-hwdef.h>
> @@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
> void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
> void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
>
> +#include <asm/fpsimd.h>
> +
You really need a one-liner comment to note why the include is in a
funny place to save someone just moving it back and then getting really
confused. Maybe:
/* included just in time to avoid circular inclusion issues */
#include <asm/fpsimd.h>
It still seems weird to me though :-/
Otherwise:
Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
--
Alex Benn?e
^ permalink raw reply
* [PATCHv2] arm64: Make sure permission updates happen for pmd/pud
From: Will Deacon @ 2018-05-24 10:27 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20180523184346.487-1-labbott@redhat.com>
Hi Laura,
On Wed, May 23, 2018 at 11:43:46AM -0700, Laura Abbott wrote:
> Commit 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings")
> disallowed block mappings for ioremap since that code does not honor
> break-before-make. The same APIs are also used for permission updating
> though and the extra checks prevent the permission updates from happening,
> even though this should be permitted. This results in read-only permissions
> not being fully applied. Visibly, this can occasionaly be seen as a failure
> on the built in rodata test when the test data ends up in a section or
> as an odd RW gap on the page table dump. Fix this by using
> pgattr_change_is_safe instead of p*d_present for determining if the
> change is permitted.
>
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Fixes: 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings")
> Signed-off-by: Laura Abbott <labbott@redhat.com>
> ---
> v2: Switch to using pgattr_change_is_safe per suggestion of Will
> ---
Thanks for re-spinning so quickly. I'll queue as a fix with the relevant
tags.
Will
^ permalink raw reply
* [PATCH v3 0/3] Renesas R9A06G032 SMP enabler
From: Michel Pollet @ 2018-05-24 10:30 UTC (permalink / raw)
To: linux-arm-kernel
*WARNING -- this requires the base R9A06G032 support patches
already posted
This patch series is for enabling the second CA7 of the R9A06G032.
It's based on a spin_table method, and it reuses the same binding
property as that driver.
v3:
+ Removed mentions of rz/?n1d?
+ Rebased on base patch v7
v2:
+ Added suggestions from Florian Fainelli
+ Use __pa_symbol()
+ Simplified logic in prepare_cpu()
+ Reordered the patches
+ Rebased on RZN1 Base patch v5
Michel Pollet (3):
dt-bindings: cpu: Add Renesas R9A06G032 SMP enable method.
arm: shmobile: Add the R9A06G032 SMP enabler driver
ARM: dts: Renesas R9A06G032 SMP enable method
Documentation/devicetree/bindings/arm/cpus.txt | 1 +
arch/arm/boot/dts/r9a06g032.dtsi | 2 +
arch/arm/mach-shmobile/Makefile | 1 +
arch/arm/mach-shmobile/smp-r9a06g032.c | 85 ++++++++++++++++++++++++++
4 files changed, 89 insertions(+)
create mode 100644 arch/arm/mach-shmobile/smp-r9a06g032.c
--
2.7.4
^ permalink raw reply
* [PATCH v3 1/3] dt-bindings: cpu: Add Renesas R9A06G032 SMP enable method.
From: Michel Pollet @ 2018-05-24 10:30 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527157834-7747-1-git-send-email-michel.pollet@bp.renesas.com>
Add a special enable method for second CA7 of the R9A06G032
Signed-off-by: Michel Pollet <michel.pollet@bp.renesas.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
Documentation/devicetree/bindings/arm/cpus.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index 29e1dc5..b395d107 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -219,6 +219,7 @@ described below.
"qcom,kpss-acc-v1"
"qcom,kpss-acc-v2"
"renesas,apmu"
+ "renesas,r9a06g032-smp"
"rockchip,rk3036-smp"
"rockchip,rk3066-smp"
"ste,dbx500-smp"
--
2.7.4
^ permalink raw reply related
* [PATCH v3 2/3] arm: shmobile: Add the R9A06G032 SMP enabler driver
From: Michel Pollet @ 2018-05-24 10:30 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527157834-7747-1-git-send-email-michel.pollet@bp.renesas.com>
The Renesas R9A06G032 second CA7 is parked in a ROM pen at boot time, it
requires a special enable method to get it started.
Signed-off-by: Michel Pollet <michel.pollet@bp.renesas.com>
---
arch/arm/mach-shmobile/Makefile | 1 +
arch/arm/mach-shmobile/smp-r9a06g032.c | 85 ++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+)
create mode 100644 arch/arm/mach-shmobile/smp-r9a06g032.c
diff --git a/arch/arm/mach-shmobile/Makefile b/arch/arm/mach-shmobile/Makefile
index 1939f52..d7fc98f 100644
--- a/arch/arm/mach-shmobile/Makefile
+++ b/arch/arm/mach-shmobile/Makefile
@@ -34,6 +34,7 @@ smp-$(CONFIG_ARCH_SH73A0) += smp-sh73a0.o headsmp-scu.o platsmp-scu.o
smp-$(CONFIG_ARCH_R8A7779) += smp-r8a7779.o headsmp-scu.o platsmp-scu.o
smp-$(CONFIG_ARCH_R8A7790) += smp-r8a7790.o
smp-$(CONFIG_ARCH_R8A7791) += smp-r8a7791.o
+smp-$(CONFIG_ARCH_R9A06G032) += smp-r9a06g032.o
smp-$(CONFIG_ARCH_EMEV2) += smp-emev2.o headsmp-scu.o platsmp-scu.o
# PM objects
diff --git a/arch/arm/mach-shmobile/smp-r9a06g032.c b/arch/arm/mach-shmobile/smp-r9a06g032.c
new file mode 100644
index 0000000..a536e89
--- /dev/null
+++ b/arch/arm/mach-shmobile/smp-r9a06g032.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * RZ/N1D Second CA7 enabler.
+ *
+ * Copyright (C) 2018 Renesas Electronics Europe Limited
+ *
+ * Michel Pollet <michel.pollet@bp.renesas.com>, <buserror@gmail.com>
+ * Derived from action,s500-smp
+ */
+
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/smp.h>
+#include <asm/cacheflush.h>
+#include <asm/smp_plat.h>
+#include <asm/smp_scu.h>
+
+/*
+ * The second CPU is parked in ROM at boot time. It requires waking it after
+ * writing an address into the BOOTADDR register of sysctrl.
+ *
+ * So the default value of the "cpu-release-addr" corresponds to BOOTADDR...
+ *
+ * *However* the BOOTADDR register is not available when the kernel
+ * starts in NONSEC mode.
+ *
+ * So for NONSEC mode, the bootloader re-parks the second CPU into a pen
+ * in SRAM, and changes the "cpu-release-addr" of linux's DT to a SRAM address,
+ * which is not restricted.
+ */
+
+static void __iomem *cpu_bootaddr;
+
+static DEFINE_SPINLOCK(cpu_lock);
+
+static int rzn1_smp_boot_secondary(unsigned int cpu, struct task_struct *idle)
+{
+ if (!cpu_bootaddr)
+ return -ENODEV;
+
+ spin_lock(&cpu_lock);
+
+ writel(__pa_symbol(secondary_startup), cpu_bootaddr);
+ arch_send_wakeup_ipi_mask(cpumask_of(cpu));
+
+ spin_unlock(&cpu_lock);
+
+ return 0;
+}
+
+static void __init rzn1_smp_prepare_cpus(unsigned int max_cpus)
+{
+ struct device_node *dn;
+ int ret;
+ u32 bootaddr;
+
+ dn = of_get_cpu_node(1, NULL);
+ if (!dn) {
+ pr_err("CPU#1: missing device tree node\n");
+ return;
+ }
+ /*
+ * Determine the address from which the CPU is polling.
+ * The bootloader *does* change this property
+ */
+ ret = of_property_read_u32(dn, "cpu-release-addr", &bootaddr);
+ of_node_put(dn);
+ if (ret) {
+ pr_err("CPU#1: invalid cpu-release-addr property\n");
+ return;
+ }
+ pr_info("CPU#1: cpu-release-addr %08x\n", (u32)bootaddr);
+
+ cpu_bootaddr = ioremap(bootaddr, sizeof(bootaddr));
+ if (!cpu_bootaddr)
+ pr_err("CPU#1: cpu-release-addr map failed\n");
+}
+
+static const struct smp_operations rzn1_smp_ops __initconst = {
+ .smp_prepare_cpus = rzn1_smp_prepare_cpus,
+ .smp_boot_secondary = rzn1_smp_boot_secondary,
+};
+CPU_METHOD_OF_DECLARE(rzn1_smp, "renesas,r9a06g032-smp", &rzn1_smp_ops);
--
2.7.4
^ permalink raw reply related
* [PATCH v3 3/3] ARM: dts: Renesas R9A06G032 SMP enable method
From: Michel Pollet @ 2018-05-24 10:39 UTC (permalink / raw)
To: linux-arm-kernel
Add a special enable method for the second CA7 of the R9A06G032
as well as the default value for the "cpu-release-addr" property.
Signed-off-by: Michel Pollet <michel.pollet@bp.renesas.com>
---
arch/arm/boot/dts/r9a06g032.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm/boot/dts/r9a06g032.dtsi b/arch/arm/boot/dts/r9a06g032.dtsi
index 9534f1b..d7b5414 100644
--- a/arch/arm/boot/dts/r9a06g032.dtsi
+++ b/arch/arm/boot/dts/r9a06g032.dtsi
@@ -30,6 +30,8 @@
compatible = "arm,cortex-a7";
reg = <1>;
clocks = <&sysctrl R9A06G032_DIV_CA7>;
+ enable-method = "renesas,r9a06g032-smp";
+ cpu-release-addr = <0x4000c204>;
};
};
--
2.7.4
^ permalink raw reply related
* [PATCH v4 1/2] arm64/sve: Thin out initialisation sanity-checks for sve_max_vl
From: Will Deacon @ 2018-05-24 10:40 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527097616-25214-2-git-send-email-Dave.Martin@arm.com>
On Wed, May 23, 2018 at 06:46:55PM +0100, Dave Martin wrote:
> Now that the kernel SVE support is reasonably mature, it is
> excessive to default sve_max_vl to the invalid value -1 and then
> sprinkle WARN_ON()s around the place to make sure it has been
> initialised before use. The cpufeatures code already runs pretty
> early, and will ensure sve_max_vl gets initialised.
>
> This patch initialises sve_max_vl to something sane that will be
> supported by every SVE implementation, and removes most of the
> sanity checks.
>
> The checks in find_supported_vector_length() are retained for now.
> If anything goes horribly wrong, we are likely to trip a check here
> sooner or later.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
> arch/arm64/kernel/fpsimd.c | 17 ++++-------------
> arch/arm64/kernel/ptrace.c | 3 ---
> 2 files changed, 4 insertions(+), 16 deletions(-)
Thanks, this makes the code a bit more readable imo:
Acked-by: Will Deacon <will.deacon@arm.com>
Will
^ permalink raw reply
* [PATCH 02/14] arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
From: Mark Rutland @ 2018-05-24 10:52 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <0b03ce12-510e-464b-63de-cd481d7cb1c5@arm.com>
On Wed, May 23, 2018 at 10:23:20AM +0100, Julien Grall wrote:
> Hi Marc,
>
> On 05/22/2018 04:06 PM, Marc Zyngier wrote:
> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> > index ec2ee720e33e..f33e6aed3037 100644
> > --- a/arch/arm64/kernel/entry.S
> > +++ b/arch/arm64/kernel/entry.S
> > @@ -18,6 +18,7 @@
> > * along with this program. If not, see <http://www.gnu.org/licenses/>.
> > */
> > +#include <linux/arm-smccc.h>
> > #include <linux/init.h>
> > #include <linux/linkage.h>
> > @@ -137,6 +138,18 @@ alternative_else_nop_endif
> > add \dst, \dst, #(\sym - .entry.tramp.text)
> > .endm
> > + // This macro corrupts x0-x3. It is the caller's duty
> > + // to save/restore them if required.
>
> NIT: Shouldn't you use /* ... */ for multi-line comments?
There's no requirement to do so, and IIRC even Torvalds prefers '//'
comments for multi-line things these days.
Mark.
^ permalink raw reply
* [PATCH 01/14] arm/arm64: smccc: Add SMCCC-specific return codes
From: Mark Rutland @ 2018-05-24 10:55 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20180522150648.28297-2-marc.zyngier@arm.com>
On Tue, May 22, 2018 at 04:06:35PM +0100, Marc Zyngier wrote:
> We've so far used the PSCI return codes for SMCCC because they
> were extremely similar. But with the new ARM DEN 0070A specification,
> "NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS".
>
> Let's bite the bullet and add SMCCC specific return codes. Users
> can be repainted as and when required.
>
> Acked-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Mark.
> ---
> include/linux/arm-smccc.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index a031897fca76..c89da86de99f 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -291,5 +291,10 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
> */
> #define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
>
> +/* Return codes defined in ARM DEN 0070A */
> +#define SMCCC_RET_SUCCESS 0
> +#define SMCCC_RET_NOT_SUPPORTED -1
> +#define SMCCC_RET_NOT_REQUIRED -2
> +
> #endif /*__ASSEMBLY__*/
> #endif /*__LINUX_ARM_SMCCC_H*/
> --
> 2.14.2
>
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply
* [PATCH 0/9] Rewrite asm-generic/bitops/{atomic, lock}.h and use on arm64
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
This patch series has previously been posted in RFC form here:
RFCv1: https://www.spinics.net/lists/arm-kernel/msg634719.html
RFCv2: https://www.spinics.net/lists/arm-kernel/msg636875.html
Changes since RFCv2 include:
* Rebased onto v4.17-rc4, which allowed me to drop some patches from
the series which were merged in 4.16.
* Moved bit.h to be linux/bit.h instead of asm-generic/bit.h
Thanks,
Will
--->8
Will Deacon (9):
h8300: Don't include linux/kernel.h in asm/atomic.h
m68k: Don't use asm-generic/bitops/lock.h
asm-generic: Move some macros from linux/bitops.h to a new bits.h file
openrisc: Don't pull in all of linux/bitops.h in asm/cmpxchg.h
sh: Don't pull in all of linux/bitops.h in asm/cmpxchg-xchg.h
asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*
arm64: Replace our atomic/lock bitop implementations with asm-generic
arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h>
arch/arm64/include/asm/bitops.h | 21 +---
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/bitops.S | 76 ---------------
arch/h8300/include/asm/atomic.h | 4 +-
arch/m68k/include/asm/bitops.h | 6 +-
arch/openrisc/include/asm/cmpxchg.h | 3 +-
arch/sh/include/asm/cmpxchg-xchg.h | 3 +-
include/asm-generic/bitops/atomic.h | 188 +++++++-----------------------------
include/asm-generic/bitops/lock.h | 68 ++++++++++---
include/linux/bitops.h | 22 +----
include/linux/bits.h | 26 +++++
11 files changed, 131 insertions(+), 288 deletions(-)
delete mode 100644 arch/arm64/lib/bitops.S
create mode 100644 include/linux/bits.h
--
2.1.4
^ permalink raw reply
* [PATCH 1/9] h8300: Don't include linux/kernel.h in asm/atomic.h
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
linux/kernel.h isn't needed by asm/atomic.h and will result in circular
dependencies when the asm-generic atomic bitops are built around the
tomic_long_t interface.
Remove the broad include and replace it with linux/compiler.h for
READ_ONCE etc and asm/irqflags.h for arch_local_irq_save etc.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/h8300/include/asm/atomic.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/h8300/include/asm/atomic.h b/arch/h8300/include/asm/atomic.h
index 941e7554e886..b174dec099bf 100644
--- a/arch/h8300/include/asm/atomic.h
+++ b/arch/h8300/include/asm/atomic.h
@@ -2,8 +2,10 @@
#ifndef __ARCH_H8300_ATOMIC__
#define __ARCH_H8300_ATOMIC__
+#include <linux/compiler.h>
#include <linux/types.h>
#include <asm/cmpxchg.h>
+#include <asm/irqflags.h>
/*
* Atomic operations that C can't guarantee us. Useful for
@@ -15,8 +17,6 @@
#define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-#include <linux/kernel.h>
-
#define ATOMIC_OP_RETURN(op, c_op) \
static inline int atomic_##op##_return(int i, atomic_t *v) \
{ \
--
2.1.4
^ permalink raw reply related
* [PATCH 2/9] m68k: Don't use asm-generic/bitops/lock.h
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
asm-generic/bitops/lock.h is shortly going to be built on top of the
atomic_long_* API, which introduces a nasty circular dependency for
m68k where linux/atomic.h pulls in linux/bitops.h via:
linux/atomic.h
asm/atomic.h
linux/irqflags.h
asm/irqflags.h
linux/preempt.h
asm/preempt.h
asm-generic/preempt.h
linux/thread_info.h
asm/thread_info.h
asm/page.h
asm-generic/getorder.h
linux/log2.h
linux/bitops.h
Since m68k isn't SMP and doesn't support ACQUIRE/RELEASE barriers, we
can just define the lock bitops in terms of the atomic bitops in the
asm/bitops.h header.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/m68k/include/asm/bitops.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 93b47b1f6fb4..18193419f97d 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -515,12 +515,16 @@ static inline int __fls(int x)
#endif
+/* Simple test-and-set bit locks */
+#define test_and_set_bit_lock test_and_set_bit
+#define clear_bit_unlock clear_bit
+#define __clear_bit_unlock clear_bit_unlock
+
#include <asm-generic/bitops/ext2-atomic.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>
#endif /* __KERNEL__ */
#endif /* _M68K_BITOPS_H */
--
2.1.4
^ permalink raw reply related
* [PATCH 3/9] asm-generic: Move some macros from linux/bitops.h to a new bits.h file
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
In preparation for implementing the asm-generic atomic bitops in terms
of atomic_long_*, we need to prevent asm/atomic.h implementations from
pulling in linux/bitops.h. A common reason for this include is for the
BITS_PER_BYTE definition, so move this and some other BIT and masking
macros into a new header file, linux/bits.h
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/linux/bitops.h | 22 +---------------------
include/linux/bits.h | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+), 21 deletions(-)
create mode 100644 include/linux/bits.h
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 4cac4e1a72ff..af419012d77d 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -2,29 +2,9 @@
#ifndef _LINUX_BITOPS_H
#define _LINUX_BITOPS_H
#include <asm/types.h>
+#include <linux/bits.h>
-#ifdef __KERNEL__
-#define BIT(nr) (1UL << (nr))
-#define BIT_ULL(nr) (1ULL << (nr))
-#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
-#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
-#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
-#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
-#define BITS_PER_BYTE 8
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
-#endif
-
-/*
- * Create a contiguous bitmask starting@bit position @l and ending at
- * position @h. For example
- * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- */
-#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
-
-#define GENMASK_ULL(h, l) \
- (((~0ULL) - (1ULL << (l)) + 1) & \
- (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
extern unsigned int __sw_hweight8(unsigned int w);
extern unsigned int __sw_hweight16(unsigned int w);
diff --git a/include/linux/bits.h b/include/linux/bits.h
new file mode 100644
index 000000000000..2b7b532c1d51
--- /dev/null
+++ b/include/linux/bits.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_BITS_H
+#define __LINUX_BITS_H
+#include <asm/bitsperlong.h>
+
+#define BIT(nr) (1UL << (nr))
+#define BIT_ULL(nr) (1ULL << (nr))
+#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
+#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
+#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
+#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
+#define BITS_PER_BYTE 8
+
+/*
+ * Create a contiguous bitmask starting@bit position @l and ending at
+ * position @h. For example
+ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
+ */
+#define GENMASK(h, l) \
+ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define GENMASK_ULL(h, l) \
+ (((~0ULL) - (1ULL << (l)) + 1) & \
+ (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
+
+#endif /* __LINUX_BITS_H */
--
2.1.4
^ permalink raw reply related
* [PATCH 4/9] openrisc: Don't pull in all of linux/bitops.h in asm/cmpxchg.h
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
The openrisc implementation of asm/cmpxchg.h pulls in linux/bitops.h
so that it can refer to BITS_PER_BYTE. It also transitively relies on
this pulling in linux/compiler.h for READ_ONCE.
Replace the #include with linux/bits.h and linux/compiler.h
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/openrisc/include/asm/cmpxchg.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h
index d29f7db53906..f9cd43a39d72 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -16,8 +16,9 @@
#ifndef __ASM_OPENRISC_CMPXCHG_H
#define __ASM_OPENRISC_CMPXCHG_H
+#include <linux/bits.h>
+#include <linux/compiler.h>
#include <linux/types.h>
-#include <linux/bitops.h>
#define __HAVE_ARCH_CMPXCHG 1
--
2.1.4
^ permalink raw reply related
* [PATCH 5/9] sh: Don't pull in all of linux/bitops.h in asm/cmpxchg-xchg.h
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
The sh implementation of asm/cmpxchg-xchg.h pulls in linux/bitops.h
so that it can refer to BITS_PER_BYTE. It also transitively relies on
this pulling in linux/compiler.h for READ_ONCE.
Replace the #include with linux/bits.h and linux/compiler.h
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/sh/include/asm/cmpxchg-xchg.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/arch/sh/include/asm/cmpxchg-xchg.h
index 1e881f5db659..593a9704782b 100644
--- a/arch/sh/include/asm/cmpxchg-xchg.h
+++ b/arch/sh/include/asm/cmpxchg-xchg.h
@@ -8,7 +8,8 @@
* This work is licensed under the terms of the GNU GPL, version 2. See the
* file "COPYING" in the main directory of this archive for more details.
*/
-#include <linux/bitops.h>
+#include <linux/bits.h>
+#include <linux/compiler.h>
#include <asm/byteorder.h>
/*
--
2.1.4
^ permalink raw reply related
* [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
The atomic bitops can actually be implemented pretty efficiently using
the atomic_fetch_* ops, rather than explicit use of spinlocks.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/bitops/atomic.h | 188 +++++++-----------------------------
1 file changed, 33 insertions(+), 155 deletions(-)
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index 04deffaf5f7d..bca92586c2f6 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -2,189 +2,67 @@
#ifndef _ASM_GENERIC_BITOPS_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_ATOMIC_H_
-#include <asm/types.h>
-#include <linux/irqflags.h>
-
-#ifdef CONFIG_SMP
-#include <asm/spinlock.h>
-#include <asm/cache.h> /* we use L1_CACHE_BYTES */
-
-/* Use an array of spinlocks for our atomic_ts.
- * Hash function to index into a different SPINLOCK.
- * Since "a" is usually an address, use one spinlock per cacheline.
- */
-# define ATOMIC_HASH_SIZE 4
-# define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
-
-extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
-
-/* Can't use raw_spin_lock_irq because of #include problems, so
- * this is the substitute */
-#define _atomic_spin_lock_irqsave(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- local_irq_save(f); \
- arch_spin_lock(s); \
-} while(0)
-
-#define _atomic_spin_unlock_irqrestore(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- arch_spin_unlock(s); \
- local_irq_restore(f); \
-} while(0)
-
-
-#else
-# define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0)
-# define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0)
-#endif
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>
/*
- * NMI events can occur at any time, including when interrupts have been
- * disabled by *_irqsave(). So you can get NMI events occurring while a
- * *_bit function is holding a spin lock. If the NMI handler also wants
- * to do bit manipulation (and they do) then you can get a deadlock
- * between the original caller of *_bit() and the NMI handler.
- *
- * by Keith Owens
+ * Implementation of atomic bitops using atomic-fetch ops.
+ * See Documentation/atomic_bitops.txt for details.
*/
-/**
- * set_bit - Atomically set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * This function is atomic and may not be reordered. See __set_bit()
- * if you do not require the atomic guarantees.
- *
- * Note: there are no guarantees that this function will not be reordered
- * on non x86 architectures, so if you are writing portable code,
- * make sure not to rely on its reordering guarantees.
- *
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void set_bit(int nr, volatile unsigned long *addr)
+static inline void set_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p |= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
}
-/**
- * clear_bit - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered. However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
- * in order to ensure changes are visible on other processors.
- */
-static inline void clear_bit(int nr, volatile unsigned long *addr)
+static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p &= ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
}
-/**
- * change_bit - Toggle a bit in memory
- * @nr: Bit to change
- * @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered. It may be
- * reordered on other architectures than x86.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void change_bit(int nr, volatile unsigned long *addr)
+static inline void change_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p ^= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
}
-/**
- * test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It may be reordered on other architectures than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old | mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
- return (old & mask) != 0;
+ old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It can be reorderdered on other architectures other than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old & ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (!(READ_ONCE(*p) & mask))
+ return 0;
- return (old & mask) != 0;
+ old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * test_and_change_bit - Change a bit and return its old value
- * @nr: Bit to change
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
- */
-static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_change_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old ^ mask;
- _atomic_spin_unlock_irqrestore(p, flags);
- return (old & mask) != 0;
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_xor(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
#endif /* _ASM_GENERIC_BITOPS_ATOMIC_H */
--
2.1.4
^ permalink raw reply related
* [PATCH 7/9] asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
The lock bitops can be implemented more efficiently using the atomic_fetch_*
ops, which provide finer-grained control over the memory ordering semantics
than the bitops.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/bitops/lock.h | 68 ++++++++++++++++++++++++++++++++-------
1 file changed, 56 insertions(+), 12 deletions(-)
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 67ab280ad134..3ae021368f48 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -2,6 +2,10 @@
#ifndef _ASM_GENERIC_BITOPS_LOCK_H_
#define _ASM_GENERIC_BITOPS_LOCK_H_
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>
+
/**
* test_and_set_bit_lock - Set a bit and return its old value, for lock
* @nr: Bit to set
@@ -11,7 +15,20 @@
* the returned value is 0.
* It can be used to implement bit locks.
*/
-#define test_and_set_bit_lock(nr, addr) test_and_set_bit(nr, addr)
+static inline int test_and_set_bit_lock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ long old;
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
+
+ old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
+ return !!(old & mask);
+}
+
/**
* clear_bit_unlock - Clear a bit in memory, for unlock
@@ -20,11 +37,11 @@
*
* This operation is atomic and provides release barrier semantics.
*/
-#define clear_bit_unlock(nr, addr) \
-do { \
- smp_mb__before_atomic(); \
- clear_bit(nr, addr); \
-} while (0)
+static inline void clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
+{
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
+}
/**
* __clear_bit_unlock - Clear a bit in memory, for unlock
@@ -37,11 +54,38 @@ do { \
*
* See for example x86's implementation.
*/
-#define __clear_bit_unlock(nr, addr) \
-do { \
- smp_mb__before_atomic(); \
- clear_bit(nr, addr); \
-} while (0)
+static inline void __clear_bit_unlock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ unsigned long old;
-#endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
+ p += BIT_WORD(nr);
+ old = READ_ONCE(*p);
+ old &= ~BIT_MASK(nr);
+ atomic_long_set_release((atomic_long_t *)p, old);
+}
+
+/**
+ * clear_bit_unlock_is_negative_byte - Clear a bit in memory and test if bottom
+ * byte is negative, for unlock.
+ * @nr: the bit to clear
+ * @addr: the address to start counting from
+ *
+ * This is a bit of a one-trick-pony for the filemap code, which clears
+ * PG_locked and tests PG_waiters,
+ */
+#ifndef clear_bit_unlock_is_negative_byte
+static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
+ volatile unsigned long *p)
+{
+ long old;
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_andnot_release(mask, (atomic_long_t *)p);
+ return !!(old & BIT(7));
+}
+#define clear_bit_unlock_is_negative_byte clear_bit_unlock_is_negative_byte
+#endif
+#endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
--
2.1.4
^ permalink raw reply related
* [PATCH 8/9] arm64: Replace our atomic/lock bitop implementations with asm-generic
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
The asm-generic/bitops/{atomic,lock}.h implementations are built around
the atomic-fetch ops, which we implement efficiently for both LSE and
LL/SC systems. Use that instead of our hand-rolled, out-of-line bitops.S.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/bitops.h | 14 ++------
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/bitops.S | 76 -----------------------------------------
3 files changed, 3 insertions(+), 89 deletions(-)
delete mode 100644 arch/arm64/lib/bitops.S
diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 9c19594ce7cb..13501460be6b 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -17,22 +17,11 @@
#define __ASM_BITOPS_H
#include <linux/compiler.h>
-#include <asm/barrier.h>
#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
#endif
-/*
- * Little endian assembly atomic bitops.
- */
-extern void set_bit(int nr, volatile unsigned long *p);
-extern void clear_bit(int nr, volatile unsigned long *p);
-extern void change_bit(int nr, volatile unsigned long *p);
-extern int test_and_set_bit(int nr, volatile unsigned long *p);
-extern int test_and_clear_bit(int nr, volatile unsigned long *p);
-extern int test_and_change_bit(int nr, volatile unsigned long *p);
-
#include <asm-generic/bitops/builtin-__ffs.h>
#include <asm-generic/bitops/builtin-ffs.h>
#include <asm-generic/bitops/builtin-__fls.h>
@@ -44,8 +33,9 @@ extern int test_and_change_bit(int nr, volatile unsigned long *p);
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/lock.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 137710f4dac3..68755fd70dcf 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-lib-y := bitops.o clear_user.o delay.o copy_from_user.o \
+lib-y := clear_user.o delay.o copy_from_user.o \
copy_to_user.o copy_in_user.o copy_page.o \
clear_page.o memchr.o memcpy.o memmove.o memset.o \
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
deleted file mode 100644
index 43ac736baa5b..000000000000
--- a/arch/arm64/lib/bitops.S
+++ /dev/null
@@ -1,76 +0,0 @@
-/*
- * Based on arch/arm/lib/bitops.h
- *
- * Copyright (C) 2013 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/lse.h>
-
-/*
- * x0: bits 5:0 bit offset
- * bits 31:6 word offset
- * x1: address
- */
- .macro bitop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x3, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x3, [x1]"
-alt_lse " \llsc x2, x2, x3", "nop"
-alt_lse " stxr w0, x2, [x1]", "nop"
-alt_lse " cbnz w0, 1b", "nop"
-
- ret
-ENDPROC(\name )
- .endm
-
- .macro testop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x4, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x4, x2, [x1]"
- lsr x0, x2, x3
-alt_lse " \llsc x2, x2, x4", "nop"
-alt_lse " stlxr w5, x2, [x1]", "nop"
-alt_lse " cbnz w5, 1b", "nop"
-alt_lse " dmb ish", "nop"
-
- and x0, x0, #1
- ret
-ENDPROC(\name )
- .endm
-
-/*
- * Atomic bit operations.
- */
- bitop change_bit, eor, steor
- bitop clear_bit, bic, stclr
- bitop set_bit, orr, stset
-
- testop test_and_change_bit, eor, ldeoral
- testop test_and_clear_bit, bic, ldclral
- testop test_and_set_bit, orr, ldsetal
--
2.1.4
^ permalink raw reply related
* [PATCH 9/9] arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h>
From: Will Deacon @ 2018-05-24 10:59 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1527159586-8578-1-git-send-email-will.deacon@arm.com>
asm-generic/bitops/ext2-atomic-setbit.h provides the ext2 atomic bitop
definitions, so we don't need to define our own.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/bitops.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 13501460be6b..10d536b1af74 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -38,11 +38,6 @@
#include <asm-generic/bitops/lock.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>
-
-/*
- * Ext2 is defined to use little-endian byte ordering.
- */
-#define ext2_set_bit_atomic(lock, nr, p) test_and_set_bit_le(nr, p)
-#define ext2_clear_bit_atomic(lock, nr, p) test_and_clear_bit_le(nr, p)
+#include <asm-generic/bitops/ext2-atomic-setbit.h>
#endif /* __ASM_BITOPS_H */
--
2.1.4
^ permalink raw reply related
* [PATCH 02/14] arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
From: Mark Rutland @ 2018-05-24 11:00 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20180522150648.28297-3-marc.zyngier@arm.com>
On Tue, May 22, 2018 at 04:06:36PM +0100, Marc Zyngier wrote:
> In order for the kernel to protect itself, let's call the SSBD mitigation
> implemented by the higher exception level (either hypervisor or firmware)
> on each transition between userspace and kernel.
>
> We must take the PSCI conduit into account in order to target the
> right exception level, hence the introduction of a runtime patching
> callback.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
> arch/arm64/kernel/cpu_errata.c | 18 ++++++++++++++++++
> arch/arm64/kernel/entry.S | 22 ++++++++++++++++++++++
> include/linux/arm-smccc.h | 5 +++++
> 3 files changed, 45 insertions(+)
>
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index a900befadfe8..46b3aafb631a 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -232,6 +232,24 @@ enable_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry)
> }
> #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
>
> +#ifdef CONFIG_ARM64_SSBD
> +void __init arm64_update_smccc_conduit(struct alt_instr *alt,
> + __le32 *origptr, __le32 *updptr,
> + int nr_inst)
> +{
> + u32 insn;
> +
> + BUG_ON(nr_inst != 1);
> +
> + if (psci_ops.conduit == PSCI_CONDUIT_HVC)
> + insn = aarch64_insn_get_hvc_value();
> + else
> + insn = aarch64_insn_get_smc_value();
Shouldn't this also handle the case where there is no conduit?
See below comment in apply_ssbd for rationale.
> +
> + *updptr = cpu_to_le32(insn);
> +}
> +#endif /* CONFIG_ARM64_SSBD */
> +
> #define CAP_MIDR_RANGE(model, v_min, r_min, v_max, r_max) \
> .matches = is_affected_midr_range, \
> .midr_range = MIDR_RANGE(model, v_min, r_min, v_max, r_max)
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..f33e6aed3037 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -18,6 +18,7 @@
> * along with this program. If not, see <http://www.gnu.org/licenses/>.
> */
>
> +#include <linux/arm-smccc.h>
> #include <linux/init.h>
> #include <linux/linkage.h>
>
> @@ -137,6 +138,18 @@ alternative_else_nop_endif
> add \dst, \dst, #(\sym - .entry.tramp.text)
> .endm
>
> + // This macro corrupts x0-x3. It is the caller's duty
> + // to save/restore them if required.
> + .macro apply_ssbd, state
> +#ifdef CONFIG_ARM64_SSBD
> + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2
> + mov w1, #\state
> +alternative_cb arm64_update_smccc_conduit
> + nop // Patched to SMC/HVC #0
> +alternative_cb_end
> +#endif
> + .endm
If my system doesn't have SMCCC1.1, or the FW doesn't have an
implementation of ARCH_WORKAROUND_2, does this stay as a NOP?
It looks like this would be patched to an SMC, which would be fatal on
systems without EL3 FW.
> +
> .macro kernel_entry, el, regsize = 64
> .if \regsize == 32
> mov w0, w0 // zero upper 32 bits of x0
> @@ -163,6 +176,13 @@ alternative_else_nop_endif
> ldr x19, [tsk, #TSK_TI_FLAGS] // since we can unmask debug
> disable_step_tsk x19, x20 // exceptions when scheduling.
>
> + apply_ssbd 1
... and thus kernel_entry would be fatal.
Thanks,
Mark.
^ permalink raw reply
* Nokia N900: refcount_t underflow, use after free
From: Pavel Machek @ 2018-05-24 11:05 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <5aa1e2de-96d1-0e88-01ca-49df9ae25457@ti.com>
Hi!
> >>
> >> OK, see if the following fixes the issue for you, only build tested.
> >
> > Word-wrapped, so I applied by hand. And yes, the oops at boot is
> > gone. Thanks!
>
> Sorry about that, have to check my mail settings. Anyway will post the
> patch again, glad that it fixed your issue.
Any news here? AFAICT the bug creeped into v4.17-rcX in the
meantime...
Pavel
> regards
> Suman
>
> >
> > (Camera still does not work in -next... kills system. Oh well. Lets
> > debug that some other day.)
> >
> >> 8< ---------------------
> >> >From bac9a48fb646dc51f2030d676a0dbe3298c3b134 Mon Sep 17 00:00:00 2001
> >> From: Suman Anna <s-anna@ti.com>
> >> Date: Fri, 9 Mar 2018 16:39:59 -0600
> >> Subject: [PATCH] media: omap3isp: fix unbalanced dma_iommu_mapping
> >>
> >> The OMAP3 ISP driver manages its MMU mappings through the IOMMU-aware
> >> ARM DMA backend. The current code creates a dma_iommu_mapping and
> >> attaches this to the ISP device, but never detaches the mapping in
> >> either the probe failure paths or the driver remove path resulting
> >> in an unbalanced mapping refcount and a memory leak. Fix this properly.
> >>
> >> Reported-by: Pavel Machek <pavel@ucw.cz>
> >> Signed-off-by: Suman Anna <s-anna@ti.com>
> >
> > Tested-by: Pavel Machek <pavel@ucw.cz>
> > Pavel
> >
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180524/38b64362/attachment-0001.sig>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox