* [PATCH v4 0/2] riscv: Optimize bitops with Zbb extension @ 2023-10-30 6:39 Xiao Wang 2023-10-30 6:39 ` [PATCH v4 1/2] riscv: Rearrange hwcap.h and cpufeature.h Xiao Wang 2023-10-30 6:39 ` [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension Xiao Wang 0 siblings, 2 replies; 7+ messages in thread From: Xiao Wang @ 2023-10-30 6:39 UTC (permalink / raw) To: paul.walmsley, palmer, aou, ardb Cc: anup, haicheng.li, ajones, yujie.liu, charlie, linux-riscv, linux-efi, linux-kernel, Xiao Wang Bitops optimization with specialized instructions is common practice in popular ISAs, this patch set uses RISC-V Zbb extension to optimize four bitops: __ffs, __fls, ffs and fls. The first patch rearranges the content in hwcap.h and cpufeature.h, it helps to avoid a cyclic header including issue for patch 2. The second patch leverages the alternative mechanism to dynamically apply this optimization. Thanks, Xiao v4: - Simplify the asm code in ffs() and fls() by moving general logic into C implementation. (Charlie) - Add a comment to decorating the large #ifdef block. (Charlie) - Link to v3: https://lore.kernel.org/all/20230926094655.3102758-1-xiao.w.wang@intel.com/ v3: - Fix riscv32 build issue reported by kernel test robot. V3 changes "hwcap.h" to "cpufeature.h" for files where cpu feature detection APIs are used. (Yujie) - Link to v2: https://lore.kernel.org/all/20230920074653.2509631-1-xiao.w.wang@intel.com/ v2: - Remove the "EFI_" prefix from macro name "EFI_NO_ALTERNATIVE" to make it generic. (Ard) - patch-1 is added, it's based on "RISC-V: Enable cbo.zero in usermode". (Andrew) - Link to v1: https://lore.kernel.org/all/20230806024715.3061589-1-xiao.w.wang@intel.com/ Xiao Wang (2): riscv: Rearrange hwcap.h and cpufeature.h riscv: Optimize bitops with Zbb extension arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- arch/riscv/include/asm/cpufeature.h | 83 +++++++++ arch/riscv/include/asm/elf.h | 2 +- arch/riscv/include/asm/hwcap.h | 91 --------- arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/include/asm/switch_to.h | 2 +- arch/riscv/include/asm/vector.h | 2 +- arch/riscv/kvm/aia.c | 2 +- arch/riscv/kvm/main.c | 2 +- arch/riscv/kvm/tlb.c | 2 +- arch/riscv/kvm/vcpu_fp.c | 2 +- arch/riscv/kvm/vcpu_onereg.c | 2 +- arch/riscv/kvm/vcpu_vector.c | 2 +- drivers/bitopstest/Kconfig | 1 + drivers/clocksource/timer-riscv.c | 2 +- drivers/firmware/efi/libstub/Makefile | 2 +- drivers/perf/riscv_pmu_sbi.c | 2 +- 17 files changed, 349 insertions(+), 106 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] riscv: Rearrange hwcap.h and cpufeature.h 2023-10-30 6:39 [PATCH v4 0/2] riscv: Optimize bitops with Zbb extension Xiao Wang @ 2023-10-30 6:39 ` Xiao Wang 2023-10-30 6:39 ` [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension Xiao Wang 1 sibling, 0 replies; 7+ messages in thread From: Xiao Wang @ 2023-10-30 6:39 UTC (permalink / raw) To: paul.walmsley, palmer, aou, ardb Cc: anup, haicheng.li, ajones, yujie.liu, charlie, linux-riscv, linux-efi, linux-kernel, Xiao Wang Now hwcap.h and cpufeature.h are mutually including each other, and most of the variable/API declarations in hwcap.h are implemented in cpufeature.c, so, it's better to move them into cpufeature.h and leave only macros for ISA extension logical IDs in hwcap.h. BTW, the riscv_isa_extension_mask macro is not used now, so this patch removes it. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> --- arch/riscv/include/asm/cpufeature.h | 83 ++++++++++++++++++++++++++ arch/riscv/include/asm/elf.h | 2 +- arch/riscv/include/asm/hwcap.h | 91 ----------------------------- arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/include/asm/switch_to.h | 2 +- arch/riscv/include/asm/vector.h | 2 +- arch/riscv/kvm/aia.c | 2 +- arch/riscv/kvm/main.c | 2 +- arch/riscv/kvm/tlb.c | 2 +- arch/riscv/kvm/vcpu_fp.c | 2 +- arch/riscv/kvm/vcpu_onereg.c | 2 +- arch/riscv/kvm/vcpu_vector.c | 2 +- drivers/clocksource/timer-riscv.c | 2 +- drivers/perf/riscv_pmu_sbi.c | 2 +- 14 files changed, 95 insertions(+), 102 deletions(-) diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index 13b7d35648a9..3061d33abc2f 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -7,7 +7,10 @@ #define _ASM_CPUFEATURE_H #include <linux/bitmap.h> +#include <linux/jump_label.h> #include <asm/hwcap.h> +#include <asm/alternative-macros.h> +#include <asm/errno.h> /* * These are probed via a device_initcall(), via either the SBI or directly @@ -33,4 +36,84 @@ extern struct riscv_isainfo hart_isa[NR_CPUS]; void check_unaligned_access(int cpu); void riscv_user_isa_enable(void); +unsigned long riscv_get_elf_hwcap(void); + +struct riscv_isa_ext_data { + const unsigned int id; + const char *name; + const char *property; +}; + +extern const struct riscv_isa_ext_data riscv_isa_ext[]; +extern const size_t riscv_isa_ext_count; +extern bool riscv_isa_fallback; + +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap); + +bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit); +#define riscv_isa_extension_available(isa_bitmap, ext) \ + __riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext) + +static __always_inline bool +riscv_has_extension_likely(const unsigned long ext) +{ + compiletime_assert(ext < RISCV_ISA_EXT_MAX, + "ext must be < RISCV_ISA_EXT_MAX"); + + if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) { + asm_volatile_goto( + ALTERNATIVE("j %l[l_no]", "nop", 0, %[ext], 1) + : + : [ext] "i" (ext) + : + : l_no); + } else { + if (!__riscv_isa_extension_available(NULL, ext)) + goto l_no; + } + + return true; +l_no: + return false; +} + +static __always_inline bool +riscv_has_extension_unlikely(const unsigned long ext) +{ + compiletime_assert(ext < RISCV_ISA_EXT_MAX, + "ext must be < RISCV_ISA_EXT_MAX"); + + if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) { + asm_volatile_goto( + ALTERNATIVE("nop", "j %l[l_yes]", 0, %[ext], 1) + : + : [ext] "i" (ext) + : + : l_yes); + } else { + if (__riscv_isa_extension_available(NULL, ext)) + goto l_yes; + } + + return false; +l_yes: + return true; +} + +static __always_inline bool riscv_cpu_has_extension_likely(int cpu, const unsigned long ext) +{ + if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE) && riscv_has_extension_likely(ext)) + return true; + + return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); +} + +static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsigned long ext) +{ + if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE) && riscv_has_extension_unlikely(ext)) + return true; + + return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); +} + #endif diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h index b3b2dfbdf945..06c236bfab53 100644 --- a/arch/riscv/include/asm/elf.h +++ b/arch/riscv/include/asm/elf.h @@ -14,7 +14,7 @@ #include <asm/auxvec.h> #include <asm/byteorder.h> #include <asm/cacheinfo.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> /* * These are used to set parameters in the core dumps. diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 31774bcdf1c6..141b7109c25c 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -8,9 +8,6 @@ #ifndef _ASM_RISCV_HWCAP_H #define _ASM_RISCV_HWCAP_H -#include <asm/alternative-macros.h> -#include <asm/errno.h> -#include <linux/bits.h> #include <uapi/asm/hwcap.h> #define RISCV_ISA_EXT_a ('a' - 'a') @@ -67,92 +64,4 @@ #define RISCV_ISA_EXT_SxAIA RISCV_ISA_EXT_SSAIA #endif -#ifndef __ASSEMBLY__ - -#include <linux/jump_label.h> -#include <asm/cpufeature.h> - -unsigned long riscv_get_elf_hwcap(void); - -struct riscv_isa_ext_data { - const unsigned int id; - const char *name; - const char *property; -}; - -extern const struct riscv_isa_ext_data riscv_isa_ext[]; -extern const size_t riscv_isa_ext_count; -extern bool riscv_isa_fallback; - -unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap); - -#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext) - -bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit); -#define riscv_isa_extension_available(isa_bitmap, ext) \ - __riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext) - -static __always_inline bool -riscv_has_extension_likely(const unsigned long ext) -{ - compiletime_assert(ext < RISCV_ISA_EXT_MAX, - "ext must be < RISCV_ISA_EXT_MAX"); - - if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) { - asm_volatile_goto( - ALTERNATIVE("j %l[l_no]", "nop", 0, %[ext], 1) - : - : [ext] "i" (ext) - : - : l_no); - } else { - if (!__riscv_isa_extension_available(NULL, ext)) - goto l_no; - } - - return true; -l_no: - return false; -} - -static __always_inline bool -riscv_has_extension_unlikely(const unsigned long ext) -{ - compiletime_assert(ext < RISCV_ISA_EXT_MAX, - "ext must be < RISCV_ISA_EXT_MAX"); - - if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) { - asm_volatile_goto( - ALTERNATIVE("nop", "j %l[l_yes]", 0, %[ext], 1) - : - : [ext] "i" (ext) - : - : l_yes); - } else { - if (__riscv_isa_extension_available(NULL, ext)) - goto l_yes; - } - - return false; -l_yes: - return true; -} - -static __always_inline bool riscv_cpu_has_extension_likely(int cpu, const unsigned long ext) -{ - if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE) && riscv_has_extension_likely(ext)) - return true; - - return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); -} - -static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsigned long ext) -{ - if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE) && riscv_has_extension_unlikely(ext)) - return true; - - return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); -} -#endif - #endif /* _ASM_RISCV_HWCAP_H */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index c8e8867c42f6..294044429e8e 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -291,6 +291,7 @@ static inline pte_t pud_pte(pud_t pud) } #ifdef CONFIG_RISCV_ISA_SVNAPOT +#include <asm/cpufeature.h> static __always_inline bool has_svnapot(void) { diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h index a727be723c56..f90d8e42f3c7 100644 --- a/arch/riscv/include/asm/switch_to.h +++ b/arch/riscv/include/asm/switch_to.h @@ -9,7 +9,7 @@ #include <linux/jump_label.h> #include <linux/sched/task_stack.h> #include <asm/vector.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/processor.h> #include <asm/ptrace.h> #include <asm/csr.h> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index c5ee07b3df07..87aaef656257 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -15,7 +15,7 @@ #include <linux/sched.h> #include <linux/sched/task_stack.h> #include <asm/ptrace.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/csr.h> #include <asm/asm.h> diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c index 74bb27440527..a944294f6f23 100644 --- a/arch/riscv/kvm/aia.c +++ b/arch/riscv/kvm/aia.c @@ -14,7 +14,7 @@ #include <linux/kvm_host.h> #include <linux/percpu.h> #include <linux/spinlock.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/kvm_aia_imsic.h> struct aia_hgei_control { diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 48ae0d4b3932..225a435d9c9a 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -11,7 +11,7 @@ #include <linux/module.h> #include <linux/kvm_host.h> #include <asm/csr.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/sbi.h> long kvm_arch_dev_ioctl(struct file *filp, diff --git a/arch/riscv/kvm/tlb.c b/arch/riscv/kvm/tlb.c index 44bc324aeeb0..23c0e82b5103 100644 --- a/arch/riscv/kvm/tlb.c +++ b/arch/riscv/kvm/tlb.c @@ -12,7 +12,7 @@ #include <linux/kvm_host.h> #include <asm/cacheflush.h> #include <asm/csr.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/insn-def.h> #define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) diff --git a/arch/riscv/kvm/vcpu_fp.c b/arch/riscv/kvm/vcpu_fp.c index 08ba48a395aa..030904d82b58 100644 --- a/arch/riscv/kvm/vcpu_fp.c +++ b/arch/riscv/kvm/vcpu_fp.c @@ -11,7 +11,7 @@ #include <linux/err.h> #include <linux/kvm_host.h> #include <linux/uaccess.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #ifdef CONFIG_FPU void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c index 1b7e9fa265cb..b03e0c879dab 100644 --- a/arch/riscv/kvm/vcpu_onereg.c +++ b/arch/riscv/kvm/vcpu_onereg.c @@ -13,7 +13,7 @@ #include <linux/uaccess.h> #include <linux/kvm_host.h> #include <asm/cacheflush.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/kvm_vcpu_vector.h> #include <asm/vector.h> diff --git a/arch/riscv/kvm/vcpu_vector.c b/arch/riscv/kvm/vcpu_vector.c index b430cbb69521..b339a2682f25 100644 --- a/arch/riscv/kvm/vcpu_vector.c +++ b/arch/riscv/kvm/vcpu_vector.c @@ -11,7 +11,7 @@ #include <linux/err.h> #include <linux/kvm_host.h> #include <linux/uaccess.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/kvm_vcpu_vector.h> #include <asm/vector.h> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c index 9c8f3e2decc2..e0333142c18c 100644 --- a/drivers/clocksource/timer-riscv.c +++ b/drivers/clocksource/timer-riscv.c @@ -25,7 +25,7 @@ #include <linux/limits.h> #include <clocksource/timer-riscv.h> #include <asm/smp.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #include <asm/sbi.h> #include <asm/timex.h> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c index 9a51053b1f99..b97d3f1abcb1 100644 --- a/drivers/perf/riscv_pmu_sbi.c +++ b/drivers/perf/riscv_pmu_sbi.c @@ -22,7 +22,7 @@ #include <asm/errata_list.h> #include <asm/sbi.h> -#include <asm/hwcap.h> +#include <asm/cpufeature.h> #define SYSCTL_NO_USER_ACCESS 0 #define SYSCTL_USER_ACCESS 1 -- 2.25.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension 2023-10-30 6:39 [PATCH v4 0/2] riscv: Optimize bitops with Zbb extension Xiao Wang 2023-10-30 6:39 ` [PATCH v4 1/2] riscv: Rearrange hwcap.h and cpufeature.h Xiao Wang @ 2023-10-30 6:39 ` Xiao Wang 2023-10-30 20:36 ` Charlie Jenkins 1 sibling, 1 reply; 7+ messages in thread From: Xiao Wang @ 2023-10-30 6:39 UTC (permalink / raw) To: paul.walmsley, palmer, aou, ardb Cc: anup, haicheng.li, ajones, yujie.liu, charlie, linux-riscv, linux-efi, linux-kernel, Xiao Wang This patch leverages the alternative mechanism to dynamically optimize bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When Zbb ext is not supported by the runtime CPU, legacy implementation is used. If Zbb is supported, then the optimized variants will be selected via alternative patching. The legacy bitops support is taken from the generic C implementation as fallback. If the parameter is a build-time constant, we leverage compiler builtin to calculate the result directly, this approach is inspired by x86 bitops implementation. EFI stub runs before the kernel, so alternative mechanism should not be used there, this patch introduces a macro NO_ALTERNATIVE for this purpose. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> --- arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- drivers/bitopstest/Kconfig | 1 + drivers/firmware/efi/libstub/Makefile | 2 +- 3 files changed, 254 insertions(+), 4 deletions(-) diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h index 3540b690944b..ef35c9ebc2ed 100644 --- a/arch/riscv/include/asm/bitops.h +++ b/arch/riscv/include/asm/bitops.h @@ -15,13 +15,262 @@ #include <asm/barrier.h> #include <asm/bitsperlong.h> +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) #include <asm-generic/bitops/__ffs.h> -#include <asm-generic/bitops/ffz.h> -#include <asm-generic/bitops/fls.h> #include <asm-generic/bitops/__fls.h> +#include <asm-generic/bitops/ffs.h> +#include <asm-generic/bitops/fls.h> + +#else +#include <asm/alternative-macros.h> +#include <asm/hwcap.h> + +#if (BITS_PER_LONG == 64) +#define CTZW "ctzw " +#define CLZW "clzw " +#elif (BITS_PER_LONG == 32) +#define CTZW "ctz " +#define CLZW "clz " +#else +#error "Unexpected BITS_PER_LONG" +#endif + +static __always_inline unsigned long variable__ffs(unsigned long word) +{ + int num; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + "ctz %0, %1\n" + ".option pop\n" + : "=r" (word) : "r" (word) :); + + return word; + +legacy: + num = 0; +#if BITS_PER_LONG == 64 + if ((word & 0xffffffff) == 0) { + num += 32; + word >>= 32; + } +#endif + if ((word & 0xffff) == 0) { + num += 16; + word >>= 16; + } + if ((word & 0xff) == 0) { + num += 8; + word >>= 8; + } + if ((word & 0xf) == 0) { + num += 4; + word >>= 4; + } + if ((word & 0x3) == 0) { + num += 2; + word >>= 2; + } + if ((word & 0x1) == 0) + num += 1; + return num; +} + +/** + * __ffs - find first set bit in a long word + * @word: The word to search + * + * Undefined if no set bit exists, so code should check against 0 first. + */ +#define __ffs(word) \ + (__builtin_constant_p(word) ? \ + (unsigned long)__builtin_ctzl(word) : \ + variable__ffs(word)) + +static __always_inline unsigned long variable__fls(unsigned long word) +{ + int num; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + "clz %0, %1\n" + ".option pop\n" + : "=r" (word) : "r" (word) :); + + return BITS_PER_LONG - 1 - word; + +legacy: + num = BITS_PER_LONG - 1; +#if BITS_PER_LONG == 64 + if (!(word & (~0ul << 32))) { + num -= 32; + word <<= 32; + } +#endif + if (!(word & (~0ul << (BITS_PER_LONG-16)))) { + num -= 16; + word <<= 16; + } + if (!(word & (~0ul << (BITS_PER_LONG-8)))) { + num -= 8; + word <<= 8; + } + if (!(word & (~0ul << (BITS_PER_LONG-4)))) { + num -= 4; + word <<= 4; + } + if (!(word & (~0ul << (BITS_PER_LONG-2)))) { + num -= 2; + word <<= 2; + } + if (!(word & (~0ul << (BITS_PER_LONG-1)))) + num -= 1; + return num; +} + +/** + * __fls - find last set bit in a long word + * @word: the word to search + * + * Undefined if no set bit exists, so code should check against 0 first. + */ +#define __fls(word) \ + (__builtin_constant_p(word) ? \ + (unsigned long)(BITS_PER_LONG - 1 - __builtin_clzl(word)) : \ + variable__fls(word)) + +static __always_inline int variable_ffs(int x) +{ + int r; + + if (!x) + return 0; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + CTZW "%0, %1\n" + ".option pop\n" + : "=r" (r) : "r" (x) :); + + return r + 1; + +legacy: + r = 1; + if (!(x & 0xffff)) { + x >>= 16; + r += 16; + } + if (!(x & 0xff)) { + x >>= 8; + r += 8; + } + if (!(x & 0xf)) { + x >>= 4; + r += 4; + } + if (!(x & 3)) { + x >>= 2; + r += 2; + } + if (!(x & 1)) { + x >>= 1; + r += 1; + } + return r; +} + +/** + * ffs - find first set bit in a word + * @x: the word to search + * + * This is defined the same way as the libc and compiler builtin ffs routines. + * + * ffs(value) returns 0 if value is 0 or the position of the first set bit if + * value is nonzero. The first (least significant) bit is at position 1. + */ +#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x)) + +static __always_inline int variable_fls(unsigned int x) +{ + int r; + + if (!x) + return 0; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + CLZW "%0, %1\n" + ".option pop\n" + : "=r" (r) : "r" (x) :); + + return 32 - r; + +legacy: + r = 32; + if (!(x & 0xffff0000u)) { + x <<= 16; + r -= 16; + } + if (!(x & 0xff000000u)) { + x <<= 8; + r -= 8; + } + if (!(x & 0xf0000000u)) { + x <<= 4; + r -= 4; + } + if (!(x & 0xc0000000u)) { + x <<= 2; + r -= 2; + } + if (!(x & 0x80000000u)) { + x <<= 1; + r -= 1; + } + return r; +} + +/** + * fls - find last set bit in a word + * @x: the word to search + * + * This is defined in a similar way as ffs, but returns the position of the most + * significant set bit. + * + * fls(value) returns 0 if value is 0 or the position of the last set bit if + * value is nonzero. The last (most significant) bit is at position 32. + */ +#define fls(x) \ + (__builtin_constant_p(x) ? \ + (int)(((x) != 0) ? \ + (sizeof(unsigned int) * 8 - __builtin_clz(x)) : 0) : \ + variable_fls(x)) + +#endif /* !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) */ + +#include <asm-generic/bitops/ffz.h> #include <asm-generic/bitops/fls64.h> #include <asm-generic/bitops/sched.h> -#include <asm-generic/bitops/ffs.h> #include <asm-generic/bitops/hweight.h> diff --git a/drivers/bitopstest/Kconfig b/drivers/bitopstest/Kconfig index d0e2af4b801e..6ef6dcd41d49 100644 --- a/drivers/bitopstest/Kconfig +++ b/drivers/bitopstest/Kconfig @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only menuconfig BITOPSTEST tristate "self test for bitops optimization" + default y help Enable this to test the bitops APIs. diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index a1157c2a7170..d68cacd4e3af 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += -DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \ -DEFI_HAVE_MEMCHR -DEFI_HAVE_STRRCHR \ -DEFI_HAVE_STRCMP -fno-builtin -fpic \ $(call cc-option,-mno-single-pic-base) -cflags-$(CONFIG_RISCV) += -fpic +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE cflags-$(CONFIG_LOONGARCH) += -fpie cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += -I$(srctree)/scripts/dtc/libfdt -- 2.25.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension 2023-10-30 6:39 ` [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension Xiao Wang @ 2023-10-30 20:36 ` Charlie Jenkins 2023-10-31 1:53 ` Wang, Xiao W 0 siblings, 1 reply; 7+ messages in thread From: Charlie Jenkins @ 2023-10-30 20:36 UTC (permalink / raw) To: Xiao Wang Cc: paul.walmsley, palmer, aou, ardb, anup, haicheng.li, ajones, yujie.liu, linux-riscv, linux-efi, linux-kernel On Mon, Oct 30, 2023 at 02:39:04PM +0800, Xiao Wang wrote: > This patch leverages the alternative mechanism to dynamically optimize > bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When > Zbb ext is not supported by the runtime CPU, legacy implementation is > used. If Zbb is supported, then the optimized variants will be selected > via alternative patching. > > The legacy bitops support is taken from the generic C implementation as > fallback. > > If the parameter is a build-time constant, we leverage compiler builtin to > calculate the result directly, this approach is inspired by x86 bitops > implementation. > > EFI stub runs before the kernel, so alternative mechanism should not be > used there, this patch introduces a macro NO_ALTERNATIVE for this purpose. > > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> > --- > arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- > drivers/bitopstest/Kconfig | 1 + > drivers/firmware/efi/libstub/Makefile | 2 +- > 3 files changed, 254 insertions(+), 4 deletions(-) > > diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h > index 3540b690944b..ef35c9ebc2ed 100644 > --- a/arch/riscv/include/asm/bitops.h > +++ b/arch/riscv/include/asm/bitops.h > @@ -15,13 +15,262 @@ > #include <asm/barrier.h> > #include <asm/bitsperlong.h> > > +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > #include <asm-generic/bitops/__ffs.h> > -#include <asm-generic/bitops/ffz.h> > -#include <asm-generic/bitops/fls.h> > #include <asm-generic/bitops/__fls.h> > +#include <asm-generic/bitops/ffs.h> > +#include <asm-generic/bitops/fls.h> > + > +#else > +#include <asm/alternative-macros.h> > +#include <asm/hwcap.h> > + > +#if (BITS_PER_LONG == 64) > +#define CTZW "ctzw " > +#define CLZW "clzw " > +#elif (BITS_PER_LONG == 32) > +#define CTZW "ctz " > +#define CLZW "clz " > +#else > +#error "Unexpected BITS_PER_LONG" > +#endif > + > +static __always_inline unsigned long variable__ffs(unsigned long word) > +{ > + int num; > + > + asm_volatile_goto( > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > + : : : : legacy); > + On this and following asm blocks, checkpatch outputs: "Lines should not end with a '('". > + asm volatile ( > + ".option push\n" > + ".option arch,+zbb\n" > + "ctz %0, %1\n" > + ".option pop\n" > + : "=r" (word) : "r" (word) :); > + > + return word; > + > +legacy: > + num = 0; > +#if BITS_PER_LONG == 64 > + if ((word & 0xffffffff) == 0) { > + num += 32; > + word >>= 32; > + } > +#endif > + if ((word & 0xffff) == 0) { > + num += 16; > + word >>= 16; > + } > + if ((word & 0xff) == 0) { > + num += 8; > + word >>= 8; > + } > + if ((word & 0xf) == 0) { > + num += 4; > + word >>= 4; > + } > + if ((word & 0x3) == 0) { > + num += 2; > + word >>= 2; > + } > + if ((word & 0x1) == 0) > + num += 1; > + return num; > +} > + > +/** > + * __ffs - find first set bit in a long word > + * @word: The word to search > + * > + * Undefined if no set bit exists, so code should check against 0 first. > + */ > +#define __ffs(word) \ > + (__builtin_constant_p(word) ? \ > + (unsigned long)__builtin_ctzl(word) : \ > + variable__ffs(word)) > + > +static __always_inline unsigned long variable__fls(unsigned long word) > +{ > + int num; > + > + asm_volatile_goto( > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > + : : : : legacy); > + > + asm volatile ( > + ".option push\n" > + ".option arch,+zbb\n" > + "clz %0, %1\n" > + ".option pop\n" > + : "=r" (word) : "r" (word) :); > + > + return BITS_PER_LONG - 1 - word; > + > +legacy: > + num = BITS_PER_LONG - 1; > +#if BITS_PER_LONG == 64 > + if (!(word & (~0ul << 32))) { > + num -= 32; > + word <<= 32; > + } > +#endif > + if (!(word & (~0ul << (BITS_PER_LONG-16)))) { > + num -= 16; > + word <<= 16; > + } > + if (!(word & (~0ul << (BITS_PER_LONG-8)))) { > + num -= 8; > + word <<= 8; > + } > + if (!(word & (~0ul << (BITS_PER_LONG-4)))) { > + num -= 4; > + word <<= 4; > + } > + if (!(word & (~0ul << (BITS_PER_LONG-2)))) { > + num -= 2; > + word <<= 2; > + } > + if (!(word & (~0ul << (BITS_PER_LONG-1)))) > + num -= 1; > + return num; > +} > + > +/** > + * __fls - find last set bit in a long word > + * @word: the word to search > + * > + * Undefined if no set bit exists, so code should check against 0 first. > + */ > +#define __fls(word) \ > + (__builtin_constant_p(word) ? \ > + (unsigned long)(BITS_PER_LONG - 1 - __builtin_clzl(word)) : \ > + variable__fls(word)) > + > +static __always_inline int variable_ffs(int x) > +{ > + int r; > + > + if (!x) > + return 0; > + > + asm_volatile_goto( > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > + : : : : legacy); > + > + asm volatile ( > + ".option push\n" > + ".option arch,+zbb\n" > + CTZW "%0, %1\n" > + ".option pop\n" > + : "=r" (r) : "r" (x) :); > + > + return r + 1; > + > +legacy: > + r = 1; > + if (!(x & 0xffff)) { > + x >>= 16; > + r += 16; > + } > + if (!(x & 0xff)) { > + x >>= 8; > + r += 8; > + } > + if (!(x & 0xf)) { > + x >>= 4; > + r += 4; > + } > + if (!(x & 3)) { > + x >>= 2; > + r += 2; > + } > + if (!(x & 1)) { > + x >>= 1; > + r += 1; > + } > + return r; > +} > + > +/** > + * ffs - find first set bit in a word > + * @x: the word to search > + * > + * This is defined the same way as the libc and compiler builtin ffs routines. > + * > + * ffs(value) returns 0 if value is 0 or the position of the first set bit if > + * value is nonzero. The first (least significant) bit is at position 1. > + */ > +#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x)) > + > +static __always_inline int variable_fls(unsigned int x) > +{ > + int r; > + > + if (!x) > + return 0; > + > + asm_volatile_goto( > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > + : : : : legacy); > + > + asm volatile ( > + ".option push\n" > + ".option arch,+zbb\n" > + CLZW "%0, %1\n" > + ".option pop\n" > + : "=r" (r) : "r" (x) :); > + > + return 32 - r; > + > +legacy: > + r = 32; > + if (!(x & 0xffff0000u)) { > + x <<= 16; > + r -= 16; > + } > + if (!(x & 0xff000000u)) { > + x <<= 8; > + r -= 8; > + } > + if (!(x & 0xf0000000u)) { > + x <<= 4; > + r -= 4; > + } > + if (!(x & 0xc0000000u)) { > + x <<= 2; > + r -= 2; > + } > + if (!(x & 0x80000000u)) { > + x <<= 1; > + r -= 1; > + } > + return r; > +} > + > +/** > + * fls - find last set bit in a word > + * @x: the word to search > + * > + * This is defined in a similar way as ffs, but returns the position of the most > + * significant set bit. > + * > + * fls(value) returns 0 if value is 0 or the position of the last set bit if > + * value is nonzero. The last (most significant) bit is at position 32. > + */ > +#define fls(x) \ > + (__builtin_constant_p(x) ? \ > + (int)(((x) != 0) ? \ > + (sizeof(unsigned int) * 8 - __builtin_clz(x)) : 0) : \ > + variable_fls(x)) > + Checkpath complains: "Macro argument reuse 'x' - possible side-effects" > +#endif /* !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) */ > + > +#include <asm-generic/bitops/ffz.h> > #include <asm-generic/bitops/fls64.h> > #include <asm-generic/bitops/sched.h> > -#include <asm-generic/bitops/ffs.h> > > #include <asm-generic/bitops/hweight.h> > > diff --git a/drivers/bitopstest/Kconfig b/drivers/bitopstest/Kconfig > index d0e2af4b801e..6ef6dcd41d49 100644 > --- a/drivers/bitopstest/Kconfig > +++ b/drivers/bitopstest/Kconfig > @@ -1,6 +1,7 @@ > # SPDX-License-Identifier: GPL-2.0-only > menuconfig BITOPSTEST > tristate "self test for bitops optimization" > + default y > help > Enable this to test the bitops APIs. Is this a test you wanted to add? The source code isn't included. - Charlie > > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile > index a1157c2a7170..d68cacd4e3af 100644 > --- a/drivers/firmware/efi/libstub/Makefile > +++ b/drivers/firmware/efi/libstub/Makefile > @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += -DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \ > -DEFI_HAVE_MEMCHR -DEFI_HAVE_STRRCHR \ > -DEFI_HAVE_STRCMP -fno-builtin -fpic \ > $(call cc-option,-mno-single-pic-base) > -cflags-$(CONFIG_RISCV) += -fpic > +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE > cflags-$(CONFIG_LOONGARCH) += -fpie > > cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += -I$(srctree)/scripts/dtc/libfdt > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension 2023-10-30 20:36 ` Charlie Jenkins @ 2023-10-31 1:53 ` Wang, Xiao W 2023-10-31 2:02 ` Charlie Jenkins 0 siblings, 1 reply; 7+ messages in thread From: Wang, Xiao W @ 2023-10-31 1:53 UTC (permalink / raw) To: Charlie Jenkins Cc: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, Li, Haicheng, ajones@ventanamicro.com, Liu, Yujie, linux-riscv@lists.infradead.org, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org > -----Original Message----- > From: Charlie Jenkins <charlie@rivosinc.com> > Sent: Tuesday, October 31, 2023 4:37 AM > To: Wang, Xiao W <xiao.w.wang@intel.com> > Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; > aou@eecs.berkeley.edu; ardb@kernel.org; anup@brainfault.org; Li, Haicheng > <haicheng.li@intel.com>; ajones@ventanamicro.com; Liu, Yujie > <yujie.liu@intel.com>; linux-riscv@lists.infradead.org; linux- > efi@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension > > On Mon, Oct 30, 2023 at 02:39:04PM +0800, Xiao Wang wrote: > > This patch leverages the alternative mechanism to dynamically optimize > > bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When > > Zbb ext is not supported by the runtime CPU, legacy implementation is > > used. If Zbb is supported, then the optimized variants will be selected > > via alternative patching. > > > > The legacy bitops support is taken from the generic C implementation as > > fallback. > > > > If the parameter is a build-time constant, we leverage compiler builtin to > > calculate the result directly, this approach is inspired by x86 bitops > > implementation. > > > > EFI stub runs before the kernel, so alternative mechanism should not be > > used there, this patch introduces a macro NO_ALTERNATIVE for this > purpose. > > > > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> > > --- > > arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- > > drivers/bitopstest/Kconfig | 1 + > > drivers/firmware/efi/libstub/Makefile | 2 +- > > 3 files changed, 254 insertions(+), 4 deletions(-) > > > > diff --git a/arch/riscv/include/asm/bitops.h > b/arch/riscv/include/asm/bitops.h > > index 3540b690944b..ef35c9ebc2ed 100644 > > --- a/arch/riscv/include/asm/bitops.h > > +++ b/arch/riscv/include/asm/bitops.h > > @@ -15,13 +15,262 @@ > > #include <asm/barrier.h> > > #include <asm/bitsperlong.h> > > > > +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > > #include <asm-generic/bitops/__ffs.h> > > -#include <asm-generic/bitops/ffz.h> > > -#include <asm-generic/bitops/fls.h> > > #include <asm-generic/bitops/__fls.h> > > +#include <asm-generic/bitops/ffs.h> > > +#include <asm-generic/bitops/fls.h> > > + > > +#else > > +#include <asm/alternative-macros.h> > > +#include <asm/hwcap.h> > > + > > +#if (BITS_PER_LONG == 64) > > +#define CTZW "ctzw " > > +#define CLZW "clzw " > > +#elif (BITS_PER_LONG == 32) > > +#define CTZW "ctz " > > +#define CLZW "clz " > > +#else > > +#error "Unexpected BITS_PER_LONG" > > +#endif > > + > > +static __always_inline unsigned long variable__ffs(unsigned long word) > > +{ > > + int num; > > + > > + asm_volatile_goto( > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > + : : : : legacy); > > + > > On this and following asm blocks, checkpatch outputs: "Lines should not > end with a '('". I did below check, but I got no warning. # ./scripts/checkpatch.pl v4-0002-riscv-Optimize-bitops-with-Zbb-extension.patch total: 0 errors, 0 warnings, 280 lines checked May I know how you do the check? BTW, I see arch/riscv/include/asm/jump_label.h and arch/riscv/include/asm/cpufeature.h have similar code. > > > + asm volatile ( > > + ".option push\n" > > + ".option arch,+zbb\n" > > + "ctz %0, %1\n" > > + ".option pop\n" > > + : "=r" (word) : "r" (word) :); > > + > > + return word; > > + > > +legacy: > > + num = 0; > > +#if BITS_PER_LONG == 64 > > + if ((word & 0xffffffff) == 0) { > > + num += 32; > > + word >>= 32; > > + } > > +#endif > > + if ((word & 0xffff) == 0) { > > + num += 16; > > + word >>= 16; > > + } > > + if ((word & 0xff) == 0) { > > + num += 8; > > + word >>= 8; > > + } > > + if ((word & 0xf) == 0) { > > + num += 4; > > + word >>= 4; > > + } > > + if ((word & 0x3) == 0) { > > + num += 2; > > + word >>= 2; > > + } > > + if ((word & 0x1) == 0) > > + num += 1; > > + return num; > > +} > > + > > +/** > > + * __ffs - find first set bit in a long word > > + * @word: The word to search > > + * > > + * Undefined if no set bit exists, so code should check against 0 first. > > + */ > > +#define __ffs(word) \ > > + (__builtin_constant_p(word) ? \ > > + (unsigned long)__builtin_ctzl(word) : \ > > + variable__ffs(word)) > > + > > +static __always_inline unsigned long variable__fls(unsigned long word) > > +{ > > + int num; > > + > > + asm_volatile_goto( > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > + : : : : legacy); > > + > > + asm volatile ( > > + ".option push\n" > > + ".option arch,+zbb\n" > > + "clz %0, %1\n" > > + ".option pop\n" > > + : "=r" (word) : "r" (word) :); > > + > > + return BITS_PER_LONG - 1 - word; > > + > > +legacy: > > + num = BITS_PER_LONG - 1; > > +#if BITS_PER_LONG == 64 > > + if (!(word & (~0ul << 32))) { > > + num -= 32; > > + word <<= 32; > > + } > > +#endif > > + if (!(word & (~0ul << (BITS_PER_LONG-16)))) { > > + num -= 16; > > + word <<= 16; > > + } > > + if (!(word & (~0ul << (BITS_PER_LONG-8)))) { > > + num -= 8; > > + word <<= 8; > > + } > > + if (!(word & (~0ul << (BITS_PER_LONG-4)))) { > > + num -= 4; > > + word <<= 4; > > + } > > + if (!(word & (~0ul << (BITS_PER_LONG-2)))) { > > + num -= 2; > > + word <<= 2; > > + } > > + if (!(word & (~0ul << (BITS_PER_LONG-1)))) > > + num -= 1; > > + return num; > > +} > > + > > +/** > > + * __fls - find last set bit in a long word > > + * @word: the word to search > > + * > > + * Undefined if no set bit exists, so code should check against 0 first. > > + */ > > +#define __fls(word) \ > > + (__builtin_constant_p(word) ? \ > > + (unsigned long)(BITS_PER_LONG - 1 - __builtin_clzl(word)) : \ > > + variable__fls(word)) > > + > > +static __always_inline int variable_ffs(int x) > > +{ > > + int r; > > + > > + if (!x) > > + return 0; > > + > > + asm_volatile_goto( > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > + : : : : legacy); > > + > > + asm volatile ( > > + ".option push\n" > > + ".option arch,+zbb\n" > > + CTZW "%0, %1\n" > > + ".option pop\n" > > + : "=r" (r) : "r" (x) :); > > + > > + return r + 1; > > + > > +legacy: > > + r = 1; > > + if (!(x & 0xffff)) { > > + x >>= 16; > > + r += 16; > > + } > > + if (!(x & 0xff)) { > > + x >>= 8; > > + r += 8; > > + } > > + if (!(x & 0xf)) { > > + x >>= 4; > > + r += 4; > > + } > > + if (!(x & 3)) { > > + x >>= 2; > > + r += 2; > > + } > > + if (!(x & 1)) { > > + x >>= 1; > > + r += 1; > > + } > > + return r; > > +} > > + > > +/** > > + * ffs - find first set bit in a word > > + * @x: the word to search > > + * > > + * This is defined the same way as the libc and compiler builtin ffs routines. > > + * > > + * ffs(value) returns 0 if value is 0 or the position of the first set bit if > > + * value is nonzero. The first (least significant) bit is at position 1. > > + */ > > +#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x)) > > + > > +static __always_inline int variable_fls(unsigned int x) > > +{ > > + int r; > > + > > + if (!x) > > + return 0; > > + > > + asm_volatile_goto( > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > + : : : : legacy); > > + > > + asm volatile ( > > + ".option push\n" > > + ".option arch,+zbb\n" > > + CLZW "%0, %1\n" > > + ".option pop\n" > > + : "=r" (r) : "r" (x) :); > > + > > + return 32 - r; > > + > > +legacy: > > + r = 32; > > + if (!(x & 0xffff0000u)) { > > + x <<= 16; > > + r -= 16; > > + } > > + if (!(x & 0xff000000u)) { > > + x <<= 8; > > + r -= 8; > > + } > > + if (!(x & 0xf0000000u)) { > > + x <<= 4; > > + r -= 4; > > + } > > + if (!(x & 0xc0000000u)) { > > + x <<= 2; > > + r -= 2; > > + } > > + if (!(x & 0x80000000u)) { > > + x <<= 1; > > + r -= 1; > > + } > > + return r; > > +} > > + > > +/** > > + * fls - find last set bit in a word > > + * @x: the word to search > > + * > > + * This is defined in a similar way as ffs, but returns the position of the most > > + * significant set bit. > > + * > > + * fls(value) returns 0 if value is 0 or the position of the last set bit if > > + * value is nonzero. The last (most significant) bit is at position 32. > > + */ > > +#define fls(x) > \ > > + (__builtin_constant_p(x) ? \ > > + (int)(((x) != 0) ? \ > > + (sizeof(unsigned int) * 8 - __builtin_clz(x)) : 0) : \ > > + variable_fls(x)) > > + > > Checkpath complains: "Macro argument reuse 'x' - possible side-effects" > Ditto. > > +#endif /* !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > */ > > + > > +#include <asm-generic/bitops/ffz.h> > > #include <asm-generic/bitops/fls64.h> > > #include <asm-generic/bitops/sched.h> > > -#include <asm-generic/bitops/ffs.h> > > > > #include <asm-generic/bitops/hweight.h> > > > > diff --git a/drivers/bitopstest/Kconfig b/drivers/bitopstest/Kconfig > > index d0e2af4b801e..6ef6dcd41d49 100644 > > --- a/drivers/bitopstest/Kconfig > > +++ b/drivers/bitopstest/Kconfig > > @@ -1,6 +1,7 @@ > > # SPDX-License-Identifier: GPL-2.0-only > > menuconfig BITOPSTEST > > tristate "self test for bitops optimization" > > + default y > > help > > Enable this to test the bitops APIs. > > Is this a test you wanted to add? The source code isn't included. Sorry, I mistakenly did a "git add" for my local test. Will drop it. BRs, Xiao > > - Charlie > > > > > diff --git a/drivers/firmware/efi/libstub/Makefile > b/drivers/firmware/efi/libstub/Makefile > > index a1157c2a7170..d68cacd4e3af 100644 > > --- a/drivers/firmware/efi/libstub/Makefile > > +++ b/drivers/firmware/efi/libstub/Makefile > > @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += - > DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \ > > -DEFI_HAVE_MEMCHR - > DEFI_HAVE_STRRCHR \ > > -DEFI_HAVE_STRCMP -fno-builtin -fpic \ > > $(call cc-option,-mno-single-pic-base) > > -cflags-$(CONFIG_RISCV) += -fpic > > +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE > > cflags-$(CONFIG_LOONGARCH) += -fpie > > > > cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += - > I$(srctree)/scripts/dtc/libfdt > > -- > > 2.25.1 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension 2023-10-31 1:53 ` Wang, Xiao W @ 2023-10-31 2:02 ` Charlie Jenkins 2023-10-31 7:52 ` Conor Dooley 0 siblings, 1 reply; 7+ messages in thread From: Charlie Jenkins @ 2023-10-31 2:02 UTC (permalink / raw) To: Wang, Xiao W Cc: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, Li, Haicheng, ajones@ventanamicro.com, Liu, Yujie, linux-riscv@lists.infradead.org, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org On Tue, Oct 31, 2023 at 01:53:55AM +0000, Wang, Xiao W wrote: > > > > -----Original Message----- > > From: Charlie Jenkins <charlie@rivosinc.com> > > Sent: Tuesday, October 31, 2023 4:37 AM > > To: Wang, Xiao W <xiao.w.wang@intel.com> > > Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; > > aou@eecs.berkeley.edu; ardb@kernel.org; anup@brainfault.org; Li, Haicheng > > <haicheng.li@intel.com>; ajones@ventanamicro.com; Liu, Yujie > > <yujie.liu@intel.com>; linux-riscv@lists.infradead.org; linux- > > efi@vger.kernel.org; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension > > > > On Mon, Oct 30, 2023 at 02:39:04PM +0800, Xiao Wang wrote: > > > This patch leverages the alternative mechanism to dynamically optimize > > > bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When > > > Zbb ext is not supported by the runtime CPU, legacy implementation is > > > used. If Zbb is supported, then the optimized variants will be selected > > > via alternative patching. > > > > > > The legacy bitops support is taken from the generic C implementation as > > > fallback. > > > > > > If the parameter is a build-time constant, we leverage compiler builtin to > > > calculate the result directly, this approach is inspired by x86 bitops > > > implementation. > > > > > > EFI stub runs before the kernel, so alternative mechanism should not be > > > used there, this patch introduces a macro NO_ALTERNATIVE for this > > purpose. > > > > > > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> > > > --- > > > arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- > > > drivers/bitopstest/Kconfig | 1 + > > > drivers/firmware/efi/libstub/Makefile | 2 +- > > > 3 files changed, 254 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/riscv/include/asm/bitops.h > > b/arch/riscv/include/asm/bitops.h > > > index 3540b690944b..ef35c9ebc2ed 100644 > > > --- a/arch/riscv/include/asm/bitops.h > > > +++ b/arch/riscv/include/asm/bitops.h > > > @@ -15,13 +15,262 @@ > > > #include <asm/barrier.h> > > > #include <asm/bitsperlong.h> > > > > > > +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > > > #include <asm-generic/bitops/__ffs.h> > > > -#include <asm-generic/bitops/ffz.h> > > > -#include <asm-generic/bitops/fls.h> > > > #include <asm-generic/bitops/__fls.h> > > > +#include <asm-generic/bitops/ffs.h> > > > +#include <asm-generic/bitops/fls.h> > > > + > > > +#else > > > +#include <asm/alternative-macros.h> > > > +#include <asm/hwcap.h> > > > + > > > +#if (BITS_PER_LONG == 64) > > > +#define CTZW "ctzw " > > > +#define CLZW "clzw " > > > +#elif (BITS_PER_LONG == 32) > > > +#define CTZW "ctz " > > > +#define CLZW "clz " > > > +#else > > > +#error "Unexpected BITS_PER_LONG" > > > +#endif > > > + > > > +static __always_inline unsigned long variable__ffs(unsigned long word) > > > +{ > > > + int num; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > > On this and following asm blocks, checkpatch outputs: "Lines should not > > end with a '('". > > I did below check, but I got no warning. > # ./scripts/checkpatch.pl v4-0002-riscv-Optimize-bitops-with-Zbb-extension.patch > total: 0 errors, 0 warnings, 280 lines checked > May I know how you do the check? > BTW, I see arch/riscv/include/asm/jump_label.h and arch/riscv/include/asm/cpufeature.h have similar code. I normally use the --strict flag since that is what the Patchwork server uses. > > > > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + "ctz %0, %1\n" > > > + ".option pop\n" > > > + : "=r" (word) : "r" (word) :); > > > + > > > + return word; > > > + > > > +legacy: > > > + num = 0; > > > +#if BITS_PER_LONG == 64 > > > + if ((word & 0xffffffff) == 0) { > > > + num += 32; > > > + word >>= 32; > > > + } > > > +#endif > > > + if ((word & 0xffff) == 0) { > > > + num += 16; > > > + word >>= 16; > > > + } > > > + if ((word & 0xff) == 0) { > > > + num += 8; > > > + word >>= 8; > > > + } > > > + if ((word & 0xf) == 0) { > > > + num += 4; > > > + word >>= 4; > > > + } > > > + if ((word & 0x3) == 0) { > > > + num += 2; > > > + word >>= 2; > > > + } > > > + if ((word & 0x1) == 0) > > > + num += 1; > > > + return num; > > > +} > > > + > > > +/** > > > + * __ffs - find first set bit in a long word > > > + * @word: The word to search > > > + * > > > + * Undefined if no set bit exists, so code should check against 0 first. > > > + */ > > > +#define __ffs(word) \ > > > + (__builtin_constant_p(word) ? \ > > > + (unsigned long)__builtin_ctzl(word) : \ > > > + variable__ffs(word)) > > > + > > > +static __always_inline unsigned long variable__fls(unsigned long word) > > > +{ > > > + int num; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + "clz %0, %1\n" > > > + ".option pop\n" > > > + : "=r" (word) : "r" (word) :); > > > + > > > + return BITS_PER_LONG - 1 - word; > > > + > > > +legacy: > > > + num = BITS_PER_LONG - 1; > > > +#if BITS_PER_LONG == 64 > > > + if (!(word & (~0ul << 32))) { > > > + num -= 32; > > > + word <<= 32; > > > + } > > > +#endif > > > + if (!(word & (~0ul << (BITS_PER_LONG-16)))) { > > > + num -= 16; > > > + word <<= 16; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-8)))) { > > > + num -= 8; > > > + word <<= 8; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-4)))) { > > > + num -= 4; > > > + word <<= 4; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-2)))) { > > > + num -= 2; > > > + word <<= 2; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-1)))) > > > + num -= 1; > > > + return num; > > > +} > > > + > > > +/** > > > + * __fls - find last set bit in a long word > > > + * @word: the word to search > > > + * > > > + * Undefined if no set bit exists, so code should check against 0 first. > > > + */ > > > +#define __fls(word) \ > > > + (__builtin_constant_p(word) ? \ > > > + (unsigned long)(BITS_PER_LONG - 1 - __builtin_clzl(word)) : \ > > > + variable__fls(word)) > > > + > > > +static __always_inline int variable_ffs(int x) > > > +{ > > > + int r; > > > + > > > + if (!x) > > > + return 0; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + CTZW "%0, %1\n" > > > + ".option pop\n" > > > + : "=r" (r) : "r" (x) :); > > > + > > > + return r + 1; > > > + > > > +legacy: > > > + r = 1; > > > + if (!(x & 0xffff)) { > > > + x >>= 16; > > > + r += 16; > > > + } > > > + if (!(x & 0xff)) { > > > + x >>= 8; > > > + r += 8; > > > + } > > > + if (!(x & 0xf)) { > > > + x >>= 4; > > > + r += 4; > > > + } > > > + if (!(x & 3)) { > > > + x >>= 2; > > > + r += 2; > > > + } > > > + if (!(x & 1)) { > > > + x >>= 1; > > > + r += 1; > > > + } > > > + return r; > > > +} > > > + > > > +/** > > > + * ffs - find first set bit in a word > > > + * @x: the word to search > > > + * > > > + * This is defined the same way as the libc and compiler builtin ffs routines. > > > + * > > > + * ffs(value) returns 0 if value is 0 or the position of the first set bit if > > > + * value is nonzero. The first (least significant) bit is at position 1. > > > + */ > > > +#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x)) > > > + > > > +static __always_inline int variable_fls(unsigned int x) > > > +{ > > > + int r; > > > + > > > + if (!x) > > > + return 0; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + CLZW "%0, %1\n" > > > + ".option pop\n" > > > + : "=r" (r) : "r" (x) :); > > > + > > > + return 32 - r; > > > + > > > +legacy: > > > + r = 32; > > > + if (!(x & 0xffff0000u)) { > > > + x <<= 16; > > > + r -= 16; > > > + } > > > + if (!(x & 0xff000000u)) { > > > + x <<= 8; > > > + r -= 8; > > > + } > > > + if (!(x & 0xf0000000u)) { > > > + x <<= 4; > > > + r -= 4; > > > + } > > > + if (!(x & 0xc0000000u)) { > > > + x <<= 2; > > > + r -= 2; > > > + } > > > + if (!(x & 0x80000000u)) { > > > + x <<= 1; > > > + r -= 1; > > > + } > > > + return r; > > > +} > > > + > > > +/** > > > + * fls - find last set bit in a word > > > + * @x: the word to search > > > + * > > > + * This is defined in a similar way as ffs, but returns the position of the most > > > + * significant set bit. > > > + * > > > + * fls(value) returns 0 if value is 0 or the position of the last set bit if > > > + * value is nonzero. The last (most significant) bit is at position 32. > > > + */ > > > +#define fls(x) > > \ > > > + (__builtin_constant_p(x) ? \ > > > + (int)(((x) != 0) ? \ > > > + (sizeof(unsigned int) * 8 - __builtin_clz(x)) : 0) : \ > > > + variable_fls(x)) > > > + > > > > Checkpath complains: "Macro argument reuse 'x' - possible side-effects" > > > > Ditto. > > > > +#endif /* !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > > */ > > > + > > > +#include <asm-generic/bitops/ffz.h> > > > #include <asm-generic/bitops/fls64.h> > > > #include <asm-generic/bitops/sched.h> > > > -#include <asm-generic/bitops/ffs.h> > > > > > > #include <asm-generic/bitops/hweight.h> > > > > > > diff --git a/drivers/bitopstest/Kconfig b/drivers/bitopstest/Kconfig > > > index d0e2af4b801e..6ef6dcd41d49 100644 > > > --- a/drivers/bitopstest/Kconfig > > > +++ b/drivers/bitopstest/Kconfig > > > @@ -1,6 +1,7 @@ > > > # SPDX-License-Identifier: GPL-2.0-only > > > menuconfig BITOPSTEST > > > tristate "self test for bitops optimization" > > > + default y > > > help > > > Enable this to test the bitops APIs. > > > > Is this a test you wanted to add? The source code isn't included. > > Sorry, I mistakenly did a "git add" for my local test. Will drop it. > > BRs, > Xiao > > > > > - Charlie In the next version after you remove this Kconfig file you can add my reviewed-by signature to this series. Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> > > > > > > > > diff --git a/drivers/firmware/efi/libstub/Makefile > > b/drivers/firmware/efi/libstub/Makefile > > > index a1157c2a7170..d68cacd4e3af 100644 > > > --- a/drivers/firmware/efi/libstub/Makefile > > > +++ b/drivers/firmware/efi/libstub/Makefile > > > @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += - > > DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \ > > > -DEFI_HAVE_MEMCHR - > > DEFI_HAVE_STRRCHR \ > > > -DEFI_HAVE_STRCMP -fno-builtin -fpic \ > > > $(call cc-option,-mno-single-pic-base) > > > -cflags-$(CONFIG_RISCV) += -fpic > > > +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE > > > cflags-$(CONFIG_LOONGARCH) += -fpie > > > > > > cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += - > > I$(srctree)/scripts/dtc/libfdt > > > -- > > > 2.25.1 > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension 2023-10-31 2:02 ` Charlie Jenkins @ 2023-10-31 7:52 ` Conor Dooley 0 siblings, 0 replies; 7+ messages in thread From: Conor Dooley @ 2023-10-31 7:52 UTC (permalink / raw) To: Charlie Jenkins Cc: Wang, Xiao W, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, Li, Haicheng, ajones@ventanamicro.com, Liu, Yujie, linux-riscv@lists.infradead.org, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1006 bytes --] > > > > +static __always_inline unsigned long variable__ffs(unsigned long word) > > > > +{ > > > > + int num; > > > > + > > > > + asm_volatile_goto( > > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > > + : : : : legacy); > > > > + > > > > > > On this and following asm blocks, checkpatch outputs: "Lines should not > > > end with a '('". > > > > I did below check, but I got no warning. > > # ./scripts/checkpatch.pl v4-0002-riscv-Optimize-bitops-with-Zbb-extension.patch > > total: 0 errors, 0 warnings, 280 lines checked > > May I know how you do the check? > > BTW, I see arch/riscv/include/asm/jump_label.h and arch/riscv/include/asm/cpufeature.h have similar code. > > I normally use the --strict flag since that is what the Patchwork server > uses. FWIW, checkpatch output does require human interpretation. It is only a "dumb" perl script, and particularly for asm stuff the complaints can often be ignored. IMO, this is one of those times. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-10-31 7:54 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-30 6:39 [PATCH v4 0/2] riscv: Optimize bitops with Zbb extension Xiao Wang 2023-10-30 6:39 ` [PATCH v4 1/2] riscv: Rearrange hwcap.h and cpufeature.h Xiao Wang 2023-10-30 6:39 ` [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension Xiao Wang 2023-10-30 20:36 ` Charlie Jenkins 2023-10-31 1:53 ` Wang, Xiao W 2023-10-31 2:02 ` Charlie Jenkins 2023-10-31 7:52 ` Conor Dooley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox