* [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature @ 2013-09-02 5:45 Kevin Hao 2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc v2: - Rebase on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master since these patches have a dependence on patch 851cf6e7 (jump_label: Split jumplabel ratelimit) - Drop the first two patches in v1 since they are not needed due to the patch 851cf6e7. v1: Inspired by Benjamin Herrenschmidt, this patch series try to reduce the cpu/mmu feature checking overhead by using jump label. The following is the difference of the run path of cpu_has_feature between before and after applying these patches: before after addis r10,r2,1 b xxx addi r9,r10,-2280 b xxx (This will also be omitted if the ld r9,0(r9) feature is not set) ld r9,16(r9) rldicl. r8,r9,55,63 beq c000000000037c94 This patch series passed the build test for almost all the defconfig of ppc. There does have some broken for some configs. But they are not related to this change. This also passed allyesconfig for x86. Boot test on p2020rdb and p5020ds boards. Kevin Hao (3): powerpc: move the cpu_has_feature to a separate file powerpc: use the jump label for cpu_has_feature powerpc: use jump label for mmu_has_feature arch/powerpc/include/asm/cacheflush.h | 1 + arch/powerpc/include/asm/cpufeatures.h | 41 +++++++++++++++++++++++++++++++ arch/powerpc/include/asm/cputable.h | 8 ------ arch/powerpc/include/asm/cputime.h | 1 + arch/powerpc/include/asm/dbell.h | 1 + arch/powerpc/include/asm/dcr-native.h | 1 + arch/powerpc/include/asm/mman.h | 1 + arch/powerpc/include/asm/mmu.h | 19 +++++++++++++++ arch/powerpc/include/asm/time.h | 1 + arch/powerpc/kernel/align.c | 1 + arch/powerpc/kernel/cputable.c | 43 +++++++++++++++++++++++++++++++++ arch/powerpc/kernel/irq.c | 1 + arch/powerpc/kernel/process.c | 1 + arch/powerpc/kernel/setup-common.c | 1 + arch/powerpc/kernel/setup_32.c | 1 + arch/powerpc/kernel/smp.c | 1 + arch/powerpc/oprofile/op_model_rs64.c | 1 + arch/powerpc/platforms/cell/pervasive.c | 1 + arch/powerpc/xmon/ppc-dis.c | 1 + 19 files changed, 118 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/include/asm/cpufeatures.h -- 1.8.3.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file 2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao @ 2013-09-02 5:45 ` Kevin Hao 2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao 2013-09-02 5:45 ` [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature Kevin Hao 2 siblings, 0 replies; 6+ messages in thread From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc We plan to use jump label for cpu_has_feature. In order to implement this we need to include the linux/jump_label.h in asm/cputable.h. But it seems that asm/cputable.h is so basic header file for ppc that it is almost included by all the other header files. The including of the linux/jump_label.h will introduces various recursive inclusion. And it is very hard to fix that. So we choose to move the function cpu_has_feature to a separate header file before using the jump label for it. No functional change. Signed-off-by: Kevin Hao <haokexin@gmail.com> --- v2: Update the commit log to replace the jump_label_base.h with jump_label.h. arch/powerpc/include/asm/cacheflush.h | 1 + arch/powerpc/include/asm/cpufeatures.h | 14 ++++++++++++++ arch/powerpc/include/asm/cputable.h | 8 -------- arch/powerpc/include/asm/cputime.h | 1 + arch/powerpc/include/asm/dbell.h | 1 + arch/powerpc/include/asm/dcr-native.h | 1 + arch/powerpc/include/asm/mman.h | 1 + arch/powerpc/include/asm/time.h | 1 + arch/powerpc/kernel/align.c | 1 + arch/powerpc/kernel/irq.c | 1 + arch/powerpc/kernel/process.c | 1 + arch/powerpc/kernel/setup-common.c | 1 + arch/powerpc/kernel/setup_32.c | 1 + arch/powerpc/kernel/smp.c | 1 + arch/powerpc/oprofile/op_model_rs64.c | 1 + arch/powerpc/platforms/cell/pervasive.c | 1 + arch/powerpc/xmon/ppc-dis.c | 1 + 17 files changed, 29 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/include/asm/cpufeatures.h diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index 5b93122..8d2d4c3 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -11,6 +11,7 @@ #include <linux/mm.h> #include <asm/cputable.h> +#include <asm/cpufeatures.h> /* * No cache flushing is required when address mappings are changed, diff --git a/arch/powerpc/include/asm/cpufeatures.h b/arch/powerpc/include/asm/cpufeatures.h new file mode 100644 index 0000000..37650db --- /dev/null +++ b/arch/powerpc/include/asm/cpufeatures.h @@ -0,0 +1,14 @@ +#ifndef __ASM_POWERPC_CPUFEATURES_H +#define __ASM_POWERPC_CPUFEATURES_H + +#include <asm/cputable.h> + +static inline int cpu_has_feature(unsigned long feature) +{ + return (CPU_FTRS_ALWAYS & feature) || + (CPU_FTRS_POSSIBLE + & cur_cpu_spec->cpu_features + & feature); +} + +#endif /* __ASM_POWERPC_CPUFEATURE_H */ diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 0d4939b..ca177b2 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -546,14 +546,6 @@ enum { }; #endif /* __powerpc64__ */ -static inline int cpu_has_feature(unsigned long feature) -{ - return (CPU_FTRS_ALWAYS & feature) || - (CPU_FTRS_POSSIBLE - & cur_cpu_spec->cpu_features - & feature); -} - #define HBP_NUM 1 #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index 607559a..15481e2 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -28,6 +28,7 @@ static inline void setup_cputime_one_jiffy(void) { } #include <asm/div64.h> #include <asm/time.h> #include <asm/param.h> +#include <asm/cpufeatures.h> typedef u64 __nocast cputime_t; typedef u64 __nocast cputime64_t; diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h index 5fa6b20..2d9eae3 100644 --- a/arch/powerpc/include/asm/dbell.h +++ b/arch/powerpc/include/asm/dbell.h @@ -16,6 +16,7 @@ #include <linux/threads.h> #include <asm/ppc-opcode.h> +#include <asm/cpufeatures.h> #define PPC_DBELL_MSG_BRDCAST (0x04000000) #define PPC_DBELL_TYPE(x) (((x) & 0xf) << (63-36)) diff --git a/arch/powerpc/include/asm/dcr-native.h b/arch/powerpc/include/asm/dcr-native.h index 7d2e623..3372650 100644 --- a/arch/powerpc/include/asm/dcr-native.h +++ b/arch/powerpc/include/asm/dcr-native.h @@ -24,6 +24,7 @@ #include <linux/spinlock.h> #include <asm/cputable.h> +#include <asm/cpufeatures.h> typedef struct { unsigned int base; diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 8565c25..74922ad 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -13,6 +13,7 @@ #include <asm/cputable.h> #include <linux/mm.h> +#include <asm/cpufeatures.h> /* * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits() diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index c1f2676..20e6ee9 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -18,6 +18,7 @@ #include <linux/percpu.h> #include <asm/processor.h> +#include <asm/cpufeatures.h> /* time.c */ extern unsigned long tb_ticks_per_jiffy; diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index a27ccd5..4f49cb1 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -25,6 +25,7 @@ #include <asm/cputable.h> #include <asm/emulated_ops.h> #include <asm/switch_to.h> +#include <asm/cpufeatures.h> struct aligninfo { unsigned char len; diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index c69440c..164a9ad 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -75,6 +75,7 @@ #endif #define CREATE_TRACE_POINTS #include <asm/trace.h> +#include <asm/cpufeatures.h> DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); EXPORT_PER_CPU_SYMBOL(irq_stat); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 6f428da..cc65650 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -57,6 +57,7 @@ #endif #include <linux/kprobes.h> #include <linux/kdebug.h> +#include <asm/cpufeatures.h> /* Transactional Memory debug */ #ifdef TM_DEBUG_SW diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 3d261c0..6e3e595 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -61,6 +61,7 @@ #include <asm/cputhreads.h> #include <mm/mmu_decl.h> #include <asm/fadump.h> +#include <asm/cpufeatures.h> #include "setup.h" diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index a4bbcae..304a85b 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -39,6 +39,7 @@ #include <asm/udbg.h> #include <asm/mmu_context.h> #include <asm/epapr_hcalls.h> +#include <asm/cpufeatures.h> #include "setup.h" diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 442d8e2..e793c0d 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -50,6 +50,7 @@ #endif #include <asm/vdso.h> #include <asm/debug.h> +#include <asm/cpufeatures.h> #ifdef DEBUG #include <asm/udbg.h> diff --git a/arch/powerpc/oprofile/op_model_rs64.c b/arch/powerpc/oprofile/op_model_rs64.c index 9b801b8..924b66c8 100644 --- a/arch/powerpc/oprofile/op_model_rs64.c +++ b/arch/powerpc/oprofile/op_model_rs64.c @@ -14,6 +14,7 @@ #include <asm/processor.h> #include <asm/cputable.h> #include <asm/oprofile_impl.h> +#include <asm/cpufeatures.h> #define dbg(args...) diff --git a/arch/powerpc/platforms/cell/pervasive.c b/arch/powerpc/platforms/cell/pervasive.c index d17e98b..036215b 100644 --- a/arch/powerpc/platforms/cell/pervasive.c +++ b/arch/powerpc/platforms/cell/pervasive.c @@ -35,6 +35,7 @@ #include <asm/pgtable.h> #include <asm/reg.h> #include <asm/cell-regs.h> +#include <asm/cpufeatures.h> #include "pervasive.h" diff --git a/arch/powerpc/xmon/ppc-dis.c b/arch/powerpc/xmon/ppc-dis.c index 89098f32..a642155 100644 --- a/arch/powerpc/xmon/ppc-dis.c +++ b/arch/powerpc/xmon/ppc-dis.c @@ -24,6 +24,7 @@ Software Foundation, 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, US #include "ansidecl.h" #include "ppc.h" #include "dis-asm.h" +#include <asm/cpufeatures.h> /* Print a PowerPC or POWER instruction. */ -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature 2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao 2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao @ 2013-09-02 5:45 ` Kevin Hao 2013-11-27 2:45 ` Benjamin Herrenschmidt 2013-09-02 5:45 ` [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature Kevin Hao 2 siblings, 1 reply; 6+ messages in thread From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc The cpu features are fixed once the probe of cpu features are done. And the function cpu_has_feature() does be used in some hot path. The checking of the cpu features for each time of invoking of cpu_has_feature() seems suboptimal. This tries to reduce this overhead of this check by using jump label. But we can only use the jump label for this check only after the execution of jump_label_init(), so we introduce another jump label to still do the feature check by default before all the cpu feature jump labels are initialized. Signed-off-by: Kevin Hao <haokexin@gmail.com> --- v2: Include the jump_label.h instead of jump_label_base.h. arch/powerpc/include/asm/cpufeatures.h | 27 +++++++++++++++++++++++++++ arch/powerpc/kernel/cputable.c | 23 +++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/arch/powerpc/include/asm/cpufeatures.h b/arch/powerpc/include/asm/cpufeatures.h index 37650db..2631287 100644 --- a/arch/powerpc/include/asm/cpufeatures.h +++ b/arch/powerpc/include/asm/cpufeatures.h @@ -2,7 +2,33 @@ #define __ASM_POWERPC_CPUFEATURES_H #include <asm/cputable.h> +#ifdef CONFIG_JUMP_LABEL +#include <linux/jump_label.h> +#ifdef __powerpc64__ +#define MAX_CPU_FEATURES 64 +#else +#define MAX_CPU_FEATURES 32 +#endif +extern struct static_key cpu_feat_keys[MAX_CPU_FEATURES]; +extern struct static_key cpu_feat_keys_enabled; + +static inline int cpu_has_feature(unsigned long feature) +{ + if (CPU_FTRS_ALWAYS & feature) + return 1; + + if (!(CPU_FTRS_POSSIBLE | feature)) + return 0; + + if (static_key_false(&cpu_feat_keys_enabled)) { + int i = __builtin_ctzl(feature); + + return static_key_false(&cpu_feat_keys[i]); + } else + return !!(cur_cpu_spec->cpu_features & feature); +} +#else static inline int cpu_has_feature(unsigned long feature) { return (CPU_FTRS_ALWAYS & feature) || @@ -10,5 +36,6 @@ static inline int cpu_has_feature(unsigned long feature) & cur_cpu_spec->cpu_features & feature); } +#endif #endif /* __ASM_POWERPC_CPUFEATURE_H */ diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 597d954..50bd216 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -21,6 +21,7 @@ #include <asm/prom.h> /* for PTRRELOC on ARCH=ppc */ #include <asm/mmu.h> #include <asm/setup.h> +#include <asm/cpufeatures.h> struct cpu_spec* cur_cpu_spec = NULL; EXPORT_SYMBOL(cur_cpu_spec); @@ -2258,3 +2259,25 @@ struct cpu_spec * __init identify_cpu(unsigned long offset, unsigned int pvr) return NULL; } + +#ifdef CONFIG_JUMP_LABEL +struct static_key cpu_feat_keys[MAX_CPU_FEATURES]; +struct static_key cpu_feat_keys_enabled; + +static __init int cpu_feat_keys_init(void) +{ + int i; + + for (i = 0; i < MAX_CPU_FEATURES; i++) { + unsigned long f = 1 << i; + + if (cur_cpu_spec->cpu_features & f) + static_key_slow_inc(&cpu_feat_keys[i]); + } + + static_key_slow_inc(&cpu_feat_keys_enabled); + + return 0; +} +early_initcall(cpu_feat_keys_init); +#endif -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature 2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao @ 2013-11-27 2:45 ` Benjamin Herrenschmidt 2013-12-25 5:58 ` Kevin Hao 0 siblings, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2013-11-27 2:45 UTC (permalink / raw) To: Kevin Hao; +Cc: linuxppc On Mon, 2013-09-02 at 13:45 +0800, Kevin Hao wrote: > The cpu features are fixed once the probe of cpu features are done. > And the function cpu_has_feature() does be used in some hot path. > The checking of the cpu features for each time of invoking of > cpu_has_feature() seems suboptimal. This tries to reduce this > overhead of this check by using jump label. But we can only use > the jump label for this check only after the execution of > jump_label_init(), so we introduce another jump label to > still do the feature check by default before all the cpu > feature jump labels are initialized. So I was looking at these and ... > +static inline int cpu_has_feature(unsigned long feature) > +{ > + if (CPU_FTRS_ALWAYS & feature) > + return 1; > + > + if (!(CPU_FTRS_POSSIBLE | feature)) > + return 0; > + > + if (static_key_false(&cpu_feat_keys_enabled)) { > + int i = __builtin_ctzl(feature); > + > + return static_key_false(&cpu_feat_keys[i]); > + } else > + return !!(cur_cpu_spec->cpu_features & feature); > +} This is gross :-) Have you checked the generated code ? I'm worried that we end up hitting at least 2 branches every time, which might be enough to defeat the purposes even if they are unconditional in term of performance and code size... Cheers, Ben. > +#else > static inline int cpu_has_feature(unsigned long feature) > { > return (CPU_FTRS_ALWAYS & feature) || > @@ -10,5 +36,6 @@ static inline int cpu_has_feature(unsigned long feature) > & cur_cpu_spec->cpu_features > & feature); > } > +#endif > > #endif /* __ASM_POWERPC_CPUFEATURE_H */ > diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c > index 597d954..50bd216 100644 > --- a/arch/powerpc/kernel/cputable.c > +++ b/arch/powerpc/kernel/cputable.c > @@ -21,6 +21,7 @@ > #include <asm/prom.h> /* for PTRRELOC on ARCH=ppc */ > #include <asm/mmu.h> > #include <asm/setup.h> > +#include <asm/cpufeatures.h> > > struct cpu_spec* cur_cpu_spec = NULL; > EXPORT_SYMBOL(cur_cpu_spec); > @@ -2258,3 +2259,25 @@ struct cpu_spec * __init identify_cpu(unsigned long offset, unsigned int pvr) > > return NULL; > } > + > +#ifdef CONFIG_JUMP_LABEL > +struct static_key cpu_feat_keys[MAX_CPU_FEATURES]; > +struct static_key cpu_feat_keys_enabled; > + > +static __init int cpu_feat_keys_init(void) > +{ > + int i; > + > + for (i = 0; i < MAX_CPU_FEATURES; i++) { > + unsigned long f = 1 << i; > + > + if (cur_cpu_spec->cpu_features & f) > + static_key_slow_inc(&cpu_feat_keys[i]); > + } > + > + static_key_slow_inc(&cpu_feat_keys_enabled); > + > + return 0; > +} > +early_initcall(cpu_feat_keys_init); > +#endif ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature 2013-11-27 2:45 ` Benjamin Herrenschmidt @ 2013-12-25 5:58 ` Kevin Hao 0 siblings, 0 replies; 6+ messages in thread From: Kevin Hao @ 2013-12-25 5:58 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc [-- Attachment #1: Type: text/plain, Size: 6479 bytes --] On Wed, Nov 27, 2013 at 01:45:41PM +1100, Benjamin Herrenschmidt wrote: > On Mon, 2013-09-02 at 13:45 +0800, Kevin Hao wrote: > > The cpu features are fixed once the probe of cpu features are done. > > And the function cpu_has_feature() does be used in some hot path. > > The checking of the cpu features for each time of invoking of > > cpu_has_feature() seems suboptimal. This tries to reduce this > > overhead of this check by using jump label. But we can only use > > the jump label for this check only after the execution of > > jump_label_init(), so we introduce another jump label to > > still do the feature check by default before all the cpu > > feature jump labels are initialized. > > So I was looking at these and ... Sorry for the delayed response. > > > +static inline int cpu_has_feature(unsigned long feature) > > +{ > > + if (CPU_FTRS_ALWAYS & feature) > > + return 1; > > + > > + if (!(CPU_FTRS_POSSIBLE | feature)) > > + return 0; > > + > > + if (static_key_false(&cpu_feat_keys_enabled)) { > > + int i = __builtin_ctzl(feature); > > + > > + return static_key_false(&cpu_feat_keys[i]); > > + } else > > + return !!(cur_cpu_spec->cpu_features & feature); > > +} > > This is gross :-) > > Have you checked the generated code ? I'm worried that we end up hitting > at least 2 branches every time, No, we would get 2 unconditional branches at worst. The following is the disassemble of part code in switch_mm() when jump label is enabled. 68 /* We must stop all altivec streams before changing the HW 69 * context 70 */ 71 #ifdef CONFIG_ALTIVEC 72 if (cpu_has_feature(CPU_FTR_ALTIVEC)) 73 asm volatile ("dssall"); 74 #endif /* CONFIG_ALTIVEC */ c0000000005c42f4: 60 00 00 00 nop c0000000005c42f8: 3d 02 00 01 addis r8,r2,1 c0000000005c42fc: 39 28 f6 b8 addi r9,r8,-2376 c0000000005c4300: e9 29 00 00 ld r9,0(r9) c0000000005c4304: e9 29 00 10 ld r9,16(r9) c0000000005c4308: 79 2a ef e3 rldicl. r10,r9,61,63 c0000000005c430c: 41 82 00 08 beq c0000000005c4314 <.__schedule+0x27c> c0000000005c4310: 7e 00 06 6c dssall c0000000005c4314: 7f 43 d3 78 mr r3,r26 c0000000005c4318: 7f a4 eb 78 mr r4,r29 c0000000005c431c: 4b a5 ff 71 bl c00000000002428c <.switch_mmu_context> c0000000005c4320: 60 00 00 00 nop .... c0000000005c4400: 60 00 00 00 nop c0000000005c4404: 7f 43 d3 78 mr r3,r26 c0000000005c4408: 7f a4 eb 78 mr r4,r29 c0000000005c440c: 4b a5 fe 81 bl c00000000002428c <.switch_mmu_context> c0000000005c4410: 60 00 00 00 nop On a p5020 board which doesn't support altivec, the code would change to the following after jump label init. c0000000005c42f4: b c0000000005c4400 The final instruction sequence should be just a branch. On a t4240 board which does have altivec support, the code would change to: c0000000005c42f4: b c0000000005c4400 ... c0000000005c4400: b c0000000005c4310 The final instruction sequence should be two unconditional branches. The following is the disassemble code when jump label is disabled. c0000000005c26fc: 60 00 00 00 nop c0000000005c2700: 3d 02 00 01 addis r8,r2,1 c0000000005c2704: 39 28 24 b8 addi r9,r8,9400 c0000000005c2708: e9 29 00 00 ld r9,0(r9) c0000000005c270c: e9 29 00 10 ld r9,16(r9) c0000000005c2710: 79 2a ef e3 rldicl. r10,r9,61,63 c0000000005c2714: 40 82 00 d4 bne c0000000005c27e8 <.__schedule+0x360> c0000000005c2718: 7f 43 d3 78 mr r3,r26 c0000000005c271c: 7f a4 eb 78 mr r4,r29 c0000000005c2720: 4b a6 0a 75 bl c000000000023194 <.switch_mmu_context> c0000000005c2724: 60 00 00 00 nop .... c0000000005c27e8: 7e 00 06 6c dssall c0000000005c27ec: 4b ff ff 2c b c0000000005c2718 <.__schedule+0x290> c0000000005c27f0: e9 3e 00 00 ld r9,0(r30) c0000000005c27f4: 71 2a 00 81 andi. r10,r9,129 c0000000005c27f8: 40 82 00 94 bne c0000000005c288c <.__schedule+0x404> The final instruction sequence is following: addis r8,r2,1 addi r9,r8,9400 ld r9,0(r9) ld r9,16(r9) rldicl. r10,r9,61,63 bne c0000000005c27e8 > which might be enough to defeat the > purposes even if they are unconditional in term of performance and > code size... It does result in an increase in the code size due to the enable of jump label. before: .text .data 005c4ff4 0005ede8 after: .text .data 005c6c04 0005fe68 Thanks, Kevin > > Cheers, > Ben. > > > +#else > > static inline int cpu_has_feature(unsigned long feature) > > { > > return (CPU_FTRS_ALWAYS & feature) || > > @@ -10,5 +36,6 @@ static inline int cpu_has_feature(unsigned long feature) > > & cur_cpu_spec->cpu_features > > & feature); > > } > > +#endif > > > > #endif /* __ASM_POWERPC_CPUFEATURE_H */ > > diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c > > index 597d954..50bd216 100644 > > --- a/arch/powerpc/kernel/cputable.c > > +++ b/arch/powerpc/kernel/cputable.c > > @@ -21,6 +21,7 @@ > > #include <asm/prom.h> /* for PTRRELOC on ARCH=ppc */ > > #include <asm/mmu.h> > > #include <asm/setup.h> > > +#include <asm/cpufeatures.h> > > > > struct cpu_spec* cur_cpu_spec = NULL; > > EXPORT_SYMBOL(cur_cpu_spec); > > @@ -2258,3 +2259,25 @@ struct cpu_spec * __init identify_cpu(unsigned long offset, unsigned int pvr) > > > > return NULL; > > } > > + > > +#ifdef CONFIG_JUMP_LABEL > > +struct static_key cpu_feat_keys[MAX_CPU_FEATURES]; > > +struct static_key cpu_feat_keys_enabled; > > + > > +static __init int cpu_eat_keys_init(void) > > +{ > > + int i; > > + > > + for (i = 0; i < MAX_CPU_FEATURES; i++) { > > + unsigned long f = 1 << i; > > + > > + if (cur_cpu_spec->cpu_features & f) > > + static_key_slow_inc(&cpu_feat_keys[i]); > > + } > > + > > + static_key_slow_inc(&cpu_feat_keys_enabled); > > + > > + return 0; > > +} > > +early_initcall(cpu_feat_keys_init); > > +#endif > > [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature 2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao 2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao 2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao @ 2013-09-02 5:45 ` Kevin Hao 2 siblings, 0 replies; 6+ messages in thread From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc The mmu features are fixed once the probe of mmu features are done. And the function mmu_has_feature() does be used in some hot path. The checking of the mmu features for each time of invoking of mmu_has_feature() seems suboptimal. This tries to reduce this overhead of this check by using jump label. But we can only use the jump label for this check only after the execution of jump_label_init(), so we introduce another jump label to still do the feature check by default before all the mmu feature jump labels are initialized. Signed-off-by: Kevin Hao <haokexin@gmail.com> --- v2: Include the jump_label.h instead of jump_label_base.h. arch/powerpc/include/asm/mmu.h | 19 +++++++++++++++++++ arch/powerpc/kernel/cputable.c | 20 ++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 691fd8a..b7f049e 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -121,10 +121,29 @@ DECLARE_PER_CPU(int, next_tlbcam_idx); #endif +#ifdef CONFIG_JUMP_LABEL +#include <linux/jump_label.h> + +#define MAX_MMU_FEATURES 32 + +extern struct static_key mmu_feat_keys[MAX_MMU_FEATURES]; +extern struct static_key mmu_feat_keys_enabled; + +static inline int mmu_has_feature(unsigned long feature) +{ + if (static_key_false(&mmu_feat_keys_enabled)) { + int i = __builtin_ctzl(feature); + + return static_key_false(&mmu_feat_keys[i]); + } else + return !!(cur_cpu_spec->mmu_features & feature); +} +#else static inline int mmu_has_feature(unsigned long feature) { return (cur_cpu_spec->mmu_features & feature); } +#endif static inline void mmu_clear_feature(unsigned long feature) { diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 50bd216..785370b 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -2280,4 +2280,24 @@ static __init int cpu_feat_keys_init(void) return 0; } early_initcall(cpu_feat_keys_init); + +struct static_key mmu_feat_keys[MAX_MMU_FEATURES]; +struct static_key mmu_feat_keys_enabled; + +static __init int mmu_feat_keys_init(void) +{ + int i; + + for (i = 0; i < MAX_MMU_FEATURES; i++) { + unsigned long f = 1 << i; + + if (cur_cpu_spec->mmu_features & f) + static_key_slow_inc(&mmu_feat_keys[i]); + } + + static_key_slow_inc(&mmu_feat_keys_enabled); + + return 0; +} +early_initcall(mmu_feat_keys_init); #endif -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-25 5:58 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao 2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao 2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao 2013-11-27 2:45 ` Benjamin Herrenschmidt 2013-12-25 5:58 ` Kevin Hao 2013-09-02 5:45 ` [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature Kevin Hao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).