* [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature
@ 2013-09-02 5:45 Kevin Hao
2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc
v2:
- Rebase on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
since these patches have a dependence on patch 851cf6e7 (jump_label:
Split jumplabel ratelimit)
- Drop the first two patches in v1 since they are not needed due to
the patch 851cf6e7.
v1:
Inspired by Benjamin Herrenschmidt, this patch series try to reduce the
cpu/mmu feature checking overhead by using jump label. The following is
the difference of the run path of cpu_has_feature between before and after
applying these patches:
before after
addis r10,r2,1 b xxx
addi r9,r10,-2280 b xxx (This will also be omitted if the
ld r9,0(r9) feature is not set)
ld r9,16(r9)
rldicl. r8,r9,55,63
beq c000000000037c94
This patch series passed the build test for almost all the defconfig of ppc.
There does have some broken for some configs. But they are not related to this
change. This also passed allyesconfig for x86. Boot test on p2020rdb and
p5020ds boards.
Kevin Hao (3):
powerpc: move the cpu_has_feature to a separate file
powerpc: use the jump label for cpu_has_feature
powerpc: use jump label for mmu_has_feature
arch/powerpc/include/asm/cacheflush.h | 1 +
arch/powerpc/include/asm/cpufeatures.h | 41 +++++++++++++++++++++++++++++++
arch/powerpc/include/asm/cputable.h | 8 ------
arch/powerpc/include/asm/cputime.h | 1 +
arch/powerpc/include/asm/dbell.h | 1 +
arch/powerpc/include/asm/dcr-native.h | 1 +
arch/powerpc/include/asm/mman.h | 1 +
arch/powerpc/include/asm/mmu.h | 19 +++++++++++++++
arch/powerpc/include/asm/time.h | 1 +
arch/powerpc/kernel/align.c | 1 +
arch/powerpc/kernel/cputable.c | 43 +++++++++++++++++++++++++++++++++
arch/powerpc/kernel/irq.c | 1 +
arch/powerpc/kernel/process.c | 1 +
arch/powerpc/kernel/setup-common.c | 1 +
arch/powerpc/kernel/setup_32.c | 1 +
arch/powerpc/kernel/smp.c | 1 +
arch/powerpc/oprofile/op_model_rs64.c | 1 +
arch/powerpc/platforms/cell/pervasive.c | 1 +
arch/powerpc/xmon/ppc-dis.c | 1 +
19 files changed, 118 insertions(+), 8 deletions(-)
create mode 100644 arch/powerpc/include/asm/cpufeatures.h
--
1.8.3.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file
2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao
@ 2013-09-02 5:45 ` Kevin Hao
2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao
2013-09-02 5:45 ` [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature Kevin Hao
2 siblings, 0 replies; 6+ messages in thread
From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc
We plan to use jump label for cpu_has_feature. In order to implement
this we need to include the linux/jump_label.h in asm/cputable.h.
But it seems that asm/cputable.h is so basic header file for ppc that
it is almost included by all the other header files. The including of
the linux/jump_label.h will introduces various recursive inclusion.
And it is very hard to fix that. So we choose to move the function
cpu_has_feature to a separate header file before using the jump label
for it. No functional change.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
v2: Update the commit log to replace the jump_label_base.h with jump_label.h.
arch/powerpc/include/asm/cacheflush.h | 1 +
arch/powerpc/include/asm/cpufeatures.h | 14 ++++++++++++++
arch/powerpc/include/asm/cputable.h | 8 --------
arch/powerpc/include/asm/cputime.h | 1 +
arch/powerpc/include/asm/dbell.h | 1 +
arch/powerpc/include/asm/dcr-native.h | 1 +
arch/powerpc/include/asm/mman.h | 1 +
arch/powerpc/include/asm/time.h | 1 +
arch/powerpc/kernel/align.c | 1 +
arch/powerpc/kernel/irq.c | 1 +
arch/powerpc/kernel/process.c | 1 +
arch/powerpc/kernel/setup-common.c | 1 +
arch/powerpc/kernel/setup_32.c | 1 +
arch/powerpc/kernel/smp.c | 1 +
arch/powerpc/oprofile/op_model_rs64.c | 1 +
arch/powerpc/platforms/cell/pervasive.c | 1 +
arch/powerpc/xmon/ppc-dis.c | 1 +
17 files changed, 29 insertions(+), 8 deletions(-)
create mode 100644 arch/powerpc/include/asm/cpufeatures.h
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 5b93122..8d2d4c3 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -11,6 +11,7 @@
#include <linux/mm.h>
#include <asm/cputable.h>
+#include <asm/cpufeatures.h>
/*
* No cache flushing is required when address mappings are changed,
diff --git a/arch/powerpc/include/asm/cpufeatures.h b/arch/powerpc/include/asm/cpufeatures.h
new file mode 100644
index 0000000..37650db
--- /dev/null
+++ b/arch/powerpc/include/asm/cpufeatures.h
@@ -0,0 +1,14 @@
+#ifndef __ASM_POWERPC_CPUFEATURES_H
+#define __ASM_POWERPC_CPUFEATURES_H
+
+#include <asm/cputable.h>
+
+static inline int cpu_has_feature(unsigned long feature)
+{
+ return (CPU_FTRS_ALWAYS & feature) ||
+ (CPU_FTRS_POSSIBLE
+ & cur_cpu_spec->cpu_features
+ & feature);
+}
+
+#endif /* __ASM_POWERPC_CPUFEATURE_H */
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 0d4939b..ca177b2 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -546,14 +546,6 @@ enum {
};
#endif /* __powerpc64__ */
-static inline int cpu_has_feature(unsigned long feature)
-{
- return (CPU_FTRS_ALWAYS & feature) ||
- (CPU_FTRS_POSSIBLE
- & cur_cpu_spec->cpu_features
- & feature);
-}
-
#define HBP_NUM 1
#endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 607559a..15481e2 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -28,6 +28,7 @@ static inline void setup_cputime_one_jiffy(void) { }
#include <asm/div64.h>
#include <asm/time.h>
#include <asm/param.h>
+#include <asm/cpufeatures.h>
typedef u64 __nocast cputime_t;
typedef u64 __nocast cputime64_t;
diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
index 5fa6b20..2d9eae3 100644
--- a/arch/powerpc/include/asm/dbell.h
+++ b/arch/powerpc/include/asm/dbell.h
@@ -16,6 +16,7 @@
#include <linux/threads.h>
#include <asm/ppc-opcode.h>
+#include <asm/cpufeatures.h>
#define PPC_DBELL_MSG_BRDCAST (0x04000000)
#define PPC_DBELL_TYPE(x) (((x) & 0xf) << (63-36))
diff --git a/arch/powerpc/include/asm/dcr-native.h b/arch/powerpc/include/asm/dcr-native.h
index 7d2e623..3372650 100644
--- a/arch/powerpc/include/asm/dcr-native.h
+++ b/arch/powerpc/include/asm/dcr-native.h
@@ -24,6 +24,7 @@
#include <linux/spinlock.h>
#include <asm/cputable.h>
+#include <asm/cpufeatures.h>
typedef struct {
unsigned int base;
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 8565c25..74922ad 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -13,6 +13,7 @@
#include <asm/cputable.h>
#include <linux/mm.h>
+#include <asm/cpufeatures.h>
/*
* This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits()
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..20e6ee9 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -18,6 +18,7 @@
#include <linux/percpu.h>
#include <asm/processor.h>
+#include <asm/cpufeatures.h>
/* time.c */
extern unsigned long tb_ticks_per_jiffy;
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index a27ccd5..4f49cb1 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -25,6 +25,7 @@
#include <asm/cputable.h>
#include <asm/emulated_ops.h>
#include <asm/switch_to.h>
+#include <asm/cpufeatures.h>
struct aligninfo {
unsigned char len;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index c69440c..164a9ad 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -75,6 +75,7 @@
#endif
#define CREATE_TRACE_POINTS
#include <asm/trace.h>
+#include <asm/cpufeatures.h>
DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6f428da..cc65650 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -57,6 +57,7 @@
#endif
#include <linux/kprobes.h>
#include <linux/kdebug.h>
+#include <asm/cpufeatures.h>
/* Transactional Memory debug */
#ifdef TM_DEBUG_SW
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 3d261c0..6e3e595 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -61,6 +61,7 @@
#include <asm/cputhreads.h>
#include <mm/mmu_decl.h>
#include <asm/fadump.h>
+#include <asm/cpufeatures.h>
#include "setup.h"
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index a4bbcae..304a85b 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -39,6 +39,7 @@
#include <asm/udbg.h>
#include <asm/mmu_context.h>
#include <asm/epapr_hcalls.h>
+#include <asm/cpufeatures.h>
#include "setup.h"
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 442d8e2..e793c0d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -50,6 +50,7 @@
#endif
#include <asm/vdso.h>
#include <asm/debug.h>
+#include <asm/cpufeatures.h>
#ifdef DEBUG
#include <asm/udbg.h>
diff --git a/arch/powerpc/oprofile/op_model_rs64.c b/arch/powerpc/oprofile/op_model_rs64.c
index 9b801b8..924b66c8 100644
--- a/arch/powerpc/oprofile/op_model_rs64.c
+++ b/arch/powerpc/oprofile/op_model_rs64.c
@@ -14,6 +14,7 @@
#include <asm/processor.h>
#include <asm/cputable.h>
#include <asm/oprofile_impl.h>
+#include <asm/cpufeatures.h>
#define dbg(args...)
diff --git a/arch/powerpc/platforms/cell/pervasive.c b/arch/powerpc/platforms/cell/pervasive.c
index d17e98b..036215b 100644
--- a/arch/powerpc/platforms/cell/pervasive.c
+++ b/arch/powerpc/platforms/cell/pervasive.c
@@ -35,6 +35,7 @@
#include <asm/pgtable.h>
#include <asm/reg.h>
#include <asm/cell-regs.h>
+#include <asm/cpufeatures.h>
#include "pervasive.h"
diff --git a/arch/powerpc/xmon/ppc-dis.c b/arch/powerpc/xmon/ppc-dis.c
index 89098f32..a642155 100644
--- a/arch/powerpc/xmon/ppc-dis.c
+++ b/arch/powerpc/xmon/ppc-dis.c
@@ -24,6 +24,7 @@ Software Foundation, 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, US
#include "ansidecl.h"
#include "ppc.h"
#include "dis-asm.h"
+#include <asm/cpufeatures.h>
/* Print a PowerPC or POWER instruction. */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature
2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao
2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao
@ 2013-09-02 5:45 ` Kevin Hao
2013-11-27 2:45 ` Benjamin Herrenschmidt
2013-09-02 5:45 ` [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature Kevin Hao
2 siblings, 1 reply; 6+ messages in thread
From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc
The cpu features are fixed once the probe of cpu features are done.
And the function cpu_has_feature() does be used in some hot path.
The checking of the cpu features for each time of invoking of
cpu_has_feature() seems suboptimal. This tries to reduce this
overhead of this check by using jump label. But we can only use
the jump label for this check only after the execution of
jump_label_init(), so we introduce another jump label to
still do the feature check by default before all the cpu
feature jump labels are initialized.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
v2: Include the jump_label.h instead of jump_label_base.h.
arch/powerpc/include/asm/cpufeatures.h | 27 +++++++++++++++++++++++++++
arch/powerpc/kernel/cputable.c | 23 +++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/arch/powerpc/include/asm/cpufeatures.h b/arch/powerpc/include/asm/cpufeatures.h
index 37650db..2631287 100644
--- a/arch/powerpc/include/asm/cpufeatures.h
+++ b/arch/powerpc/include/asm/cpufeatures.h
@@ -2,7 +2,33 @@
#define __ASM_POWERPC_CPUFEATURES_H
#include <asm/cputable.h>
+#ifdef CONFIG_JUMP_LABEL
+#include <linux/jump_label.h>
+#ifdef __powerpc64__
+#define MAX_CPU_FEATURES 64
+#else
+#define MAX_CPU_FEATURES 32
+#endif
+extern struct static_key cpu_feat_keys[MAX_CPU_FEATURES];
+extern struct static_key cpu_feat_keys_enabled;
+
+static inline int cpu_has_feature(unsigned long feature)
+{
+ if (CPU_FTRS_ALWAYS & feature)
+ return 1;
+
+ if (!(CPU_FTRS_POSSIBLE | feature))
+ return 0;
+
+ if (static_key_false(&cpu_feat_keys_enabled)) {
+ int i = __builtin_ctzl(feature);
+
+ return static_key_false(&cpu_feat_keys[i]);
+ } else
+ return !!(cur_cpu_spec->cpu_features & feature);
+}
+#else
static inline int cpu_has_feature(unsigned long feature)
{
return (CPU_FTRS_ALWAYS & feature) ||
@@ -10,5 +36,6 @@ static inline int cpu_has_feature(unsigned long feature)
& cur_cpu_spec->cpu_features
& feature);
}
+#endif
#endif /* __ASM_POWERPC_CPUFEATURE_H */
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 597d954..50bd216 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -21,6 +21,7 @@
#include <asm/prom.h> /* for PTRRELOC on ARCH=ppc */
#include <asm/mmu.h>
#include <asm/setup.h>
+#include <asm/cpufeatures.h>
struct cpu_spec* cur_cpu_spec = NULL;
EXPORT_SYMBOL(cur_cpu_spec);
@@ -2258,3 +2259,25 @@ struct cpu_spec * __init identify_cpu(unsigned long offset, unsigned int pvr)
return NULL;
}
+
+#ifdef CONFIG_JUMP_LABEL
+struct static_key cpu_feat_keys[MAX_CPU_FEATURES];
+struct static_key cpu_feat_keys_enabled;
+
+static __init int cpu_feat_keys_init(void)
+{
+ int i;
+
+ for (i = 0; i < MAX_CPU_FEATURES; i++) {
+ unsigned long f = 1 << i;
+
+ if (cur_cpu_spec->cpu_features & f)
+ static_key_slow_inc(&cpu_feat_keys[i]);
+ }
+
+ static_key_slow_inc(&cpu_feat_keys_enabled);
+
+ return 0;
+}
+early_initcall(cpu_feat_keys_init);
+#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature
2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao
2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao
2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao
@ 2013-09-02 5:45 ` Kevin Hao
2 siblings, 0 replies; 6+ messages in thread
From: Kevin Hao @ 2013-09-02 5:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc
The mmu features are fixed once the probe of mmu features are done.
And the function mmu_has_feature() does be used in some hot path.
The checking of the mmu features for each time of invoking of
mmu_has_feature() seems suboptimal. This tries to reduce this
overhead of this check by using jump label. But we can only use
the jump label for this check only after the execution of
jump_label_init(), so we introduce another jump label to
still do the feature check by default before all the mmu
feature jump labels are initialized.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
v2: Include the jump_label.h instead of jump_label_base.h.
arch/powerpc/include/asm/mmu.h | 19 +++++++++++++++++++
arch/powerpc/kernel/cputable.c | 20 ++++++++++++++++++++
2 files changed, 39 insertions(+)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 691fd8a..b7f049e 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -121,10 +121,29 @@
DECLARE_PER_CPU(int, next_tlbcam_idx);
#endif
+#ifdef CONFIG_JUMP_LABEL
+#include <linux/jump_label.h>
+
+#define MAX_MMU_FEATURES 32
+
+extern struct static_key mmu_feat_keys[MAX_MMU_FEATURES];
+extern struct static_key mmu_feat_keys_enabled;
+
+static inline int mmu_has_feature(unsigned long feature)
+{
+ if (static_key_false(&mmu_feat_keys_enabled)) {
+ int i = __builtin_ctzl(feature);
+
+ return static_key_false(&mmu_feat_keys[i]);
+ } else
+ return !!(cur_cpu_spec->mmu_features & feature);
+}
+#else
static inline int mmu_has_feature(unsigned long feature)
{
return (cur_cpu_spec->mmu_features & feature);
}
+#endif
static inline void mmu_clear_feature(unsigned long feature)
{
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 50bd216..785370b 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2280,4 +2280,24 @@ static __init int cpu_feat_keys_init(void)
return 0;
}
early_initcall(cpu_feat_keys_init);
+
+struct static_key mmu_feat_keys[MAX_MMU_FEATURES];
+struct static_key mmu_feat_keys_enabled;
+
+static __init int mmu_feat_keys_init(void)
+{
+ int i;
+
+ for (i = 0; i < MAX_MMU_FEATURES; i++) {
+ unsigned long f = 1 << i;
+
+ if (cur_cpu_spec->mmu_features & f)
+ static_key_slow_inc(&mmu_feat_keys[i]);
+ }
+
+ static_key_slow_inc(&mmu_feat_keys_enabled);
+
+ return 0;
+}
+early_initcall(mmu_feat_keys_init);
#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature
2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao
@ 2013-11-27 2:45 ` Benjamin Herrenschmidt
2013-12-25 5:58 ` Kevin Hao
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2013-11-27 2:45 UTC (permalink / raw)
To: Kevin Hao; +Cc: linuxppc
On Mon, 2013-09-02 at 13:45 +0800, Kevin Hao wrote:
> The cpu features are fixed once the probe of cpu features are done.
> And the function cpu_has_feature() does be used in some hot path.
> The checking of the cpu features for each time of invoking of
> cpu_has_feature() seems suboptimal. This tries to reduce this
> overhead of this check by using jump label. But we can only use
> the jump label for this check only after the execution of
> jump_label_init(), so we introduce another jump label to
> still do the feature check by default before all the cpu
> feature jump labels are initialized.
So I was looking at these and ...
> +static inline int cpu_has_feature(unsigned long feature)
> +{
> + if (CPU_FTRS_ALWAYS & feature)
> + return 1;
> +
> + if (!(CPU_FTRS_POSSIBLE | feature))
> + return 0;
> +
> + if (static_key_false(&cpu_feat_keys_enabled)) {
> + int i = __builtin_ctzl(feature);
> +
> + return static_key_false(&cpu_feat_keys[i]);
> + } else
> + return !!(cur_cpu_spec->cpu_features & feature);
> +}
This is gross :-)
Have you checked the generated code ? I'm worried that we end up hitting
at least 2 branches every time, which might be enough to defeat the
purposes even if they are unconditional in term of performance and
code size...
Cheers,
Ben.
> +#else
> static inline int cpu_has_feature(unsigned long feature)
> {
> return (CPU_FTRS_ALWAYS & feature) ||
> @@ -10,5 +36,6 @@ static inline int cpu_has_feature(unsigned long feature)
> & cur_cpu_spec->cpu_features
> & feature);
> }
> +#endif
>
> #endif /* __ASM_POWERPC_CPUFEATURE_H */
> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> index 597d954..50bd216 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -21,6 +21,7 @@
> #include <asm/prom.h> /* for PTRRELOC on ARCH=ppc */
> #include <asm/mmu.h>
> #include <asm/setup.h>
> +#include <asm/cpufeatures.h>
>
> struct cpu_spec* cur_cpu_spec = NULL;
> EXPORT_SYMBOL(cur_cpu_spec);
> @@ -2258,3 +2259,25 @@ struct cpu_spec * __init identify_cpu(unsigned long offset, unsigned int pvr)
>
> return NULL;
> }
> +
> +#ifdef CONFIG_JUMP_LABEL
> +struct static_key cpu_feat_keys[MAX_CPU_FEATURES];
> +struct static_key cpu_feat_keys_enabled;
> +
> +static __init int cpu_feat_keys_init(void)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_CPU_FEATURES; i++) {
> + unsigned long f = 1 << i;
> +
> + if (cur_cpu_spec->cpu_features & f)
> + static_key_slow_inc(&cpu_feat_keys[i]);
> + }
> +
> + static_key_slow_inc(&cpu_feat_keys_enabled);
> +
> + return 0;
> +}
> +early_initcall(cpu_feat_keys_init);
> +#endif
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature
2013-11-27 2:45 ` Benjamin Herrenschmidt
@ 2013-12-25 5:58 ` Kevin Hao
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Hao @ 2013-12-25 5:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc
[-- Attachment #1: Type: text/plain, Size: 6479 bytes --]
On Wed, Nov 27, 2013 at 01:45:41PM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2013-09-02 at 13:45 +0800, Kevin Hao wrote:
> > The cpu features are fixed once the probe of cpu features are done.
> > And the function cpu_has_feature() does be used in some hot path.
> > The checking of the cpu features for each time of invoking of
> > cpu_has_feature() seems suboptimal. This tries to reduce this
> > overhead of this check by using jump label. But we can only use
> > the jump label for this check only after the execution of
> > jump_label_init(), so we introduce another jump label to
> > still do the feature check by default before all the cpu
> > feature jump labels are initialized.
>
> So I was looking at these and ...
Sorry for the delayed response.
>
> > +static inline int cpu_has_feature(unsigned long feature)
> > +{
> > + if (CPU_FTRS_ALWAYS & feature)
> > + return 1;
> > +
> > + if (!(CPU_FTRS_POSSIBLE | feature))
> > + return 0;
> > +
> > + if (static_key_false(&cpu_feat_keys_enabled)) {
> > + int i = __builtin_ctzl(feature);
> > +
> > + return static_key_false(&cpu_feat_keys[i]);
> > + } else
> > + return !!(cur_cpu_spec->cpu_features & feature);
> > +}
>
> This is gross :-)
>
> Have you checked the generated code ? I'm worried that we end up hitting
> at least 2 branches every time,
No, we would get 2 unconditional branches at worst. The following is the
disassemble of part code in switch_mm() when jump label is enabled.
68 /* We must stop all altivec streams before changing the HW
69 * context
70 */
71 #ifdef CONFIG_ALTIVEC
72 if (cpu_has_feature(CPU_FTR_ALTIVEC))
73 asm volatile ("dssall");
74 #endif /* CONFIG_ALTIVEC */
c0000000005c42f4: 60 00 00 00 nop
c0000000005c42f8: 3d 02 00 01 addis r8,r2,1
c0000000005c42fc: 39 28 f6 b8 addi r9,r8,-2376
c0000000005c4300: e9 29 00 00 ld r9,0(r9)
c0000000005c4304: e9 29 00 10 ld r9,16(r9)
c0000000005c4308: 79 2a ef e3 rldicl. r10,r9,61,63
c0000000005c430c: 41 82 00 08 beq c0000000005c4314 <.__schedule+0x27c>
c0000000005c4310: 7e 00 06 6c dssall
c0000000005c4314: 7f 43 d3 78 mr r3,r26
c0000000005c4318: 7f a4 eb 78 mr r4,r29
c0000000005c431c: 4b a5 ff 71 bl c00000000002428c <.switch_mmu_context>
c0000000005c4320: 60 00 00 00 nop
....
c0000000005c4400: 60 00 00 00 nop
c0000000005c4404: 7f 43 d3 78 mr r3,r26
c0000000005c4408: 7f a4 eb 78 mr r4,r29
c0000000005c440c: 4b a5 fe 81 bl c00000000002428c <.switch_mmu_context>
c0000000005c4410: 60 00 00 00 nop
On a p5020 board which doesn't support altivec, the code would change
to the following after jump label init.
c0000000005c42f4: b c0000000005c4400
The final instruction sequence should be just a branch.
On a t4240 board which does have altivec support, the code would change to:
c0000000005c42f4: b c0000000005c4400
...
c0000000005c4400: b c0000000005c4310
The final instruction sequence should be two unconditional branches.
The following is the disassemble code when jump label is disabled.
c0000000005c26fc: 60 00 00 00 nop
c0000000005c2700: 3d 02 00 01 addis r8,r2,1
c0000000005c2704: 39 28 24 b8 addi r9,r8,9400
c0000000005c2708: e9 29 00 00 ld r9,0(r9)
c0000000005c270c: e9 29 00 10 ld r9,16(r9)
c0000000005c2710: 79 2a ef e3 rldicl. r10,r9,61,63
c0000000005c2714: 40 82 00 d4 bne c0000000005c27e8 <.__schedule+0x360>
c0000000005c2718: 7f 43 d3 78 mr r3,r26
c0000000005c271c: 7f a4 eb 78 mr r4,r29
c0000000005c2720: 4b a6 0a 75 bl c000000000023194 <.switch_mmu_context>
c0000000005c2724: 60 00 00 00 nop
....
c0000000005c27e8: 7e 00 06 6c dssall
c0000000005c27ec: 4b ff ff 2c b c0000000005c2718 <.__schedule+0x290>
c0000000005c27f0: e9 3e 00 00 ld r9,0(r30)
c0000000005c27f4: 71 2a 00 81 andi. r10,r9,129
c0000000005c27f8: 40 82 00 94 bne c0000000005c288c <.__schedule+0x404>
The final instruction sequence is following:
addis r8,r2,1
addi r9,r8,9400
ld r9,0(r9)
ld r9,16(r9)
rldicl. r10,r9,61,63
bne c0000000005c27e8
> which might be enough to defeat the
> purposes even if they are unconditional in term of performance and
> code size...
It does result in an increase in the code size due to the enable of jump label.
before:
.text .data
005c4ff4 0005ede8
after:
.text .data
005c6c04 0005fe68
Thanks,
Kevin
>
> Cheers,
> Ben.
>
> > +#else
> > static inline int cpu_has_feature(unsigned long feature)
> > {
> > return (CPU_FTRS_ALWAYS & feature) ||
> > @@ -10,5 +36,6 @@ static inline int cpu_has_feature(unsigned long feature)
> > & cur_cpu_spec->cpu_features
> > & feature);
> > }
> > +#endif
> >
> > #endif /* __ASM_POWERPC_CPUFEATURE_H */
> > diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> > index 597d954..50bd216 100644
> > --- a/arch/powerpc/kernel/cputable.c
> > +++ b/arch/powerpc/kernel/cputable.c
> > @@ -21,6 +21,7 @@
> > #include <asm/prom.h> /* for PTRRELOC on ARCH=ppc */
> > #include <asm/mmu.h>
> > #include <asm/setup.h>
> > +#include <asm/cpufeatures.h>
> >
> > struct cpu_spec* cur_cpu_spec = NULL;
> > EXPORT_SYMBOL(cur_cpu_spec);
> > @@ -2258,3 +2259,25 @@ struct cpu_spec * __init identify_cpu(unsigned long offset, unsigned int pvr)
> >
> > return NULL;
> > }
> > +
> > +#ifdef CONFIG_JUMP_LABEL
> > +struct static_key cpu_feat_keys[MAX_CPU_FEATURES];
> > +struct static_key cpu_feat_keys_enabled;
> > +
> > +static __init int cpu_eat_keys_init(void)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < MAX_CPU_FEATURES; i++) {
> > + unsigned long f = 1 << i;
> > +
> > + if (cur_cpu_spec->cpu_features & f)
> > + static_key_slow_inc(&cpu_feat_keys[i]);
> > + }
> > +
> > + static_key_slow_inc(&cpu_feat_keys_enabled);
> > +
> > + return 0;
> > +}
> > +early_initcall(cpu_feat_keys_init);
> > +#endif
>
>
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-25 5:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-02 5:45 [PATCH v2 0/3] powerpc: use jump label for cpu/mmu_has_feature Kevin Hao
2013-09-02 5:45 ` [PATCH v2 1/3] powerpc: move the cpu_has_feature to a separate file Kevin Hao
2013-09-02 5:45 ` [PATCH v2 2/3] powerpc: use the jump label for cpu_has_feature Kevin Hao
2013-11-27 2:45 ` Benjamin Herrenschmidt
2013-12-25 5:58 ` Kevin Hao
2013-09-02 5:45 ` [PATCH v2 3/3] powerpc: use jump label for mmu_has_feature Kevin Hao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).