* [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction
@ 2008-04-14 12:23 ehrhardt
2008-04-14 21:33 ` Hollis Blanchard
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: ehrhardt @ 2008-04-14 12:23 UTC (permalink / raw)
To: kvm-ppc
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
This extends the kvm_stat reports for kvm on embedded powerpc. Since kvmppc is
using emulation (no hardware support yet) this gives people interested in
performance a detailed split of the emul_instruction counter already available.
This statistic does not cover e.g. operants of the commands, but that way it
should have only a small perf impact (never break what you want to measure).
This feature is configurable in .config under the kvmppc virtualization itself.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
---
[diffstat]
arch/powerpc/kvm/Kconfig | 8 +++
arch/powerpc/kvm/emulate.c | 61 +++++++++++++++++++++++++++++
arch/powerpc/kvm/powerpc.c | 84 +++++++++++++++++++++++++++++++++++++++++
include/asm-powerpc/kvm_host.h | 40 +++++++++++++++++++
include/asm-powerpc/kvm_ppc.h | 50 ++++++++++++++++++++++++
5 files changed, 243 insertions(+)
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -41,6 +41,14 @@ config KVM_POWERPC_440
---help---
Provides support for KVM on 440 processors.
+config KVM_POWERPC_440_INSTRUCTION_STAT
+ bool "ppc440 instruction emulation statistics"
+ depends on KVM && 44x && KVM_POWERPC_440
+ ---help---
+ Adds the tracking of the different emulated instructions. This is
+ used to debug performance issues and adds a slight runtime overhead.
+ If unsure, say N.
+
config KVM_PPC_VIRTIO
bool "Virtio Support"
select VIRTIO
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -87,6 +87,14 @@ static inline unsigned int get_d(u32 ins
static inline unsigned int get_d(u32 inst)
{
return inst & 0xffff;
+}
+
+static inline void emulinstr_stat(struct kvm_vcpu *vcpu,
+ enum kvmppc_emulated_instructions kvmppc_emulinstr)
+{
+#ifdef CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT
+ (*(u32 *)((void *)vcpu + kvmppc_emulinstr_offset[kvmppc_emulinstr]))++;
+#endif
}
static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
@@ -229,6 +237,7 @@ int kvmppc_emulate_instruction(struct kv
switch (get_op(inst)) {
case 3: /* trap */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_TRAP);
if (get_d(inst) = 1) {
/* FIXME port to final hypercall API when defined */
printk(KERN_INFO"Guest requested shutdown\n");
@@ -242,6 +251,7 @@ int kvmppc_emulate_instruction(struct kv
case 19:
switch (get_xop(inst)) {
case 50: /* rfi */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_RFI);
kvmppc_emul_rfi(vcpu);
advance = 0;
break;
@@ -256,32 +266,39 @@ int kvmppc_emulate_instruction(struct kv
switch (get_xop(inst)) {
case 83: /* mfmsr */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_MFMSR);
rt = get_rt(inst);
vcpu->arch.gpr[rt] = vcpu->arch.msr;
break;
case 87: /* lbzx */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LBZX);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
case 131: /* wrtee */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_WRTEE);
rs = get_rs(inst);
vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
| (vcpu->arch.gpr[rs] & MSR_EE);
break;
case 146: /* mtmsr */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_MTMSR);
rs = get_rs(inst);
kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]);
break;
case 163: /* wrteei */
+ emulinstr_stat(vcpu,
+ KVMPPC_EMULATED_INSTRUCTION_WRTEEI);
vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
| (inst & MSR_EE);
break;
case 215: /* stbx */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STBX);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
@@ -289,6 +306,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 247: /* stbux */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STBUX);
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
@@ -304,11 +322,13 @@ int kvmppc_emulate_instruction(struct kv
break;
case 279: /* lhzx */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LHZX);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
break;
case 311: /* lhzux */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LHZUX);
rt = get_rt(inst);
ra = get_ra(inst);
rb = get_rb(inst);
@@ -322,6 +342,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 323: /* mfdcr */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_MFDCR);
dcrn = get_dcrn(inst);
rt = get_rt(inst);
@@ -349,6 +370,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 339: /* mfspr */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_MFSPR);
sprn = get_sprn(inst);
rt = get_rt(inst);
@@ -438,6 +460,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 407: /* sthx */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STHX);
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
@@ -448,6 +471,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 439: /* sthux */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STHUX);
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
@@ -463,6 +487,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 451: /* mtdcr */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_MTDCR);
dcrn = get_dcrn(inst);
rs = get_rs(inst);
@@ -482,6 +507,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 467: /* mtspr */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_MTSPR);
sprn = get_sprn(inst);
rs = get_rs(inst);
switch (sprn) {
@@ -588,6 +614,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 470: /* dcbi */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_DCBI);
/* Do nothing. The guest is performing dcbi because
* hardware DMA is not snooped by the dcache, but
* emulated DMA either goes through the dcache as
@@ -596,14 +623,19 @@ int kvmppc_emulate_instruction(struct kv
break;
case 534: /* lwbrx */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LWBRX);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 0);
break;
case 566: /* tlbsync */
+ emulinstr_stat(vcpu,
+ KVMPPC_EMULATED_INSTRUCTION_TLBSYNC);
break;
case 662: /* stwbrx */
+ emulinstr_stat(vcpu,
+ KVMPPC_EMULATED_INSTRUCTION_STWBRX);
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
@@ -614,6 +646,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 978: /* tlbwe */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_TLBWE);
emulated = kvmppc_emul_tlbwe(vcpu, inst);
break;
@@ -621,6 +654,7 @@ int kvmppc_emulate_instruction(struct kv
int index;
unsigned int as = get_mmucr_sts(vcpu);
unsigned int pid = get_mmucr_stid(vcpu);
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_TLBSX);
rt = get_rt(inst);
ra = get_ra(inst);
@@ -644,11 +678,14 @@ int kvmppc_emulate_instruction(struct kv
break;
case 790: /* lhbrx */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LHBRX);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 0);
break;
case 918: /* sthbrx */
+ emulinstr_stat(vcpu,
+ KVMPPC_EMULATED_INSTRUCTION_STHBRX);
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
@@ -659,6 +696,7 @@ int kvmppc_emulate_instruction(struct kv
break;
case 966: /* iccci */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_ICCCI);
break;
default:
@@ -670,11 +708,14 @@ int kvmppc_emulate_instruction(struct kv
break;
case 32: /* lwz */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LWZ);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
break;
case 33: /* lwzu */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LWZU);
+ rt = get_rt(inst);
ra = get_ra(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
@@ -682,11 +723,15 @@ int kvmppc_emulate_instruction(struct kv
break;
case 34: /* lbz */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LBZ);
+ rt = get_rt(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
case 35: /* lbzu */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LBZU);
+ rt = get_rt(inst);
ra = get_ra(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
@@ -694,12 +739,16 @@ int kvmppc_emulate_instruction(struct kv
break;
case 36: /* stw */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STW);
+ rt = get_rt(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
4, 1);
break;
case 37: /* stwu */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STWU);
+ rt = get_rt(inst);
ra = get_ra(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
@@ -708,12 +757,16 @@ int kvmppc_emulate_instruction(struct kv
break;
case 38: /* stb */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STB);
+ rt = get_rt(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
1, 1);
break;
case 39: /* stbu */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STBU);
+ rt = get_rt(inst);
ra = get_ra(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
@@ -722,11 +775,15 @@ int kvmppc_emulate_instruction(struct kv
break;
case 40: /* lhz */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LHZ);
+ rt = get_rt(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
break;
case 41: /* lhzu */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_LHZU);
+ rt = get_rt(inst);
ra = get_ra(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
@@ -734,12 +791,16 @@ int kvmppc_emulate_instruction(struct kv
break;
case 44: /* sth */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STH);
+ rt = get_rt(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
2, 1);
break;
case 45: /* sthu */
+ emulinstr_stat(vcpu, KVMPPC_EMULATED_INSTRUCTION_STHU);
+ rt = get_rt(inst);
ra = get_ra(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -56,8 +56,92 @@ struct kvm_stats_debugfs_item debugfs_en
{ "inst_emu", VCPU_STAT(emulated_inst_exits) },
{ "dec", VCPU_STAT(dec_exits) },
{ "ext_intr", VCPU_STAT(ext_intr_exits) },
+#ifdef CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT
+ { "instr_trap", VCPU_STAT(instr_trap) },
+ { "instr_rfi", VCPU_STAT(instr_rfi) },
+ { "instr_mfmsr", VCPU_STAT(instr_mfmsr) },
+ { "instr_lbzx", VCPU_STAT(instr_lbzx) },
+ { "instr_wrtee", VCPU_STAT(instr_wrtee) },
+ { "instr_mtmsr", VCPU_STAT(instr_mtmsr) },
+ { "instr_wrteei", VCPU_STAT(instr_wrteei) },
+ { "instr_stbx", VCPU_STAT(instr_stbx) },
+ { "instr_stbux", VCPU_STAT(instr_stbux) },
+ { "instr_lhzx", VCPU_STAT(instr_lhzx) },
+ { "instr_lhzux", VCPU_STAT(instr_lhzux) },
+ { "instr_mfdcr", VCPU_STAT(instr_mfdcr) },
+ { "instr_mfspr", VCPU_STAT(instr_mfspr) },
+ { "instr_sthx", VCPU_STAT(instr_sthx) },
+ { "instr_sthux", VCPU_STAT(instr_sthux) },
+ { "instr_mtdcr", VCPU_STAT(instr_mtdcr) },
+ { "instr_mtspr", VCPU_STAT(instr_mtspr) },
+ { "instr_dcbi", VCPU_STAT(instr_dcbi) },
+ { "instr_lwbrx", VCPU_STAT(instr_lwbrx) },
+ { "instr_tlbsync", VCPU_STAT(instr_tlbsync) },
+ { "instr_stwbrx", VCPU_STAT(instr_stwbrx) },
+ { "instr_tlbwe", VCPU_STAT(instr_tlbwe) },
+ { "instr_tlbsx", VCPU_STAT(instr_tlbsx) },
+ { "instr_lhbrx", VCPU_STAT(instr_lhbrx) },
+ { "instr_sthbrx", VCPU_STAT(instr_sthbrx) },
+ { "instr_iccci", VCPU_STAT(instr_iccci) },
+ { "instr_lwz", VCPU_STAT(instr_lwz) },
+ { "instr_lwzu", VCPU_STAT(instr_lwzu) },
+ { "instr_lbz", VCPU_STAT(instr_lbz) },
+ { "instr_lbzu", VCPU_STAT(instr_lbzu) },
+ { "instr_stw", VCPU_STAT(instr_stw) },
+ { "instr_stwu", VCPU_STAT(instr_stwu) },
+ { "instr_stb", VCPU_STAT(instr_stb) },
+ { "instr_stbu", VCPU_STAT(instr_stbu) },
+ { "instr_lhz", VCPU_STAT(instr_lhz) },
+ { "instr_lhzu", VCPU_STAT(instr_lhzu) },
+ { "instr_sth", VCPU_STAT(instr_sth) },
+ { "instr_sthu", VCPU_STAT(instr_sthu) },
+#endif /* CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT */
{ NULL }
};
+
+#ifdef CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT
+#define INST_STAT_OFFSET(x) offsetof(struct kvm_vcpu, stat.x)
+const int kvmppc_emulinstr_offset[] = {
+ [KVMPPC_EMULATED_INSTRUCTION_TRAP] = INST_STAT_OFFSET(instr_trap),
+ [KVMPPC_EMULATED_INSTRUCTION_RFI] = INST_STAT_OFFSET(instr_rfi),
+ [KVMPPC_EMULATED_INSTRUCTION_MFMSR] = INST_STAT_OFFSET(instr_mfmsr),
+ [KVMPPC_EMULATED_INSTRUCTION_LBZX] = INST_STAT_OFFSET(instr_lbzx),
+ [KVMPPC_EMULATED_INSTRUCTION_WRTEE] = INST_STAT_OFFSET(instr_wrtee),
+ [KVMPPC_EMULATED_INSTRUCTION_MTMSR] = INST_STAT_OFFSET(instr_mtmsr),
+ [KVMPPC_EMULATED_INSTRUCTION_WRTEEI] = INST_STAT_OFFSET(instr_wrteei),
+ [KVMPPC_EMULATED_INSTRUCTION_STBX] = INST_STAT_OFFSET(instr_stbx),
+ [KVMPPC_EMULATED_INSTRUCTION_STBUX] = INST_STAT_OFFSET(instr_stbux),
+ [KVMPPC_EMULATED_INSTRUCTION_LHZX] = INST_STAT_OFFSET(instr_lhzx),
+ [KVMPPC_EMULATED_INSTRUCTION_LHZUX] = INST_STAT_OFFSET(instr_lhzux),
+ [KVMPPC_EMULATED_INSTRUCTION_MFDCR] = INST_STAT_OFFSET(instr_mfdcr),
+ [KVMPPC_EMULATED_INSTRUCTION_MFSPR] = INST_STAT_OFFSET(instr_mfspr),
+ [KVMPPC_EMULATED_INSTRUCTION_STHX] = INST_STAT_OFFSET(instr_sthx),
+ [KVMPPC_EMULATED_INSTRUCTION_STHUX] = INST_STAT_OFFSET(instr_sthux),
+ [KVMPPC_EMULATED_INSTRUCTION_MTDCR] = INST_STAT_OFFSET(instr_mtdcr),
+ [KVMPPC_EMULATED_INSTRUCTION_MTSPR] = INST_STAT_OFFSET(instr_mtspr),
+ [KVMPPC_EMULATED_INSTRUCTION_DCBI] = INST_STAT_OFFSET(instr_dcbi),
+ [KVMPPC_EMULATED_INSTRUCTION_LWBRX] = INST_STAT_OFFSET(instr_lwbrx),
+ [KVMPPC_EMULATED_INSTRUCTION_TLBSYNC] = INST_STAT_OFFSET(instr_tlbsync),
+ [KVMPPC_EMULATED_INSTRUCTION_STWBRX] = INST_STAT_OFFSET(instr_stwbrx),
+ [KVMPPC_EMULATED_INSTRUCTION_TLBWE] = INST_STAT_OFFSET(instr_tlbwe),
+ [KVMPPC_EMULATED_INSTRUCTION_TLBSX] = INST_STAT_OFFSET(instr_tlbsx),
+ [KVMPPC_EMULATED_INSTRUCTION_LHBRX] = INST_STAT_OFFSET(instr_lhbrx),
+ [KVMPPC_EMULATED_INSTRUCTION_STHBRX] = INST_STAT_OFFSET(instr_sthbrx),
+ [KVMPPC_EMULATED_INSTRUCTION_ICCCI] = INST_STAT_OFFSET(instr_iccci),
+ [KVMPPC_EMULATED_INSTRUCTION_LWZ] = INST_STAT_OFFSET(instr_lwz),
+ [KVMPPC_EMULATED_INSTRUCTION_LWZU] = INST_STAT_OFFSET(instr_lwzu),
+ [KVMPPC_EMULATED_INSTRUCTION_LBZ] = INST_STAT_OFFSET(instr_lbz),
+ [KVMPPC_EMULATED_INSTRUCTION_LBZU] = INST_STAT_OFFSET(instr_lbzu),
+ [KVMPPC_EMULATED_INSTRUCTION_STW] = INST_STAT_OFFSET(instr_stw),
+ [KVMPPC_EMULATED_INSTRUCTION_STWU] = INST_STAT_OFFSET(instr_stwu),
+ [KVMPPC_EMULATED_INSTRUCTION_STB] = INST_STAT_OFFSET(instr_stb),
+ [KVMPPC_EMULATED_INSTRUCTION_STBU] = INST_STAT_OFFSET(instr_stbu),
+ [KVMPPC_EMULATED_INSTRUCTION_LHZ] = INST_STAT_OFFSET(instr_lhz),
+ [KVMPPC_EMULATED_INSTRUCTION_LHZU] = INST_STAT_OFFSET(instr_lhzu),
+ [KVMPPC_EMULATED_INSTRUCTION_STH] = INST_STAT_OFFSET(instr_sth),
+ [KVMPPC_EMULATED_INSTRUCTION_STHU] = INST_STAT_OFFSET(instr_sthu)
+};
+#endif /* CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT */
static int kvmppc_debug;
diff --git a/include/asm-powerpc/kvm_host.h b/include/asm-powerpc/kvm_host.h
--- a/include/asm-powerpc/kvm_host.h
+++ b/include/asm-powerpc/kvm_host.h
@@ -59,6 +59,46 @@ struct kvm_vcpu_stat {
u32 emulated_inst_exits;
u32 dec_exits;
u32 ext_intr_exits;
+#ifdef CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT
+ u32 instr_trap;
+ u32 instr_rfi;
+ u32 instr_mfmsr;
+ u32 instr_lbzx;
+ u32 instr_wrtee;
+ u32 instr_mtmsr;
+ u32 instr_wrteei;
+ u32 instr_stbx;
+ u32 instr_stbux;
+ u32 instr_lhzx;
+ u32 instr_lhzux;
+ u32 instr_mfdcr;
+ u32 instr_mfspr;
+ u32 instr_sthx;
+ u32 instr_sthux;
+ u32 instr_mtdcr;
+ u32 instr_mtspr;
+ u32 instr_dcbi;
+ u32 instr_lwbrx;
+ u32 instr_tlbsync;
+ u32 instr_stwbrx;
+ u32 instr_tlbwe;
+ u32 instr_tlbsx;
+ u32 instr_lhbrx;
+ u32 instr_sthbrx;
+ u32 instr_iccci;
+ u32 instr_lwz;
+ u32 instr_lwzu;
+ u32 instr_lbz;
+ u32 instr_lbzu;
+ u32 instr_stw;
+ u32 instr_stwu;
+ u32 instr_stb;
+ u32 instr_stbu;
+ u32 instr_lhz;
+ u32 instr_lhzu;
+ u32 instr_sth;
+ u32 instr_sthu;
+#endif /* CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT */
};
struct tlbe {
diff --git a/include/asm-powerpc/kvm_ppc.h b/include/asm-powerpc/kvm_ppc.h
--- a/include/asm-powerpc/kvm_ppc.h
+++ b/include/asm-powerpc/kvm_ppc.h
@@ -41,6 +41,56 @@ enum emulation_result {
EMULATE_FAIL, /* can't emulate this instruction */
EMULATE_SHUTDOWN, /* shutdown guest requested (no kvm_run data) */
};
+
+enum kvmppc_emulated_instructions {
+ KVMPPC_EMULATED_INSTRUCTION_TRAP,
+ KVMPPC_EMULATED_INSTRUCTION_RFI,
+ KVMPPC_EMULATED_INSTRUCTION_MFMSR,
+ KVMPPC_EMULATED_INSTRUCTION_LBZX,
+ KVMPPC_EMULATED_INSTRUCTION_WRTEE,
+ KVMPPC_EMULATED_INSTRUCTION_MTMSR,
+ KVMPPC_EMULATED_INSTRUCTION_WRTEEI,
+ KVMPPC_EMULATED_INSTRUCTION_STBX,
+ KVMPPC_EMULATED_INSTRUCTION_STBUX,
+ KVMPPC_EMULATED_INSTRUCTION_LHZX,
+ KVMPPC_EMULATED_INSTRUCTION_LHZUX,
+ KVMPPC_EMULATED_INSTRUCTION_MFDCR,
+ KVMPPC_EMULATED_INSTRUCTION_MFSPR,
+ KVMPPC_EMULATED_INSTRUCTION_STHX,
+ KVMPPC_EMULATED_INSTRUCTION_STHUX,
+ KVMPPC_EMULATED_INSTRUCTION_MTDCR,
+ KVMPPC_EMULATED_INSTRUCTION_MTSPR,
+ KVMPPC_EMULATED_INSTRUCTION_DCBI,
+ KVMPPC_EMULATED_INSTRUCTION_LWBRX,
+ KVMPPC_EMULATED_INSTRUCTION_TLBSYNC,
+ KVMPPC_EMULATED_INSTRUCTION_STWBRX,
+ KVMPPC_EMULATED_INSTRUCTION_TLBWE,
+ KVMPPC_EMULATED_INSTRUCTION_TLBSX,
+ KVMPPC_EMULATED_INSTRUCTION_LHBRX,
+ KVMPPC_EMULATED_INSTRUCTION_STHBRX,
+ KVMPPC_EMULATED_INSTRUCTION_ICCCI,
+ KVMPPC_EMULATED_INSTRUCTION_LWZ,
+ KVMPPC_EMULATED_INSTRUCTION_LWZU,
+ KVMPPC_EMULATED_INSTRUCTION_LBZ,
+ KVMPPC_EMULATED_INSTRUCTION_LBZU,
+ KVMPPC_EMULATED_INSTRUCTION_STW,
+ KVMPPC_EMULATED_INSTRUCTION_STWU,
+ KVMPPC_EMULATED_INSTRUCTION_STB,
+ KVMPPC_EMULATED_INSTRUCTION_STBU,
+ KVMPPC_EMULATED_INSTRUCTION_LHZ,
+ KVMPPC_EMULATED_INSTRUCTION_LHZU,
+ KVMPPC_EMULATED_INSTRUCTION_STH,
+ KVMPPC_EMULATED_INSTRUCTION_STHU,
+};
+
+#ifdef CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT
+/*
+ * offset definitions to map&update counters with low runtime overhead
+ * directly in vcpu->stat.x (no new op/xop -> counter mapping needed, this
+ * uses the already implemented switch/case in emulate_instruction).
+*/
+extern const int kvmppc_emulinstr_offset[];
+#endif /* CONFIG_KVM_POWERPC_440_INSTRUCTION_STAT */
extern const unsigned char exception_priority[];
extern const unsigned char priority_exception[];
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-ppc-devel mailing list
kvm-ppc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction
2008-04-14 12:23 [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction ehrhardt
@ 2008-04-14 21:33 ` Hollis Blanchard
2008-04-15 7:34 ` Christian Ehrhardt
2008-04-15 14:41 ` Hollis Blanchard
2 siblings, 0 replies; 4+ messages in thread
From: Hollis Blanchard @ 2008-04-14 21:33 UTC (permalink / raw)
To: kvm-ppc
On Monday 14 April 2008 07:23:56 ehrhardt@linux.vnet.ibm.com wrote:
> From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
>
> This extends the kvm_stat reports for kvm on embedded powerpc. Since kvmppc
> is using emulation (no hardware support yet) this gives people interested
> in performance a detailed split of the emul_instruction counter already
> available. This statistic does not cover e.g. operants of the commands, but
> that way it should have only a small perf impact (never break what you want
> to measure). This feature is configurable in .config under the kvmppc
> virtualization itself.
This array-based approach seems to add a lot of lines of code, and it's also
copy/paste stuff that is just begging for a typo (e.g. miscounting STHX in
the STHUX bucket) or being forgotten entirely when adding new emulation code.
A more general approach would be to record just the opcode/extended opcode in
a variable-sized structure, allocating a new bucket as we encounter a
previously unseen instruction. I'm thinking of something like this:
log_instruction(inst) {
bucket = hash_u32(inst, 5);
list_for_each(l, vcpu->instlog[bucket]) {
struct instlog = list_entry(l);
if (instlog->inst = inst) {
instlog->count++;
break;
}
}
}
emulate(inst) {
log = get_op(inst);
switch (get_op(inst)) {
...
case 31:
log |= get_xop(inst);
switch (inst) {
...
}
}
log_instruction(log);
}
It looks like we could build a hash table pretty easily with hash_long() and
list_entry stuff (see fs/mbcache.c for example). So far you've found 17
different instructions types emulated in the "boot" workload, so 32 entries
would probably be a reasonable hash size. The same approach could be used for
SPR accesses, where you've hit 31 different registers. Really I think those
numbers won't vary much by workload, but rather by guest kernel...
--
Hollis Blanchard
IBM Linux Technology Center
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-ppc-devel mailing list
kvm-ppc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction
2008-04-14 12:23 [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction ehrhardt
2008-04-14 21:33 ` Hollis Blanchard
@ 2008-04-15 7:34 ` Christian Ehrhardt
2008-04-15 14:41 ` Hollis Blanchard
2 siblings, 0 replies; 4+ messages in thread
From: Christian Ehrhardt @ 2008-04-15 7:34 UTC (permalink / raw)
To: kvm-ppc
Hollis Blanchard wrote:
> On Monday 14 April 2008 07:23:56 ehrhardt@linux.vnet.ibm.com wrote:
>> From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
>>
>> This extends the kvm_stat reports for kvm on embedded powerpc. Since kvmppc
>> is using emulation (no hardware support yet) this gives people interested
>> in performance a detailed split of the emul_instruction counter already
>> available. This statistic does not cover e.g. operants of the commands, but
>> that way it should have only a small perf impact (never break what you want
>> to measure). This feature is configurable in .config under the kvmppc
>> virtualization itself.
>
> This array-based approach seems to add a lot of lines of code, and it's also
> copy/paste stuff that is just begging for a typo (e.g. miscounting STHX in
> the STHUX bucket) or being forgotten entirely when adding new emulation code.
>
> A more general approach would be to record just the opcode/extended opcode in
> a variable-sized structure, allocating a new bucket as we encounter a
> previously unseen instruction. I'm thinking of something like this:
I already thought about some bucket like counting, but I focused small runtime overhead above everything else here by hijacking the already existent switch/case and avoiding any new if's & function calls.
I agree that the array like solution is messy in the aspect of lines and is more error prone by forgetting something.
> log_instruction(inst) {
> bucket = hash_u32(inst, 5);
> list_for_each(l, vcpu->instlog[bucket]) {
> struct instlog = list_entry(l);
> if (instlog->inst = inst) {
> instlog->count++;
> break;
> }
> }
> }
>
> emulate(inst) {
> log = get_op(inst);
> switch (get_op(inst)) {
> ...
> case 31:
> log |= get_xop(inst);
> switch (inst) {
> ...
> }
> }
> log_instruction(log);
> }
I agree that this looks better, but is has per instruction:
1x hash function
a loop wich runs ~1-2 times
an if (do we actually have long pipelines that suffer from wrong branch prediction?)
When we add more functions here like mapping sprn's and whatever else comes in mind later this might get even more complex, while this statistics should be the low overhead stats.
The thing I want to point out is that the tracing with the relay channel also only have (speaking of the reduced version discussed in the other thread with only 3 integers) 3 integers to move into a buffer and some overhead to select the right buffer. The userspace application reads 4096b => 341 records per scheduling of the reading app and we might even increase that size. I really like the "full trace" feeling of the relay based patch, so I don't like to increase the functionality&overhead of this patch which was intended to be low-overhead.
As stated above I agree with you about lines of code and error proneness of that patch. What about that:
- for integration into our upstream code only the relay based approach is take into account, giving a full trace while it has some overhead
- we keep the slight overhead, but error prone kvm_stat based patch in our private queues and only use it for our own measurements until we either need or change it once we have more experience with the relay based tracing (e.g. know the perf. overhead of that better)
- That way we would have two very similar solutions that share code and are configurable so developers can use them as they want
- additionally I think again about Arnd's comments and if we might need/want a runtime switch e.g. proc/sys interface to enable/disable that tracing functions on the fly
Comments welcome
> It looks like we could build a hash table pretty easily with hash_long() and
> list_entry stuff (see fs/mbcache.c for example). So far you've found 17
> different instructions types emulated in the "boot" workload, so 32 entries
> would probably be a reasonable hash size. The same approach could be used for
> SPR accesses, where you've hit 31 different registers. Really I think those
> numbers won't vary much by workload, but rather by guest kernel...
>
--
Grüsse / regards,
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-ppc-devel mailing list
kvm-ppc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction
2008-04-14 12:23 [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction ehrhardt
2008-04-14 21:33 ` Hollis Blanchard
2008-04-15 7:34 ` Christian Ehrhardt
@ 2008-04-15 14:41 ` Hollis Blanchard
2 siblings, 0 replies; 4+ messages in thread
From: Hollis Blanchard @ 2008-04-15 14:41 UTC (permalink / raw)
To: kvm-ppc
Could you please use a blank line when you intend to start a new paragraph in
your email? The lack of whitespace here hurts readability.
On Tuesday 15 April 2008 02:34:47 Christian Ehrhardt wrote:
> Hollis Blanchard wrote:
> > On Monday 14 April 2008 07:23:56 ehrhardt@linux.vnet.ibm.com wrote:
> >> From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
> >>
> >> This extends the kvm_stat reports for kvm on embedded powerpc. Since
> >> kvmppc is using emulation (no hardware support yet) this gives people
> >> interested in performance a detailed split of the emul_instruction
> >> counter already available. This statistic does not cover e.g. operants
> >> of the commands, but that way it should have only a small perf impact
> >> (never break what you want to measure). This feature is configurable in
> >> .config under the kvmppc virtualization itself.
> >
> > This array-based approach seems to add a lot of lines of code, and it's
> > also copy/paste stuff that is just begging for a typo (e.g. miscounting
> > STHX in the STHUX bucket) or being forgotten entirely when adding new
> > emulation code.
> >
> > A more general approach would be to record just the opcode/extended
> > opcode in a variable-sized structure, allocating a new bucket as we
> > encounter a previously unseen instruction. I'm thinking of something like
> > this:
>
> I already thought about some bucket like counting, but I focused small
> runtime overhead above everything else here by hijacking the already
> existent switch/case and avoiding any new if's & function calls. I agree
> that the array like solution is messy in the aspect of lines and is more
> error prone by forgetting something.
>
> > log_instruction(inst) {
> > bucket = hash_u32(inst, 5);
> > list_for_each(l, vcpu->instlog[bucket]) {
> > struct instlog = list_entry(l);
> > if (instlog->inst = inst) {
> > instlog->count++;
> > break;
> > }
> > }
> > }
> >
> > emulate(inst) {
> > log = get_op(inst);
> > switch (get_op(inst)) {
> > ...
> > case 31:
> > log |= get_xop(inst);
> > switch (inst) {
> > ...
> > }
> > }
> > log_instruction(log);
> > }
>
> I agree that this looks better, but is has per instruction:
> 1x hash function
This is a shift.
> a loop wich runs ~1-2 times
A couple additions and memory accesses.
> an if (do we actually have long pipelines that suffer from wrong branch
> prediction?)
I believe we have many "if" statements. :)
> When we add more functions here like mapping sprn's and
> whatever else comes in mind later this might get even more complex, while
> this statistics should be the low overhead stats.
I don't know what "mapping SPRNs" means.
In general though, I'm not so worried about adding some memory references or
branches in this path. First, there are much bigger problems to worry about
(need I remind you about our TLB faults?). Second, it's the trace path: if we
care that much, we would just disable tracing. I think the benefits of the
more general approach are well worth it.
--
Hollis Blanchard
IBM Linux Technology Center
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-ppc-devel mailing list
kvm-ppc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-04-15 14:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-14 12:23 [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction ehrhardt
2008-04-14 21:33 ` Hollis Blanchard
2008-04-15 7:34 ` Christian Ehrhardt
2008-04-15 14:41 ` Hollis Blanchard
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.