* [RFC PATCH 0/2] Emulator speedups - avoid initializations where possible @ 2014-04-03 22:27 Bandan Das 2014-04-03 22:27 ` [RFC PATCH 1/2] KVM: emulate: move init_decode_cache to emulate.c Bandan Das 2014-04-03 22:27 ` [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache Bandan Das 0 siblings, 2 replies; 4+ messages in thread From: Bandan Das @ 2014-04-03 22:27 UTC (permalink / raw) To: kvm; +Cc: linux-kernel, Paolo Bonzini While initializing emulation context structure, kvm memsets to 0 a number of fields some of which are redundant since they get set eventually. This patch attempts at avoiding some of them. Here are some before/after numbers on a Haswell host Note : The before numbers already include Paolo's RFC posted here : http://comments.gmane.org/gmane.linux.kernel/1676101 Before: 667 cycles/emulated jump instruction 839 cycles/emulated move instruction 892 cycles/emulated arithmetic instruction 995 cycles/emulated memory load instruction 1026 cycles/emulated memory store instruction 1022 cycles/emulated memory RMW instruction After: 639 cycles/emulated jump instruction 786 cycles/emulated move instruction 802 cycles/emulated arithmetic instruction 936 cycles/emulated memory load instruction 970 cycles/emulated memory store instruction 1000 cycles/emulated memory RMW instruction Bandan Das (2): KVM: emulate: move init_decode_cache to emulate.c KVM: emulate: clean up initializations in init_decode_cache arch/x86/include/asm/kvm_emulate.h | 17 +++++++------ arch/x86/kvm/emulate.c | 49 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 13 ---------- 3 files changed, 59 insertions(+), 20 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH 1/2] KVM: emulate: move init_decode_cache to emulate.c 2014-04-03 22:27 [RFC PATCH 0/2] Emulator speedups - avoid initializations where possible Bandan Das @ 2014-04-03 22:27 ` Bandan Das 2014-04-03 22:27 ` [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache Bandan Das 1 sibling, 0 replies; 4+ messages in thread From: Bandan Das @ 2014-04-03 22:27 UTC (permalink / raw) To: kvm; +Cc: linux-kernel, Paolo Bonzini Core emulator functions all belong in emulator.c, x86 should have no knowledge of emulator internals Signed-off-by: Bandan Das <bsd@redhat.com> --- arch/x86/include/asm/kvm_emulate.h | 1 + arch/x86/kvm/emulate.c | 13 +++++++++++++ arch/x86/kvm/x86.c | 13 ------------- 3 files changed, 14 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index d085f73..ad4cca8 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -436,6 +436,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt *ctxt); #define EMULATION_OK 0 #define EMULATION_RESTART 1 #define EMULATION_INTERCEPTED 2 +void init_decode_cache(struct x86_emulate_ctxt *ctxt); int x86_emulate_insn(struct x86_emulate_ctxt *ctxt); int emulator_task_switch(struct x86_emulate_ctxt *ctxt, u16 tss_selector, int idt_index, int reason, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8ca292c..8e2b866 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4550,6 +4550,19 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *)) return X86EMUL_CONTINUE; } +void init_decode_cache(struct x86_emulate_ctxt *ctxt) +{ + memset(&ctxt->opcode_len, 0, + (void *)&ctxt->_regs - (void *)&ctxt->opcode_len); + + ctxt->fetch.start = 0; + ctxt->fetch.end = 0; + ctxt->io_read.pos = 0; + ctxt->io_read.end = 0; + ctxt->mem_read.pos = 0; + ctxt->mem_read.end = 0; +} + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) { const struct x86_emulate_ops *ops = ctxt->ops; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4fad05d..122410d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4951,19 +4951,6 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu) kvm_queue_exception(vcpu, ctxt->exception.vector); } -static void init_decode_cache(struct x86_emulate_ctxt *ctxt) -{ - memset(&ctxt->opcode_len, 0, - (void *)&ctxt->_regs - (void *)&ctxt->opcode_len); - - ctxt->fetch.start = 0; - ctxt->fetch.end = 0; - ctxt->io_read.pos = 0; - ctxt->io_read.end = 0; - ctxt->mem_read.pos = 0; - ctxt->mem_read.end = 0; -} - static void init_emulate_ctxt(struct kvm_vcpu *vcpu) { struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache 2014-04-03 22:27 [RFC PATCH 0/2] Emulator speedups - avoid initializations where possible Bandan Das 2014-04-03 22:27 ` [RFC PATCH 1/2] KVM: emulate: move init_decode_cache to emulate.c Bandan Das @ 2014-04-03 22:27 ` Bandan Das 2014-04-04 9:47 ` Paolo Bonzini 1 sibling, 1 reply; 4+ messages in thread From: Bandan Das @ 2014-04-03 22:27 UTC (permalink / raw) To: kvm; +Cc: linux-kernel, Paolo Bonzini A lot of initializations are unnecessary as they get set to appropriate values before actually being used. Remove some of them and rework some others if the conditions that set them are not true Signed-off-by: Bandan Das <bsd@redhat.com> --- arch/x86/include/asm/kvm_emulate.h | 16 +++++++------ arch/x86/kvm/emulate.c | 46 +++++++++++++++++++++++++++++++++----- 2 files changed, 50 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index ad4cca8..ccb7911 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -315,30 +315,32 @@ struct x86_emulate_ctxt { u8 opcode_len; u8 b; u8 intercept; - u8 lock_prefix; - u8 rep_prefix; u8 op_bytes; u8 ad_bytes; u8 rex_prefix; struct operand src; struct operand src2; struct operand dst; + int (*execute)(struct x86_emulate_ctxt *ctxt); + int (*check_perm)(struct x86_emulate_ctxt *ctxt); + u8 lock_prefix; + u8 rep_prefix; bool has_seg_override; u8 seg_override; u64 d; - int (*execute)(struct x86_emulate_ctxt *ctxt); - int (*check_perm)(struct x86_emulate_ctxt *ctxt); + bool rip_relative; + /* bitmaps of registers in _regs[] that can be read */ + u32 regs_valid; + /* bitmaps of registers in _regs[] that have been written */ + u32 regs_dirty; /* modrm */ u8 modrm; u8 modrm_mod; u8 modrm_reg; u8 modrm_rm; u8 modrm_seg; - bool rip_relative; unsigned long _eip; struct operand memop; - u32 regs_valid; /* bitmaps of registers in _regs[] that can be read */ - u32 regs_dirty; /* bitmaps of registers in _regs[] that have been written */ /* Fields above regs are cleared together. */ unsigned long _regs[NR_VCPU_REGS]; struct operand *memopp; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8e2b866..eac488b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1072,6 +1072,9 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt, ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1; /* REX.R */ index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */ ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */ + } else { + ctxt->modrm_reg = 0; + ctxt->modrm_rm = 0; } ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6; @@ -4357,6 +4360,8 @@ done_prefixes: if (ctxt->d & ModRM) ctxt->modrm = insn_fetch(u8, ctxt); + else + ctxt->modrm = 0; while (ctxt->d & GroupMask) { switch (ctxt->d & GroupMask) { @@ -4435,10 +4440,14 @@ done_prefixes: ctxt->op_bytes = 16; else if (ctxt->d & Mmx) ctxt->op_bytes = 8; + } else { + ctxt->intercept = 0; + ctxt->check_perm = NULL; } /* ModRM and SIB bytes. */ if (ctxt->d & ModRM) { + ctxt->modrm_mod = 0; rc = decode_modrm(ctxt, &ctxt->memop); if (!ctxt->has_seg_override) set_seg_override(ctxt, ctxt->modrm_seg); @@ -4552,14 +4561,41 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *)) void init_decode_cache(struct x86_emulate_ctxt *ctxt) { - memset(&ctxt->opcode_len, 0, - (void *)&ctxt->_regs - (void *)&ctxt->opcode_len); - ctxt->fetch.start = 0; - ctxt->fetch.end = 0; + /* + * Variables that don't require initializing to 0 + * opcode_len - set in x86_decode_insn + * b - set in x86_decode_insn + * intercept - conditionally set in x86_decode_insn, added + * else set to 0 + * op_bytes - initialized in x86_decode_insn + * ad_bytes - initialized in x86_decode_insn + * rex_prefix - conditionally set in x86_decode_isn + * struct operands src,src2,dst - set by calling decode_operand + * in x86_decode_insn, + * default.type = OP_NONE + * (*execute) - set in x86_decode_insn + * (*check_perm) - conditionally set in x86_decode_insn, added + * else set to 0 + * d - set in x86_decode_insn + * modrm - conditionally set in x86_decode_insn, added else set to 0 + * modrm_mod - or'ed in decode_modrm which is conditionally called in + * in x86_decode_insn, added initialization to 0 before call + * modrm_reg - set in decode_modrm or else decode_register_operand + * modrm_rm - set in decode_modrm, added else set to 0 + * modrm_seg - set in decode_modrm + * _eip - set in x86_decode_insn + * memop - .type set to OP_NONE in x86_decode_insn + * ctxt->fetch.start - set in x86_decode_insn + * ctxt->fetch.end + * ctxt->mem_read.pos - set in x86_emulate_insn + */ + + memset(&ctxt->lock_prefix, 0, + (void *)&ctxt->modrm - (void *)&ctxt->lock_prefix); + ctxt->io_read.pos = 0; ctxt->io_read.end = 0; - ctxt->mem_read.pos = 0; ctxt->mem_read.end = 0; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache 2014-04-03 22:27 ` [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache Bandan Das @ 2014-04-04 9:47 ` Paolo Bonzini 0 siblings, 0 replies; 4+ messages in thread From: Paolo Bonzini @ 2014-04-04 9:47 UTC (permalink / raw) To: Bandan Das, kvm; +Cc: linux-kernel Hi Bandan. > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 8e2b866..eac488b 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -1072,6 +1072,9 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt, > ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1; /* REX.R */ > index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */ > ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */ > + } else { > + ctxt->modrm_reg = 0; > + ctxt->modrm_rm = 0; > } You can drop the if altogether here (and also the initialization in "int index_reg = 0, base_reg = 0, scale;"). Also, getting down to micro-micro-optimization, the following will give slightly better code: ctxt->modrm_reg = (ctxt->rex_prefix << 1) & 8; /* REX.R */ index_reg = (ctxt->rex_prefix << 2) & 8; /* REX.X */ ctxt->modrm_rm = base_reg = (ctxt->rex_prefix << 3) & 8; /* REG.B */ because x86 can do a three-operand shift (using the LEA instruction), but not three-operand AND. > ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6; > @@ -4357,6 +4360,8 @@ done_prefixes: > > if (ctxt->d & ModRM) > ctxt->modrm = insn_fetch(u8, ctxt); > + else > + ctxt->modrm = 0; I think in the "else" case modrm won't be read at all, but it is indeed a bit safer this way. You can just leave it in the "zeroed" part of ctxt. There is padding in the struct, so you get the initialization for free. > while (ctxt->d & GroupMask) { > switch (ctxt->d & GroupMask) { > @@ -4435,10 +4440,14 @@ done_prefixes: > ctxt->op_bytes = 16; > else if (ctxt->d & Mmx) > ctxt->op_bytes = 8; > + } else { > + ctxt->intercept = 0; You can add a preparatory patch that checks (ctxt->d & Intercept) in x86_emulate_insn instead of ctxt->intercept and skip this initialization. > + ctxt->check_perm = NULL; Similarly you can check (ctxt->d & CheckPerm) and skip this initialization. > } Same here: the common case will zero them, might as well leave it to the memset. > > /* ModRM and SIB bytes. */ > if (ctxt->d & ModRM) { > + ctxt->modrm_mod = 0; What about instead changing ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6; from "|=" to "="? > rc = decode_modrm(ctxt, &ctxt->memop); > if (!ctxt->has_seg_override) > set_seg_override(ctxt, ctxt->modrm_seg); > @@ -4552,14 +4561,41 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *)) > > void init_decode_cache(struct x86_emulate_ctxt *ctxt) > { > - memset(&ctxt->opcode_len, 0, > - (void *)&ctxt->_regs - (void *)&ctxt->opcode_len); > > - ctxt->fetch.start = 0; > - ctxt->fetch.end = 0; > + /* > + * Variables that don't require initializing to 0 > + * opcode_len - set in x86_decode_insn > + * b - set in x86_decode_insn > + * intercept - conditionally set in x86_decode_insn, added > + * else set to 0 > + * op_bytes - initialized in x86_decode_insn > + * ad_bytes - initialized in x86_decode_insn > + * rex_prefix - conditionally set in x86_decode_isn I think rex_prefix must be in the zeroed area. Again, with careful field placement you get that for free. > + * struct operands src,src2,dst - set by calling decode_operand > + * in x86_decode_insn, > + * default.type = OP_NONE > + * (*execute) - set in x86_decode_insn > + * (*check_perm) - conditionally set in x86_decode_insn, added > + * else set to 0 > + * d - set in x86_decode_insn > + * modrm - conditionally set in x86_decode_insn, added else set to 0 > + * modrm_mod - or'ed in decode_modrm which is conditionally called in > + * in x86_decode_insn, added initialization to 0 before call > + * modrm_reg - set in decode_modrm or else decode_register_operand > + * modrm_rm - set in decode_modrm, added else set to 0 > + * modrm_seg - set in decode_modrm > + * _eip - set in x86_decode_insn > + * memop - .type set to OP_NONE in x86_decode_insn > + * ctxt->fetch.start - set in x86_decode_insn > + * ctxt->fetch.end > + * ctxt->mem_read.pos - set in x86_emulate_insn Apart from the above suggestions, I agree with this patch. Also, memopp is initialized to NULL and can be moved to the zeroed area. But perhaps we can remove memopp, see below. You now have: + u8 lock_prefix; + u8 rep_prefix; bool has_seg_override; u8 seg_override; u64 d; + bool rip_relative; + /* bitmaps of registers in _regs[] that can be read */ + u32 regs_valid; + /* bitmaps of registers in _regs[] that have been written */ + u32 regs_dirty; Even if we add back a couple of fields, that's pretty good! It's 32 bytes (four stores). rip_relative introduces a lot of padding; move it together with other u8 and bool fields, and that makes 24. And ctxt->d is in the zeroed area by mistake, which makes it 16. There is a little more that you can do in this field. seg_override can be moved out of the zeroed area, and the seg_override() calls can be changed to direct accesses of the field. Please double check though. :) Actually, if the above is true, has_seg_override can also be eliminated and moved to x86_decode_insn as a local variable. We could also get rid completely of rip_relative and move it to a local variable in decode_modrm. Currently it is in x86_decode_insn because of this earlier check: if (ctxt->memop.type == OP_MEM && ctxt->ad_bytes != 8) ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea; ... if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative) ctxt->memopp->addr.mem.ea += ctxt->_eip; but I think the earlier check can be moved to decode_modrm as well. This is my reasoning: - the first "if" can only trigger for modrm (decode_abs will not set mem.ea to an out-of-range value if ctxt->ad_bytes != 8) - ctxt->memopp != NULL means that *ctxt->memopp was copied from ctxt->memop (so it must be either ModRM or MemAbs), but rip_relative is set to 1 only if the instruction is ModRM, never for MemAbs. And once you do this, memopp is never read anymore and can be dropped. This gives the following series: - patch 1 from here - replace ->intercept and ->check_perm checks with ctxt->d checks - cleanups and code changes from this patch, except struct reorganization and memset change - struct reorganization and memset change; instead of the large comment in init_decode_cache, please add comments in front of each field or group of fields - removing has_seg_override and moving seg_override out of the zeroed area - removing memopp and rip_relative (BTW, I found a bug in my second series while studying your patch. It ignores the mem_read cache in prepare_memory_operand, but it shouldn't). Paolo ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-04 9:47 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-03 22:27 [RFC PATCH 0/2] Emulator speedups - avoid initializations where possible Bandan Das 2014-04-03 22:27 ` [RFC PATCH 1/2] KVM: emulate: move init_decode_cache to emulate.c Bandan Das 2014-04-03 22:27 ` [RFC PATCH 2/2] KVM: emulate: clean up initializations in init_decode_cache Bandan Das 2014-04-04 9:47 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).