public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
@ 2011-03-29 12:08 Gleb Natapov
  2011-03-30 10:38 ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-29 12:08 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm

Currently we sync registers back and forth before/after exiting
to userspace for IO, but during IO device model shouldn't need to
read/write the registers, so we can as well skip those sync points. The
only exaception is broken vmware backdor interface. The new code sync
registers content during IO only if registers are read from/written to
by userspace in the middle of the IO operation and this almost never
happens in practise.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35f81b1..3bf77c2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -369,6 +369,8 @@ struct kvm_vcpu_arch {
 	/* emulate context */
 
 	struct x86_emulate_ctxt emulate_ctxt;
+	bool emulate_regs_need_sync_to_vcpu;
+	bool emulate_regs_need_sync_from_vcpu;
 
 	gpa_t time;
 	struct pvclock_vcpu_time_info hv_clock;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bfd7763..274c916 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4321,6 +4321,7 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
 		? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
 	memset(c, 0, sizeof(struct decode_cache));
 	memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
+	vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
 }
 
 int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq)
@@ -4403,6 +4404,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 {
 	int r;
 	struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
+	bool writeback = true;
 
 	kvm_clear_exception_queue(vcpu);
 	vcpu->arch.mmio_fault_cr2 = cr2;
@@ -4443,9 +4445,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		return EMULATE_DONE;
 	}
 
-	/* this is needed for vmware backdor interface to work since it
+	/* this is needed for vmware backdoor interface to work since it
 	   changes registers values  during IO operation */
-	memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
+	if (vcpu->arch.emulate_regs_need_sync_from_vcpu) {
+		vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
+		memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
+	}
 
 restart:
 	r = x86_emulate_insn(&vcpu->arch.emulate_ctxt);
@@ -4463,21 +4468,30 @@ restart:
 	} else if (vcpu->arch.pio.count) {
 		if (!vcpu->arch.pio.in)
 			vcpu->arch.pio.count = 0;
+		else
+			writeback = false;	
 		r = EMULATE_DO_MMIO;
 	} else if (vcpu->mmio_needed) {
 		if (vcpu->mmio_is_write)
 			vcpu->mmio_needed = 0;
+		else
+			writeback = false;	
 		r = EMULATE_DO_MMIO;
 	} else if (r == EMULATION_RESTART)
 		goto restart;
 	else
 		r = EMULATE_DONE;
 
-	toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility);
-	kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
-	memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
-	kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
+	if (writeback) {
+		toggle_interruptibility(vcpu,
+				vcpu->arch.emulate_ctxt.interruptibility);
+		kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
+		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
+	} else
+		vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
 
 	return r;
 }
@@ -5437,6 +5451,18 @@ out:
 
 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
+		/*
+		 * We are here if userspace calls get_regs() in the middle of
+		 * instruction emulation. Registers state needs to be copied
+		 * back from emulation context to vcpu. Usrapace shouldn't do
+		 * that usually, but some bad designed PV devices (vmware
+		 * backdoor interface) need this to work
+		 */
+		struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
+		memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
+		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+	}
 	regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX);
 	regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX);
@@ -5464,6 +5490,9 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
+	vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
 	kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax);
 	kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx);
 	kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx);
--
			Gleb.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-29 12:08 [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu Gleb Natapov
@ 2011-03-30 10:38 ` Avi Kivity
  2011-03-30 10:47   ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 10:38 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/29/2011 02:08 PM, Gleb Natapov wrote:
> Currently we sync registers back and forth before/after exiting
> to userspace for IO, but during IO device model shouldn't need to
> read/write the registers, so we can as well skip those sync points. The
> only exaception is broken vmware backdor interface. The new code sync
> registers content during IO only if registers are read from/written to
> by userspace in the middle of the IO operation and this almost never
> happens in practise.

While this is a nice idea, how much does it save in practice?  It does 
introduce more complexity.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 10:38 ` Avi Kivity
@ 2011-03-30 10:47   ` Gleb Natapov
  2011-03-30 10:50     ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 10:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
> On 03/29/2011 02:08 PM, Gleb Natapov wrote:
> >Currently we sync registers back and forth before/after exiting
> >to userspace for IO, but during IO device model shouldn't need to
> >read/write the registers, so we can as well skip those sync points. The
> >only exaception is broken vmware backdor interface. The new code sync
> >registers content during IO only if registers are read from/written to
> >by userspace in the middle of the IO operation and this almost never
> >happens in practise.
> 
> While this is a nice idea, how much does it save in practice?  It
> does introduce more complexity.
> 

I haven't measured, but can try to do so. It eliminates two copies of
all registers on each MMIO/PIO read. I expect this to be measurable in
workloads that do many such reads.

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 10:47   ` Gleb Natapov
@ 2011-03-30 10:50     ` Avi Kivity
  2011-03-30 11:22       ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 10:50 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 12:47 PM, Gleb Natapov wrote:
> On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
> >  On 03/29/2011 02:08 PM, Gleb Natapov wrote:
> >  >Currently we sync registers back and forth before/after exiting
> >  >to userspace for IO, but during IO device model shouldn't need to
> >  >read/write the registers, so we can as well skip those sync points. The
> >  >only exaception is broken vmware backdor interface. The new code sync
> >  >registers content during IO only if registers are read from/written to
> >  >by userspace in the middle of the IO operation and this almost never
> >  >happens in practise.
> >
> >  While this is a nice idea, how much does it save in practice?  It
> >  does introduce more complexity.
> >
>
> I haven't measured, but can try to do so. It eliminates two copies of
> all registers on each MMIO/PIO read. I expect this to be measurable in
> workloads that do many such reads.
>

I don't, especially if these are mmios to userspace.  Perhaps it's 
better to remove the copy on kernel mmio, since it's much faster, if the 
result is simpler (there can be no KVM_SET_REGS in that context).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 10:50     ` Avi Kivity
@ 2011-03-30 11:22       ` Gleb Natapov
  2011-03-30 11:43         ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 11:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 12:50:53PM +0200, Avi Kivity wrote:
> On 03/30/2011 12:47 PM, Gleb Natapov wrote:
> >On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
> >>  On 03/29/2011 02:08 PM, Gleb Natapov wrote:
> >>  >Currently we sync registers back and forth before/after exiting
> >>  >to userspace for IO, but during IO device model shouldn't need to
> >>  >read/write the registers, so we can as well skip those sync points. The
> >>  >only exaception is broken vmware backdor interface. The new code sync
> >>  >registers content during IO only if registers are read from/written to
> >>  >by userspace in the middle of the IO operation and this almost never
> >>  >happens in practise.
> >>
> >>  While this is a nice idea, how much does it save in practice?  It
> >>  does introduce more complexity.
> >>
> >
> >I haven't measured, but can try to do so. It eliminates two copies of
> >all registers on each MMIO/PIO read. I expect this to be measurable in
> >workloads that do many such reads.
> >
> 
> I don't, especially if these are mmios to userspace.  Perhaps it's
> better to remove the copy on kernel mmio, since it's much faster, if
> the result is simpler (there can be no KVM_SET_REGS in that
> context).
> 

The patch saves copying of 256 bytes on each MMIO/PIO read. It may not
save a lot comparing to time it takes to do one MMIO to userspace, but
do 1 million of those and you saved a lot of CPU cycles. I do not think
we should abandon the optimization so easily. Unfortunately I can't run
perf on 2.6.38 kernel right now. It gives me strange errors and when it
doesn't it makes kernel OOPS.

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 11:22       ` Gleb Natapov
@ 2011-03-30 11:43         ` Gleb Natapov
  2011-03-30 12:12           ` Avi Kivity
  2011-03-30 12:17           ` Avi Kivity
  0 siblings, 2 replies; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 11:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 01:22:43PM +0200, Gleb Natapov wrote:
> On Wed, Mar 30, 2011 at 12:50:53PM +0200, Avi Kivity wrote:
> > On 03/30/2011 12:47 PM, Gleb Natapov wrote:
> > >On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
> > >>  On 03/29/2011 02:08 PM, Gleb Natapov wrote:
> > >>  >Currently we sync registers back and forth before/after exiting
> > >>  >to userspace for IO, but during IO device model shouldn't need to
> > >>  >read/write the registers, so we can as well skip those sync points. The
> > >>  >only exaception is broken vmware backdor interface. The new code sync
> > >>  >registers content during IO only if registers are read from/written to
> > >>  >by userspace in the middle of the IO operation and this almost never
> > >>  >happens in practise.
> > >>
> > >>  While this is a nice idea, how much does it save in practice?  It
> > >>  does introduce more complexity.
> > >>
> > >
> > >I haven't measured, but can try to do so. It eliminates two copies of
> > >all registers on each MMIO/PIO read. I expect this to be measurable in
> > >workloads that do many such reads.
> > >
> > 
> > I don't, especially if these are mmios to userspace.  Perhaps it's
> > better to remove the copy on kernel mmio, since it's much faster, if
> > the result is simpler (there can be no KVM_SET_REGS in that
> > context).
> > 
> 
> The patch saves copying of 256 bytes on each MMIO/PIO read. It may not
> save a lot comparing to time it takes to do one MMIO to userspace, but
> do 1 million of those and you saved a lot of CPU cycles. I do not think
> we should abandon the optimization so easily. Unfortunately I can't run
> perf on 2.6.38 kernel right now. It gives me strange errors and when it
> doesn't it makes kernel OOPS.
> 
After reboot perf started to work. I ran modified emulator.flat unit
test. It was modified to run test_cmps() in an endless loop.

Without patch:
1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction

With patch:
0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 11:43         ` Gleb Natapov
@ 2011-03-30 12:12           ` Avi Kivity
  2011-03-30 12:12             ` Avi Kivity
  2011-03-30 12:17           ` Avi Kivity
  1 sibling, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 12:12 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 01:43 PM, Gleb Natapov wrote:
> >
> >  The patch saves copying of 256 bytes on each MMIO/PIO read. It may not
> >  save a lot comparing to time it takes to do one MMIO to userspace, but
> >  do 1 million of those and you saved a lot of CPU cycles. I do not think
> >  we should abandon the optimization so easily. Unfortunately I can't run
> >  perf on 2.6.38 kernel right now. It gives me strange errors and when it
> >  doesn't it makes kernel OOPS.
> >
> After reboot perf started to work. I ran modified emulator.flat unit
> test. It was modified to run test_cmps() in an endless loop.
>
> Without patch:
> 1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
>
> With patch:
> 0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
>

Well, that is probably significant.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 12:12           ` Avi Kivity
@ 2011-03-30 12:12             ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 12:12 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 02:12 PM, Avi Kivity wrote:
> On 03/30/2011 01:43 PM, Gleb Natapov wrote:
>> >
>> >  The patch saves copying of 256 bytes on each MMIO/PIO read. It may 
>> not
>> >  save a lot comparing to time it takes to do one MMIO to userspace, 
>> but
>> >  do 1 million of those and you saved a lot of CPU cycles. I do not 
>> think
>> >  we should abandon the optimization so easily. Unfortunately I 
>> can't run
>> >  perf on 2.6.38 kernel right now. It gives me strange errors and 
>> when it
>> >  doesn't it makes kernel OOPS.
>> >
>> After reboot perf started to work. I ran modified emulator.flat unit
>> test. It was modified to run test_cmps() in an endless loop.
>>
>> Without patch:
>> 1.71%  qemu-system-x86  [kvm]                 [k] 
>> x86_emulate_instruction
>> 1.51%  qemu-system-x86  [kvm]                 [k] 
>> x86_emulate_instruction
>> 1.68%  qemu-system-x86  [kvm]                 [k] 
>> x86_emulate_instruction
>>
>> With patch:
>> 0.84%  qemu-system-x86  [kvm]                 [k] 
>> x86_emulate_instruction
>> 0.96%  qemu-system-x86  [kvm]                 [k] 
>> x86_emulate_instruction
>> 0.89%  qemu-system-x86  [kvm]                 [k] 
>> x86_emulate_instruction
>>
>
> Well, that is probably significant.
>

Please rebase the patch after sse-mmio goes in.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 11:43         ` Gleb Natapov
  2011-03-30 12:12           ` Avi Kivity
@ 2011-03-30 12:17           ` Avi Kivity
  2011-03-30 12:48             ` Gleb Natapov
  1 sibling, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 12:17 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 01:43 PM, Gleb Natapov wrote:
> After reboot perf started to work. I ran modified emulator.flat unit
> test. It was modified to run test_cmps() in an endless loop.
>
> Without patch:
> 1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
>
> With patch:
> 0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
>

The cause might be kvm_rip_write() using vmwrite.  Can you use perf to 
see where the hits are in x86_emulate_instruction?

If that's the case, we may be able to do local optimizations to 
kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity() instead 
of this global change.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 12:17           ` Avi Kivity
@ 2011-03-30 12:48             ` Gleb Natapov
  2011-03-30 13:26               ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 12:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
> On 03/30/2011 01:43 PM, Gleb Natapov wrote:
> >After reboot perf started to work. I ran modified emulator.flat unit
> >test. It was modified to run test_cmps() in an endless loop.
> >
> >Without patch:
> >1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >
> >With patch:
> >0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >
> 
> The cause might be kvm_rip_write() using vmwrite.  Can you use perf
> to see where the hits are in x86_emulate_instruction?
> 
> If that's the case, we may be able to do local optimizations to
> kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
> instead of this global change.
> 
I can leave copying there and eliminate only kvm_rip_write and see
perf data.

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 12:48             ` Gleb Natapov
@ 2011-03-30 13:26               ` Gleb Natapov
  2011-03-30 13:29                 ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 13:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 02:48:28PM +0200, Gleb Natapov wrote:
> On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
> > On 03/30/2011 01:43 PM, Gleb Natapov wrote:
> > >After reboot perf started to work. I ran modified emulator.flat unit
> > >test. It was modified to run test_cmps() in an endless loop.
> > >
> > >Without patch:
> > >1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> > >1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> > >1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> > >
> > >With patch:
> > >0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> > >0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> > >0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> > >
> > 
> > The cause might be kvm_rip_write() using vmwrite.  Can you use perf
> > to see where the hits are in x86_emulate_instruction?
> > 
> > If that's the case, we may be able to do local optimizations to
> > kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
> > instead of this global change.
> > 
> I can leave copying there and eliminate only kvm_rip_write and see
> perf data.
> 

1.75%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
1.60%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
1.42%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction

This is with copy in place, but those are under if (writeback):
                toggle_interruptibility(vcpu,
                                vcpu->arch.emulate_ctxt.interruptibility);
                kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
                kvm_make_request(KVM_REQ_EVENT, vcpu);
                vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
                kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 13:26               ` Gleb Natapov
@ 2011-03-30 13:29                 ` Avi Kivity
  2011-03-30 13:36                   ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 13:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 03:26 PM, Gleb Natapov wrote:
> On Wed, Mar 30, 2011 at 02:48:28PM +0200, Gleb Natapov wrote:
> >  On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
> >  >  On 03/30/2011 01:43 PM, Gleb Natapov wrote:
> >  >  >After reboot perf started to work. I ran modified emulator.flat unit
> >  >  >test. It was modified to run test_cmps() in an endless loop.
> >  >  >
> >  >  >Without patch:
> >  >  >1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >  >  >1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >  >  >1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >  >  >
> >  >  >With patch:
> >  >  >0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >  >  >0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >  >  >0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >  >  >
> >  >
> >  >  The cause might be kvm_rip_write() using vmwrite.  Can you use perf
> >  >  to see where the hits are in x86_emulate_instruction?
> >  >
> >  >  If that's the case, we may be able to do local optimizations to
> >  >  kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
> >  >  instead of this global change.
> >  >
> >  I can leave copying there and eliminate only kvm_rip_write and see
> >  perf data.
> >
>
> 1.75%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 1.60%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> 1.42%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
>
> This is with copy in place, but those are under if (writeback):
>                  toggle_interruptibility(vcpu,
>                                  vcpu->arch.emulate_ctxt.interruptibility);
>                  kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>                  kvm_make_request(KVM_REQ_EVENT, vcpu);
>                  vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
>                  kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
>

It's wierd.  Do you get perf hits in the copying?

Copying a couple of hot cache lines shouldn't take any measurable time 
compared to a heavyweight exit.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 13:29                 ` Avi Kivity
@ 2011-03-30 13:36                   ` Gleb Natapov
  2011-03-30 13:41                     ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 13:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 03:29:02PM +0200, Avi Kivity wrote:
> On 03/30/2011 03:26 PM, Gleb Natapov wrote:
> >On Wed, Mar 30, 2011 at 02:48:28PM +0200, Gleb Natapov wrote:
> >>  On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
> >>  >  On 03/30/2011 01:43 PM, Gleb Natapov wrote:
> >>  >  >After reboot perf started to work. I ran modified emulator.flat unit
> >>  >  >test. It was modified to run test_cmps() in an endless loop.
> >>  >  >
> >>  >  >Without patch:
> >>  >  >1.71%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >>  >  >1.51%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >>  >  >1.68%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >>  >  >
> >>  >  >With patch:
> >>  >  >0.84%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >>  >  >0.96%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >>  >  >0.89%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >>  >  >
> >>  >
> >>  >  The cause might be kvm_rip_write() using vmwrite.  Can you use perf
> >>  >  to see where the hits are in x86_emulate_instruction?
> >>  >
> >>  >  If that's the case, we may be able to do local optimizations to
> >>  >  kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
> >>  >  instead of this global change.
> >>  >
> >>  I can leave copying there and eliminate only kvm_rip_write and see
> >>  perf data.
> >>
> >
> >1.75%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >1.60%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >1.42%  qemu-system-x86  [kvm]                 [k] x86_emulate_instruction
> >
> >This is with copy in place, but those are under if (writeback):
> >                 toggle_interruptibility(vcpu,
> >                                 vcpu->arch.emulate_ctxt.interruptibility);
> >                 kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
> >                 kvm_make_request(KVM_REQ_EVENT, vcpu);
> >                 vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
> >                 kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
> >
> 
> It's wierd.  Do you get perf hits in the copying?
> 
How can I check. The memcpy is inlined.

> Copying a couple of hot cache lines shouldn't take any measurable
> time compared to a heavyweight exit.
> 
The whole function takes only 1.5% CPU. Perf measures how much this
function become faster and heavyweight exit is not part of the function.

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 13:36                   ` Gleb Natapov
@ 2011-03-30 13:41                     ` Avi Kivity
  2011-03-30 13:43                       ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 13:41 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 03:36 PM, Gleb Natapov wrote:
> >
> >  It's wierd.  Do you get perf hits in the copying?
> >
> How can I check. The memcpy is inlined.
>

perf annotate x86_emulate_instruction

(newer perf allows you to get there interactively from 'perf report')

> >  Copying a couple of hot cache lines shouldn't take any measurable
> >  time compared to a heavyweight exit.
> >
> The whole function takes only 1.5% CPU. Perf measures how much this
> function become faster and heavyweight exit is not part of the function.

It's still relative to exit cost.  If the total exit was 2 us, then a 1% 
decrease in cost translates to 40 ns.

(well, that's not unlikely for a 256 byte memcpy, but let's be sure).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 13:41                     ` Avi Kivity
@ 2011-03-30 13:43                       ` Gleb Natapov
  2011-03-30 13:46                         ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2011-03-30 13:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm

On Wed, Mar 30, 2011 at 03:41:09PM +0200, Avi Kivity wrote:
> On 03/30/2011 03:36 PM, Gleb Natapov wrote:
> >>
> >>  It's wierd.  Do you get perf hits in the copying?
> >>
> >How can I check. The memcpy is inlined.
> >
> 
> perf annotate x86_emulate_instruction
> 
> (newer perf allows you to get there interactively from 'perf report')
> 
> >>  Copying a couple of hot cache lines shouldn't take any measurable
Ah, forgot about it:

First one:
27.71 :           1179f:       f3 a5                   rep movsl %ds:(%rsi),%es:(%rdi)

Second one:
32.68 :           11888:       f3 a5                   rep movsl %ds:(%rsi),%es:(%rdi)

> >>  time compared to a heavyweight exit.
> >>
> >The whole function takes only 1.5% CPU. Perf measures how much this
> >function become faster and heavyweight exit is not part of the function.
> 
> It's still relative to exit cost.  If the total exit was 2 us, then
> a 1% decrease in cost translates to 40 ns.
> 
> (well, that's not unlikely for a 256 byte memcpy, but let's be sure).
> 
> -- 
> error compiling committee.c: too many arguments to function

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu
  2011-03-30 13:43                       ` Gleb Natapov
@ 2011-03-30 13:46                         ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2011-03-30 13:46 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: mtosatti, kvm

On 03/30/2011 03:43 PM, Gleb Natapov wrote:
> On Wed, Mar 30, 2011 at 03:41:09PM +0200, Avi Kivity wrote:
> >  On 03/30/2011 03:36 PM, Gleb Natapov wrote:
> >  >>
> >  >>   It's wierd.  Do you get perf hits in the copying?
> >  >>
> >  >How can I check. The memcpy is inlined.
> >  >
> >
> >  perf annotate x86_emulate_instruction
> >
> >  (newer perf allows you to get there interactively from 'perf report')
> >
> >  >>   Copying a couple of hot cache lines shouldn't take any measurable
> Ah, forgot about it:
>
> First one:
> 27.71 :           1179f:       f3 a5                   rep movsl %ds:(%rsi),%es:(%rdi)
>
> Second one:
> 32.68 :           11888:       f3 a5                   rep movsl %ds:(%rsi),%es:(%rdi)
>

Okay, I'm convinced.

(looks like a missed optimization, should use rep movsq there)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-03-30 13:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-29 12:08 [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu Gleb Natapov
2011-03-30 10:38 ` Avi Kivity
2011-03-30 10:47   ` Gleb Natapov
2011-03-30 10:50     ` Avi Kivity
2011-03-30 11:22       ` Gleb Natapov
2011-03-30 11:43         ` Gleb Natapov
2011-03-30 12:12           ` Avi Kivity
2011-03-30 12:12             ` Avi Kivity
2011-03-30 12:17           ` Avi Kivity
2011-03-30 12:48             ` Gleb Natapov
2011-03-30 13:26               ` Gleb Natapov
2011-03-30 13:29                 ` Avi Kivity
2011-03-30 13:36                   ` Gleb Natapov
2011-03-30 13:41                     ` Avi Kivity
2011-03-30 13:43                       ` Gleb Natapov
2011-03-30 13:46                         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox