* Re: [RFC] fix VMX TSC synchronicity @ 2008-01-12 3:11 Will Trives 2008-01-12 12:28 ` Marcelo Tosatti 0 siblings, 1 reply; 28+ messages in thread From: Will Trives @ 2008-01-12 3:11 UTC (permalink / raw) To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hello, Just thought i'd mention that this patch caused disk corruption in my virtual machines, particularly when using scsi emulation. I got few errors like this as well: lsi_scsi: error: ORDERED queue not implemented lsi_scsi: error: IO with unknown tag 65577 Regards, Will ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-12 3:11 [RFC] fix VMX TSC synchronicity Will Trives @ 2008-01-12 12:28 ` Marcelo Tosatti 2008-01-12 13:48 ` Will Trives 0 siblings, 1 reply; 28+ messages in thread From: Marcelo Tosatti @ 2008-01-12 12:28 UTC (permalink / raw) To: Will Trives; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sat, Jan 12, 2008 at 02:11:30PM +1100, Will Trives wrote: > Hello, > > Just thought i'd mention that this patch caused disk corruption in my > virtual machines, particularly when using scsi emulation. > > I got few errors like this as well: > > lsi_scsi: error: ORDERED queue not implemented > lsi_scsi: error: IO with unknown tag 65577 Hi Will, Can you please be more descriptive of your environment? What guest OS ? How many CPU's do you have assigned to it ? Are you sure you're talking about the "VMX TSC" patch in the Subject: line? Thanks ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-12 12:28 ` Marcelo Tosatti @ 2008-01-12 13:48 ` Will Trives 2008-01-12 20:51 ` Avi Kivity 0 siblings, 1 reply; 28+ messages in thread From: Will Trives @ 2008-01-12 13:48 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hello, 64 bit environment Intel Q6600 processor. I did hand apply the patch so I might have got something wrong. Guest was a 32 bit kubuntu 7.10. Basically I just mounted a few disk images with " -drive file=/whatever,if=scsi " and used gparted to copy a partition from one to the other. It's probably something I have done wrong, i'll test it again in a few more days. Regards, Will On Sat, 2008-01-12 at 10:28 -0200, Marcelo Tosatti wrote: > On Sat, Jan 12, 2008 at 02:11:30PM +1100, Will Trives wrote: > > Hello, > > > > Just thought i'd mention that this patch caused disk corruption in my > > virtual machines, particularly when using scsi emulation. > > > > I got few errors like this as well: > > > > lsi_scsi: error: ORDERED queue not implemented > > lsi_scsi: error: IO with unknown tag 65577 > > Hi Will, > > Can you please be more descriptive of your environment? What guest OS ? > How many CPU's do you have assigned to it ? > > Are you sure you're talking about the "VMX TSC" patch in the Subject: > line? > > Thanks ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-12 13:48 ` Will Trives @ 2008-01-12 20:51 ` Avi Kivity [not found] ` <4789284E.7000907-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-12 20:51 UTC (permalink / raw) To: Will Trives; +Cc: Marcelo Tosatti, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Will Trives wrote: > Hello, > > 64 bit environment Intel Q6600 processor. > > I did hand apply the patch so I might have got something wrong. > > Guest was a 32 bit kubuntu 7.10. > > Basically I just mounted a few disk images with " -drive > file=/whatever,if=scsi " and used gparted to copy a partition from one > to the other. > > It's probably something I have done wrong, i'll test it again in a few > more days. > > I've trashed more than one image by running multiple guests with the same image, can this be the case here? -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <4789284E.7000907-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] kvm: qemu: remove SIGUSR2 from io_sigset [not found] ` <4789284E.7000907-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-19 18:58 ` Will Trives 2008-01-20 9:34 ` Avi Kivity 0 siblings, 1 reply; 28+ messages in thread From: Will Trives @ 2008-01-19 18:58 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hello, This patch : kvm: qemu: remove SIGUSR2 from io_sigset Applied 15th Jan 2008 looks to kill I/O performance for me. As soon as I revert it my VMs are much faster. I don't really use SMP for my guests, perhaps this patch made uniprocessor VMs slower ? Regards, Will Trives ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] kvm: qemu: remove SIGUSR2 from io_sigset 2008-01-19 18:58 ` [PATCH] kvm: qemu: remove SIGUSR2 from io_sigset Will Trives @ 2008-01-20 9:34 ` Avi Kivity [not found] ` <4793158A.2080509-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-20 9:34 UTC (permalink / raw) To: Will Trives; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Will Trives wrote: > Hello, > > This patch : > > kvm: qemu: remove SIGUSR2 from io_sigset > > Applied 15th Jan 2008 looks to kill I/O performance for me. As soon as I > revert it my VMs are much faster. I don't really use SMP for my guests, > perhaps this patch made uniprocessor VMs slower ? > > > How are you measuring performance? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <4793158A.2080509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] kvm: qemu: remove SIGUSR2 from io_sigset [not found] ` <4793158A.2080509-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-20 10:45 ` Avi Kivity 0 siblings, 0 replies; 28+ messages in thread From: Avi Kivity @ 2008-01-20 10:45 UTC (permalink / raw) To: Will Trives; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Avi Kivity wrote: > Will Trives wrote: >> Hello, >> >> This patch : >> >> kvm: qemu: remove SIGUSR2 from io_sigset >> >> Applied 15th Jan 2008 looks to kill I/O performance for me. As soon as I >> revert it my VMs are much faster. I don't really use SMP for my guests, >> perhaps this patch made uniprocessor VMs slower ? >> >> >> > > How are you measuring performance? > Never mind, fio shows it. Will investigate. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC] fix VMX TSC synchronicity @ 2008-01-11 20:49 Marcelo Tosatti 2008-01-13 12:19 ` Avi Kivity 0 siblings, 1 reply; 28+ messages in thread From: Marcelo Tosatti @ 2008-01-11 20:49 UTC (permalink / raw) To: kvm-devel The boot TSC sync check is failing on recent Linux SMP guests on TSC stable hosts. Following patch attempts to address the problems, which are: 1) APIC_DM_STARTUP, which is only called for vcpu's other than vcpu0, will trigger ->vcpu_reset(), setting the TSC to 0. Fix that by moving the guest_write_tsc(0) to vcpu_load. 2) vcpu's are initialized at different times by QEMU (vcpu0 init happens way earlier than the rest). Fix that by initializing the TSC of vcpu's > 0 with reference to vcpu0 init tsc value. This way TSC synchronization is kept (apparently Xen does something similar). 3) The code which adjusts the TSC of a VCPU on physical CPU switch is meant to guarantee that the guest sees a monotonically increasing value. However there is a large gap, in terms of clocks, between the time which the source CPU TSC is read and the time the destination CPU TSC is read. So move those two reads to happen at vcpu_clear. I believe that 3) is the reason for the -no-kvm-irqchip problems reported by Avi on RHEL5, with kernels < 2.6.21 (where vcpu->cpu pinning would fix the problem). Unfortunately I could reproduce that problem. 4-way guest with constant tick at 250Hz now reliably sees the TSC's synchronized, and idle guest CPU consumption is reduced by 50% (3-4% instead of 8%, the latter with acpi_pm_good boot parameter). diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4741806..e1c8cf4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -47,6 +47,8 @@ struct vcpu_vmx { struct kvm_vcpu vcpu; int launched; u8 fail; + u64 first_tsc; + u64 tsc_this; u32 idt_vectoring_info; struct kvm_msr_entry *guest_msrs; struct kvm_msr_entry *host_msrs; @@ -254,6 +256,7 @@ static void vcpu_clear(struct vcpu_vmx *vmx) if (vmx->vcpu.cpu == -1) return; smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); + rdtscll(vmx->tsc_this); vmx->launched = 0; } @@ -480,6 +483,8 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) reload_host_efer(vmx); } +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); + /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. @@ -488,7 +493,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 phys_addr = __pa(vmx->vmcs); - u64 tsc_this, delta; + u64 delta; if (vcpu->cpu != cpu) { vcpu_clear(vmx); @@ -511,6 +516,19 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct descriptor_table dt; unsigned long sysenter_esp; + if (unlikely(vcpu->cpu == -1)) { + rdtscll(vcpu->arch.host_tsc); + vmx->tsc_this = vcpu->arch.host_tsc; + if (vcpu->vcpu_id == 0) { + guest_write_tsc(vcpu->arch.host_tsc, 0); + vmx->first_tsc = vcpu->arch.host_tsc; + } else { + struct vcpu_vmx *cpu0; + cpu0 = to_vmx(vcpu->kvm->vcpus[0]); + guest_write_tsc(cpu0->first_tsc, 0); + } + } + vcpu->cpu = cpu; /* * Linux uses per-cpu TSS and GDT, so set these when switching @@ -526,8 +544,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) /* * Make sure the time stamp counter is monotonous. */ - rdtscll(tsc_this); - delta = vcpu->arch.host_tsc - tsc_this; + delta = vcpu->arch.host_tsc - vmx->tsc_this; vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); } } @@ -690,11 +707,8 @@ static u64 guest_read_tsc(void) * writes 'guest_tsc' into guest's timestamp counter "register" * guest_tsc = host_tsc + tsc_offset ==> tsc_offset = guest_tsc - host_tsc */ -static void guest_write_tsc(u64 guest_tsc) +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc) { - u64 host_tsc; - - rdtscll(host_tsc); vmcs_write64(TSC_OFFSET, guest_tsc - host_tsc); } @@ -758,6 +772,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_msr_entry *msr; + u64 host_tsc; int ret = 0; switch (msr_index) { @@ -786,7 +801,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) vmcs_writel(GUEST_SYSENTER_ESP, data); break; case MSR_IA32_TIME_STAMP_COUNTER: - guest_write_tsc(data); + rdtscll(host_tsc); + guest_write_tsc(host_tsc, data); break; default: msr = find_msr_entry(vmx, msr_index); @@ -1684,7 +1700,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0); - guest_write_tsc(0); /* Special registers */ vmcs_write64(GUEST_IA32_DEBUGCTL, 0); ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-11 20:49 [RFC] fix VMX TSC synchronicity Marcelo Tosatti @ 2008-01-13 12:19 ` Avi Kivity [not found] ` <478A01D1.7000402-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-13 12:19 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: > The boot TSC sync check is failing on recent Linux SMP guests on TSC > stable hosts. > > What about tsc unstable hosts? If your patch convinces the guest its tsc is table, while the host tsc is not, then it may cause confusion later on. > Following patch attempts to address the problems, which are: > > 1) APIC_DM_STARTUP, which is only called for vcpu's other than vcpu0, > will trigger ->vcpu_reset(), setting the TSC to 0. Fix that by moving > the guest_write_tsc(0) to vcpu_load. > > 2) vcpu's are initialized at different times by QEMU (vcpu0 init happens > way earlier than the rest). Fix that by initializing the TSC of vcpu's > > 0 with reference to vcpu0 init tsc value. This way TSC synchronization > is kept (apparently Xen does something similar). > > 3) The code which adjusts the TSC of a VCPU on physical CPU switch is > meant to guarantee that the guest sees a monotonically increasing value. > However there is a large gap, in terms of clocks, between the time which > the source CPU TSC is read and the time the destination CPU TSC is read. > So move those two reads to happen at vcpu_clear. > > I believe that 3) is the reason for the -no-kvm-irqchip problems > reported by Avi on RHEL5, with kernels < 2.6.21 (where vcpu->cpu pinning > would fix the problem). Unfortunately I could reproduce that problem. > > 4-way guest with constant tick at 250Hz now reliably sees the TSC's > synchronized, and idle guest CPU consumption is reduced by 50% (3-4% > instead of 8%, the latter with acpi_pm_good boot parameter). > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 4741806..e1c8cf4 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -47,6 +47,8 @@ struct vcpu_vmx { > struct kvm_vcpu vcpu; > int launched; > u8 fail; > + u64 first_tsc; > + u64 tsc_this; > u32 idt_vectoring_info; > struct kvm_msr_entry *guest_msrs; > struct kvm_msr_entry *host_msrs; > @@ -254,6 +256,7 @@ static void vcpu_clear(struct vcpu_vmx *vmx) > if (vmx->vcpu.cpu == -1) > return; > smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); > + rdtscll(vmx->tsc_this); > vmx->launched = 0; > } > > @@ -480,6 +483,8 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) > reload_host_efer(vmx); > } > > +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); > + > /* > * Switches to specified vcpu, until a matching vcpu_put(), but assumes > * vcpu mutex is already taken. > @@ -488,7 +493,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > u64 phys_addr = __pa(vmx->vmcs); > - u64 tsc_this, delta; > + u64 delta; > > if (vcpu->cpu != cpu) { > vcpu_clear(vmx); > @@ -511,6 +516,19 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > struct descriptor_table dt; > unsigned long sysenter_esp; > > + if (unlikely(vcpu->cpu == -1)) { > This can happen after migration, I believe. > + rdtscll(vcpu->arch.host_tsc); > + vmx->tsc_this = vcpu->arch.host_tsc; > + if (vcpu->vcpu_id == 0) { > + guest_write_tsc(vcpu->arch.host_tsc, 0); > + vmx->first_tsc = vcpu->arch.host_tsc; > + } else { > + struct vcpu_vmx *cpu0; > + cpu0 = to_vmx(vcpu->kvm->vcpus[0]); > + guest_write_tsc(cpu0->first_tsc, 0); > + } > + } > + > Depending on vcpu_id == 0 can cause problems (for example, if vcpu 0 is somehow not the first to run). We might initialize the tsc base on vm creation, and if the host tsc is synced, then the guest tsc should also be. > vcpu->cpu = cpu; > /* > * Linux uses per-cpu TSS and GDT, so set these when switching > @@ -526,8 +544,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > /* > * Make sure the time stamp counter is monotonous. > */ > - rdtscll(tsc_this); > - delta = vcpu->arch.host_tsc - tsc_this; > + delta = vcpu->arch.host_tsc - vmx->tsc_this; > vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); > This is a little roundabout, how about moving the delta calculation immediately after the call to vcpu_clear()? I don't think this is the cause of the problem, it can't account for more than a few hundred cycles, compared to the much greater vmentry cost. Anyway it should be in a separate patch. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478A01D1.7000402-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478A01D1.7000402-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-14 16:06 ` Marcelo Tosatti 2008-01-14 20:46 ` Marcelo Tosatti 2008-01-15 14:33 ` Avi Kivity 0 siblings, 2 replies; 28+ messages in thread From: Marcelo Tosatti @ 2008-01-14 16:06 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel Hi Avi, On Sun, Jan 13, 2008 at 02:19:29PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: > >The boot TSC sync check is failing on recent Linux SMP guests on TSC > >stable hosts. > > > > > > What about tsc unstable hosts? If your patch convinces the guest its > tsc is table, while the host tsc is not, then it may cause confusion > later on. The adjustment to zero won't fool the guest because it assumes that the TSC's are synchronized. It will simply set the guest TSC to zero on all VCPUs based on the time VCPU0 was initialized. That is, setting -(vcpu[0].first_tsc) on all VCPU's will not correct any synchronization problem. > >Following patch attempts to address the problems, which are: > > > >1) APIC_DM_STARTUP, which is only called for vcpu's other than vcpu0, > >will trigger ->vcpu_reset(), setting the TSC to 0. Fix that by moving > >the guest_write_tsc(0) to vcpu_load. > > > >2) vcpu's are initialized at different times by QEMU (vcpu0 init happens > >way earlier than the rest). Fix that by initializing the TSC of vcpu's > > >0 with reference to vcpu0 init tsc value. This way TSC synchronization > >is kept (apparently Xen does something similar). > > > >3) The code which adjusts the TSC of a VCPU on physical CPU switch is > >meant to guarantee that the guest sees a monotonically increasing value. > >However there is a large gap, in terms of clocks, between the time which > >the source CPU TSC is read and the time the destination CPU TSC is read. > >So move those two reads to happen at vcpu_clear. > > > >I believe that 3) is the reason for the -no-kvm-irqchip problems > >reported by Avi on RHEL5, with kernels < 2.6.21 (where vcpu->cpu pinning > >would fix the problem). Unfortunately I could reproduce that problem. > > > >4-way guest with constant tick at 250Hz now reliably sees the TSC's > >synchronized, and idle guest CPU consumption is reduced by 50% (3-4% > >instead of 8%, the latter with acpi_pm_good boot parameter). > > > > > >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >index 4741806..e1c8cf4 100644 > >--- a/arch/x86/kvm/vmx.c > >+++ b/arch/x86/kvm/vmx.c > >@@ -47,6 +47,8 @@ struct vcpu_vmx { > > struct kvm_vcpu vcpu; > > int launched; > > u8 fail; > >+ u64 first_tsc; > >+ u64 tsc_this; > > u32 idt_vectoring_info; > > struct kvm_msr_entry *guest_msrs; > > struct kvm_msr_entry *host_msrs; > >@@ -254,6 +256,7 @@ static void vcpu_clear(struct vcpu_vmx *vmx) > > if (vmx->vcpu.cpu == -1) > > return; > > smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); > >+ rdtscll(vmx->tsc_this); > > vmx->launched = 0; > > } > > > >@@ -480,6 +483,8 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) > > reload_host_efer(vmx); > > } > > > >+static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); > >+ > > /* > > * Switches to specified vcpu, until a matching vcpu_put(), but assumes > > * vcpu mutex is already taken. > >@@ -488,7 +493,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int > >cpu) > > { > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > u64 phys_addr = __pa(vmx->vmcs); > >- u64 tsc_this, delta; > >+ u64 delta; > > > > if (vcpu->cpu != cpu) { > > vcpu_clear(vmx); > >@@ -511,6 +516,19 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int > >cpu) > > struct descriptor_table dt; > > unsigned long sysenter_esp; > > > >+ if (unlikely(vcpu->cpu == -1)) { > > > > This can happen after migration, I believe. Right now the destination host will do ->vcpu_reset() on all VCPU's on startup... so its already broken. This patch is not introducing any issues, just changing where it happens :) Perhaps fixing migration should be the subject of a separate patch? > >+ rdtscll(vcpu->arch.host_tsc); > >+ vmx->tsc_this = vcpu->arch.host_tsc; > >+ if (vcpu->vcpu_id == 0) { > >+ guest_write_tsc(vcpu->arch.host_tsc, 0); > >+ vmx->first_tsc = vcpu->arch.host_tsc; > >+ } else { > >+ struct vcpu_vmx *cpu0; > >+ cpu0 = to_vmx(vcpu->kvm->vcpus[0]); > >+ guest_write_tsc(cpu0->first_tsc, 0); > >+ } > >+ } > >+ > > > > Depending on vcpu_id == 0 can cause problems (for example, if vcpu 0 is > somehow not the first to run). But QEMU will always run kvm_create() first (which does kvm_create_vcpu(0)), then start the other threads later. So I don't see how that can happen. > We might initialize the tsc base on vm creation, and if the host tsc is > synced, then the guest tsc should also be. > > > vcpu->cpu = cpu; > > /* > > * Linux uses per-cpu TSS and GDT, so set these when > > switching > >@@ -526,8 +544,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int > >cpu) > > /* > > * Make sure the time stamp counter is monotonous. > > */ > >- rdtscll(tsc_this); > >- delta = vcpu->arch.host_tsc - tsc_this; > >+ delta = vcpu->arch.host_tsc - vmx->tsc_this; > > vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); > > > > This is a little roundabout, how about moving the delta calculation > immediately after the call to vcpu_clear()? > > I don't think this is the cause of the problem, it can't account for > more than a few hundred cycles, compared to the much greater vmentry cost. Actually it accounts for several thousand cycles warp by the time the kernel checks the TSC synchronization between CPU's, which happens very early on boot. You saw the hang on userspace init, by then there could be many VCPU->CPU switchs. > > Anyway it should be in a separate patch. How does the following look (minus proper changelog): Index: kvm.quilt/arch/x86/kvm/vmx.c =================================================================== --- kvm.quilt.orig/arch/x86/kvm/vmx.c +++ kvm.quilt/arch/x86/kvm/vmx.c @@ -47,6 +47,7 @@ struct vcpu_vmx { struct kvm_vcpu vcpu; int launched; u8 fail; + u64 first_tsc; u32 idt_vectoring_info; struct kvm_msr_entry *guest_msrs; struct kvm_msr_entry *host_msrs; @@ -480,6 +481,7 @@ static void vmx_load_host_state(struct v reload_host_efer(vmx); } +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. @@ -511,6 +513,18 @@ static void vmx_vcpu_load(struct kvm_vcp struct descriptor_table dt; unsigned long sysenter_esp; + if (unlikely(vcpu->cpu == -1)) { + rdtscll(vcpu->arch.host_tsc); + if (vcpu->vcpu_id == 0) { + guest_write_tsc(vcpu->arch.host_tsc, 0); + vmx->first_tsc = vcpu->arch.host_tsc; + } else { + struct vcpu_vmx *cpu0; + cpu0 = to_vmx(vcpu->kvm->vcpus[0]); + guest_write_tsc(cpu0->first_tsc, 0); + } + } + vcpu->cpu = cpu; /* * Linux uses per-cpu TSS and GDT, so set these when switching @@ -690,11 +704,8 @@ static u64 guest_read_tsc(void) * writes 'guest_tsc' into guest's timestamp counter "register" * guest_tsc = host_tsc + tsc_offset ==> tsc_offset = guest_tsc - host_tsc */ -static void guest_write_tsc(u64 guest_tsc) +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc) { - u64 host_tsc; - - rdtscll(host_tsc); vmcs_write64(TSC_OFFSET, guest_tsc - host_tsc); } @@ -758,6 +769,7 @@ static int vmx_set_msr(struct kvm_vcpu * { struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_msr_entry *msr; + u64 host_tsc; int ret = 0; switch (msr_index) { @@ -786,7 +798,8 @@ static int vmx_set_msr(struct kvm_vcpu * vmcs_writel(GUEST_SYSENTER_ESP, data); break; case MSR_IA32_TIME_STAMP_COUNTER: - guest_write_tsc(data); + rdtscll(host_tsc); + guest_write_tsc(host_tsc, data); break; default: msr = find_msr_entry(vmx, msr_index); @@ -1684,8 +1697,6 @@ static int vmx_vcpu_reset(struct kvm_vcp vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0); - guest_write_tsc(0); - /* Special registers */ vmcs_write64(GUEST_IA32_DEBUGCTL, 0); Index: kvm.quilt/arch/x86/kvm/vmx.c =================================================================== --- kvm.quilt.orig/arch/x86/kvm/vmx.c +++ kvm.quilt/arch/x86/kvm/vmx.c @@ -238,26 +238,6 @@ static void vmcs_clear(struct vmcs *vmcs vmcs, phys_addr); } -static void __vcpu_clear(void *arg) -{ - struct vcpu_vmx *vmx = arg; - int cpu = raw_smp_processor_id(); - - if (vmx->vcpu.cpu == cpu) - vmcs_clear(vmx->vmcs); - if (per_cpu(current_vmcs, cpu) == vmx->vmcs) - per_cpu(current_vmcs, cpu) = NULL; - rdtscll(vmx->vcpu.arch.host_tsc); -} - -static void vcpu_clear(struct vcpu_vmx *vmx) -{ - if (vmx->vcpu.cpu == -1) - return; - smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); - vmx->launched = 0; -} - static unsigned long vmcs_readl(unsigned long field) { unsigned long value; @@ -348,6 +328,34 @@ static void update_exception_bitmap(stru vmcs_write32(EXCEPTION_BITMAP, eb); } +static void __vcpu_clear(void *arg) +{ + struct vcpu_vmx *vmx = arg; + int cpu = raw_smp_processor_id(); + + if (vmx->vcpu.cpu == cpu) + vmcs_clear(vmx->vmcs); + if (per_cpu(current_vmcs, cpu) == vmx->vmcs) + per_cpu(current_vmcs, cpu) = NULL; + rdtscll(vmx->vcpu.arch.host_tsc); +} + +static void vcpu_clear(struct vcpu_vmx *vmx) +{ + u64 tsc_this, delta; + + if (vmx->vcpu.cpu == -1) + return; + smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); + /* + * Make sure the time stamp counter is monotonous. + */ + rdtscll(tsc_this); + delta = vmx->vcpu.arch.host_tsc - tsc_this; + vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); + vmx->launched = 0; +} + static void reload_tss(void) { #ifndef CONFIG_X86_64 @@ -490,7 +498,6 @@ static void vmx_vcpu_load(struct kvm_vcp { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 phys_addr = __pa(vmx->vmcs); - u64 tsc_this, delta; if (vcpu->cpu != cpu) { vcpu_clear(vmx); @@ -536,13 +543,6 @@ static void vmx_vcpu_load(struct kvm_vcp rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp); vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */ - - /* - * Make sure the time stamp counter is monotonous. - */ - rdtscll(tsc_this); - delta = vcpu->arch.host_tsc - tsc_this; - vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); } } ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-14 16:06 ` Marcelo Tosatti @ 2008-01-14 20:46 ` Marcelo Tosatti 2008-01-15 14:59 ` Avi Kivity 2008-01-15 14:33 ` Avi Kivity 1 sibling, 1 reply; 28+ messages in thread From: Marcelo Tosatti @ 2008-01-14 20:46 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel On Mon, Jan 14, 2008 at 02:06:47PM -0200, Marcelo Tosatti wrote: > Hi Avi, > > On Sun, Jan 13, 2008 at 02:19:29PM +0200, Avi Kivity wrote: > > Marcelo Tosatti wrote: > > >The boot TSC sync check is failing on recent Linux SMP guests on TSC > > >stable hosts. > > > > > > > > > > What about tsc unstable hosts? If your patch convinces the guest its > > tsc is table, while the host tsc is not, then it may cause confusion > > later on. > > The adjustment to zero won't fool the guest because it assumes that the > TSC's are synchronized. It will simply set the guest TSC to zero on all > VCPUs based on the time VCPU0 was initialized. > > That is, setting -(vcpu[0].first_tsc) on all VCPU's will not correct any > synchronization problem. > > > >Following patch attempts to address the problems, which are: > > > > > >1) APIC_DM_STARTUP, which is only called for vcpu's other than vcpu0, > > >will trigger ->vcpu_reset(), setting the TSC to 0. Fix that by moving > > >the guest_write_tsc(0) to vcpu_load. > > > > > >2) vcpu's are initialized at different times by QEMU (vcpu0 init happens > > >way earlier than the rest). Fix that by initializing the TSC of vcpu's > > > >0 with reference to vcpu0 init tsc value. This way TSC synchronization > > >is kept (apparently Xen does something similar). > > > > > >3) The code which adjusts the TSC of a VCPU on physical CPU switch is > > >meant to guarantee that the guest sees a monotonically increasing value. > > >However there is a large gap, in terms of clocks, between the time which > > >the source CPU TSC is read and the time the destination CPU TSC is read. > > >So move those two reads to happen at vcpu_clear. > > > > > >I believe that 3) is the reason for the -no-kvm-irqchip problems > > >reported by Avi on RHEL5, with kernels < 2.6.21 (where vcpu->cpu pinning > > >would fix the problem). Unfortunately I could reproduce that problem. > > > > > >4-way guest with constant tick at 250Hz now reliably sees the TSC's > > >synchronized, and idle guest CPU consumption is reduced by 50% (3-4% > > >instead of 8%, the latter with acpi_pm_good boot parameter). > > > > > > > > >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > >index 4741806..e1c8cf4 100644 > > >--- a/arch/x86/kvm/vmx.c > > >+++ b/arch/x86/kvm/vmx.c > > >@@ -47,6 +47,8 @@ struct vcpu_vmx { > > > struct kvm_vcpu vcpu; > > > int launched; > > > u8 fail; > > >+ u64 first_tsc; > > >+ u64 tsc_this; > > > u32 idt_vectoring_info; > > > struct kvm_msr_entry *guest_msrs; > > > struct kvm_msr_entry *host_msrs; > > >@@ -254,6 +256,7 @@ static void vcpu_clear(struct vcpu_vmx *vmx) > > > if (vmx->vcpu.cpu == -1) > > > return; > > > smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); > > >+ rdtscll(vmx->tsc_this); > > > vmx->launched = 0; > > > } > > > > > >@@ -480,6 +483,8 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) > > > reload_host_efer(vmx); > > > } > > > > > >+static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); > > >+ > > > /* > > > * Switches to specified vcpu, until a matching vcpu_put(), but assumes > > > * vcpu mutex is already taken. > > >@@ -488,7 +493,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int > > >cpu) > > > { > > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > > u64 phys_addr = __pa(vmx->vmcs); > > >- u64 tsc_this, delta; > > >+ u64 delta; > > > > > > if (vcpu->cpu != cpu) { > > > vcpu_clear(vmx); > > >@@ -511,6 +516,19 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int > > >cpu) > > > struct descriptor_table dt; > > > unsigned long sysenter_esp; > > > > > >+ if (unlikely(vcpu->cpu == -1)) { > > > > > > > This can happen after migration, I believe. > > Right now the destination host will do ->vcpu_reset() on all VCPU's on > startup... so its already broken. This patch is not introducing any > issues, just changing where it happens :) > > Perhaps fixing migration should be the subject of a separate patch? > > > >+ rdtscll(vcpu->arch.host_tsc); > > >+ vmx->tsc_this = vcpu->arch.host_tsc; > > >+ if (vcpu->vcpu_id == 0) { > > >+ guest_write_tsc(vcpu->arch.host_tsc, 0); > > >+ vmx->first_tsc = vcpu->arch.host_tsc; > > >+ } else { > > >+ struct vcpu_vmx *cpu0; > > >+ cpu0 = to_vmx(vcpu->kvm->vcpus[0]); > > >+ guest_write_tsc(cpu0->first_tsc, 0); > > >+ } > > >+ } > > >+ > > > > > > > Depending on vcpu_id == 0 can cause problems (for example, if vcpu 0 is > > somehow not the first to run). > > But QEMU will always run kvm_create() first (which does > kvm_create_vcpu(0)), then start the other threads later. So I don't see > how that can happen. > > > We might initialize the tsc base on vm creation, and if the host tsc is > > synced, then the guest tsc should also be. > > > > > vcpu->cpu = cpu; > > > /* > > > * Linux uses per-cpu TSS and GDT, so set these when > > > switching > > >@@ -526,8 +544,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int > > >cpu) > > > /* > > > * Make sure the time stamp counter is monotonous. > > > */ > > >- rdtscll(tsc_this); > > >- delta = vcpu->arch.host_tsc - tsc_this; > > >+ delta = vcpu->arch.host_tsc - vmx->tsc_this; > > > vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); > > > > > > > This is a little roundabout, how about moving the delta calculation > > immediately after the call to vcpu_clear()? > > > > I don't think this is the cause of the problem, it can't account for > > more than a few hundred cycles, compared to the much greater vmentry cost. > > Actually it accounts for several thousand cycles warp by the time the > kernel checks the TSC synchronization between CPU's, which happens very > early on boot. > > You saw the hang on userspace init, by then there could be many > VCPU->CPU switchs. > > > > > Anyway it should be in a separate patch. > > How does the following look (minus proper changelog): Actually setting the TSC_OFFSET at vcpu_clear() seems a bit early: vmwrite error: reg 2010 value 82470677 (err -2109274505) Pid: 5590, comm: qemu-system-x86 Not tainted 2.6.24-rc6-ge7706133 #26 Call Trace: [<ffffffff880a5983>] :kvm_intel:vmwrite_error+0x22/0x28 [<ffffffff880a5a99>] :kvm_intel:vcpu_clear+0x54/0x60 [<ffffffff880a5ad0>] :kvm_intel:vmx_vcpu_load+0x29/0x12b [<ffffffff8024d803>] get_futex_value_locked+0x1d/0x38 [<ffffffff8808fcf1>] :kvm:kvm_arch_vcpu_ioctl_run+0x13/0x4b9 [<ffffffff8808c3df>] :kvm:kvm_vcpu_ioctl+0xda/0x2dd So the first patch seemed alright (taking into account the comments from my previous email, that migration should probably be fixed in a separate patch since its broken already and that its guaranteed that vcpu0 is the first to hit vcpu_load). Avi? ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-14 20:46 ` Marcelo Tosatti @ 2008-01-15 14:59 ` Avi Kivity 0 siblings, 0 replies; 28+ messages in thread From: Avi Kivity @ 2008-01-15 14:59 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: > So the first patch seemed alright (taking into account the comments from > my previous email, that migration should probably be fixed in a separate > patch since its broken already and that its guaranteed that vcpu0 is the > first to hit vcpu_load). > I really would like not to have this dependency. It's easy to write userspace that doesn't do this and there's no reason not to. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-14 16:06 ` Marcelo Tosatti 2008-01-14 20:46 ` Marcelo Tosatti @ 2008-01-15 14:33 ` Avi Kivity [not found] ` <478CC448.1030901-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-15 14:33 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: > Hi Avi, > > On Sun, Jan 13, 2008 at 02:19:29PM +0200, Avi Kivity wrote: > >> Marcelo Tosatti wrote: >> >>> The boot TSC sync check is failing on recent Linux SMP guests on TSC >>> stable hosts. >>> >>> >>> >> What about tsc unstable hosts? If your patch convinces the guest its >> tsc is table, while the host tsc is not, then it may cause confusion >> later on. >> > > The adjustment to zero won't fool the guest because it assumes that the > TSC's are synchronized. It will simply set the guest TSC to zero on all > VCPUs based on the time VCPU0 was initialized. > > That is, setting -(vcpu[0].first_tsc) on all VCPU's will not correct any > synchronization problem. > > What I mean is, right now we present really broken tscs to the guest. After your patch, we present less-broken tscs (at boot, they will closely resemble stable tscs). But after the machine idles a bit and cpufreq takes over, the tscs won't be stable any more. >>> +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); >>> + >>> /* >>> * Switches to specified vcpu, until a matching vcpu_put(), but assumes >>> * vcpu mutex is already taken. >>> @@ -488,7 +493,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int >>> cpu) >>> { >>> struct vcpu_vmx *vmx = to_vmx(vcpu); >>> u64 phys_addr = __pa(vmx->vmcs); >>> - u64 tsc_this, delta; >>> + u64 delta; >>> >>> if (vcpu->cpu != cpu) { >>> vcpu_clear(vmx); >>> @@ -511,6 +516,19 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int >>> cpu) >>> struct descriptor_table dt; >>> unsigned long sysenter_esp; >>> >>> + if (unlikely(vcpu->cpu == -1)) { >>> >>> >> This can happen after migration, I believe. >> > > Right now the destination host will do ->vcpu_reset() on all VCPU's on > startup... so its already broken. This patch is not introducing any > issues, just changing where it happens :) > No, after the reset qemu will set the guest msrs including the tsc msr, overriding the reset value. However, by that time vcpu->cpu will no longer be -1, so there's no issue there. On the other hand, it may happen during cpu hotunplug (see decache_vcpus_on_vcpu). > > >>> + rdtscll(vcpu->arch.host_tsc); >>> + vmx->tsc_this = vcpu->arch.host_tsc; >>> + if (vcpu->vcpu_id == 0) { >>> + guest_write_tsc(vcpu->arch.host_tsc, 0); >>> + vmx->first_tsc = vcpu->arch.host_tsc; >>> + } else { >>> + struct vcpu_vmx *cpu0; >>> + cpu0 = to_vmx(vcpu->kvm->vcpus[0]); >>> + guest_write_tsc(cpu0->first_tsc, 0); >>> + } >>> + } >>> + >>> >>> >> Depending on vcpu_id == 0 can cause problems (for example, if vcpu 0 is >> somehow not the first to run). >> > > But QEMU will always run kvm_create() first (which does > kvm_create_vcpu(0)), then start the other threads later. So I don't see > how that can happen. > > We're not developing purely for qemu (there's xenner, for example) and I don't want to embed hidden assumptions into the API. >>> vcpu->cpu = cpu; >>> /* >>> * Linux uses per-cpu TSS and GDT, so set these when >>> switching >>> @@ -526,8 +544,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int >>> cpu) >>> /* >>> * Make sure the time stamp counter is monotonous. >>> */ >>> - rdtscll(tsc_this); >>> - delta = vcpu->arch.host_tsc - tsc_this; >>> + delta = vcpu->arch.host_tsc - vmx->tsc_this; >>> vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); >>> >>> >> This is a little roundabout, how about moving the delta calculation >> immediately after the call to vcpu_clear()? >> >> I don't think this is the cause of the problem, it can't account for >> more than a few hundred cycles, compared to the much greater vmentry cost. >> > > Actually it accounts for several thousand cycles warp by the time the > kernel checks the TSC synchronization between CPU's, which happens very > early on boot. > > You saw the hang on userspace init, by then there could be many > VCPU->CPU switchs. > > Okay. >> Anyway it should be in a separate patch. >> > > How does the following look (minus proper changelog): > > > Index: kvm.quilt/arch/x86/kvm/vmx.c > =================================================================== > --- kvm.quilt.orig/arch/x86/kvm/vmx.c > +++ kvm.quilt/arch/x86/kvm/vmx.c > @@ -47,6 +47,7 @@ struct vcpu_vmx { > struct kvm_vcpu vcpu; > int launched; > u8 fail; > + u64 first_tsc; > u32 idt_vectoring_info; > struct kvm_msr_entry *guest_msrs; > struct kvm_msr_entry *host_msrs; > @@ -480,6 +481,7 @@ static void vmx_load_host_state(struct v > reload_host_efer(vmx); > } > > +static void guest_write_tsc(u64 host_tsc, u64 guest_tsc); > /* > * Switches to specified vcpu, until a matching vcpu_put(), but assumes > * vcpu mutex is already taken. > @@ -511,6 +513,18 @@ static void vmx_vcpu_load(struct kvm_vcp > struct descriptor_table dt; > unsigned long sysenter_esp; > > + if (unlikely(vcpu->cpu == -1)) { > > + rdtscll(vcpu->arch.host_tsc); > + if (vcpu->vcpu_id == 0) { > + guest_write_tsc(vcpu->arch.host_tsc, 0); > + vmx->first_tsc = vcpu->arch.host_tsc; > + } else { > + struct vcpu_vmx *cpu0; > + cpu0 = to_vmx(vcpu->kvm->vcpus[0]); > + guest_write_tsc(cpu0->first_tsc, 0); > + } > + } > + > I don't like the dependency on vcpu 0. Also see comments above. > > +static void __vcpu_clear(void *arg) > +{ > + struct vcpu_vmx *vmx = arg; > + int cpu = raw_smp_processor_id(); > + > + if (vmx->vcpu.cpu == cpu) > + vmcs_clear(vmx->vmcs); > + if (per_cpu(current_vmcs, cpu) == vmx->vmcs) > + per_cpu(current_vmcs, cpu) = NULL; > + rdtscll(vmx->vcpu.arch.host_tsc); > +} > + > +static void vcpu_clear(struct vcpu_vmx *vmx) > +{ > + u64 tsc_this, delta; > + > + if (vmx->vcpu.cpu == -1) > + return; > + smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 0, 1); > + /* > + * Make sure the time stamp counter is monotonous. > + */ > + rdtscll(tsc_this); > + delta = vmx->vcpu.arch.host_tsc - tsc_this; > + vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta); > + vmx->launched = 0; > +} > This part is fine. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478CC448.1030901-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478CC448.1030901-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-15 14:50 ` Alexander Graf [not found] ` <478CC819.3040106-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Alexander Graf @ 2008-01-15 14:50 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel Avi Kivity wrote: > Marcelo Tosatti wrote: > >> Hi Avi, >> >> On Sun, Jan 13, 2008 at 02:19:29PM +0200, Avi Kivity wrote: >> >> >>> Marcelo Tosatti wrote: >>> >>> >>>> The boot TSC sync check is failing on recent Linux SMP guests on TSC >>>> stable hosts. >>>> >>>> >>>> >>>> >>> What about tsc unstable hosts? If your patch convinces the guest its >>> tsc is table, while the host tsc is not, then it may cause confusion >>> later on. >>> >>> >> The adjustment to zero won't fool the guest because it assumes that the >> TSC's are synchronized. It will simply set the guest TSC to zero on all >> VCPUs based on the time VCPU0 was initialized. >> >> That is, setting -(vcpu[0].first_tsc) on all VCPU's will not correct any >> synchronization problem. >> >> >> > > What I mean is, right now we present really broken tscs to the guest. > After your patch, we present less-broken tscs (at boot, they will > closely resemble stable tscs). But after the machine idles a bit and > cpufreq takes over, the tscs won't be stable any more. > > Why would the TSC break due to cpufreq? This patchset was against VMX, which is only available on current Intel CPUs. All of those guarantee a constant TSC increase at the maximum frequency. Also see the Intel Documentation: Vol. 3 18-37 For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors (family [06H], model [0FH]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the maximum qualified frequency of the processor, see Section 18.17.5 for more detail. The specific processor configuration determines the behavior. Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward. Alex ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478CC819.3040106-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478CC819.3040106-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org> @ 2008-01-15 15:09 ` Avi Kivity [not found] ` <478CCCA9.2080300-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-15 15:09 UTC (permalink / raw) To: Alexander Graf; +Cc: Marcelo Tosatti, kvm-devel Alexander Graf wrote: >>> >>> >>> >> What I mean is, right now we present really broken tscs to the guest. >> After your patch, we present less-broken tscs (at boot, they will >> closely resemble stable tscs). But after the machine idles a bit and >> cpufreq takes over, the tscs won't be stable any more. >> >> >> > > Why would the TSC break due to cpufreq? This patchset was against VMX, > which is only available on current Intel CPUs. All of those guarantee a > constant TSC increase at the maximum frequency. > > Also see the Intel Documentation: > > Vol. 3 18-37 > > For Pentium 4 processors, Intel Xeon processors (family [0FH], models > [03H and higher]); for Intel Core Solo and Intel Core Duo processors > (family [06H], model [0EH]); for the Intel Xeon processor 5100 series > and Intel Core 2 Duo processors (family [06H], model [0FH]): the > time-stamp counter increments at a constant rate. That rate may be set > by the maximum core-clock to bus-clock ratio of the processor or may be > set by the maximum resolved frequency at which the processor is booted. > The maximum resolved frequency may differ from the maximum qualified > frequency of the processor, see Section 18.17.5 for more > detail. > The specific processor configuration determines the behavior. Constant > TSC behavior ensures that the duration of each clock tick is uniform and > supports the use of the TSC as a wall clock timer even if the processor > core changes frequency. > This is the architectural behavior moving forward. Thanks; that's reassuring to know that it will work (at least on Intel). -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478CCCA9.2080300-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478CCCA9.2080300-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-15 16:42 ` Alexander Graf [not found] ` <478CE277.9010109-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org> 2008-01-16 5:51 ` Andi Kleen 1 sibling, 1 reply; 28+ messages in thread From: Alexander Graf @ 2008-01-15 16:42 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm-devel Avi Kivity wrote: > Alexander Graf wrote: > >>>> >>>> >>>> >>>> >>> What I mean is, right now we present really broken tscs to the guest. >>> After your patch, we present less-broken tscs (at boot, they will >>> closely resemble stable tscs). But after the machine idles a bit and >>> cpufreq takes over, the tscs won't be stable any more. >>> >>> >>> >>> >> Why would the TSC break due to cpufreq? This patchset was against VMX, >> which is only available on current Intel CPUs. All of those guarantee a >> constant TSC increase at the maximum frequency. >> >> Also see the Intel Documentation: >> >> Vol. 3 18-37 >> >> For Pentium 4 processors, Intel Xeon processors (family [0FH], models >> [03H and higher]); for Intel Core Solo and Intel Core Duo processors >> (family [06H], model [0EH]); for the Intel Xeon processor 5100 series >> and Intel Core 2 Duo processors (family [06H], model [0FH]): the >> time-stamp counter increments at a constant rate. That rate may be set >> by the maximum core-clock to bus-clock ratio of the processor or may be >> set by the maximum resolved frequency at which the processor is booted. >> The maximum resolved frequency may differ from the maximum qualified >> frequency of the processor, see Section 18.17.5 for more >> detail. >> The specific processor configuration determines the behavior. Constant >> TSC behavior ensures that the duration of each clock tick is uniform and >> supports the use of the TSC as a wall clock timer even if the processor >> core changes frequency. >> This is the architectural behavior moving forward. >> > > Thanks; that's reassuring to know that it will work (at least on Intel). > > If I remember correctly, there was a statement on LKML by AMD which said that the TSC is completely broken on AMD systems, so you should not use it there anyway. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478CE277.9010109-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478CE277.9010109-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org> @ 2008-01-15 17:46 ` Avi Kivity [not found] ` <478CF186.5030304-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-15 17:46 UTC (permalink / raw) To: Alexander Graf; +Cc: Marcelo Tosatti, kvm-devel Alexander Graf wrote: > If I remember correctly, there was a statement on LKML by AMD which said > that the TSC is completely broken on AMD systems, so you should not use > it there anyway. > Well, that's up to the guest to decide. Looks like older Linux guests make the wrong decision. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478CF186.5030304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478CF186.5030304-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-15 18:29 ` Amit Shah 0 siblings, 0 replies; 28+ messages in thread From: Amit Shah @ 2008-01-15 18:29 UTC (permalink / raw) To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Cc: Marcelo Tosatti, Alexander Graf, Avi Kivity On Tuesday 15 January 2008 23:16:46 Avi Kivity wrote: > Alexander Graf wrote: > > If I remember correctly, there was a statement on LKML by AMD which said > > that the TSC is completely broken on AMD systems, so you should not use > > it there anyway. > > Well, that's up to the guest to decide. Looks like older Linux guests > make the wrong decision. http://lkml.org/lkml/2005/11/4/173 That talks about AMD's TSC plans (dated 2005), but it's possible to find out whether it's a monotonically-increasing source. Basically some revisions of AMD family F onwards should support this feature: Future TSC Directions and Solutions =================================== Future AMD processors will provide a TSC that is P-state and C-State invariant and unaffected by STPCLK-throttling. This will make the TSC immune to drift. Because using the TSC for fast timer APIs is a desirable feature that helps performance, AMD has defined a CPUID feature bit that software can test to determine if the TSC is invariant. Issuing a CPUID instruction with an %eax register value of 0x8000_0007, on a processor whose base family is 0xF, returns "Advanced Power Management Information" in the %eax, %ebx, %ecx, and %edx registers. Bit 8 of the return %edx is the "TscInvariant" feature flag which is set when TSC is P-state, C-state, and STPCLK-throttling invariant; it is clear otherwise. The rate of the invariant TSC is implementation-dependent and will likely *not* be the frequency of the processor core; however, its period should be short enough such that it is not possible for two back-to-back rdtsc instructions to return the same value. Software which is trying to measure actual processor frequency or cycle-performance should use Performance Event 76h, CPU Clocks not Halted, rather than the TSC to count CPU cycles. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478CCCA9.2080300-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2008-01-15 16:42 ` Alexander Graf @ 2008-01-16 5:51 ` Andi Kleen [not found] ` <p73ir1ul3ls.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org> 1 sibling, 1 reply; 28+ messages in thread From: Andi Kleen @ 2008-01-16 5:51 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, kvm-devel Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes: > Thanks; that's reassuring to know that it will work (at least on Intel). Actually there are modern Intel systems which still have instable TSCs; e.g. IBM Summit multi node systems and some others. So you should still handle that case. -Andi ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <p73ir1ul3ls.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <p73ir1ul3ls.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org> @ 2008-01-16 8:46 ` Avi Kivity [not found] ` <478DC453.1000404-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-01-16 8:46 UTC (permalink / raw) To: Andi Kleen; +Cc: Marcelo Tosatti, Alexander Graf, kvm-devel [fixing gmane emails, urgfhsz] Andi Kleen wrote: > Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes: > > >> Thanks; that's reassuring to know that it will work (at least on Intel). >> > > Actually there are modern Intel systems which still have instable TSCs; > e.g. IBM Summit multi node systems and some others. So you should > still handle that case. > I really don't see any way we could. If the guest assumes tscs are synchronous, and they really are not, there's nothing we can do. [well, we could trap and emulate rdtsc, but performance would tank] You might taskset guests into a single node on such systems, which is a good idea anyway. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478DC453.1000404-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478DC453.1000404-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-16 13:34 ` Andi Kleen [not found] ` <478E08E5.2030507@qumranet.com> 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2008-01-16 13:34 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, Andi Kleen, kvm-devel On Wed, Jan 16, 2008 at 10:46:11AM +0200, Avi Kivity wrote: > [fixing gmane emails, urgfhsz] > > Andi Kleen wrote: > >Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes: > > > > > >>Thanks; that's reassuring to know that it will work (at least on Intel). > >> > > > >Actually there are modern Intel systems which still have instable TSCs; > >e.g. IBM Summit multi node systems and some others. So you should > >still handle that case. > > > > I really don't see any way we could. If the guest assumes tscs are > synchronous, and they really are not, there's nothing we can do. Linux checks a couple of things: e.g. if there are no deep C states and if there are no clustered nodes in the APIC etc. It might be reasonable to check the clock source of the kernel and if it's not TSC force one of these in the emulated firmware environment > You might taskset guests into a single node on such systems, which is a > good idea anyway. Ah pushing the problem to the user. An easy, but typically wrong, solution. -Andi ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <478E08E5.2030507@qumranet.com>]
[parent not found: <478E08E5.2030507-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <478E08E5.2030507-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-16 13:54 ` Andi Kleen [not found] ` <20080116135415.GA14664-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2008-01-16 13:54 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, Andi Kleen, kvm-devel On Wed, Jan 16, 2008 at 03:38:45PM +0200, Avi Kivity wrote: > Andi Kleen wrote: > >Linux checks a couple of things: e.g. if there are no deep C states > >and if there are no clustered nodes in the APIC etc. > > > >It might be reasonable to check the clock source of the kernel > >and if it's not TSC force one of these in the emulated firmware > >environment > > > > > > The problems are with older guests which assume the tsc is okay. Newer > guests check the tsc and conclude that it isn't usable. If the guest would get it wrong running natively on the host I guess it would be reasonable to require an option that forces TSC off. Disabling the TSC bit unfortunately won't work for 64bit guests, but for probably most 32bit guests. But for non broken guests they can only do that if the guest has the same visibility into the firmware state as the host. For the easy cases Linux will check it anyways becaused on standard the TSC synchronicity check, but there are cases where the TSCs only drift apart slowly over a longer time [I finally fixed the clocksource watchdog now to catch this case, but it will be only in .25] I think it would be better to fake at least some of the usual firmware cues for bad TSC if the host does not use it. > > >>You might taskset guests into a single node on such systems, which is a > >>good idea anyway. > >> > > > >Ah pushing the problem to the user. An easy, but typically wrong, solution. > > > > If you have other suggestions I'll be happy to hear them. I don't like > this either. Check if host is using TSC source and if not force a clustered APIC mode (only works for 64bit unfortunately) or fake a C3 state in ACPI and on AMD clear the synchronous TSC bit. -Andi ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20080116135415.GA14664-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>]
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <20080116135415.GA14664-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> @ 2008-01-16 13:56 ` Avi Kivity 2008-01-17 18:43 ` Marcelo Tosatti 1 sibling, 0 replies; 28+ messages in thread From: Avi Kivity @ 2008-01-16 13:56 UTC (permalink / raw) To: Andi Kleen; +Cc: Marcelo Tosatti, Alexander Graf, kvm-devel Andi Kleen wrote: > Check if host is using TSC source and if not force a clustered > APIC mode (only works for 64bit unfortunately) or fake a C3 state > in ACPI and on AMD clear the synchronous TSC bit. > Yes, I got similar suggestions from Thomas. But it looks like older guests will need a boot option. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity [not found] ` <20080116135415.GA14664-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> 2008-01-16 13:56 ` Avi Kivity @ 2008-01-17 18:43 ` Marcelo Tosatti 2008-01-17 18:56 ` Andi Kleen ` (2 more replies) 1 sibling, 3 replies; 28+ messages in thread From: Marcelo Tosatti @ 2008-01-17 18:43 UTC (permalink / raw) To: Andi Kleen; +Cc: Marcelo Tosatti, Alexander Graf, Avi Kivity, kvm-devel On Wed, Jan 16, 2008 at 02:54:15PM +0100, Andi Kleen wrote: > On Wed, Jan 16, 2008 at 03:38:45PM +0200, Avi Kivity wrote: > > Andi Kleen wrote: > > >Linux checks a couple of things: e.g. if there are no deep C states > > >and if there are no clustered nodes in the APIC etc. > > > > > >It might be reasonable to check the clock source of the kernel > > >and if it's not TSC force one of these in the emulated firmware > > >environment > > > > > > > > > > The problems are with older guests which assume the tsc is okay. Newer > > guests check the tsc and conclude that it isn't usable. > > If the guest would get it wrong running natively on the host > I guess it would be reasonable to require an option that forces > TSC off. Disabling the TSC bit unfortunately won't work for 64bit > guests, but for probably most 32bit guests. > > But for non broken guests they can only do that if the guest has the same > visibility into the firmware state as the host. For the easy cases Linux will > check it anyways becaused on standard the TSC synchronicity check, but > there are cases where the TSCs only drift apart slowly over a longer time > > [I finally fixed the clocksource watchdog now to catch this case, but > it will be only in .25] > > I think it would be better to fake at least some of the usual > firmware cues for bad TSC if the host does not use it. > > > > > >>You might taskset guests into a single node on such systems, which is a > > >>good idea anyway. > > >> > > > > > >Ah pushing the problem to the user. An easy, but typically wrong, solution. > > > > > > > If you have other suggestions I'll be happy to hear them. I don't like > > this either. > > Check if host is using TSC source and if not force a clustered > APIC mode (only works for 64bit unfortunately) or fake a C3 state > in ACPI and on AMD clear the synchronous TSC bit. Forcing clustered APIC mode works only on SMP, and there were high CPU consumption on Windows SMP guests due to C3 state being reported (fixed in kvm-30 something). So perhaps: - Faking clustered APIC on SMP - Faking C3 on UP And turning of the TSC bit (for 32-bit guests). Is the way to go? Avi, do you understand why C3 was causing the Windows SMP problems ? /* Common C-state entry for C2, C3, .. */ static void acpi_cstate_enter(struct acpi_processor_cx *cstate) { if (cstate->space_id == ACPI_CSTATE_FFH) { /* Call into architectural FFH based C-state */ acpi_processor_ffh_cstate_enter(cstate); } else { int unused; /* IO port based C-state */ inb(cstate->address); /* Dummy wait op - must do something useless after P_LVL2 read because chipsets cannot guarantee that STPCLK# signal gets asserted in time to freeze execution properly. */ unused = inl(acpi_gbl_FADT.xpm_timer_block.address); } } Clearly that inb() won't actually idle under QEMU. So the question is, if C3 stated is reported, that port read should be emulated... But how? http://www.mail-archive.com/kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org/msg05360.html Zhao, Yunfeng wrote: >> I ran the same image under qemu 0.9.0, uniprocessor, and got the same >> results: high host cpu usage while the guest is idle. Most likely the >> BIOS ACPI tables are misconfigured and the Windows idle loop is >> > confused. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-17 18:43 ` Marcelo Tosatti @ 2008-01-17 18:56 ` Andi Kleen 2008-01-20 14:59 ` Avi Kivity 2008-05-04 15:40 ` Avi Kivity 2 siblings, 0 replies; 28+ messages in thread From: Andi Kleen @ 2008-01-17 18:56 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Alexander Graf, Andi Kleen, Avi Kivity > And turning of the TSC bit (for 32-bit guests). Turning off the TSC bit will break 64bit Linux (it checks if the cpuid bits have minimum supported features) and you don't know in advantage if a guest is 32bit or not. Ok there is a special BIOS call that the kernel issues to tell the BIOS that it is 64bit -- in theory you could intercept that one and then disable the TSC bit, but it doesn't sound very clean. Also that call wasn't in the very first 64bit kernels. -Andi ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-17 18:43 ` Marcelo Tosatti 2008-01-17 18:56 ` Andi Kleen @ 2008-01-20 14:59 ` Avi Kivity 2008-05-04 15:40 ` Avi Kivity 2 siblings, 0 replies; 28+ messages in thread From: Avi Kivity @ 2008-01-20 14:59 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Alexander Graf, Andi Kleen Marcelo Tosatti wrote: > Avi, do you understand why C3 was causing the Windows SMP problems ? > > It may be that the latency was advertised as so low that Windows switched into C3 and back too often. Also, the problem possibly wasn't SMP related but rather TPR related -- when the idle loop called acpi to do the mode switches, that caused many tpr hits. Perhaps with a FlexPriority enabled processor it would work better. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-01-17 18:43 ` Marcelo Tosatti 2008-01-17 18:56 ` Andi Kleen 2008-01-20 14:59 ` Avi Kivity @ 2008-05-04 15:40 ` Avi Kivity 2008-05-06 13:55 ` Avi Kivity 2 siblings, 1 reply; 28+ messages in thread From: Avi Kivity @ 2008-05-04 15:40 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Alexander Graf, Andi Kleen [Resurrecting post from the dead] Marcelo Tosatti wrote: > Forcing clustered APIC mode works only on SMP, and there were high CPU > consumption on Windows SMP guests due to C3 state being reported (fixed > in kvm-30 something). > > So perhaps: > - Faking clustered APIC on SMP > - Faking C3 on UP > > And turning of the TSC bit (for 32-bit guests). > > Is the way to go? > > Avi, do you understand why C3 was causing the Windows SMP problems ? > > It's probably inb()ing on the port in a loop. It's not SMP causing the problems, but the ACPI HAL. I'll check this. > /* Common C-state entry for C2, C3, .. */ > static void acpi_cstate_enter(struct acpi_processor_cx *cstate) > { > if (cstate->space_id == ACPI_CSTATE_FFH) { > /* Call into architectural FFH based C-state */ > acpi_processor_ffh_cstate_enter(cstate); > } else { > int unused; > /* IO port based C-state */ > inb(cstate->address); > /* Dummy wait op - must do something useless after P_LVL2 read > because chipsets cannot guarantee that STPCLK# signal > gets asserted in time to freeze execution properly. */ > unused = inl(acpi_gbl_FADT.xpm_timer_block.address); > } > } > > Clearly that inb() won't actually idle under QEMU. So the question is, > if C3 stated is reported, that port read should be emulated... But how? > We can add now use the KVM_SET_MPSTATE ioctl to halt the vcpu if we see the port read. Since not all hosts support setting mpstate, the bios should only report C3 if the host supports it. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] fix VMX TSC synchronicity 2008-05-04 15:40 ` Avi Kivity @ 2008-05-06 13:55 ` Avi Kivity 0 siblings, 0 replies; 28+ messages in thread From: Avi Kivity @ 2008-05-06 13:55 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Alexander Graf, Andi Kleen Avi Kivity wrote: > [Resurrecting post from the dead] > > > Marcelo Tosatti wrote: >> Forcing clustered APIC mode works only on SMP, and there were high CPU >> consumption on Windows SMP guests due to C3 state being reported (fixed >> in kvm-30 something). >> >> So perhaps: >> - Faking clustered APIC on SMP - Faking C3 on UP >> >> And turning of the TSC bit (for 32-bit guests). >> >> Is the way to go? >> Avi, do you understand why C3 was causing the Windows SMP problems ? >> >> > > It's probably inb()ing on the port in a loop. It's not SMP causing > the problems, but the ACPI HAL. I'll check this. > Yes, it's reading 0xb010 and 0xb014, which ought to place the cpu in sleep mode, but don't. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-05-06 13:55 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-12 3:11 [RFC] fix VMX TSC synchronicity Will Trives
2008-01-12 12:28 ` Marcelo Tosatti
2008-01-12 13:48 ` Will Trives
2008-01-12 20:51 ` Avi Kivity
[not found] ` <4789284E.7000907-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-19 18:58 ` [PATCH] kvm: qemu: remove SIGUSR2 from io_sigset Will Trives
2008-01-20 9:34 ` Avi Kivity
[not found] ` <4793158A.2080509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-20 10:45 ` Avi Kivity
-- strict thread matches above, loose matches on Subject: below --
2008-01-11 20:49 [RFC] fix VMX TSC synchronicity Marcelo Tosatti
2008-01-13 12:19 ` Avi Kivity
[not found] ` <478A01D1.7000402-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-14 16:06 ` Marcelo Tosatti
2008-01-14 20:46 ` Marcelo Tosatti
2008-01-15 14:59 ` Avi Kivity
2008-01-15 14:33 ` Avi Kivity
[not found] ` <478CC448.1030901-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-15 14:50 ` Alexander Graf
[not found] ` <478CC819.3040106-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org>
2008-01-15 15:09 ` Avi Kivity
[not found] ` <478CCCA9.2080300-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-15 16:42 ` Alexander Graf
[not found] ` <478CE277.9010109-r27SGEef+tmzQB+pC5nmwQ@public.gmane.org>
2008-01-15 17:46 ` Avi Kivity
[not found] ` <478CF186.5030304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-15 18:29 ` Amit Shah
2008-01-16 5:51 ` Andi Kleen
[not found] ` <p73ir1ul3ls.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org>
2008-01-16 8:46 ` Avi Kivity
[not found] ` <478DC453.1000404-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-16 13:34 ` Andi Kleen
[not found] ` <478E08E5.2030507@qumranet.com>
[not found] ` <478E08E5.2030507-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-16 13:54 ` Andi Kleen
[not found] ` <20080116135415.GA14664-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2008-01-16 13:56 ` Avi Kivity
2008-01-17 18:43 ` Marcelo Tosatti
2008-01-17 18:56 ` Andi Kleen
2008-01-20 14:59 ` Avi Kivity
2008-05-04 15:40 ` Avi Kivity
2008-05-06 13:55 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox