* Bug in KVM clock backwards compensation
@ 2011-04-28 6:59 Zachary Amsden
2011-04-28 7:06 ` Jan Kiszka
2011-04-28 7:13 ` Roedel, Joerg
0 siblings, 2 replies; 12+ messages in thread
From: Zachary Amsden @ 2011-04-28 6:59 UTC (permalink / raw)
To: kvm, Jan Kiszka, Joerg Roedel
So I've been going over the new code changes to the TSC related code and
I don't like one particular set of changes. In particular, here:
kvm_x86_ops->vcpu_load(vcpu, cpu);
if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
/* Make sure TSC doesn't go backwards */
s64 tsc_delta;
u64 tsc;
kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
tsc - vcpu->arch.last_guest_tsc;
if (tsc_delta < 0)
mark_tsc_unstable("KVM discovered backwards TSC");
if (check_tsc_unstable()) {
kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
vcpu->arch.tsc_catchup = 1;
}
The point of this code fragment is to test the host clock to see if it
is stable, because we may have just come back from an idle phase which
stopped the TSC, switched CPUs, or come back from a deep sleep state
which reset the host TSC.
However, the above code is fetching the guest TSC instead of the host
TSC, which isn't the way it is supposed to work.
I saw a patch floating around that touched this code recently, but I
think there's a definite issue here that needs addressing.
Zach
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Bug in KVM clock backwards compensation 2011-04-28 6:59 Bug in KVM clock backwards compensation Zachary Amsden @ 2011-04-28 7:06 ` Jan Kiszka 2011-04-28 7:22 ` Roedel, Joerg 2011-04-28 17:48 ` Zachary Amsden 2011-04-28 7:13 ` Roedel, Joerg 1 sibling, 2 replies; 12+ messages in thread From: Jan Kiszka @ 2011-04-28 7:06 UTC (permalink / raw) To: Zachary Amsden; +Cc: kvm, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 1869 bytes --] On 2011-04-28 08:59, Zachary Amsden wrote: > So I've been going over the new code changes to the TSC related code and > I don't like one particular set of changes. In particular, here: > > kvm_x86_ops->vcpu_load(vcpu, cpu); > if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { > /* Make sure TSC doesn't go backwards */ > s64 tsc_delta; > u64 tsc; > > kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc); > tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : > tsc - vcpu->arch.last_guest_tsc; > > if (tsc_delta < 0) > mark_tsc_unstable("KVM discovered backwards TSC"); > if (check_tsc_unstable()) { > kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); > vcpu->arch.tsc_catchup = 1; > } > > > The point of this code fragment is to test the host clock to see if it > is stable, because we may have just come back from an idle phase which > stopped the TSC, switched CPUs, or come back from a deep sleep state > which reset the host TSC. > > However, the above code is fetching the guest TSC instead of the host > TSC, which isn't the way it is supposed to work. > > I saw a patch floating around that touched this code recently, but I > think there's a definite issue here that needs addressing. And /me still wonders (like I did when this first popped up) if the proper place of determining TSC stability really have to be KVM. If the Linux core fails to detect some instability and KVM has to jump in, shouldn't we better improve the core's detection abilities and make use of them in KVM? Conceptually this looks like we are currently just working around a core deficit in KVM. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 259 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 7:06 ` Jan Kiszka @ 2011-04-28 7:22 ` Roedel, Joerg 2011-04-28 19:06 ` Zachary Amsden 2011-04-28 17:48 ` Zachary Amsden 1 sibling, 1 reply; 12+ messages in thread From: Roedel, Joerg @ 2011-04-28 7:22 UTC (permalink / raw) To: Jan Kiszka; +Cc: Zachary Amsden, kvm On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote: > And /me still wonders (like I did when this first popped up) if the > proper place of determining TSC stability really have to be KVM. > > If the Linux core fails to detect some instability and KVM has to jump > in, shouldn't we better improve the core's detection abilities and make > use of them in KVM? Conceptually this looks like we are currently just > working around a core deficit in KVM. Yes, good question. Has this ever triggered on a real machine (not counting the suspend/resume issue in)? Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 7:22 ` Roedel, Joerg @ 2011-04-28 19:06 ` Zachary Amsden 2011-04-28 22:38 ` Jan Kiszka 0 siblings, 1 reply; 12+ messages in thread From: Zachary Amsden @ 2011-04-28 19:06 UTC (permalink / raw) To: Roedel, Joerg; +Cc: Jan Kiszka, kvm On 04/28/2011 12:22 AM, Roedel, Joerg wrote: > On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote: > >> And /me still wonders (like I did when this first popped up) if the >> proper place of determining TSC stability really have to be KVM. >> >> If the Linux core fails to detect some instability and KVM has to jump >> in, shouldn't we better improve the core's detection abilities and make >> use of them in KVM? Conceptually this looks like we are currently just >> working around a core deficit in KVM. >> > Yes, good question. Has this ever triggered on a real machine (not > counting the suspend/resume issue in)? > Yes... some platforms don't go haywire until you start using power mangement, TSC is stable before that, but not afterwards, and depending on the version of the kernel, KVM might detect this before the kernel does. Honestly, the code is obsolete, but still useful for those who build KVM as an external module on older kernels using the kvm-kmod system. Zach ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 19:06 ` Zachary Amsden @ 2011-04-28 22:38 ` Jan Kiszka 0 siblings, 0 replies; 12+ messages in thread From: Jan Kiszka @ 2011-04-28 22:38 UTC (permalink / raw) To: Zachary Amsden; +Cc: Roedel, Joerg, kvm [-- Attachment #1: Type: text/plain, Size: 1219 bytes --] On 2011-04-28 21:06, Zachary Amsden wrote: > On 04/28/2011 12:22 AM, Roedel, Joerg wrote: >> On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote: >> >>> And /me still wonders (like I did when this first popped up) if the >>> proper place of determining TSC stability really have to be KVM. >>> >>> If the Linux core fails to detect some instability and KVM has to jump >>> in, shouldn't we better improve the core's detection abilities and make >>> use of them in KVM? Conceptually this looks like we are currently just >>> working around a core deficit in KVM. >>> >> Yes, good question. Has this ever triggered on a real machine (not >> counting the suspend/resume issue in)? >> > > Yes... some platforms don't go haywire until you start using power > mangement, TSC is stable before that, but not afterwards, and depending > on the version of the kernel, KVM might detect this before the kernel does. > > Honestly, the code is obsolete, but still useful for those who build KVM > as an external module on older kernels using the kvm-kmod system. I'll happily accept patches that migrate any logic to kvm-kmod that the current kernel does not need it anymore. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 259 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 7:06 ` Jan Kiszka 2011-04-28 7:22 ` Roedel, Joerg @ 2011-04-28 17:48 ` Zachary Amsden 1 sibling, 0 replies; 12+ messages in thread From: Zachary Amsden @ 2011-04-28 17:48 UTC (permalink / raw) To: Jan Kiszka; +Cc: kvm, Joerg Roedel On 04/28/2011 12:06 AM, Jan Kiszka wrote: > On 2011-04-28 08:59, Zachary Amsden wrote: > >> So I've been going over the new code changes to the TSC related code and >> I don't like one particular set of changes. In particular, here: >> >> kvm_x86_ops->vcpu_load(vcpu, cpu); >> if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { >> /* Make sure TSC doesn't go backwards */ >> s64 tsc_delta; >> u64 tsc; >> >> kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc); >> tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : >> tsc - vcpu->arch.last_guest_tsc; >> >> if (tsc_delta< 0) >> mark_tsc_unstable("KVM discovered backwards TSC"); >> if (check_tsc_unstable()) { >> kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); >> vcpu->arch.tsc_catchup = 1; >> } >> >> >> The point of this code fragment is to test the host clock to see if it >> is stable, because we may have just come back from an idle phase which >> stopped the TSC, switched CPUs, or come back from a deep sleep state >> which reset the host TSC. >> >> However, the above code is fetching the guest TSC instead of the host >> TSC, which isn't the way it is supposed to work. >> >> I saw a patch floating around that touched this code recently, but I >> think there's a definite issue here that needs addressing. >> > And /me still wonders (like I did when this first popped up) if the > proper place of determining TSC stability really have to be KVM. > No, it's not. I have a patch which fixes that. Was in the process of merging it into my local tree when I found this. > If the Linux core fails to detect some instability and KVM has to jump > in, shouldn't we better improve the core's detection abilities and make > use of them in KVM? Conceptually this looks like we are currently just > working around a core deficit in KVM. > 100% correct you are. Zach ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 6:59 Bug in KVM clock backwards compensation Zachary Amsden 2011-04-28 7:06 ` Jan Kiszka @ 2011-04-28 7:13 ` Roedel, Joerg 2011-04-28 18:34 ` Zachary Amsden 1 sibling, 1 reply; 12+ messages in thread From: Roedel, Joerg @ 2011-04-28 7:13 UTC (permalink / raw) To: Zachary Amsden; +Cc: kvm, Jan Kiszka On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote: > So I've been going over the new code changes to the TSC related code and > I don't like one particular set of changes. In particular, here: > > kvm_x86_ops->vcpu_load(vcpu, cpu); > if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { > /* Make sure TSC doesn't go backwards */ > s64 tsc_delta; > u64 tsc; > > kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc); > tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : > tsc - vcpu->arch.last_guest_tsc; > > if (tsc_delta < 0) > mark_tsc_unstable("KVM discovered backwards TSC"); > if (check_tsc_unstable()) { > kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); > vcpu->arch.tsc_catchup = 1; > } > > > The point of this code fragment is to test the host clock to see if it > is stable, because we may have just come back from an idle phase which > stopped the TSC, switched CPUs, or come back from a deep sleep state > which reset the host TSC. I see it different. This code wants to check if the _guest_ tsc moves forwared (or at least not backwards). So it is fully legitimate to just do this by reading the guest-tsc and compare it to the last one the guest had. > I saw a patch floating around that touched this code recently, but I > think there's a definite issue here that needs addressing. In fact, this change was done to address one of your concerns. You mentioned that the values passed to adjust_tsc_offset() were in unconsistent units in my first version of tsc-scaling. This was a right objection because one call-site used guest-tsc-units while the other used host-tsc-units. This change intended to fix that by using guest-tsc-units always for adjust_tsc_offset(). Not that the guest and the host tsc have the same units on current machines. But with tsc-scaling these units are different. Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 7:13 ` Roedel, Joerg @ 2011-04-28 18:34 ` Zachary Amsden 2011-04-28 20:20 ` Joerg Roedel 0 siblings, 1 reply; 12+ messages in thread From: Zachary Amsden @ 2011-04-28 18:34 UTC (permalink / raw) To: Roedel, Joerg; +Cc: kvm, Jan Kiszka On 04/28/2011 12:13 AM, Roedel, Joerg wrote: > On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote: > >> So I've been going over the new code changes to the TSC related code and >> I don't like one particular set of changes. In particular, here: >> >> kvm_x86_ops->vcpu_load(vcpu, cpu); >> if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { >> /* Make sure TSC doesn't go backwards */ >> s64 tsc_delta; >> u64 tsc; >> >> kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc); >> tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : >> tsc - vcpu->arch.last_guest_tsc; >> >> if (tsc_delta< 0) >> mark_tsc_unstable("KVM discovered backwards TSC"); >> if (check_tsc_unstable()) { >> kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); >> vcpu->arch.tsc_catchup = 1; >> } >> >> >> The point of this code fragment is to test the host clock to see if it >> is stable, because we may have just come back from an idle phase which >> stopped the TSC, switched CPUs, or come back from a deep sleep state >> which reset the host TSC. >> > I see it different. This code wants to check if the _guest_ tsc moves > forwared (or at least not backwards). So it is fully legitimate to just > do this by reading the guest-tsc and compare it to the last one the > guest had. > That wasn't the intention when I wrote that code. It's simply there to detect backwards motion of the host TSC. The guest TSC can legally go backwards whenever the guest decides to change it, so checking the guest TSC doesn't make sense here. >> I saw a patch floating around that touched this code recently, but I >> think there's a definite issue here that needs addressing. >> > In fact, this change was done to address one of your concerns. You > mentioned that the values passed to adjust_tsc_offset() were in > unconsistent units in my first version of tsc-scaling. This was a right > objection because one call-site used guest-tsc-units while the other > used host-tsc-units. This change intended to fix that by using > guest-tsc-units always for adjust_tsc_offset(). > > Not that the guest and the host tsc have the same units on current > machines. But with tsc-scaling these units are different. > Yes, with tsc-scaling, the machines already have stable TSCs - the above test is for older hardware which could have problems, and can be reverted back to the original code without worrying about switching units. Zach ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 18:34 ` Zachary Amsden @ 2011-04-28 20:20 ` Joerg Roedel 2011-04-29 3:00 ` Zachary Amsden 0 siblings, 1 reply; 12+ messages in thread From: Joerg Roedel @ 2011-04-28 20:20 UTC (permalink / raw) To: Zachary Amsden; +Cc: Roedel, Joerg, kvm, Jan Kiszka On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote: > On 04/28/2011 12:13 AM, Roedel, Joerg wrote: >> I see it different. This code wants to check if the _guest_ tsc moves >> forwared (or at least not backwards). So it is fully legitimate to just >> do this by reading the guest-tsc and compare it to the last one the >> guest had. > > That wasn't the intention when I wrote that code. It's simply there to > detect backwards motion of the host TSC. The guest TSC can legally go > backwards whenever the guest decides to change it, so checking the guest > TSC doesn't make sense here. This code checks how many guest tsc cycles have passed since this vCPU was de-scheduled last time (and before it is running again). So since the vCPU hasn't run in the meantime it had no chance to change its TSC. Further, the other parameters like the TSC offset and the scaling multiplier havn't changed too, so the only variable in the guest-tsc calculation is the host-tsc. So this calculation using the guest-tsc can detect backwards going host-tsc as good as the old one. The benefit here is that we can feed consistent values into adjust_tsc_offset(). > Yes, with tsc-scaling, the machines already have stable TSCs - the above > test is for older hardware which could have problems, and can be > reverted back to the original code without worrying about switching > units. This is the case pratically. But architecturally the tsc-scaling feature does not depend on a constant tsc, so we can make no such assumtions. Additionally, it may happen that Linux mis-detects an unstable tsc for some reason (broken BIOS, bug in the code, ...). Therefore I think it is dangerous to assume that this code will never run on tsc-scaling capable hosts. And if it does and we don't manage the tsc-offset units right, we may see very weird behavior. Regards, Joerg ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-28 20:20 ` Joerg Roedel @ 2011-04-29 3:00 ` Zachary Amsden 2011-04-29 8:40 ` Joerg Roedel 0 siblings, 1 reply; 12+ messages in thread From: Zachary Amsden @ 2011-04-29 3:00 UTC (permalink / raw) To: Joerg Roedel; +Cc: Roedel, Joerg, kvm, Jan Kiszka On 04/28/2011 01:20 PM, Joerg Roedel wrote: > On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote: > >> On 04/28/2011 12:13 AM, Roedel, Joerg wrote: >> > >>> I see it different. This code wants to check if the _guest_ tsc moves >>> forwared (or at least not backwards). So it is fully legitimate to just >>> do this by reading the guest-tsc and compare it to the last one the >>> guest had. >>> >> That wasn't the intention when I wrote that code. It's simply there to >> detect backwards motion of the host TSC. The guest TSC can legally go >> backwards whenever the guest decides to change it, so checking the guest >> TSC doesn't make sense here. >> > This code checks how many guest tsc cycles have passed since this vCPU > was de-scheduled last time (and before it is running again). So since > the vCPU hasn't run in the meantime it had no chance to change its TSC. > Further, the other parameters like the TSC offset and the scaling > multiplier havn't changed too, so the only variable in the guest-tsc > calculation is the host-tsc. > So this calculation using the guest-tsc can detect backwards going > host-tsc as good as the old one. The benefit here is that we can feed > consistent values into adjust_tsc_offset(). > While true, this is more complex than the original code. The original code here doesn't try to actually compensate for the guest TSC difference - instead what it does is NULL any discovered host TSC delta: if (tsc_delta < 0) mark_tsc_unstable("KVM discovered backwards TSC"); if (check_tsc_unstable()) { kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); vcpu->arch.tsc_catchup = 1; } kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); Erasing that delta also erases elapsed time since the VCPU has last been run, which isn't desirable, so it then sets tsc_catchup mode, which will restore the proper TSC. The request here triggers code which later updates the TSC offset again. To avoid complexity, I think it's simplest to do the first computation in terms of the host TSC. >> Yes, with tsc-scaling, the machines already have stable TSCs - the above >> test is for older hardware which could have problems, and can be >> reverted back to the original code without worrying about switching >> units. >> > This is the case pratically. But architecturally the tsc-scaling feature > does not depend on a constant tsc, so we can make no such assumtions. > Additionally, it may happen that Linux mis-detects an unstable tsc for > some reason (broken BIOS, bug in the code, ...). Therefore I think it > is dangerous to assume that this code will never run on tsc-scaling > capable hosts. And if it does and we don't manage the tsc-offset units > right, we may see very weird behavior. > I agree, it is best to handle this case - hardware can and will change - but the TSC adjustment in terms of guest rate should be done under the atomic protection right before entering hardware virtualized mode - here: I left compute_guest_tsc in place to recompute time in guest units here, even if the underlying hardware rate changes. /* * We may have to catch up the TSC to match elapsed wall clock * time for two reasons, even if kvmclock is used. * 1) CPU could have been running below the maximum TSC rate * 2) Broken TSC compensation resets the base at each VCPU * entry to avoid unknown leaps of TSC even when running * again on the same CPU. This may cause apparent elapsed * time to disappear, and the guest to stand still or run * very slowly. */ if (vcpu->tsc_catchup) { u64 tsc = compute_guest_tsc(v, kernel_ns); if (tsc > tsc_timestamp) { kvm_x86_ops->adjust_tsc_offset(v, tsc - tsc_timestamp); tsc_timestamp = tsc; } } So yeah, the code is getting pretty complex but we'd like to avoid that as much as possible - so I would prefer to have the hardware backwards compensation separate from the guest rate computation by doing this: step 1) remove any backwards hardware TSC delta (in hardware units) step 2) recompute guest TSC from a stable clock (gotten from kernel_ns) and apply adjustment (in guest units) So it appears you can just share most of the logic of guest TSC catchup mode. What do you think? Zach ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-29 3:00 ` Zachary Amsden @ 2011-04-29 8:40 ` Joerg Roedel 2011-04-29 18:17 ` Zachary Amsden 0 siblings, 1 reply; 12+ messages in thread From: Joerg Roedel @ 2011-04-29 8:40 UTC (permalink / raw) To: Zachary Amsden; +Cc: Roedel, Joerg, kvm, Jan Kiszka On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote: > On 04/28/2011 01:20 PM, Joerg Roedel wrote: >> This code checks how many guest tsc cycles have passed since this vCPU >> was de-scheduled last time (and before it is running again). So since >> the vCPU hasn't run in the meantime it had no chance to change its TSC. >> Further, the other parameters like the TSC offset and the scaling >> multiplier havn't changed too, so the only variable in the guest-tsc >> calculation is the host-tsc. >> So this calculation using the guest-tsc can detect backwards going >> host-tsc as good as the old one. The benefit here is that we can feed >> consistent values into adjust_tsc_offset(). >> > > While true, this is more complex than the original code. The original > code here doesn't try to actually compensate for the guest TSC > difference - instead what it does is NULL any discovered host TSC delta: Why should KVM care about the host-tsc going backwards when the guest-tsc moves forward? I think the bottom-line of this code is to make sure the guest-tsc does not jump around (or even jumps backwards). I also don't agree that this is additional complexity. With these changes we were able to get rid of the last_host_tsc variable which is actually a simplification. As I see it, there are two situations here: 1) On hosts without tsc-scaling the value of tsc_delta calculated from the guest-tsc values is the same as when calculated with host-tsc values, because the guest-tsc only differs by an offset from the host-tsc. 2) On hosts with tsc-scaling these two tsc_delta values may be different. If the guest has a different tsc-freq as the host, we can't simply adjust the tsc by an offset calculated from the host-tsc values. So tsc_delta has to be calculated using the guest-tsc. > > if (tsc_delta < 0) > mark_tsc_unstable("KVM discovered backwards TSC"); > if (check_tsc_unstable()) { > kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); > vcpu->arch.tsc_catchup = 1; > } > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > > Erasing that delta also erases elapsed time since the VCPU has last been > run, which isn't desirable, so it then sets tsc_catchup mode, which will > restore the proper TSC. The request here triggers code which later > updates the TSC offset again. I just looked again at the tsc_catchup thing. Since the clock-update is forced in the code above, and the clock-update-code adjusts the tsc itself if necessary, is it really necessary to do this here at all? > I left compute_guest_tsc in place to recompute time in guest units here, > even if the underlying hardware rate changes. > > /* > * We may have to catch up the TSC to match elapsed wall clock > * time for two reasons, even if kvmclock is used. > * 1) CPU could have been running below the maximum TSC rate > * 2) Broken TSC compensation resets the base at each VCPU > * entry to avoid unknown leaps of TSC even when running > * again on the same CPU. This may cause apparent elapsed > * time to disappear, and the guest to stand still or run > * very slowly. > */ > if (vcpu->tsc_catchup) { > u64 tsc = compute_guest_tsc(v, kernel_ns); > if (tsc > tsc_timestamp) { > kvm_x86_ops->adjust_tsc_offset(v, tsc - > tsc_timestamp); > tsc_timestamp = tsc; > } > } > > So yeah, the code is getting pretty complex but we'd like to avoid that > as much as possible - so I would prefer to have the hardware backwards > compensation separate from the guest rate computation by doing this: > > step 1) remove any backwards hardware TSC delta (in hardware units) > step 2) recompute guest TSC from a stable clock (gotten from kernel_ns) > and apply adjustment (in guest units) > > So it appears you can just share most of the logic of guest TSC catchup > mode. > > What do you think? With the compensation done in kvm_guest_time_update I see no reason anymore to adjust the tsc in vcpu_load at all. So if we can remove the adjustment from there we can switch back to host-tsc there. The question still is whether this needs to be done in KVM. For older kernels Jan will take patches that handle this in kvm-kmod. So this code can probably be removed from vcpu_load. The benefit is that we don't need to re-introduce the last_host_tsc. If we still need to call adjust_tsc_offset() in vcpu_load then we have to do the calculation with the guest_tsc because this gives us a tsc_delta in units that adjust_tsc_offset expects. Regards, Joerg ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug in KVM clock backwards compensation 2011-04-29 8:40 ` Joerg Roedel @ 2011-04-29 18:17 ` Zachary Amsden 0 siblings, 0 replies; 12+ messages in thread From: Zachary Amsden @ 2011-04-29 18:17 UTC (permalink / raw) To: Joerg Roedel; +Cc: Roedel, Joerg, kvm, Jan Kiszka On 04/29/2011 01:40 AM, Joerg Roedel wrote: > On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote: > >> On 04/28/2011 01:20 PM, Joerg Roedel wrote: >> > >>> This code checks how many guest tsc cycles have passed since this vCPU >>> was de-scheduled last time (and before it is running again). So since >>> the vCPU hasn't run in the meantime it had no chance to change its TSC. >>> Further, the other parameters like the TSC offset and the scaling >>> multiplier havn't changed too, so the only variable in the guest-tsc >>> calculation is the host-tsc. >>> So this calculation using the guest-tsc can detect backwards going >>> host-tsc as good as the old one. The benefit here is that we can feed >>> consistent values into adjust_tsc_offset(). >>> >>> >> While true, this is more complex than the original code. The original >> code here doesn't try to actually compensate for the guest TSC >> difference - instead what it does is NULL any discovered host TSC delta: >> > Why should KVM care about the host-tsc going backwards when the > guest-tsc moves forward? I think the bottom-line of this code is to make > sure the guest-tsc does not jump around (or even jumps backwards). > > I also don't agree that this is additional complexity. With these > changes we were able to get rid of the last_host_tsc variable which is > actually a simplification. > > As I see it, there are two situations here: > > 1) On hosts without tsc-scaling the value of tsc_delta calculated from > the guest-tsc values is the same as when calculated with host-tsc > values, because the guest-tsc only differs by an offset from the > host-tsc. > > 2) On hosts with tsc-scaling these two tsc_delta values may be > different. If the guest has a different tsc-freq as the host, we > can't simply adjust the tsc by an offset calculated from the host-tsc > values. So tsc_delta has to be calculated using the guest-tsc. > >> if (tsc_delta< 0) >> mark_tsc_unstable("KVM discovered backwards TSC"); >> if (check_tsc_unstable()) { >> kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); >> vcpu->arch.tsc_catchup = 1; >> } >> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); >> >> Erasing that delta also erases elapsed time since the VCPU has last been >> run, which isn't desirable, so it then sets tsc_catchup mode, which will >> restore the proper TSC. The request here triggers code which later >> updates the TSC offset again. >> > I just looked again at the tsc_catchup thing. Since the clock-update is > forced in the code above, and the clock-update-code adjusts the tsc > itself if necessary, is it really necessary to do this here at all? > Unfortunately, yes, it is. The code in the second or catchup phase can only correct an under adjusted clock, or we risk setting the guest clock backwards from what has been observed. So to guarantee there is not a huge jump due to over-adjustment, we must eliminate the TSC delta when switching CPUs, and that needs to happen in the preempt notifier, not the clock update handler. I agree it is simpler to do everything in terms of guest clock rate, but there is one variable which is thrown in to complicate things here - the host clock rate may have changed during this whole process. Let me look at the code again and see if it is possible to get rid of the first, host based adjustment entirely. Ideally we would just track things in terms of the guest clock and do all the computation inside the clock update request handler instead of having one adjustment in the preempt notifier and a separate one later when the clock update happens. I had originally tried something like that but ran into issues where the rate computation got exceedingly difficult. Maybe the code has been restructured enough now that it will work (the clock update didn't used to happen unless you had KVM clock...) Zach ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-04-29 18:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-28 6:59 Bug in KVM clock backwards compensation Zachary Amsden 2011-04-28 7:06 ` Jan Kiszka 2011-04-28 7:22 ` Roedel, Joerg 2011-04-28 19:06 ` Zachary Amsden 2011-04-28 22:38 ` Jan Kiszka 2011-04-28 17:48 ` Zachary Amsden 2011-04-28 7:13 ` Roedel, Joerg 2011-04-28 18:34 ` Zachary Amsden 2011-04-28 20:20 ` Joerg Roedel 2011-04-29 3:00 ` Zachary Amsden 2011-04-29 8:40 ` Joerg Roedel 2011-04-29 18:17 ` Zachary Amsden
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).