* gettimeofday "slow" in RHEL4 guests
@ 2008-11-24 17:47 David S. Ahern
2008-11-25 4:41 ` David S. Ahern
0 siblings, 1 reply; 15+ messages in thread
From: David S. Ahern @ 2008-11-24 17:47 UTC (permalink / raw)
To: kvm-devel
I noticed that gettimeofday in RHEL4.6 guests is taking much longer than
with RHEL3.8 guests. I wrote a simple program (see below) to call
gettimeofday in a loop 1,000,000 times and then used time to measure how
long it took.
For the RHEL3.8 guest:
time -p ./timeofday_bench
real 0.99
user 0.12
sys 0.24
For the RHEL4.6 guest with the default clock source (pmtmr):
time -p ./timeofday_bench
real 15.65
user 0.18
sys 15.46
and RHEL4.6 guest with PIT as the clock source (clock=pit kernel parameter):
time -p ./timeofday_bench
real 13.67
user 0.21
sys 13.45
So, basically gettimeofday() takes about 50 times as long on a RHEL4 guest.
Host is a DL380G5, 2 dual-core Xeon 5140 processors, 4 GB of RAM. It's
running kvm.git tree as of 11/18/08 with kvm-75 userspace. Guest in both
RHEL3 and RHEL4 cases has 4 vcpus, 3.5GB of RAM.
david
----------
timeofday_bench.c:
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int rc = 0, n;
struct timeval tv;
int iter = 1000000; /* number of times to call gettimeofday */
if (argc > 1)
iter = atoi(argv[1]);
if (iter == 0) {
fprintf(stderr, "invalid number of iterations\n");
return 1;
}
printf("starting.... ");
for (n = 0; n < iter; ++n) {
if (gettimeofday(&tv, NULL) != 0) {
fprintf(stderr, "\ngettimeofday failed\n");
rc = 1;
break;
}
}
if (!rc)
printf("done\n");
return rc;
}
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: gettimeofday "slow" in RHEL4 guests 2008-11-24 17:47 gettimeofday "slow" in RHEL4 guests David S. Ahern @ 2008-11-25 4:41 ` David S. Ahern 2008-11-25 10:14 ` Andi Kleen 2008-11-25 17:20 ` Hollis Blanchard 0 siblings, 2 replies; 15+ messages in thread From: David S. Ahern @ 2008-11-25 4:41 UTC (permalink / raw) To: kvm-devel; +Cc: Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity Some more data on this overhead. RHEL3 (which is based on the 2.4.21 kernel) gets microsecond resolutions by reading the TSC. Reading the TSC from within a guest is very fast on kvm. RHEL4 (which is basd on the 2.6.9 kernel) allows multiple time sources: pmtmr (ACPI power management timer which is the default), pit, hpet and TSC. The pmtmr and pit both do ioport reads to get microsecond resolutions (see read_pmtmr and get_offset_pit, respectively). For the tsc as the timer source gettimeofday is *very* lightweight, but time drifts very badly and ntpd cannot acquire a sync. I believe someone is working on the HPET for guests and I know from bare metal performance that it is a much lighter weight time source, but with RHEL4 the HPET breaks the ability to use the RTC. So, I'm running out of options for reliable and lightweight time sources. Any chance the pit or pmtmr options can be optimized a bit? thanks, david PS. yes, I did try the userspace pit and its performance is worse than the in-kernel PIT. David S. Ahern wrote: > I noticed that gettimeofday in RHEL4.6 guests is taking much longer than > with RHEL3.8 guests. I wrote a simple program (see below) to call > gettimeofday in a loop 1,000,000 times and then used time to measure how > long it took. > > > For the RHEL3.8 guest: > time -p ./timeofday_bench > real 0.99 > user 0.12 > sys 0.24 > > For the RHEL4.6 guest with the default clock source (pmtmr): > time -p ./timeofday_bench > real 15.65 > user 0.18 > sys 15.46 > > and RHEL4.6 guest with PIT as the clock source (clock=pit kernel parameter): > time -p ./timeofday_bench > real 13.67 > user 0.21 > sys 13.45 > > So, basically gettimeofday() takes about 50 times as long on a RHEL4 guest. > > Host is a DL380G5, 2 dual-core Xeon 5140 processors, 4 GB of RAM. It's > running kvm.git tree as of 11/18/08 with kvm-75 userspace. Guest in both > RHEL3 and RHEL4 cases has 4 vcpus, 3.5GB of RAM. > > david > > ---------- > > timeofday_bench.c: > > #include <sys/time.h> > #include <stdio.h> > #include <stdlib.h> > > int main(int argc, char *argv[]) > { > int rc = 0, n; > struct timeval tv; > int iter = 1000000; /* number of times to call gettimeofday */ > > if (argc > 1) > iter = atoi(argv[1]); > > if (iter == 0) { > fprintf(stderr, "invalid number of iterations\n"); > return 1; > } > > printf("starting.... "); > for (n = 0; n < iter; ++n) { > if (gettimeofday(&tv, NULL) != 0) { > fprintf(stderr, "\ngettimeofday failed\n"); > rc = 1; > break; > } > } > > if (!rc) > printf("done\n"); > > return rc; > } > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 4:41 ` David S. Ahern @ 2008-11-25 10:14 ` Andi Kleen 2008-11-25 11:17 ` Alexander Graf 2008-11-25 17:20 ` Hollis Blanchard 1 sibling, 1 reply; 15+ messages in thread From: Andi Kleen @ 2008-11-25 10:14 UTC (permalink / raw) To: David S. Ahern Cc: kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity "David S. Ahern" <daahern@cisco.com> writes: > > Any chance the pit or pmtmr options can be optimized a bit? They both will require vmexits and never be really fast. Same for HPET. If you want fast gtod you really need to make TSC work. -Andi -- ak@linux.intel.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 10:14 ` Andi Kleen @ 2008-11-25 11:17 ` Alexander Graf 2008-11-25 11:48 ` Andi Kleen 0 siblings, 1 reply; 15+ messages in thread From: Alexander Graf @ 2008-11-25 11:17 UTC (permalink / raw) To: Andi Kleen Cc: David S. Ahern, kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity On 25.11.2008, at 11:14, Andi Kleen <andi@firstfloor.org> wrote: > "David S. Ahern" <daahern@cisco.com> writes: >> >> Any chance the pit or pmtmr options can be optimized a bit? > > They both will require vmexits and never be really fast. Same > for HPET. If you want fast gtod you really need to make TSC work. Why does hpet need to be slow? Can't you just 1:1 pass through one of the hpet timers if you only have a limited amount of vms? If done cleverly this might even work if #hpet > #cpu. Alex > > > -Andi > > -- > ak@linux.intel.com > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 11:17 ` Alexander Graf @ 2008-11-25 11:48 ` Andi Kleen 2008-11-25 12:13 ` Alexander Graf 0 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2008-11-25 11:48 UTC (permalink / raw) To: Alexander Graf Cc: Andi Kleen, David S. Ahern, kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity > Why does hpet need to be slow? Can't you just 1:1 pass through one of > the hpet timers if you only have a limited amount of vms? HPET is not a truly virtualizable device, it's all the counters in one block that cannot be really mapped to different people. Also most systems have very little counters and Linux typically needs two at least (system timer and /dev/hpet) > If done cleverly this might even work if #hpet > #cpu. Sure with a device model, but that needs vmexits. -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 11:48 ` Andi Kleen @ 2008-11-25 12:13 ` Alexander Graf 2008-11-25 12:52 ` Andi Kleen 0 siblings, 1 reply; 15+ messages in thread From: Alexander Graf @ 2008-11-25 12:13 UTC (permalink / raw) To: Andi Kleen Cc: David S. Ahern, kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity On 25.11.2008, at 12:48, Andi Kleen wrote: >> Why does hpet need to be slow? Can't you just 1:1 pass through one of >> the hpet timers if you only have a limited amount of vms? > > HPET is not a truly virtualizable device, it's all the counters > in one block that cannot be really mapped to different people. Right, you'd have to remap stuff on non-page boundaries which is probably pretty hard to do. Otherwise you wouldn't gain anything, since you'd still have exits to reprogram the hpet. > Also most systems have very little counters and Linux typically > needs two > at least (system timer and /dev/hpet) Well, IIRC 3 is a pretty normal amount and blocking /dev/hpet if the hpet is in use shouldn't be a problem. But yeah - the remapping of HPET timers to virtual HPET timers sounds pretty tough. I wonder if one could overcome that with a little hardware support though ... Alex ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 12:13 ` Alexander Graf @ 2008-11-25 12:52 ` Andi Kleen 2008-12-28 18:38 ` Marcelo Tosatti 0 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2008-11-25 12:52 UTC (permalink / raw) To: Alexander Graf Cc: Andi Kleen, David S. Ahern, kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity > But yeah - the remapping of HPET timers to virtual HPET timers sounds > pretty tough. I wonder if one could overcome that with a little > hardware support though ... For gettimeofday better make TSC work. Even in the best case (no virtualization) it is much faster than HPET because it sits in the CPU, while HPET is far away on the external south bridge. For other HPET usages (interval timer etc.) which are less performance critical I suppose vmexits are not a serious problem so a standard software device model should work. -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 12:52 ` Andi Kleen @ 2008-12-28 18:38 ` Marcelo Tosatti 2008-12-29 12:37 ` Yang, Sheng 2008-12-29 13:11 ` Avi Kivity 0 siblings, 2 replies; 15+ messages in thread From: Marcelo Tosatti @ 2008-12-28 18:38 UTC (permalink / raw) To: Yang, Sheng Cc: Alexander Graf, David S. Ahern, kvm-devel, Glauber de Oliveira Costa, Avi Kivity, Gleb Natapov, Dor Laor On Tue, Nov 25, 2008 at 01:52:59PM +0100, Andi Kleen wrote: > > But yeah - the remapping of HPET timers to virtual HPET timers sounds > > pretty tough. I wonder if one could overcome that with a little > > hardware support though ... > > For gettimeofday better make TSC work. Even in the best case (no > virtualization) it is much faster than HPET because it sits in the CPU, > while HPET is far away on the external south bridge. The tsc clock on older Linux 2.6 kernels compensates for lost ticks. The algorithm uses the PIT count (latched) to measure the delay between interrupt generation and handling, and sums that value, on the next interrupt, to the TSC delta. Sheng investigated this problem in the discussions before in-kernel PIT was merged: http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13873.html The algorithm overcompensates for lost ticks and the guest time runs faster than the hosts. There are two issues: 1) A bug in the in-kernel PIT which miscalculates the count value. 2) For the case where more than one interrupt is lost, and later reinjected, the value read from PIT count is meaningless for the purpose of the tsc algorithm. The count is interpreted as the delay until the next interrupt, which is not the case with reinjection. As Sheng mentioned in the thread above, Xen pulls back the TSC value when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC", which I believe is similar in this context. For KVM I believe the best immediate solution (for now) is to provide an option to disable reinjection, behaving similarly to real hardware. The advantage is simplicity compared to virtualizing the time sources. The QEMU PIT emulation has a limit on the rate of interrupt reinjection, perhaps something similar should be investigated in the future. The following patch (which contains the bugfix for 1) and disabled reinjection) fixes the severe time drift on RHEL4 with "clock=tsc". What I'm proposing is to condition reinjection with an option (-kvm-pit-no-reinject or something). Comments or better ideas? diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index e665d1c..608af7b 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps) if (!atomic_inc_and_test(&pt->pending)) set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests); + if (atomic_read(&pt->pending) > 1) + atomic_set(&pt->pending, 1); + if (vcpu0 && waitqueue_active(&vcpu0->wq)) wake_up_interruptible(&vcpu0->wq); hrtimer_add_expires_ns(&pt->timer, pt->period); pt->scheduled = hrtimer_get_expires_ns(&pt->timer); if (pt->period) - ps->channels[0].count_load_time = hrtimer_get_expires(&pt->timer); + ps->channels[0].count_load_time = ktime_get(); return (pt->period == 0 ? 0 : 1); } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-12-28 18:38 ` Marcelo Tosatti @ 2008-12-29 12:37 ` Yang, Sheng 2008-12-29 13:11 ` Avi Kivity 1 sibling, 0 replies; 15+ messages in thread From: Yang, Sheng @ 2008-12-29 12:37 UTC (permalink / raw) To: Marcelo Tosatti Cc: Alexander Graf, David S. Ahern, kvm-devel, Glauber de Oliveira Costa, Avi Kivity, Gleb Natapov, Dor Laor On Monday 29 December 2008 02:38:07 Marcelo Tosatti wrote: > On Tue, Nov 25, 2008 at 01:52:59PM +0100, Andi Kleen wrote: > > > But yeah - the remapping of HPET timers to virtual HPET timers sounds > > > pretty tough. I wonder if one could overcome that with a little > > > hardware support though ... > > > > For gettimeofday better make TSC work. Even in the best case (no > > virtualization) it is much faster than HPET because it sits in the CPU, > > while HPET is far away on the external south bridge. > > The tsc clock on older Linux 2.6 kernels compensates for lost ticks. > The algorithm uses the PIT count (latched) to measure the delay between > interrupt generation and handling, and sums that value, on the next > interrupt, to the TSC delta. > > Sheng investigated this problem in the discussions before in-kernel PIT > was merged: > > http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13873.html > > The algorithm overcompensates for lost ticks and the guest time runs > faster than the hosts. > > There are two issues: > > 1) A bug in the in-kernel PIT which miscalculates the count value. > > 2) For the case where more than one interrupt is lost, and later > reinjected, the value read from PIT count is meaningless for the purpose > of the tsc algorithm. The count is interpreted as the delay until the > next interrupt, which is not the case with reinjection. > > As Sheng mentioned in the thread above, Xen pulls back the TSC value > when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC", > which I believe is similar in this context. > > For KVM I believe the best immediate solution (for now) is to provide an > option to disable reinjection, behaving similarly to real hardware. The > advantage is simplicity compared to virtualizing the time sources. > > The QEMU PIT emulation has a limit on the rate of interrupt reinjection, > perhaps something similar should be investigated in the future. > > The following patch (which contains the bugfix for 1) and disabled > reinjection) fixes the severe time drift on RHEL4 with "clock=tsc". > What I'm proposing is to condition reinjection with an option > (-kvm-pit-no-reinject or something). I agree that it should go with a user space option to disable rejection, as it's hard to overcome the problem that we delayed interrupt injection... -- regards Yang, Sheng > Comments or better ideas? > > > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c > index e665d1c..608af7b 100644 > --- a/arch/x86/kvm/i8254.c > +++ b/arch/x86/kvm/i8254.c > @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps) > if (!atomic_inc_and_test(&pt->pending)) > set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests); > > + if (atomic_read(&pt->pending) > 1) > + atomic_set(&pt->pending, 1); > + > if (vcpu0 && waitqueue_active(&vcpu0->wq)) > wake_up_interruptible(&vcpu0->wq); > > hrtimer_add_expires_ns(&pt->timer, pt->period); > pt->scheduled = hrtimer_get_expires_ns(&pt->timer); > if (pt->period) > - ps->channels[0].count_load_time = hrtimer_get_expires(&pt->timer); > + ps->channels[0].count_load_time = ktime_get(); > > return (pt->period == 0 ? 0 : 1); > } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-12-28 18:38 ` Marcelo Tosatti 2008-12-29 12:37 ` Yang, Sheng @ 2008-12-29 13:11 ` Avi Kivity 2008-12-29 16:12 ` Dor Laor 1 sibling, 1 reply; 15+ messages in thread From: Avi Kivity @ 2008-12-29 13:11 UTC (permalink / raw) To: Marcelo Tosatti Cc: Yang, Sheng, Alexander Graf, David S. Ahern, kvm-devel, Glauber de Oliveira Costa, Gleb Natapov, Dor Laor Marcelo Tosatti wrote: > The tsc clock on older Linux 2.6 kernels compensates for lost ticks. > The algorithm uses the PIT count (latched) to measure the delay between > interrupt generation and handling, and sums that value, on the next > interrupt, to the TSC delta. > > Sheng investigated this problem in the discussions before in-kernel PIT > was merged: > > http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13873.html > > The algorithm overcompensates for lost ticks and the guest time runs > faster than the hosts. > > There are two issues: > > 1) A bug in the in-kernel PIT which miscalculates the count value. > > 2) For the case where more than one interrupt is lost, and later > reinjected, the value read from PIT count is meaningless for the purpose > of the tsc algorithm. The count is interpreted as the delay until the > next interrupt, which is not the case with reinjection. > > As Sheng mentioned in the thread above, Xen pulls back the TSC value > when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC", > which I believe is similar in this context. > > For KVM I believe the best immediate solution (for now) is to provide an > option to disable reinjection, behaving similarly to real hardware. The > advantage is simplicity compared to virtualizing the time sources. > > The QEMU PIT emulation has a limit on the rate of interrupt reinjection, > perhaps something similar should be investigated in the future. > > The following patch (which contains the bugfix for 1) and disabled > reinjection) fixes the severe time drift on RHEL4 with "clock=tsc". > What I'm proposing is to condition reinjection with an option > (-kvm-pit-no-reinject or something). > > Comments or better ideas? > > > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c > index e665d1c..608af7b 100644 > --- a/arch/x86/kvm/i8254.c > +++ b/arch/x86/kvm/i8254.c > @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps) > if (!atomic_inc_and_test(&pt->pending)) > set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests); > > + if (atomic_read(&pt->pending) > 1) > + atomic_set(&pt->pending, 1); > + > Replace the atomic_inc() with atomic_set(, 1) instead? One less test, and more important, the logic is scattered less around the source. > if (vcpu0 && waitqueue_active(&vcpu0->wq)) > wake_up_interruptible(&vcpu0->wq); > > hrtimer_add_expires_ns(&pt->timer, pt->period); > pt->scheduled = hrtimer_get_expires_ns(&pt->timer); > if (pt->period) > - ps->channels[0].count_load_time = hrtimer_get_expires(&pt->timer); > + ps->channels[0].count_load_time = ktime_get(); > > return (pt->period == 0 ? 0 : 1); > } > I don't like the idea of punting to the user but looks like we don't have a choice. Hopefully vendors will port kvmclock to these kernels and release them as updates -- time simply doesn't work will with virtualization, especially Linux guests. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-12-29 13:11 ` Avi Kivity @ 2008-12-29 16:12 ` Dor Laor 2008-12-29 16:27 ` Avi Kivity 2008-12-29 16:29 ` Avi Kivity 0 siblings, 2 replies; 15+ messages in thread From: Dor Laor @ 2008-12-29 16:12 UTC (permalink / raw) To: Avi Kivity Cc: Marcelo Tosatti, Yang, Sheng, Alexander Graf, David S. Ahern, kvm-devel, Glauber de Oliveira Costa, Gleb Natapov Avi Kivity wrote: > Marcelo Tosatti wrote: >> The tsc clock on older Linux 2.6 kernels compensates for lost ticks. >> The algorithm uses the PIT count (latched) to measure the delay between >> interrupt generation and handling, and sums that value, on the next >> interrupt, to the TSC delta. >> >> Sheng investigated this problem in the discussions before in-kernel PIT >> was merged: >> >> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13873.html >> >> >> The algorithm overcompensates for lost ticks and the guest time runs >> faster than the hosts. >> >> There are two issues: >> >> 1) A bug in the in-kernel PIT which miscalculates the count value. >> >> 2) For the case where more than one interrupt is lost, and later >> reinjected, the value read from PIT count is meaningless for the purpose >> of the tsc algorithm. The count is interpreted as the delay until the >> next interrupt, which is not the case with reinjection. >> >> As Sheng mentioned in the thread above, Xen pulls back the TSC value >> when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC", >> which I believe is similar in this context. >> >> For KVM I believe the best immediate solution (for now) is to provide an >> option to disable reinjection, behaving similarly to real hardware. The >> advantage is simplicity compared to virtualizing the time sources. >> >> The QEMU PIT emulation has a limit on the rate of interrupt reinjection, >> perhaps something similar should be investigated in the future. >> >> The following patch (which contains the bugfix for 1) and disabled >> reinjection) fixes the severe time drift on RHEL4 with "clock=tsc". >> What I'm proposing is to condition reinjection with an option >> (-kvm-pit-no-reinject or something). >> >> Comments or better ideas? >> >> >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c >> index e665d1c..608af7b 100644 >> --- a/arch/x86/kvm/i8254.c >> +++ b/arch/x86/kvm/i8254.c >> @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state >> *ps) >> if (!atomic_inc_and_test(&pt->pending)) >> set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests); >> >> + if (atomic_read(&pt->pending) > 1) >> + atomic_set(&pt->pending, 1); >> + >> > > Replace the atomic_inc() with atomic_set(, 1) instead? One less test, > and more important, the logic is scattered less around the source. But having only a pending bit instead of a counter will cause kvm to drop pit irqs on rare high load situations. The disable reinjection option is better. > >> if (vcpu0 && waitqueue_active(&vcpu0->wq)) >> wake_up_interruptible(&vcpu0->wq); >> >> hrtimer_add_expires_ns(&pt->timer, pt->period); >> pt->scheduled = hrtimer_get_expires_ns(&pt->timer); >> if (pt->period) >> - ps->channels[0].count_load_time = >> hrtimer_get_expires(&pt->timer); >> + ps->channels[0].count_load_time = ktime_get(); >> >> return (pt->period == 0 ? 0 : 1); >> } >> > > I don't like the idea of punting to the user but looks like we don't > have a choice. Hopefully vendors will port kvmclock to these kernels > and release them as updates -- time simply doesn't work will with > virtualization, especially Linux guests. > Except for these 'tsc compensate' guest, what are the occasions where the guest writes his tsc? If this is the only case we can disable reinjection once we trap tsc writes. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-12-29 16:12 ` Dor Laor @ 2008-12-29 16:27 ` Avi Kivity 2008-12-29 16:29 ` Avi Kivity 1 sibling, 0 replies; 15+ messages in thread From: Avi Kivity @ 2008-12-29 16:27 UTC (permalink / raw) To: dlaor Cc: Marcelo Tosatti, Yang, Sheng, Alexander Graf, David S. Ahern, kvm-devel, Glauber de Oliveira Costa, Gleb Natapov Dor Laor wrote: >>> >>> + if (atomic_read(&pt->pending) > 1) >>> + atomic_set(&pt->pending, 1); >>> + >>> >> >> Replace the atomic_inc() with atomic_set(, 1) instead? One less test, >> and more important, the logic is scattered less around the source. > But having only a pending bit instead of a counter will cause kvm to > drop pit irqs on rare high load situations. > The disable reinjection option is better. Both variants disable reinjection. Forcing a counter to 1 every time it exceeds 1 is equivalent to maintaining a bit. In both variants, there is a missing 'if (disable_reinjection)' (Marcelo mentioned this in the original message). > Except for these 'tsc compensate' guest, what are the occasions where > the guest writes his tsc? > If this is the only case we can disable reinjection once we trap tsc > writes. I don't think these guests write to the tsc. Rather, they read the tsc and the pit counters and try to correlate. And fail. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-12-29 16:12 ` Dor Laor 2008-12-29 16:27 ` Avi Kivity @ 2008-12-29 16:29 ` Avi Kivity 1 sibling, 0 replies; 15+ messages in thread From: Avi Kivity @ 2008-12-29 16:29 UTC (permalink / raw) To: dlaor Cc: Marcelo Tosatti, Yang, Sheng, Alexander Graf, David S. Ahern, kvm-devel, Glauber de Oliveira Costa, Gleb Natapov Dor Laor wrote: >>> >>> + if (atomic_read(&pt->pending) > 1) >>> + atomic_set(&pt->pending, 1); >>> + >>> >> >> Replace the atomic_inc() with atomic_set(, 1) instead? One less test, >> and more important, the logic is scattered less around the source. > But having only a pending bit instead of a counter will cause kvm to > drop pit irqs on rare high load situations. > The disable reinjection option is better. Both variants disable reinjection. Forcing a counter to 1 every time it exceeds 1 is equivalent to maintaining a bit. In both variants, there is a missing 'if (disable_reinjection)' (Marcelo mentioned this in the original message). > Except for these 'tsc compensate' guest, what are the occasions where > the guest writes his tsc? > If this is the only case we can disable reinjection once we trap tsc > writes. I don't think these guests write to the tsc. Rather, they read the tsc and the pit counters and try to correlate. And fail. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 4:41 ` David S. Ahern 2008-11-25 10:14 ` Andi Kleen @ 2008-11-25 17:20 ` Hollis Blanchard 2008-11-25 19:09 ` David S. Ahern 1 sibling, 1 reply; 15+ messages in thread From: Hollis Blanchard @ 2008-11-25 17:20 UTC (permalink / raw) To: David S. Ahern Cc: kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity On Mon, 2008-11-24 at 21:41 -0700, David S. Ahern wrote: > > RHEL3 (which is based on the 2.4.21 kernel) gets microsecond > resolutions > by reading the TSC. Reading the TSC from within a guest is very fast > on kvm. > > RHEL4 (which is basd on the 2.6.9 kernel) allows multiple time > sources: > pmtmr (ACPI power management timer which is the default), pit, hpet > and TSC. > > The pmtmr and pit both do ioport reads to get microsecond resolutions > (see read_pmtmr and get_offset_pit, respectively). For the tsc as the > timer source gettimeofday is *very* lightweight, but time drifts very > badly and ntpd cannot acquire a sync. Why aren't you seeing severe time drift when using RHEL3 guests with the TSC time source? -- Hollis Blanchard IBM Linux Technology Center ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: gettimeofday "slow" in RHEL4 guests 2008-11-25 17:20 ` Hollis Blanchard @ 2008-11-25 19:09 ` David S. Ahern 0 siblings, 0 replies; 15+ messages in thread From: David S. Ahern @ 2008-11-25 19:09 UTC (permalink / raw) To: Hollis Blanchard Cc: kvm-devel, Marcelo Tosatti, Glauber de Oliveira Costa, Avi Kivity Hollis Blanchard wrote: > On Mon, 2008-11-24 at 21:41 -0700, David S. Ahern wrote: >> RHEL3 (which is based on the 2.4.21 kernel) gets microsecond >> resolutions >> by reading the TSC. Reading the TSC from within a guest is very fast >> on kvm. >> >> RHEL4 (which is basd on the 2.6.9 kernel) allows multiple time >> sources: >> pmtmr (ACPI power management timer which is the default), pit, hpet >> and TSC. >> >> The pmtmr and pit both do ioport reads to get microsecond resolutions >> (see read_pmtmr and get_offset_pit, respectively). For the tsc as the >> timer source gettimeofday is *very* lightweight, but time drifts very >> badly and ntpd cannot acquire a sync. > > Why aren't you seeing severe time drift when using RHEL3 guests with the > TSC time source? > With RHEL3 it's a PIT time source, and the PIT counter is only read on interrupts. For gettimeofday requests only the tsc is read; the algorithm for microsecond resolution uses the pit count and its tsc timestamp from the last interrupt. In RHEL4, the PIT counter is read for each gettimeofday request when it is the timer source. That's the cause of the extra overhead, and consequently, worse performance. david ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-12-29 16:29 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-24 17:47 gettimeofday "slow" in RHEL4 guests David S. Ahern 2008-11-25 4:41 ` David S. Ahern 2008-11-25 10:14 ` Andi Kleen 2008-11-25 11:17 ` Alexander Graf 2008-11-25 11:48 ` Andi Kleen 2008-11-25 12:13 ` Alexander Graf 2008-11-25 12:52 ` Andi Kleen 2008-12-28 18:38 ` Marcelo Tosatti 2008-12-29 12:37 ` Yang, Sheng 2008-12-29 13:11 ` Avi Kivity 2008-12-29 16:12 ` Dor Laor 2008-12-29 16:27 ` Avi Kivity 2008-12-29 16:29 ` Avi Kivity 2008-11-25 17:20 ` Hollis Blanchard 2008-11-25 19:09 ` David S. Ahern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).