Marcelo Tosatti wrote: > Hi David, > > On Fri, Jul 11, 2008 at 03:18:54PM -0600, David S. Ahern wrote: >> What's the status with this for full virt guests? > > The consensus seems to be that fullvirt guests need assistance from the > management app (libvirt) to have boosted priority during their boot > stage, so loops_per_jiffy calibration can be performed safely. As Daniel > pointed out this is tricky because you can't know for sure how long the > boot up will take, if for example PXE is used. I boosted the priority of the guest to investigate an idea that maybe some startup calibration in the guest was off slightly leading to systematic drifting. I was on vacation last week and I am still catching up with traffic on this list. I just happened to see your first message with the panic which aligned with one of my tests. > > Glauber is working on some paravirt patches to remedy the situation. > > But loops_per_jiffy is not directly related to clock drifts, so this > is a separate problem. > >> I am still seeing systematic time drifts in RHEL 3 and RHEL4 guests >> which I've been digging into it the past few days. > > All time drift issues we were aware of are fixed in kvm-70. Can you > please provide more details on how you see the time drifting with > RHEL3/4 guests? It slowly but continually drifts or there are large > drifts at once? Are they using TSC or ACPIPM as clocksource? The attached file shows one example of the drift I am seeing. It's for a 4-way RHEL3 guest started with 'nice -20'. After startup each vcpu was pinned to a physical cpu using taskset. The only activity on the host is this one single guest; the guest is relatively idle -- about 4% activity (~1% user, ~3% system time). Host is synchronized to an ntp server; the guest is not. The guest is started with the -localtime parameter. From the file you can see the guest gains about 1-2 seconds every 5 minutes. Since it's a RHEL3 guest I believe the PIT is the only choice (how to confirm?), though it does read the TSC (ie., use_tsc is 1). > > Also, most issues we've seen could only be replicated with dyntick > guests. > > I'll try to reproduce it locally. > >> In the course of it I have been launching guests with boosted priority >> (both nice -20 and realtime priority (RR 1)) on a nearly 100% idle >> host. > > Can you also see wacked bogomips without boosting the guest priority? The wacked bogomips only shows up when started with real-time priority. With the 'nice -20' it's sane and close to what the host shows. As another data point I restarted the RHEL3 guest using the -no-kvm-pit and -tdf options (nice -20 priority boost). After 22 hours of uptime, the guest is 29 seconds *behind* the host. Using the in-kernel pit the guest time is always fast compared to the host. I've seen similar drifting in RHEL4 guests, but I have not spent as much time investigating it yet. On ESX adding clock=pit to the boot parameters for RHEL4 guests helps immensely. david