From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Kxjgv-0002mH-To
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 09:49:06 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Kxjgu-0002lT-KM
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 09:49:05 -0500
Received: from [199.232.76.173] (port=55074 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Kxjgu-0002l8-73
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 09:49:04 -0500
Received: from mail2.shareable.org ([80.68.89.115]:48082)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1Kxjgt-0004fK-MQ
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 09:49:03 -0500
Date: Wed, 5 Nov 2008 14:48:55 +0000
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] [PATCH 1/2] Add HPET emulation to qemu (v3)
Message-ID: <20081105144855.GH13630@shareable.org>
References: <1224245854.3399.7.camel@beth-laptop>
	<20081017154932.GA14229@shareable.org>
	<1224529724.3399.27.camel@beth-laptop>
	<49059CA9.30007@il.qumranet.com>
	<1225389430.2933.2.camel@beth-laptop> <491197BB.6080502@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <491197BB.6080502@redhat.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Dor Laor <dlaor@redhat.com>
Cc: Beth Kon <eak@us.ibm.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org, Alexander Graf <agraf@suse.de>

Dor Laor wrote:
> Right, I think if this time drift fix approach is accepted, it should also be
> implemented for qemu_irq_pulse too.

I don't think simply injecting timer interrupts is the right approach.

I suspect doing that to compensate for lost ticks can _cause_ drift in
some guests.

Some guest kernels have lost-timer-interrupt detection.  For example,
by reading the local TSC, local APIC timer, PM timer, and/or HPET
counter, they can determine (on a real machine) when some timer
interrupts were missed, and compensate for it.

If a burst of timer interrupts are sent by QEMU to compensate for lost
ones due to host scheduling delays, on servicing the first of those
interrupts the guest may read a timer value which indicates time has
jumped forward, and the guest may adjust it's clock to compensate for
missing interrupts.  On servicing the remaining ones injected by QEMU,
the guest will read timer values which haven't increased by much, but
since _extra_ timer interrupts aren't expected on real hardware, the
guest may not implement reverse compensation.

The result will be QEMU sends a burst of timer interrupts, and the
guest clock moves _forward_ by a few ticks.

I think a better to handle this in QEMU is to still inject those
"lost" interrupts but also modify the values returned when the guest
reads timers, so they appear to increment by a normal amount between
each interrupt.

In other words, rather than counting ticks, modify the flow of virtual
time - stretch it when the virtual CPU is supposed to run but is
delayed, and compress it afterwards to resync with real time.

On some architectures, this may be even more important with "tickless"
guest kernels which are busily running high resolution timers with
different delays between each event.  Simply counting missed
interrupts and injecting a burst doesn't work if the guest depends on
reprogramming a timer chip to a different delay between each interrupt
event (like x86 PIT), but it's fine with timer chips (like HPET) which
have a free-running counter and alarm register.

That method should work with all guests, whatever timers they use and
whatever their interrupt handler does, and result in _zero_ long term
drift as long as the virtual CPU is not persistantly starved.  In the
case of timer chips like HPET, which have specified nominal frequency,
it may mean you don't have to run NTP on the guest at all.

-- Jamie