All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yang, Sheng" <sheng.yang@intel.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Alexander Graf <agraf@suse.de>,
	"David S. Ahern" <daahern@cisco.com>,
	"kvm-devel" <kvm@vger.kernel.org>,
	Glauber de Oliveira Costa <gcosta@redhat.com>,
	Avi Kivity <avi@redhat.com>, Gleb Natapov <gleb@redhat.com>,
	Dor Laor <dor.laor@qumranet.com>
Subject: Re: gettimeofday "slow" in RHEL4 guests
Date: Mon, 29 Dec 2008 20:37:32 +0800	[thread overview]
Message-ID: <200812292037.33531.sheng.yang@intel.com> (raw)
In-Reply-To: <20081228183807.GA3883@amt.cnet>

On Monday 29 December 2008 02:38:07 Marcelo Tosatti wrote:
> On Tue, Nov 25, 2008 at 01:52:59PM +0100, Andi Kleen wrote:
> > > But yeah - the remapping of HPET timers to virtual HPET timers sounds
> > > pretty tough. I wonder if one could overcome that with a little
> > > hardware support though ...
> >
> > For gettimeofday better make TSC work. Even in the best case (no
> > virtualization) it is much faster than HPET because it sits in the CPU,
> > while HPET is far away on the external south bridge.
>
> The tsc clock on older Linux 2.6 kernels compensates for lost ticks.
> The algorithm uses the PIT count (latched) to measure the delay between
> interrupt generation and handling, and sums that value, on the next
> interrupt, to the TSC delta.
>
> Sheng investigated this problem in the discussions before in-kernel PIT
> was merged:
>
> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13873.html
>
> The algorithm overcompensates for lost ticks and the guest time runs
> faster than the hosts.
>
> There are two issues:
>
> 1) A bug in the in-kernel PIT which miscalculates the count value.
>
> 2) For the case where more than one interrupt is lost, and later
> reinjected, the value read from PIT count is meaningless for the purpose
> of the tsc algorithm. The count is interpreted as the delay until the
> next interrupt, which is not the case with reinjection.
>
> As Sheng mentioned in the thread above, Xen pulls back the TSC value
> when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC",
> which I believe is similar in this context.
>
> For KVM I believe the best immediate solution (for now) is to provide an
> option to disable reinjection, behaving similarly to real hardware. The
> advantage is simplicity compared to virtualizing the time sources.
>
> The QEMU PIT emulation has a limit on the rate of interrupt reinjection,
> perhaps something similar should be investigated in the future.
>
> The following patch (which contains the bugfix for 1) and disabled
> reinjection) fixes the severe time drift on RHEL4 with "clock=tsc".
> What I'm proposing is to condition reinjection with an option
> (-kvm-pit-no-reinject or something).

I agree that it should go with a user space option to disable rejection, as 
it's hard to overcome the problem that we delayed interrupt injection... 

-- 
regards
Yang, Sheng

> Comments or better ideas?
>
>
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index e665d1c..608af7b 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps)
>  	if (!atomic_inc_and_test(&pt->pending))
>  		set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests);
>
> +	if (atomic_read(&pt->pending) > 1)
> +		atomic_set(&pt->pending, 1);
> +
>  	if (vcpu0 && waitqueue_active(&vcpu0->wq))
>  		wake_up_interruptible(&vcpu0->wq);
>
>  	hrtimer_add_expires_ns(&pt->timer, pt->period);
>  	pt->scheduled = hrtimer_get_expires_ns(&pt->timer);
>  	if (pt->period)
> -		ps->channels[0].count_load_time = hrtimer_get_expires(&pt->timer);
> +		ps->channels[0].count_load_time = ktime_get();
>
>  	return (pt->period == 0 ? 0 : 1);
>  }


  reply	other threads:[~2008-12-29 12:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-24 17:47 gettimeofday "slow" in RHEL4 guests David S. Ahern
2008-11-25  4:41 ` David S. Ahern
2008-11-25 10:14   ` Andi Kleen
2008-11-25 11:17     ` Alexander Graf
2008-11-25 11:48       ` Andi Kleen
2008-11-25 12:13         ` Alexander Graf
2008-11-25 12:52           ` Andi Kleen
2008-12-28 18:38             ` Marcelo Tosatti
2008-12-29 12:37               ` Yang, Sheng [this message]
2008-12-29 13:11               ` Avi Kivity
2008-12-29 16:12                 ` Dor Laor
2008-12-29 16:27                   ` Avi Kivity
2008-12-29 16:29                   ` Avi Kivity
2008-11-25 17:20   ` Hollis Blanchard
2008-11-25 19:09     ` David S. Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812292037.33531.sheng.yang@intel.com \
    --to=sheng.yang@intel.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=daahern@cisco.com \
    --cc=dor.laor@qumranet.com \
    --cc=gcosta@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.