From mboxrd@z Thu Jan  1 00:00:00 1970
From: Radim Krcmar <rkrcmar@redhat.com>
Subject: Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer
 expiration
Date: Mon, 5 Jan 2015 19:12:36 +0100
Message-ID: <20150105181235.GA5462@potion.brq.redhat.com>
References: <20141223205841.410988818@redhat.com>
 <20141223210046.824105975@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: kvm@vger.kernel.org, Luiz Capitulino <lcapitulino@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:41977 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753212AbbAESMk (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 5 Jan 2015 13:12:40 -0500
Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t05ICdiT005572
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL)
	for <kvm@vger.kernel.org>; Mon, 5 Jan 2015 13:12:40 -0500
Content-Disposition: inline
In-Reply-To: <20141223210046.824105975@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

2014-12-23 15:58-0500, Marcelo Tosatti:
> For the hrtimer which emulates the tscdeadline timer in the guest,
> add an option to advance expiration, and busy spin on VM-entry waitin=
g
> for the actual expiration time to elapse.
>=20
> This allows achieving low latencies in cyclictest (or any scenario=20
> which requires strict timing regarding timer expiration).
>=20
> Reduces average cyclictest latency from 12us to 8us
> on Core i5 desktop.
>=20
> Note: this option requires tuning to find the appropriate value=20
> for a particular hardware/guest combination. One method is to measure=
 the=20
> average delay between apic_timer_fn and VM-entry.=20
> Another method is to start with 1000ns, and increase the value
> in say 500ns increments until avg cyclictest numbers stop decreasing.
>=20
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Reviewed-by: Radim Kr=C4=8Dm=C3=A1=C5=99 <rkrcmar@redhat.com>

(Other patches weren't touched, so my previous Reviewed-by holds.)

> +++ kvm/arch/x86/kvm/x86.c
> @@ -108,6 +108,10 @@ EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz)
>  static u32 tsc_tolerance_ppm =3D 250;
>  module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
> =20
> +/* lapic timer advance (tscdeadline mode only) in nanoseconds */
> +unsigned int lapic_timer_advance_ns =3D 0;
> +module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR);
> +
>  static bool backwards_tsc_observed =3D false;
> =20
>  #define KVM_NR_SHARED_MSRS 16
> @@ -5625,6 +5629,10 @@ static void kvm_timer_init(void)
>  	__register_hotcpu_notifier(&kvmclock_cpu_notifier_block);
>  	cpu_notifier_register_done();
> =20
> +	if (check_tsc_unstable() && lapic_timer_advance_ns) {
> +		pr_info("kvm: unstable TSC, disabling lapic_timer_advance_ns\n");
> +		lapic_timer_advance_ns =3D 0;

Does unstable TSC invalidate this feature?
(lapic_timer_advance_ns can be overridden, so we don't differentiate
 workflows that calibrate after starting with 0.)

And cover letter is a bit misleading:  The condition does nothing to
guarantee TSC based __delay() loop.  (Right now, __delay() =3D delay_ts=
c()
whenever the hardware has TSC, regardless of stability, thus always.)