All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
	kvm@vger.kernel.org, Andrey Smetanin <asmetanin@virtuozzo.com>,
	"Denis V. Lunev" <den@openvz.org>,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH kvm-unit-tests] KVM: x86: add hyperv clock test case
Date: Wed, 3 Feb 2016 17:37:08 +0100	[thread overview]
Message-ID: <56B22CB4.9090404@redhat.com> (raw)
In-Reply-To: <20160128162206.GA29344@rkaganb.sw.ru>



On 28/01/2016 17:22, Roman Kagan wrote:
> On Thu, Jan 28, 2016 at 03:04:58PM +0100, Paolo Bonzini wrote:
>> The test checks the relative precision of the reference TSC page
>> and the time reference counter.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> 	Andrey, the test has a relative error of approximately 3 parts
>> 	per million on my machine.  In other words it drifts by 3
>> 	microseconds every second, which I don't think is acceptable.
>> 	The problem is that virtual_tsc_khz is 2593993 while the actual
>> 	frequency is more like 2594001 kHz.
> 
> Hmm, how come?  I thought virtual_tsc_khz could only diverge from
> tsc_khz on migration.
> 
> [Or maybe I just misunderstand what you mean by "the actual frequency".]

Talking to Marcelo helped me understanding it. :)

The issue is that the TSC kHz is not the correct value to use for the  
the TSC page.  Instead, we want to pass to the guest a scale value 
corresponding to the kernel's timekeeping multiplier.  This is what 
both kvmclock and the time reference counter use.

The bit that I was missing last week was how the guest could perceive
updates to the TSC page parameters as atomic.  We actually _can_
emulate seqlock-like behavior in Hyper-V by writing 0 to seq during
an update.  Instead of looping like seqlocks do, the guest will simply
use the time reference counter and take a hypervisor exit.  The result
however is still valid, because we want the time reference counter to
be perfectly synchronized with the Hyper-V clock and lets you handle
any kvmclock update scenario safely.

Therefore the algorithm would be like patch 2/2 I sent, but with
important differences.  You'd still

	if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);

on MSR writes, but then hook into kvm_guest_time_update rather than
kvm_gen_update_masterclock, like:

	if (v == kvm_get_vcpu(v->kvm, 0))
	        kvm_write_hyperv_tsc_page(v->kvm, &vcpu->hv_clock);

and in kvm_write_hyperv_tsc_page the calculation would be based on
the kvmclock parameters:

        {
                write 0 to seq;
                if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT))
                        return;

                compute scale and offset from hv_clock mul and shift;
                write scale and offset;
		write sequence
        }

all of scale, offset and sequence can be computed from kvmclock parameters.

For sequence we have to convert "even values between 0 and 0xFFFFFFFE"
into "values between "1 and 0xFFFFFFFE".  For example:

	hyper-v sequence = (kvmclock sequence >> 1) + 1

will do it.  Computing the scale and offset is something like:

        // KVM formula:
	//    nsec = (ticks - tsc_timestamp) * tsc_to_system_mul << tsc_shift + system_time
	//
	// hyper-v formula:
	//    nsec/100 = ticks * scale / 2^32 + offset
	//
	// when tsc_timestamp, offset is zero so remove them both:
        //    ticks * tsc_to_system_mul << tsc_shift / 100 = ticks * scale / 2^32
	//
	// multiply both sides by 2^32 / ticks and you get scale:
	//    scale = tsc_to_system_mul << (32 + tsc_shift) / 100
	//
	// check if it would overflow, and then use the time ref counter
	//    tsc_to_system_mul << (32 + tsc_shift) / 100 >= 2^32
	//    tsc_to_system_mul << 32 >= 100 * 2^32 << -tsc_shift
	//    tsc_to_system_mul >= 100 << -tsc_shift

	if (shift < 0)
		rhs = 100 << -shift;
	else
		rhs = 100 >> shift;

	if (tsc_to_system_mul >= rhs)
		return;

        scale = mul_u64_u32_div(1ULL << (32 + tsc_shift), tsc_to_system_mul, 100);

        // now expand the kvmclock formula:
        //    nsec = ticks * tsc_to_system_mul << tsc_shift - (tsc_timestamp * tsc_to_system_mul << tsc_shift) + system_time 
	// divide by 100:
	//    nsec/100 = ticks * tsc_to_system_mul << tsc_shift /100 - (tsc_timestamp * tsc_to_system_mul << tsc_shift) /100 + system_time /100
	// replace tsc_to_system_mul << tsc_shift /100 by scale / 2^32:
        //    nsec/100 = ticks * scale / 2^32 - (tsc_timestamp * scale / 2^32) + system_time / 100
	// comparing with the Hyper-V formula:
	//    offset = system_time / 100 - (tsc_timestamp * scale / 2^32)

	offset = system_time;
	do_div(offset, 100);

	timestamp_offset = tsc_timestamp;
	do_div32_shl32(timestamp_offset, scale);
        offset = offset - timestamp_offset;

The remaining part is _further_ adjusting the offset to
Would you like to finish implementing this and test it with the unit test?
kvm/queue already has the patch to introduce do_div32_shl32.

Paolo

  reply	other threads:[~2016-02-03 16:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 14:04 [PATCH kvm-unit-tests] KVM: x86: add hyperv clock test case Paolo Bonzini
2016-01-28 14:04 ` Paolo Bonzini
2016-01-28 14:25 ` Andrey Smetanin
2016-01-28 14:50   ` Paolo Bonzini
2016-01-28 15:53     ` Paolo Bonzini
2016-01-28 18:45       ` Roman Kagan
2016-01-28 18:53     ` Roman Kagan
2016-01-28 21:28       ` Paolo Bonzini
2016-01-28 16:22 ` Roman Kagan
2016-02-03 16:37   ` Paolo Bonzini [this message]
2016-02-04  9:33     ` Roman Kagan
2016-02-04 10:13       ` Paolo Bonzini
2016-02-04 11:12         ` Roman Kagan
2016-04-21 17:01     ` Roman Kagan
2016-04-22 13:32       ` Roman Kagan
2016-04-22 18:08         ` Paolo Bonzini
2016-04-25  8:47           ` Roman Kagan
2016-04-26 10:34             ` Roman Kagan
2016-05-25 18:33               ` Roman Kagan
2016-05-26 14:47                 ` Roman Kagan
2016-05-29 22:34                 ` Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2016-02-08 15:18 [PATCH 0/4] kvmclock: improve accuracy Paolo Bonzini
2016-02-08 15:18 ` [PATCH 1/4] KVM: x86: rename argument to kvm_set_tsc_khz Paolo Bonzini
2016-02-11 15:01   ` Marcelo Tosatti
2016-02-08 15:18 ` [PATCH 2/4] KVM: x86: rewrite handling of scaled TSC for kvmclock Paolo Bonzini
2016-02-11 15:23   ` Marcelo Tosatti
2016-02-08 15:18 ` [PATCH 3/4] KVM: x86: pass kvm_get_time_scale arguments in hertz Paolo Bonzini
2016-02-08 15:18 ` [PATCH 4/4] KVM: x86: track actual TSC frequency from the timekeeper struct Paolo Bonzini
2016-02-09 18:41   ` Owen Hofmann
2016-02-10 13:57     ` Paolo Bonzini
2016-02-16 13:48   ` Marcelo Tosatti
2016-02-16 14:25     ` Marcelo Tosatti
2016-02-16 16:59       ` Paolo Bonzini
2016-02-19 14:12         ` Marcelo Tosatti
2016-02-19 15:53           ` Paolo Bonzini
2016-02-16 14:00 ` [PATCH 0/4] kvmclock: improve accuracy Marcelo Tosatti
2016-08-31 14:13 [PATCH kvm-unit-tests] KVM: x86: add hyperv clock test case Roman Kagan
2016-09-01 16:07 ` Paolo Bonzini
2016-09-01 16:07 Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B22CB4.9090404@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=asmetanin@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.