From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1750927AbdAPSBw (ORCPT <rfc822;w@1wt.eu>);
        Mon, 16 Jan 2017 13:01:52 -0500
Received: from mx1.redhat.com ([209.132.183.28]:50744 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1750774AbdAPSBv (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 16 Jan 2017 13:01:51 -0500
Date: Mon, 16 Jan 2017 19:01:48 +0100
From: Radim Krcmar <rkrcmar@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Bonzini <pbonzini@redhat.com>,
        Richard Cochran <richardcochran@gmail.com>,
        Miroslav Lichvar <mlichvar@redhat.com>
Subject: Re: [patch 3/3] PTP: add kvm PTP driver
Message-ID: <20170116180147.GD31452@potion>
References: <20170113120131.086634482@redhat.com>
 <20170113120319.777765254@redhat.com>
 <20170113155657.GD22440@potion>
 <20170113174014.GA9310@amt.cnet>
 <20170116162653.GA32097@potion>
 <20170116165411.GA2386@potion>
 <20170116170827.GB2501@amt.cnet>
 <20170116172758.GB31452@potion>
 <20170116173909.GA4639@amt.cnet>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170116173909.GA4639@amt.cnet>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 16 Jan 2017 18:01:51 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

2017-01-16 15:39-0200, Marcelo Tosatti:
> On Mon, Jan 16, 2017 at 06:27:58PM +0100, Radim Krcmar wrote:
>> 2017-01-16 15:08-0200, Marcelo Tosatti:
>> > On Mon, Jan 16, 2017 at 05:54:11PM +0100, Radim Krcmar wrote:
>> >> 2017-01-16 17:26+0100, Radim Krcmar:
>> >> > 2017-01-13 15:40-0200, Marcelo Tosatti:
>> >> >> On Fri, Jan 13, 2017 at 04:56:58PM +0100, Radim Krcmar wrote:
>> >> >> > 2017-01-13 10:01-0200, Marcelo Tosatti:
>> >> >>> > +		version = pvclock_read_begin(src);
>> >> >>> > +
>> >> >>> > +		ret = kvm_hypercall2(KVM_HC_CLOCK_OFFSET,
>> >> >>> > +				     clock_off_gpa,
>> >> >>> > +				     KVM_CLOCK_OFFSET_WALLCLOCK);
>> >> >>> > +		if (ret != 0) {
>> >> >>> > +			pr_err("clock offset hypercall ret %lu\n", ret);
>> >> >>> > +			spin_unlock(&kvm_ptp_lock);
>> >> >>> > +			preempt_enable_notrace();
>> >> >>> > +			return -EOPNOTSUPP;
>> >> >>> > +		}
>> >> >>> > +
>> >> >>> > +		tspec.tv_sec = clock_off.sec;
>> >> >>> > +		tspec.tv_nsec = clock_off.nsec;
>> >> >>> > +
>> >> >>> > +		delta = rdtsc_ordered() - clock_off.tsc;
>> >> >>> > +
>> >> >>> > +		offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
>> >> >>> > +					     src->tsc_shift);
>> >> >>> > +
>> >> >>> > +	} while (pvclock_read_retry(src, version));
>> >> >>> > +
>> >> >>> > +	preempt_enable_notrace();
>> >> >>> > +
>> >> >>> > +	tspec.tv_nsec = tspec.tv_nsec + offset;
>> >> >>> > +
>> >> >>> > +	spin_unlock(&kvm_ptp_lock);
>> >> >>> > +
>> >> >>> > +	if (tspec.tv_nsec >= NSEC_PER_SEC) {
>> >> >>> > +		u64 secs = tspec.tv_nsec;
>> >> >>> > +
>> >> >>> > +		tspec.tv_nsec = do_div(secs, NSEC_PER_SEC);
>> >> >>> > +		tspec.tv_sec += secs;
>> >> >>> > +	}
>> >> >>> > +
>> >> >>> > +	memcpy(ts, &tspec, sizeof(struct timespec64));
>> >> >>> 
>> >> >>> But the whole idea is of improving the time by reading tsc a bit later
>> >> >>> is just weird ... why is it better to provide
>> >> >>> 
>> >> >>>   tsc + x, time + tsc_delta_to_time(x)
>> >> >>> 
>> >> >>> than just
>> >> >>> 
>> >> >>>  tsc, time
>> >> >>> 
>> >> >>> ?
>> >> >> 
>> >> >> Because you want to calculate the value of the host realtime clock 
>> >> >> at the moment of ptp_kvm_gettime.
>> >> >> 
>> >> >> We do:
>> >> >> 
>> >> >> 	1. kvm_hypercall.
>> >> >> 	2. get {sec, nsec, guest_tsc}.
>> >> >> 	3. kvm_hypercall returns.
>> >> >> 	4. delay = rdtsc() - guest_tsc.
>> >> >> 
>> >> >> Where delay is the delta (measured with the TSC) between points 2 and 4.
>> >> > 
>> >> > I see now ... the PTP interface is just not good for our purposes.
>> >> 
>> >> There is getcrosststamp() callback in PTP, which seems to be exactly
>> >> what we want when pairing with TSC, so the pvclock delay fixup can be
>> >> dropped when using it.
>> > 
>> > What pvclock delay fixup you refer to? The "rdtsc() - clock_offset.tsc"
>> > part?
>> 
>> Yes.
>> 
>> >       You can't drop it, because if you do then your "host realtime
>> > clock read" will be behind by "rdtsc() - clock_offset.tsc" TSC cycles.
>> 
>> The TSC read will be some cycles old when the hypercall ends, but that
>> doesn't matter, because we will pass {sec, nsec, guest_tsc} to PTP and
>> PTP should plug them into kernel's realtime clock roughly like this:
>> 
>>   sec/nsec + (rdtsc() - guest_tsc) * tsc_freq
>> 
>> Adding delay to guest_tsc and sec/nsec cannot improve precision.
>> (And will likely degrade it as kvmclock's frequency is incorrect.)
>> 
>> > We want the highest precision as possible.
>> 
>> I agree, which is why we don't want to lose precision in the delay
>> guesswork because of gettime64().
> 
> Sorry the clock difference is 10ns now. So the guest clock is off by _10 ns_ 
> of the host clock.

That is pretty good.

> You are suggesting to use getcrosststamp instead, to drop the (rdtsc() -
> guest_tsc) part ?

Yes, it results in simpler code, doesn't create dependency on the
dreaded kvmclock, and is the best we can currently do wrt. precision.

Thanks.