All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Joao Martins <joao.m.martins@oracle.com>, xen-devel@lists.xen.org
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
	Keir Fraser <keir@xen.org>, Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH RFC 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support
Date: Tue, 29 Dec 2015 14:58:51 +0000	[thread overview]
Message-ID: <56829FAB.1060008@citrix.com> (raw)
In-Reply-To: <1451321985-13728-1-git-send-email-joao.m.martins@oracle.com>

On 28/12/2015 16:59, Joao Martins wrote:
> Hey!
>
> I've been working on pvclock vdso support on Linux guests, and came
> across Xen lacking support for PVCLOCK_TSC_STABLE_BIT flag which is
> required for vdso as of the latest pvclock ABI shared with KVM.

, and originally borrowed from Xen.

Please be aware that all of this was originally the Xen ABI (c/s 
1271b793, 2005) and was added to Linux (c/s 7af192c, 2008) for join use 
with KVM.  In particular, Linux c/s 424c32f1a (which introduces 'flags') 
and every subsequent change in pvclock-abi.h violates the comment at the 
top, reminding people that the hypervisors must be kept in sync.

By the looks of things, the structures are still compatible, and having 
the two in sync is in everyones best interest. The first steps here need 
to be Linux upstreaming their local modifications, and further efforts 
made to ensuring that ABI changes don't go unnoticed as far as Xen is 
concerned (entry in the maintainers file with xen-devel listed?)

> In addition, I've found some problems which aren't necessarily visible
> to kernel as the pvclock algorithm in linux keeps the highest pvti
> time read among all cpus. But as is, a process using vdso gettimeofday
> observes a significant amount of warps (i.e. time going backwards) and
> it could be due to 1) time calibration skew in xen rendezvous
> algorithm, 2) xen clocksource not in sync with TSC and 3) in
> situations when guests unaware of VCPUS moving to a different PCPU.
> The warps are seen more frequently on PV guests (potentially because
> vcpu time infos are only updated when guest is in kernel mode, and
> perhaps lack of tsc_offset?), and in rare ocasions on HVM guests. And
> it is worth noting that with guests VCPUs pinned, only PV guests see
> these warps. But on HVM guests specifically: such warps only occur
> when one of guest VCPUs is pinned to CPU0.

These are all concerning findings (especially the pinning on cpu0). 
Which version of Xen have you been developing on?

Haozhong Zhang (CC'd) found and fixed several timing related bugs as 
part of his VMX TSC Scaling support series (Message root at 
<1449435529-12989-1-git-send-email-haozhong.zhang@intel.com>). I would 
be surprised if your identified bugs and his identified bugs didn't at 
least partially overlap.  (Note that most of the series has yet to be 
applied).

As to the specific options you identify, the time calibration rendezvous 
is (or should be) a nop on modern boxes with constant/invarient/etc 
TSCs.  I have a low priority TODO item to investigating conditionally 
disabling it, as it is a scalability concern on larger boxes.  Option 2 
seems more likely.

As for option 3, on a modern system it shouldn't make any difference.  
On an older system, it can presumably only be fixed by a guest 
performing its rdtsc between the two version checks, to ensure that it 
sees a consistent timestamp and scale, along with the hypervisor bumping 
version on deschedule, and against on schedule. However, this might be 
contrary to proposed plan to have one global pv wallclock.

>
> PVCLOCK_TSC_STABLE_BIT is the flag telling the guest that the
> vcpu_time_info (pvti) are monotonic as seen by any CPU,
> and supporting it could help fixing the issue mentioned above.

Surely fixing some of these bugs are prerequisites to supporting 
TSC_STABLE_BIT ?  Either way, we should see about doing both.

>   This
> series aims to propose a solution to that and it's divided as
> following:
>
> 	* Patch 1: Adds the missing flag field to vcpu_time_info.
> 	* Patch 2: Adds a new clocksource based on TSC
> 	* Patch 3, 4: Cleanups for patch 5
> 	* Patch 5: Implements the PVCLOCK_TSC_STABLE_BIT.
> 	* Patch 6: Same as 5 before but for other platform timers
>
> I have some doubts on the correctness of Patch 6 but was the only
> solution I found for supporting PVCLOCK_TSC_STABLE_BIT when using
> other clocksources (i.e. != tsc). The test was running time-warp-test,
> that constantly calls clock_gettime/gettimeofday on every CPU. It
> measures a delta with the previous returned value and mark a warp if
> it's negative. I measured it during periods of 1h and 6h and check how
> many warps and their values (alongside the amount of time skew).
> Measurements are included in individual patches.
>
> Note that most of the testing has been done with Linux 4.4 in which
> these warps/skew could be easily observed as vdso would use each vCPU
> pvti. Though Linux 4.5 will change this behaviour and use only the
> vCPU0 pvti though still requiring PVCLOCK_TSC_STABLE_BIT usage.
>
> I've been using it for a while in the past weeks with no issues so far
> though still requires testing on bad TSC machines. But I would like to
> get folks comments/suggestions if this is the correct approach in
> implementing this flag.

If you have some test instructions, I can probably find some bad TSC 
machines amongst the selection of older hardware in the XenServer test pool.

~Andrew

  parent reply	other threads:[~2015-12-29 14:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-28 16:59 [PATCH RFC 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
2015-12-28 16:59 ` [PATCH RFC 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
2016-01-25 20:11   ` Konrad Rzeszutek Wilk
2016-01-26 10:31     ` Joao Martins
2015-12-28 16:59 ` [PATCH RFC 2/6] x86/time: implement tsc as clocksource Joao Martins
2015-12-29 15:11   ` Andrew Cooper
2015-12-29 17:37     ` Joao Martins
2015-12-28 16:59 ` [PATCH RFC 3/6] x86/time: streamline platform time init on plt_init() Joao Martins
2016-01-25 20:26   ` Konrad Rzeszutek Wilk
2016-01-26 10:31     ` Joao Martins
2015-12-28 16:59 ` [PATCH RFC 4/6] x86/time: refactor read_platform_stime() Joao Martins
2015-12-28 16:59 ` [PATCH RFC 5/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
2015-12-28 16:59 ` [PATCH RFC 6/6] x86/time: convert counter to tsc for non-tsc clocksource Joao Martins
2015-12-29 14:58 ` Andrew Cooper [this message]
2015-12-29 17:37   ` [PATCH RFC 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
2016-02-22 21:14     ` Joao Martins
2016-02-23  8:09       ` Jan Beulich
2016-02-23 12:13         ` Joao Martins
2015-12-30  3:47   ` Haozhong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56829FAB.1060008@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=haozhong.zhang@intel.com \
    --cc=jbeulich@suse.com \
    --cc=joao.m.martins@oracle.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.