From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH 0/2] Time-related fixes for migration
Date: Mon, 31 Mar 2014 11:30:25 -0400
Message-ID: <53398A11.5000001@oracle.com>
References: <1396148751-6918-1-git-send-email-boris.ostrovsky@oracle.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D125DCE226@SHSMSX101.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D125DCE226@SHSMSX101.ccr.corp.intel.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "ian.campbell@citrix.com" <ian.campbell@citrix.com>, "stefano.stabellini@eu.citrix.com" <stefano.stabellini@eu.citrix.com>, "Nakajima, Jun" <jun.nakajima@intel.com>, "Dong,
	Eddie" <eddie.dong@intel.com>, "ian.jackson@eu.citrix.com" <ian.jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, "jbeulich@suse.com" <jbeulich@suse.com>, "suravee.suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>
List-Id: xen-devel@lists.xenproject.org

On 03/31/2014 10:41 AM, Tian, Kevin wrote:
>> * The second patch keeps TSCs synchronized across VPCUs after save/restore.
>> Currently TSC values diverge after migration because during both save and
>> restore
>> we calculate them separately for each VCPU and base each calculation on
>> newly-read host's TSC.
>>
>> The problem can be easily demonstrated with this program for a 2-VCPU guest
>> (I am assuming here invariant TSC so, for example,
>> tsc_mode="always_emulate" (*)):
>>
>> int
>> main(int argc, char* argv[])
>> {
>>
>>    unsigned long long h = 0LL;
>>    int proc = 0;
>>    cpu_set_t set;
>>
>>    for(;;) {
>>      unsigned long long n = __native_read_tsc();
>>      if(h && n < h)
>>          printf("prev 0x%llx cur 0x%llx\n", h, n);
>>      CPU_ZERO(&set);
>>      proc = (proc + 1) & 1;
>>      CPU_SET(proc, &set);
>>      if (sched_setaffinity(0, sizeof(cpu_set_t), &set)) {
>>          perror("sched_setaffinity");
>>          exit(1);
>>      }
>>
>>      h = n;
>>    }
>> }
>>
> what's the backward drift range from above program? dozens of cycles?
> hundreds of cycles?

For "raw" difference (i.e. TSC registers themselves) it's usually tens 
of thousands, sometimes more (and sometimes *much* more).

For example, here are outputs of 'xl debug-key v' before and after a 
migrate for a 2p guest:

root@haswell> xl dmesg |grep Offset
(XEN) TSC Offset = ffffff54e63f1cab
(XEN) TSC Offset = ffffff54e63f1cab
(XEN) TSC Offset = ffffff6cb0a59ea9
(XEN) TSC Offset = ffffff6cb0a566ae
root@haswell>

For guest's view, taking into account, for example, the fact that 
sched_affinity() takes quite some time, it's a few hundreds of cycles.

And obviously as you inclrease number of VCPUs (and I think guest memory 
as well) the largest difference grows further.

-boris