From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Performance difference between Xen versions Date: Fri, 29 Apr 2011 15:58:55 +0100 Message-ID: References: <4DBABEAB.6090906@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4DBABEAB.6090906@ts.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Juergen Gross Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 29/04/2011 14:35, "Juergen Gross" wrote: > On 04/29/11 15:28, Keir Fraser wrote: >> Are you sure TSC runs at the same rate in the guest on both hypervisor >> versions? Xen 4.0 might trap and emulate a more consistent but slower rate >> TSC by default. 'tsc_mode=2' in your domain config file on 4.0 might be a >> quick fix. > Already done :-), so yes, I am sure the tsc rate is the same. The debug key > 's' (softTSC stats) shows that no tsc is emulated. > > BTW: different tsc rate is improbable as the memory access loop shows > nearly the same tsc difference... Then I'm not sure. Maybe something got added to the VMEXIT/VMENTRY path that is unexpectedly slow. You'll have to do a bit of digging. -- Keir > Juergen > >> -- Keir >> >> On 29/04/2011 13:32, "Juergen Gross" wrote: >> >>> Hi, >>> >>> comparing performance of different Xen versions with BS2000 as HVM guest >>> showed some weird data I'd like to understand. >>> >>> All measurements were done on an Intel Xeon E7220 box. We used a disk- >>> benchmark and found the cpu utilization was much higher with Xen 4.0 >>> compared >>> to Xen 3.3. I did some more investigation and narrowed things down to calls >>> of >>> the hypervisor (implicit or explicit). >>> >>> Following is a table with timing data for different low-level functions, all >>> timing values are tsc ticks obtained via rdtsc: >>> >>> Xen 3.3 Xen 4.0 Function >>> 88 165 just the measurement overhead >>> 176 330 rdtsc-instruction + cli/sti >>> 5896 11044 lapic timer query >>> 7381 13519 setting lapic timer >>> 4653 8987 reload of cr3 >>> 3124 5709 invlpg instruction >>> 792253 792264 wbinvd instruction >>> 748 1375 int + iret >>> 5203 9317 hypervisor yield call >>> 12598102 12597882 memory access loop >>> >>> All operations involving the hypervisor take nearly twice the time on 4.0. >>> Operations not involving the hypervisor (wbinvd and memory access loop) are >>> the same on both systems (this rules out the possibility of different rdtsc >>> behavior). >>> >>> Is there any easy explanation for this? Both Xen versions are from SLES >>> (SLES11 or SLES11 SP1). >>> >>> >>> Juergen