From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Poor HVM performance with 8 vcpus Date: Wed, 14 Oct 2009 10:16:25 +0200 Message-ID: <4AD588D9.4040104@ts.fujitsu.com> References: <4ACC3B49.4060500@ts.fujitsu.com> <4ACD907F.7030505@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4ACD907F.7030505@ts.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Juergen Gross Cc: Gianluca Guida , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Gianluca, as the performance of BS2000 seems to be hit by OOS optimization, I'm thinking of making a patch to disable this feature by a domain parameter. Is there a way to do this without having to change all places where the #if statements are placed? I think there should be some central routines where adding an "if" could be enough (setting oos_active to 0 seems not to be enough, I fear). Do you have any hint? Juergen Juergen Gross wrote: > Hi, > > Gianluca Guida wrote: >> Hi, >> >> On Wed, Oct 7, 2009 at 8:55 AM, Juergen Gross >> wrote: >>> we've got massive performance problems running a 8 vcpu HVM-guest (BS2000) >>> under XEN (xen 3.3.1). >>> >>> With a specific benchmark producing a rather high load on memory management >>> operations (lots of process creation/deletion and memory allocation) the 8 >>> vcpu performance was worse than the 4 vcpu performance. On other platforms >>> (/390, MIPS, SPARC) this benchmark scaled rather well with the number of cpus. >>> >>> The result of the usage of the software performance counters of XEN seemed >>> to point to the shadow lock being the reason. I modified the Hypervisor to >>> gather some lock statistics (patch will be sent soon) and found that the >>> shadow lock is really the bottleneck. On average 4 vcpus are waiting to get >>> the lock! >>> >>> Is this a known issue? >> Acutally, I think so. The OOS optimization is widely known not to be >> too scalable at 8vcpus in the current state, since its weak point is >> the CR3 switching time increasing linearly with the number of cpus. If >> you have lot of processes switches together with lot of PTE writings >> (as it seems to be the case for your benchmark) then that's probably >> the cause. >> >> Could you try disabling the OOS optimization from the >> SHADOW_OPTIMIZATIONS definition? > > Great! > First performance data looks okay! > We will have to run different benchmarks in different configurations, but I > think you gave an excellent hint. :-) -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html