From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Poor HVM performance with 8 vcpus Date: Wed, 14 Oct 2009 12:44:35 +0200 Message-ID: <4AD5AB93.7050909@ts.fujitsu.com> References: <4AD588D9.4040104@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Gianluca Guida Cc: Tim Deegan , "xen-devel@lists.xensource.com" , Keir Fraser List-Id: xen-devel@lists.xenproject.org Gianluca Guida wrote: > Ah, those good old OOS talks. I fear I am going to fail on my attempt > to be laconic. :-) > > On Wed, Oct 14, 2009 at 10:35 AM, Keir Fraser wrote: >> On 14/10/2009 09:16, "Juergen Gross" wrote: >> >>> as the performance of BS2000 seems to be hit by OOS optimization, I'm >>> thinking of making a patch to disable this feature by a domain parameter. >>> >>> Is there a way to do this without having to change all places where the >>> #if statements are placed? >>> I think there should be some central routines where adding an "if" could >>> be enough (setting oos_active to 0 seems not to be enough, I fear). >>> >>> Do you have any hint? >> How about disabling it for domains with more than four VCPUs? Have you >> measured performance with OOS for 1-4 VCPU guests? This is perhaps not >> something that needs to be baked into guest configs. > > In general, shadow code loses performances as the vcpus increase (>=4) > because of the single shadow lock (and getting rid of the shadow lock, > i.e. having per-vcpu shadows wouldn't help, since it would make much > slower the most common operation, that is removing writable access of > guest pages). > But the two algorithms (always in-sync vs. OOS) will show their > performance penalties in two different areas: in a scenario where > guests do lot of PTE writes (read Windows in most of its operations) > the in-sync approach will be more penalizing, because emulation is > slow and needs the shadow lock, while scenarios were guests tend to > have many dirty CR3 switches (that is CR3 switches with freshly > written PTEs, as in the case with Juergen benchmark and the famous > Windows parallel ddk build) will be penalized more by the OOS > algorithm. > > Disabling OOS for domains more than 4 vcpus might be a good idea, but > not necessarily optimal. Furthermore, I always understood that a good > practice for VM performance is to have many small VMs instead of a VM > eating all of the host's CPUs, at least when shadow code is on. With > big VMs, EPT/NPT has always been the best approach, since even with > lot of TLB misses, the system was definitely lock-free in most of the > VM's life. > > Creating a per-domain switch should be a good idea, but a more generic > (and correct) approach would be to have a dynamic policy for OOSing > pages, in which we would stop putting OOS pages when we realize that > we are resynch'ing too many pages in CR3 switches. This was taken in > consideration during the development of the OOS, but it was finally > discarded because performance were decent and big VMs were not in the > interest range. > > Yes, definitely away from spartan wit. But I hope this clarifies the issue. I really does. I think I'll start with a per-domain switch and leave the generic approach to the specialists. ;-) If, however, Keir rejects such a switch, I could try the generic solution, but I think this solution would need very much work to find the correct parameters. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html