From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L17qn-0003gc-Sb for qemu-devel@nongnu.org; Fri, 14 Nov 2008 18:13:17 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L17qm-0003gI-Ti for qemu-devel@nongnu.org; Fri, 14 Nov 2008 18:13:17 -0500 Received: from [199.232.76.173] (port=33138 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L17qm-0003gF-OI for qemu-devel@nongnu.org; Fri, 14 Nov 2008 18:13:16 -0500 Received: from mail2.shareable.org ([80.68.89.115]:48524) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1L17qm-0004mU-Bl for qemu-devel@nongnu.org; Fri, 14 Nov 2008 18:13:16 -0500 Date: Fri, 14 Nov 2008 23:13:09 +0000 Subject: Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c Message-ID: <20081114231309.GD19384@shareable.org> References: <1226527840-14183-1-git-send-email-aliguori@us.ibm.com> <20081114040311.GN2055@shareable.org> <491D8624.50800@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <491D8624.50800@codemonkey.ws> From: Jamie Lokier Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Carsten Otte , Anthony Liguori , Hollis Blanchard , kvm-devel , qemu-devel@nongnu.org, Paul Brook Anthony Liguori wrote: > Jamie Lokier wrote: > >Also, an earlier thread pointed out that loops doing a lot of MMIO are > >_slower_ with KVM than without - this manifested as very slow VGA > >output for some guests. Having KVM pass control to TCG for short runs > >of guest instructions which do MMIO, or other instructions which need > >to be emulated, would accelerate KVM in this respect. ... > An MMIO exit to userspace typically costs around 6k cycles. On the > other hand, a TB translation tends to average closer to 300k often times > reaching much higher. This with was with dyngen so TCG may be more or > less expensive. > > An in-kernel MMIO exit on the other hand will cost around 3k cycles. ... > To make up the cost of TCG translation for just one TB, you need to have > a tight loop of at least 50 iterations. Firstly: That doesn't make sense: why would you do an expensive TCG translation every time you hit the same code? After the first encounter, if the code page hasn't been modified, it should be a TB cache lookup to already translated code. I'm guessing the cost of TB cache lookup is much closer to 3k than 300k cycles, maybe even lower... Secondly: In these cases, you can use a special fast translation (when it's not cached) which just copies the instructions 1:1 from the guest, simply converting the special instructions (MMIO, anything else needing it) to helper calls. That's possible because you know the host is ture architeccompatible with the guest, as it's running KVM. > If you also consider all the potential locking issues with SMP guests, I > think it's pretty likely that there are few cases where dropping to TCG > is going to be a net performance win. VMware claimed otherwise when Intel first brought out CPU support for virtualisation. SMP works fine if you map guest instructions 1:1 to host instructions with helper calls for special cases. Even atomics, load-locked sequences and complex weak memory ordering things would behave correctly. Oops, I believe I just argued for keeping the TB cache and code translation but not using TCG :-) -- Jamie