From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1K35vr-0007qZ-Ha for qemu-devel@nongnu.org; Mon, 02 Jun 2008 05:02:23 -0400 Received: from [199.232.76.173] (port=55295 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K35vq-0007pX-MH for qemu-devel@nongnu.org; Mon, 02 Jun 2008 05:02:22 -0400 Received: from mx1.polytechnique.org ([129.104.30.34]:54866) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1K35vp-00033m-NY for qemu-devel@nongnu.org; Mon, 02 Jun 2008 05:02:22 -0400 Received: from fbe1.dev.netgem.com (gw.netgem.com [195.68.2.34]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ssl.polytechnique.org (Postfix) with ESMTP id AE2B93317E for ; Mon, 2 Jun 2008 11:02:14 +0200 (CEST) Message-ID: <4843B711.7090304@bellard.org> Date: Mon, 02 Jun 2008 11:02:09 +0200 From: Fabrice Bellard MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: KQEMU code organization References: <483C3D55.2000508@siemens.com> <483C8705.307@bellard.org> <483D81FA.5070202@siemens.com> <483D8A2E.5070907@bellard.org> <483D8E9A.40509@siemens.com> <483EA1AD.1010901@bellard.org> <20080529161322.GB21610@shareable.org> <483ED935.2060802@codemonkey.ws> <483F25A2.1090108@bellard.org> <4843299D.6050902@codemonkey.ws> In-Reply-To: <4843299D.6050902@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Anthony Liguori wrote: > Fabrice Bellard wrote: >> Anthony Liguori wrote: >> >>> [...] >>> FWIW, the l1_phys_map table is a current hurdle in getting >>> performance. When we use proper accessors to access the virtio_ring, >>> we end up taking >>> a significant performance hit (around 20% on iperf). I have some simple >>> patches that implement a page_desc cache that cache the RAM regions in a >>> linear array. That helps get most of it back. >>> >>> I'd really like to remove the l1_phys_map entirely and replace it with a >>> sorted list of regions. I think this would have an overall performance >>> improvement since its much more cache friendly. One thing keeping this >>> from happening is the fact that the data structure is passed up to the >>> kernel for kqemu. Eliminating that dependency would be a very good >>> thing! >>> >> >> If the l1_phys_map is a performance bottleneck it means that the >> internals of QEMU are not properly used. In QEMU/kqemu, it is not >> accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why >> KVM cannot use a similar system. >> > > This is for device emulation. KVM doesn't use l1_phys_map() for things > like shadow page table accesses. > > In the device emulation, we're currently using stl_phys() and friends. > This goes through a full lookup in l1_phys_map. > > Looking at other devices, some use phys_ram_base + PA and stl_raw() > which is broken but faster. A few places call > cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). > This is okay but it still requires at least one l1_phys_map lookup per > operation in the device (packet receive, io notification, etc.). I > don't think that's going to help much because in our fast paths, we're > only doing 2 or 3 stl_phys() operations. > > At least on x86, there are very few regions of RAM. That makes it very > easy to cache. A TLB style cache seems wrong to me because there are so > few RAM regions. I don't see a better way to do this with the existing > APIs. I see your point. st/ldx_phys() were never optimized in fact. A first solution would be to use a cache similar to the TLBs. It has the advantage is being quite generic and fast. Another solution would be to compute a few intervals with are tested before the generic case. These intervals would correspond to the main RAM area and would be updated each time a new device region is registered. Does your remark implies that KVM switches back to the QEMU process for each I/O ? If so, the l1_phys_map access time should be negligible compared to the SVM-VMX/kernel/user context switch ! Fabrice.