From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1K2wVo-0008VA-L3 for qemu-devel@nongnu.org; Sun, 01 Jun 2008 18:58:52 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1K2wVm-0008Sp-RN for qemu-devel@nongnu.org; Sun, 01 Jun 2008 18:58:52 -0400 Received: from [199.232.76.173] (port=59315 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K2wVm-0008Sh-LR for qemu-devel@nongnu.org; Sun, 01 Jun 2008 18:58:50 -0400 Received: from wx-out-0506.google.com ([66.249.82.239]:49873) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1K2wVm-0003h5-7g for qemu-devel@nongnu.org; Sun, 01 Jun 2008 18:58:50 -0400 Received: by wx-out-0506.google.com with SMTP id h29so495896wxd.4 for ; Sun, 01 Jun 2008 15:58:48 -0700 (PDT) Message-ID: <4843299D.6050902@codemonkey.ws> Date: Sun, 01 Jun 2008 17:58:37 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: KQEMU code organization References: <483C3D55.2000508@siemens.com> <483C8705.307@bellard.org> <483D81FA.5070202@siemens.com> <483D8A2E.5070907@bellard.org> <483D8E9A.40509@siemens.com> <483EA1AD.1010901@bellard.org> <20080529161322.GB21610@shareable.org> <483ED935.2060802@codemonkey.ws> <483F25A2.1090108@bellard.org> In-Reply-To: <483F25A2.1090108@bellard.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Fabrice Bellard wrote: > Anthony Liguori wrote: > >> [...] >> FWIW, the l1_phys_map table is a current hurdle in getting performance. >> When we use proper accessors to access the virtio_ring, we end up taking >> a significant performance hit (around 20% on iperf). I have some simple >> patches that implement a page_desc cache that cache the RAM regions in a >> linear array. That helps get most of it back. >> >> I'd really like to remove the l1_phys_map entirely and replace it with a >> sorted list of regions. I think this would have an overall performance >> improvement since its much more cache friendly. One thing keeping this >> from happening is the fact that the data structure is passed up to the >> kernel for kqemu. Eliminating that dependency would be a very good thing! >> > > If the l1_phys_map is a performance bottleneck it means that the > internals of QEMU are not properly used. In QEMU/kqemu, it is not > accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why > KVM cannot use a similar system. > This is for device emulation. KVM doesn't use l1_phys_map() for things like shadow page table accesses. In the device emulation, we're currently using stl_phys() and friends. This goes through a full lookup in l1_phys_map. Looking at other devices, some use phys_ram_base + PA and stl_raw() which is broken but faster. A few places call cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). This is okay but it still requires at least one l1_phys_map lookup per operation in the device (packet receive, io notification, etc.). I don't think that's going to help much because in our fast paths, we're only doing 2 or 3 stl_phys() operations. At least on x86, there are very few regions of RAM. That makes it very easy to cache. A TLB style cache seems wrong to me because there are so few RAM regions. I don't see a better way to do this with the existing APIs. Regards, Anthony Liguori > Fabrice. > > > >