From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1K35vr-0007qZ-Ha
	for qemu-devel@nongnu.org; Mon, 02 Jun 2008 05:02:23 -0400
Received: from [199.232.76.173] (port=55295 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1K35vq-0007pX-MH
	for qemu-devel@nongnu.org; Mon, 02 Jun 2008 05:02:22 -0400
Received: from mx1.polytechnique.org ([129.104.30.34]:54866)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <fabrice@bellard.org>) id 1K35vp-00033m-NY
	for qemu-devel@nongnu.org; Mon, 02 Jun 2008 05:02:22 -0400
Received: from fbe1.dev.netgem.com (gw.netgem.com [195.68.2.34])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by ssl.polytechnique.org (Postfix) with ESMTP id AE2B93317E
	for <qemu-devel@nongnu.org>; Mon,  2 Jun 2008 11:02:14 +0200 (CEST)
Message-ID: <4843B711.7090304@bellard.org>
Date: Mon, 02 Jun 2008 11:02:09 +0200
From: Fabrice Bellard <fabrice@bellard.org>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: KQEMU code organization
References: <483C3D55.2000508@siemens.com>	<483C8705.307@bellard.org>	<483D81FA.5070202@siemens.com>	<483D8A2E.5070907@bellard.org>	<483D8E9A.40509@siemens.com>	<483EA1AD.1010901@bellard.org>	<20080529161322.GB21610@shareable.org>	<483ED935.2060802@codemonkey.ws>	<483F25A2.1090108@bellard.org>
	<4843299D.6050902@codemonkey.ws>
In-Reply-To: <4843299D.6050902@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Anthony Liguori wrote:
> Fabrice Bellard wrote:
>> Anthony Liguori wrote:
>>  
>>> [...]
>>> FWIW, the l1_phys_map table is a current hurdle in getting 
>>> performance. When we use proper accessors to access the virtio_ring, 
>>> we end up taking
>>> a significant performance hit (around 20% on iperf).  I have some simple
>>> patches that implement a page_desc cache that cache the RAM regions in a
>>> linear array.  That helps get most of it back.
>>>
>>> I'd really like to remove the l1_phys_map entirely and replace it with a
>>> sorted list of regions.  I think this would have an overall performance
>>> improvement since its much more cache friendly.  One thing keeping this
>>> from happening is the fact that the data structure is passed up to the
>>> kernel for kqemu.  Eliminating that dependency would be a very good 
>>> thing!
>>>     
>>
>> If the l1_phys_map is a performance bottleneck it means that the
>> internals of QEMU are not properly used. In QEMU/kqemu, it is not
>> accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
>> KVM cannot use a similar system.
>>   
> 
> This is for device emulation.  KVM doesn't use l1_phys_map() for things 
> like shadow page table accesses.
> 
> In the device emulation, we're currently using stl_phys() and friends.  
> This goes through a full lookup in l1_phys_map.
> 
> Looking at other devices, some use phys_ram_base + PA and stl_raw() 
> which is broken but faster.  A few places call 
> cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw().  
> This is okay but it still requires at least one l1_phys_map lookup per 
> operation in the device (packet receive, io notification, etc.).  I 
> don't think that's going to help much because in our fast paths, we're 
> only doing 2 or 3 stl_phys() operations.
> 
> At least on x86, there are very few regions of RAM.  That makes it very 
> easy to cache.  A TLB style cache seems wrong to me because there are so 
> few RAM regions.  I don't see a better way to do this with the existing 
> APIs.

I see your point. st/ldx_phys() were never optimized in fact.

A first solution would be to use a cache similar to the TLBs. It has the 
advantage is being quite generic and fast. Another solution would be to 
compute a few intervals with are tested before the generic case. These 
intervals would correspond to the main RAM area and would be updated 
each time a new device region is registered.

Does your remark implies that KVM switches back to the QEMU process for 
each I/O ? If so, the l1_phys_map access time should be negligible 
compared to the SVM-VMX/kernel/user context switch !

Fabrice.