From mboxrd@z Thu Jan  1 00:00:00 1970
From: "David S. Ahern" <daahern@cisco.com>
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels	(specifically
 RHEL3)
Date: Mon, 19 May 2008 10:25:47 -0600
Message-ID: <4831AA0B.30700@cisco.com>
References: <48054518.3000104@cisco.com>	<4805BCF1.6040605@qumranet.com>	<4807BD53.6020304@cisco.com>	<48085485.3090205@qumranet.com>	<480C188F.3020101@cisco.com>	<480C5C39.4040300@qumranet.com>	<480E492B.3060500@cisco.com>	<480EEDA0.3080209@qumranet.com>	<480F546C.2030608@cisco.com>	<481215DE.3000302@cisco.com>	<20080428181550.GA3965@dmt>	<4816617F.3080403@cisco.com>	<4817F30C.6050308@cisco.com>	<48184228.2020701@qumranet.com>	<481876A9.1010806@cisco.com>	<48187903.2070409@qumranet.com>	<4826E744.1080107@qumranet.com>	<4826F668.6030305@qumranet.com> <48290FC2.4070505@cisco.com> <48294272.5020801@qumranet.com> <482B4D29.7010202@cisco.com> <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <4
 8318E64.8090706@qumranet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Avi Kivity <avi@qumranet.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from sj-iport-3.cisco.com ([171.71.176.72]:2777 "EHLO
	sj-iport-3.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753487AbYESQ0J (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 19 May 2008 12:26:09 -0400
In-Reply-To: <48318E64.8090706@qumranet.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Does the fact that the hugemem kernel works just fine have any bearing
on your options? Or rather, is there something unique about the way
kscand works in the hugemem kernel that its performance is ok?

I mentioned last month (so without your first patch) that running the
hugemem kernel showed a remarkable improvement in performance compared
to the standard smp kernel. Over the weekend I ran a test with your
first patch and with the flood detector at 3 (I have not run a case with
the detector at 5) and performance with the hugemem was even better in
the sense that 1-minute averages of guest system time show no noticeable
spikes.

In an earlier post I showed a diff in the config files for the standard
SMP and hugemem kernels. See:
http://article.gmane.org/gmane.comp.emulators.kvm.devel/16944/

david


Avi Kivity wrote:
> David S. Ahern wrote:
>>> [dsa] No. I saw the same problem with the flood count at 5. The
>>> attachment in the last email shows kvm_stat data during a kscand event.
>>> The data was collected with the patch you posted. With the flood count
>>> at 3 the mmu cache/flood counters are in the 18,000/sec and pte updates
>>> at ~50,000/sec and writes at 70,000/sec. With the flood count at 5
>>> mmu_cache/flood drops to 0 and pte updates and writes both hit
>>> 180,000+/second. In both cases these last for 30 seconds or more. I only
>>> included data for the onset as it's pretty flat during the kscand
>>> activity.
>>>     
> 
> It makes sense.  We removed a flooding false positive, and introduced a
> false negative.
> 
> The guest access sequence is:
> - point kmap pte at page table
> - use the new pte to access the page table
> 
> Prior to the patch, the mmu didn't see the 'use' part, so it concluded
> the kmap pte would be better off unshadowed.  This shows up as a high
> flood count.
> 
> After the patch, this no longer happens, so the sequence can repreat for
> long periods.  However the pte that is the result of the 'use' part is
> never accessed, so it should be detected as flooded!  But our flood
> detection mechanism looks at one page at a time (per vcpu), while there
> are two pages involved here.
> 
> There are (at least) three options available:
> - detect and special-case this scenario
> - change the flood detector to be per page table instead of per vcpu
> - change the flood detector to look at a list of recently used page
> tables instead of the last page table
> 
> I'm having a hard time trying to pick between the second and third options.
>