From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [patch 00/13] RFC: out of sync shadow
Date: Sun, 07 Sep 2008 14:22:47 +0300
Message-ID: <48C3B987.6020803@qumranet.com>
References: <20080906184822.560099087@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:39092 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753197AbYIGLWt (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sun, 7 Sep 2008 07:22:49 -0400
In-Reply-To: <20080906184822.560099087@localhost.localdomain>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> Keep shadow pages temporarily out of sync, allowing more efficient guest
> PTE updates in comparison to trap-emulate + unprotect heuristics. Stolen
> from Xen :)
>
> This version only allows leaf pagetables to go out of sync, for
> simplicity, but can be enhanced.
>
> VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new
> PTE writes need no TLB flush, I assume). Not sure if its worthwhile to
> convert notrap_nonpresent -> trap_nonpresent on unshadow or just go 
> for unconditional nonpaging_prefetch_page.
>
>   

Doesn't it kill bypass_guest_pf completely?  As soon as we unsync a 
page, we can't have nontrapping nonpresent ptes in it.

We can try convertion on unsync, it does speed up demand paging.

> * Kernel builds on 4-way 64-bit guest improve 10% (+ 3.7% for
> get_user_pages_fast). 
>
> * lmbench's "lat_proc fork" microbenchmark latency is 40% lower (a
> shadow worst scenario test).
>
> * The RHEL3 highpte kscand hangs go from 5+ seconds to < 1 second.
>
> * Windows 2003 Server, 32-bit PAE, DDK build (build -cPzM 3):
>
> Windows 2003 Checked 64 Bit Build Environment, 256M RAM
> 1-vcpu:
> vanilla + gup_fast:         oos
> 0:04:37.375                 0:03:28.047     (- 25%)
>
> 2-vcpus:
> vanilla + gup_fast          oos
> 0:02:32.000                 0:01:56.031     (- 23%)
>
>
> Windows 2003 Checked Build Environment, 1GB RAM
> 2-vcpus:
> vanilla + fast_gup         oos
> 0:02:26.078                0:01:50.110      (- 24%)
>
> 4-vcpus:
> vanilla + fast_gup         oos
> 0:01:59.266                0:01:29.625      (- 25%)
>
>   

Impressive results.

> And I think other optimizations are possible now, for example the guest
> can be responsible for remote TLB flushing on kvm_mmu_pte_write().
>   

But kvm_mmu_pte_write() is no longer called, since we unsync?

-- 
error compiling committee.c: too many arguments to function