From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [patch 00/13] RFC: out of sync shadow Date: Sun, 07 Sep 2008 14:22:47 +0300 Message-ID: <48C3B987.6020803@qumranet.com> References: <20080906184822.560099087@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Marcelo Tosatti Return-path: Received: from il.qumranet.com ([212.179.150.194]:39092 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753197AbYIGLWt (ORCPT ); Sun, 7 Sep 2008 07:22:49 -0400 In-Reply-To: <20080906184822.560099087@localhost.localdomain> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > Keep shadow pages temporarily out of sync, allowing more efficient guest > PTE updates in comparison to trap-emulate + unprotect heuristics. Stolen > from Xen :) > > This version only allows leaf pagetables to go out of sync, for > simplicity, but can be enhanced. > > VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new > PTE writes need no TLB flush, I assume). Not sure if its worthwhile to > convert notrap_nonpresent -> trap_nonpresent on unshadow or just go > for unconditional nonpaging_prefetch_page. > > Doesn't it kill bypass_guest_pf completely? As soon as we unsync a page, we can't have nontrapping nonpresent ptes in it. We can try convertion on unsync, it does speed up demand paging. > * Kernel builds on 4-way 64-bit guest improve 10% (+ 3.7% for > get_user_pages_fast). > > * lmbench's "lat_proc fork" microbenchmark latency is 40% lower (a > shadow worst scenario test). > > * The RHEL3 highpte kscand hangs go from 5+ seconds to < 1 second. > > * Windows 2003 Server, 32-bit PAE, DDK build (build -cPzM 3): > > Windows 2003 Checked 64 Bit Build Environment, 256M RAM > 1-vcpu: > vanilla + gup_fast: oos > 0:04:37.375 0:03:28.047 (- 25%) > > 2-vcpus: > vanilla + gup_fast oos > 0:02:32.000 0:01:56.031 (- 23%) > > > Windows 2003 Checked Build Environment, 1GB RAM > 2-vcpus: > vanilla + fast_gup oos > 0:02:26.078 0:01:50.110 (- 24%) > > 4-vcpus: > vanilla + fast_gup oos > 0:01:59.266 0:01:29.625 (- 25%) > > Impressive results. > And I think other optimizations are possible now, for example the guest > can be responsible for remote TLB flushing on kvm_mmu_pte_write(). > But kvm_mmu_pte_write() is no longer called, since we unsync? -- error compiling committee.c: too many arguments to function