From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ingo Molnar Subject: Re: [RFT] mmu optimizations branch Date: Tue, 2 Jan 2007 17:11:17 +0100 Message-ID: <20070102161117.GA3306@elte.hu> References: <4598E33B.608@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel Return-path: To: Avi Kivity Content-Disposition: inline In-Reply-To: <4598E33B.608-atKUWr5tajBWk0Htik3J/w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org * Avi Kivity wrote: > This is a request for testing of the mmu optimizations branch. > > Currently the shadow page tables are discarded every time the guest > performs a context switch. The mmu branch allows shadow page tables > to be cached across context switches, greatly reducing the cpu > utilization on multi process workloads. It is now stable enough for > testing (though perhaps not for general use). i have tested it with Fedora Core 6 guest (32-bit, nopae), under a FC6 host (32-bit CoreDuo2, nopae, enough RAM), and it's working great! Here are some quick numbers. Context-switch overhead with lmbench lat_ctx -s 0 [zero memory footprint]: ------------------------------------------------- #tasks native kvm-r4204 kvm-r4232(mmu) ------------------------------------------------- 2: 2.02 180.91 9.19 20: 4.04 183.21 10.01 50: 4.30 185.95 11.27 so here it's a /massive/, almost 20 times speedup! Context-switch overhead with -s 1000 (1MB memory footprint): ------------------------------------------------- #tasks native kvm-r4204 kvm-r4232(mmu) ------------------------------------------------- 2: 150.5 1032.97 295.16 20: 216.6 1020.34 393.01 50: 218.1 1015.58 2335.99[*] the speedup is nice here too. Note the outlier at 50 tasks: it's consistently reproducable. Could KVM be trashing the pagetable cache due to some sort of internal limit? It's not due to guest size The -mmu FC6 guest is visibly faster, so it's not just microbenchmarks that benefit from this change. KVM got /massively/ faster in every aspect, kudos Avi! (Note that r4204 already included the interactivity IRQ fixes so the improvements are i think purely due to pagetable caching speedups.) on a related note, i also got: vmwrite error: reg 6802 value cfd3c4a4 (err 17408) and: kvm: unhandled wrmsr: 0xc1 inject_general_protection: rip 0xc011f7f3 kvm: unhandled wrmsr: 0x186 inject_general_protection: rip 0xc011f7f3 kvm: unhandled wrmsr: 0xc1 inject_general_protection: rip 0xc011f7f3 kvm: unhandled wrmsr: 0x186 inject_general_protection: rip 0xc011f7f3 unfortunately 0xc011f7f3 is in native_write_msr(), which isnt very helpful. (i have CONFIG_PARAVIRT enabled in the -rt guest and host kernels) But the MSR values suggest that this is the NMI watchdog thing again, trying to program MSR_ARCH_PERFMON_EVENTSEL0 and MSR_ARCH_PERFMON_PERFCTR0, but this time Linux recovered due to a more robust MSR handling. The guest disabled the NMI watchdog with: Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! the FC6 installer hang that i saw with earlier MMU-branch snapshots is fixed. Ingo ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV