From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3) Date: Wed, 28 May 2008 18:37:41 +0300 Message-ID: <483D7C45.5020300@qumranet.com> References: <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <48318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> <4835EEF5.9010600@cisco.com> <483D391F.7050007@qumranet.com> <483D6898.2050605@cisco.com> <20080528144850.GX27375@duo.random> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "David S. Ahern" , kvm@vger.kernel.org To: Andrea Arcangeli Return-path: Received: from bzq-179-150-194.static.bezeqint.net ([212.179.150.194]:53068 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751303AbYE1Phq (ORCPT ); Wed, 28 May 2008 11:37:46 -0400 In-Reply-To: <20080528144850.GX27375@duo.random> Sender: kvm-owner@vger.kernel.org List-ID: Andrea Arcangeli wrote: > > So I never found a relation to the symptom reported of VM kernel > threads going weird, with KVM optimal handling of kmap ptes. > The problem is this code: static int scan_active_list(struct zone_struct * zone, int age, struct list_head * list) { struct list_head *page_lru , *next; struct page * page; int over_rsslimit; /* Take the lock while messing with the list... */ lru_lock(zone); list_for_each_safe(page_lru, next, list) { page = list_entry(page_lru, struct page, lru); pte_chain_lock(page); if (page_referenced(page, &over_rsslimit) && !over_rsslimit) age_page_up_nolock(page, age); pte_chain_unlock(page); } lru_unlock(zone); return 0; } If the pages in the list are in the same order as in the ptes (which is very likely), then we have the following access pattern - set up kmap to point at pte - test_and_clear_bit(pte) - kunmap From kvm's point of view this looks like - several accesses to set up the kmap - if these accesses trigger flooding, we will have to tear down the shadow for this page, only to set it up again soon - an access to the pte (emulted) - if this access _doesn't_ trigger flooding, we will have 512 unneeded emulations. The pte is worthless anyway since the accessed bit is clear (so we can't set up a shadow pte for it) - this bug was fixed - an access to tear down the kmap [btw, am I reading this right? the entire list is scanned each time? if you have 1G of active HIGHMEM, that's a quarter of a million pages, which would take at least a second no matter what we do. VMware can probably special-case kmaps, but we can't] -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.