From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Ahern" Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3) Date: Wed, 28 May 2008 09:43:09 -0600 Message-ID: <483D7D8D.3030309@cisco.com> References: <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <48318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> <4835EEF5.9010600@cisco.com> <483D391F.7050007@qumranet.com> <483D6898.2050605@cisco.com> <20080528144850.GX27375@duo.random> <483D7C45.5020300@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andrea Arcangeli , kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from sj-iport-3.cisco.com ([171.71.176.72]:11293 "EHLO sj-iport-3.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751346AbYE1Pnz (ORCPT ); Wed, 28 May 2008 11:43:55 -0400 In-Reply-To: <483D7C45.5020300@qumranet.com> Sender: kvm-owner@vger.kernel.org List-ID: This is the code in the RHEL3.8 kernel: static int scan_active_list(struct zone_struct * zone, int age, struct list_head * list, int count) { struct list_head *page_lru , *next; struct page * page; int over_rsslimit; count = count * kscand_work_percent / 100; /* Take the lock while messing with the list... */ lru_lock(zone); while (count-- > 0 && !list_empty(list)) { page = list_entry(list->prev, struct page, lru); pte_chain_lock(page); if (page_referenced(page, &over_rsslimit) && !over_rsslimit && check_mapping_inuse(page)) age_page_up_nolock(page, age); else { list_del(&page->lru); list_add(&page->lru, list); } pte_chain_unlock(page); } lru_unlock(zone); return 0; } My previous email shows examples of the number of pages in the list and the scanning that happens. david Avi Kivity wrote: > Andrea Arcangeli wrote: >> >> So I never found a relation to the symptom reported of VM kernel >> threads going weird, with KVM optimal handling of kmap ptes. >> > > > The problem is this code: > > static int scan_active_list(struct zone_struct * zone, int age, > struct list_head * list) > { > struct list_head *page_lru , *next; > struct page * page; > int over_rsslimit; > > /* Take the lock while messing with the list... */ > lru_lock(zone); > list_for_each_safe(page_lru, next, list) { > page = list_entry(page_lru, struct page, lru); > pte_chain_lock(page); > if (page_referenced(page, &over_rsslimit) && !over_rsslimit) > age_page_up_nolock(page, age); > pte_chain_unlock(page); > } > lru_unlock(zone); > return 0; > } > > If the pages in the list are in the same order as in the ptes (which is > very likely), then we have the following access pattern > > - set up kmap to point at pte > - test_and_clear_bit(pte) > - kunmap > > From kvm's point of view this looks like > > - several accesses to set up the kmap > - if these accesses trigger flooding, we will have to tear down the > shadow for this page, only to set it up again soon > - an access to the pte (emulted) > - if this access _doesn't_ trigger flooding, we will have 512 unneeded > emulations. The pte is worthless anyway since the accessed bit is clear > (so we can't set up a shadow pte for it) > - this bug was fixed > - an access to tear down the kmap > > [btw, am I reading this right? the entire list is scanned each time? > > if you have 1G of active HIGHMEM, that's a quarter of a million pages, > which would take at least a second no matter what we do. VMware can > probably special-case kmaps, but we can't] >