From mboxrd@z Thu Jan  1 00:00:00 1970
From: "David S. Ahern" <daahern@cisco.com>
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels	(specifically
 RHEL3)
Date: Wed, 28 May 2008 09:43:09 -0600
Message-ID: <483D7D8D.3030309@cisco.com>
References: <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <48318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> <4835EEF5.9010600@cisco.com> <483D391F.7050007@qumranet.com> <483D6898.2050605@cisco.com> <20080528144850.GX27375@duo.random> <483D7C45.5020300@qumranet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Andrea Arcangeli <andrea@qumranet.com>, kvm@vger.kernel.org
To: Avi Kivity <avi@qumranet.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from sj-iport-3.cisco.com ([171.71.176.72]:11293 "EHLO
	sj-iport-3.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751346AbYE1Pnz (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 28 May 2008 11:43:55 -0400
In-Reply-To: <483D7C45.5020300@qumranet.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

This is the code in the RHEL3.8 kernel:

static int scan_active_list(struct zone_struct * zone, int age,
		struct list_head * list, int count)
{
	struct list_head *page_lru , *next;
	struct page * page;
	int over_rsslimit;

	count = count * kscand_work_percent / 100;
	/* Take the lock while messing with the list... */
	lru_lock(zone);
	while (count-- > 0 && !list_empty(list)) {
		page = list_entry(list->prev, struct page, lru);
		pte_chain_lock(page);
		if (page_referenced(page, &over_rsslimit)
				&& !over_rsslimit
				&& check_mapping_inuse(page))
			age_page_up_nolock(page, age);
		else {
			list_del(&page->lru);
			list_add(&page->lru, list);
		}
		pte_chain_unlock(page);
	}
	lru_unlock(zone);
	return 0;
}

My previous email shows examples of the number of pages in the list and
the scanning that happens.

david


Avi Kivity wrote:
> Andrea Arcangeli wrote:
>>
>> So I never found a relation to the symptom reported of VM kernel
>> threads going weird, with KVM optimal handling of kmap ptes.
>>   
> 
> 
> The problem is this code:
> 
> static int scan_active_list(struct zone_struct * zone, int age,
>                struct list_head * list)
> {
>        struct list_head *page_lru , *next;
>        struct page * page;
>        int over_rsslimit;
> 
>        /* Take the lock while messing with the list... */
>        lru_lock(zone);
>        list_for_each_safe(page_lru, next, list) {
>                page = list_entry(page_lru, struct page, lru);
>                pte_chain_lock(page);
>                if (page_referenced(page, &over_rsslimit) && !over_rsslimit)
>                        age_page_up_nolock(page, age);
>                pte_chain_unlock(page);
>        }
>        lru_unlock(zone);
>        return 0;
> }
> 
> If the pages in the list are in the same order as in the ptes (which is
> very likely), then we have the following access pattern
> 
> - set up kmap to point at pte
> - test_and_clear_bit(pte)
> - kunmap
> 
> From kvm's point of view this looks like
> 
> - several accesses to set up the kmap
>  - if these accesses trigger flooding, we will have to tear down the
> shadow for this page, only to set it up again soon
> - an access to the pte (emulted)
>  - if this access _doesn't_ trigger flooding, we will have 512 unneeded
> emulations.  The pte is worthless anyway since the accessed bit is clear
> (so we can't set up a shadow pte for it)
>    - this bug was fixed
> - an access to tear down the kmap
> 
> [btw, am I reading this right? the entire list is scanned each time?
> 
> if you have 1G of active HIGHMEM, that's a quarter of a million pages,
> which would take at least a second no matter what we do.  VMware can
> probably special-case kmaps, but we can't]
>