From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels	(specifically
 RHEL3)
Date: Wed, 28 May 2008 18:37:41 +0300
Message-ID: <483D7C45.5020300@qumranet.com>
References: <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <48318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> <4835EEF5.9010600@cisco.com> <483D391F.7050007@qumranet.com> <483D6898.2050605@cisco.com> <20080528144850.GX27375@duo.random>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "David S. Ahern" <daahern@cisco.com>, kvm@vger.kernel.org
To: Andrea Arcangeli <andrea@qumranet.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from bzq-179-150-194.static.bezeqint.net ([212.179.150.194]:53068
	"EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751303AbYE1Phq (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 28 May 2008 11:37:46 -0400
In-Reply-To: <20080528144850.GX27375@duo.random>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Andrea Arcangeli wrote:
>
> So I never found a relation to the symptom reported of VM kernel
> threads going weird, with KVM optimal handling of kmap ptes.
>   


The problem is this code:

static int scan_active_list(struct zone_struct * zone, int age,
                struct list_head * list)
{
        struct list_head *page_lru , *next;
        struct page * page;
        int over_rsslimit;

        /* Take the lock while messing with the list... */
        lru_lock(zone);
        list_for_each_safe(page_lru, next, list) {
                page = list_entry(page_lru, struct page, lru);
                pte_chain_lock(page);
                if (page_referenced(page, &over_rsslimit) && !over_rsslimit)
                        age_page_up_nolock(page, age);
                pte_chain_unlock(page);
        }
        lru_unlock(zone);
        return 0;
}

If the pages in the list are in the same order as in the ptes (which is 
very likely), then we have the following access pattern

- set up kmap to point at pte
- test_and_clear_bit(pte)
- kunmap

 From kvm's point of view this looks like

- several accesses to set up the kmap
  - if these accesses trigger flooding, we will have to tear down the 
shadow for this page, only to set it up again soon
- an access to the pte (emulted)
  - if this access _doesn't_ trigger flooding, we will have 512 unneeded 
emulations.  The pte is worthless anyway since the accessed bit is clear 
(so we can't set up a shadow pte for it)
    - this bug was fixed
- an access to tear down the kmap

[btw, am I reading this right? the entire list is scanned each time?

if you have 1G of active HIGHMEM, that's a quarter of a million pages, 
which would take at least a second no matter what we do.  VMware can 
probably special-case kmaps, but we can't]

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.