From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: RFC: shadow page table reclaim
Date: Mon, 31 Aug 2009 12:55:24 +0300
Message-ID: <4A9B9E0C.2080701@redhat.com>
References: <200908280431.04960.max@laiers.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Max Laier <max@laiers.net>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:13742 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750715AbZHaJz1 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 31 Aug 2009 05:55:27 -0400
In-Reply-To: <200908280431.04960.max@laiers.net>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 08/28/2009 05:31 AM, Max Laier wrote:
> Hello,
>
> it seems to me that the reclaim mechanism for shadow page table pages is sub-
> optimal.  The arch.active_mmu_pages list that is used for reclaiming does not
> move up parent shadow page tables when a child is added so when we need a new
> shadow page we zap the oldest - which can well be a directory level page
> holding a just added table level page.
>
> Attached is a proof-of-concept diff and two plots before and after.  The plots
> show referenced guest pages over time.

What do you mean by referenced guest pages?  Total number of populated 
sptes?

> As you can see there is less saw-
> toothing in the after plot and also fewer changes overall (because we don't
> zap mappings that are still in use as often).  This is with a limit of 64 for
> the shadow page table to increase the effect and vmx/ept.
>
> I realize that the list_move and parent walk are quite expensive and that
> kvm_mmu_alloc_page is only half the story.  It should really be done every
> time a new guest page table is mapped - maybe via rmap_add.  This would
> obviously completely kill performance-wise, though.
>
> Another idea would be to improve the reclaim logic in a way that it prefers
> "old" PT_PAGE_TABLE_LEVEL over directories.  Though I'm not sure how to code
> that up sensibly, either.
>
> As I said, this is proof-of-concept and RFC.  So any comments welcome.  For my
> use case the proof-of-concept diff seems to do well enough, though.
>    

Given that reclaim is fairly rare, we should try to move the cost 
there.  So how about this:

- add an 'accessed' flag to struct kvm_mmu_page
- when reclaiming, try to evict pages that were not recently accessed 
(but don't overscan - if you scan many recently accessed pages, evict 
some of them anyway)
- when scanning, update the accessed flag with the accessed bit of all 
parent_ptes
- when dropping an spte, update the accessed flag of the kvm_mmu_page it 
points to
- when reloading cr3, mark the page as accessed (since it has no 
parent_ptes)

This should introduce some LRU-ness that depends not only on fault 
behaviour but also on long-term guest access behaviour (which is 
important for long-running processes and kernel pages).

-- 
error compiling committee.c: too many arguments to function