From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: RFC: shadow page table reclaim Date: Mon, 31 Aug 2009 12:55:24 +0300 Message-ID: <4A9B9E0C.2080701@redhat.com> References: <200908280431.04960.max@laiers.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Max Laier Return-path: Received: from mx1.redhat.com ([209.132.183.28]:13742 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750715AbZHaJz1 (ORCPT ); Mon, 31 Aug 2009 05:55:27 -0400 In-Reply-To: <200908280431.04960.max@laiers.net> Sender: kvm-owner@vger.kernel.org List-ID: On 08/28/2009 05:31 AM, Max Laier wrote: > Hello, > > it seems to me that the reclaim mechanism for shadow page table pages is sub- > optimal. The arch.active_mmu_pages list that is used for reclaiming does not > move up parent shadow page tables when a child is added so when we need a new > shadow page we zap the oldest - which can well be a directory level page > holding a just added table level page. > > Attached is a proof-of-concept diff and two plots before and after. The plots > show referenced guest pages over time. What do you mean by referenced guest pages? Total number of populated sptes? > As you can see there is less saw- > toothing in the after plot and also fewer changes overall (because we don't > zap mappings that are still in use as often). This is with a limit of 64 for > the shadow page table to increase the effect and vmx/ept. > > I realize that the list_move and parent walk are quite expensive and that > kvm_mmu_alloc_page is only half the story. It should really be done every > time a new guest page table is mapped - maybe via rmap_add. This would > obviously completely kill performance-wise, though. > > Another idea would be to improve the reclaim logic in a way that it prefers > "old" PT_PAGE_TABLE_LEVEL over directories. Though I'm not sure how to code > that up sensibly, either. > > As I said, this is proof-of-concept and RFC. So any comments welcome. For my > use case the proof-of-concept diff seems to do well enough, though. > Given that reclaim is fairly rare, we should try to move the cost there. So how about this: - add an 'accessed' flag to struct kvm_mmu_page - when reclaiming, try to evict pages that were not recently accessed (but don't overscan - if you scan many recently accessed pages, evict some of them anyway) - when scanning, update the accessed flag with the accessed bit of all parent_ptes - when dropping an spte, update the accessed flag of the kvm_mmu_page it points to - when reloading cr3, mark the page as accessed (since it has no parent_ptes) This should introduce some LRU-ness that depends not only on fault behaviour but also on long-term guest access behaviour (which is important for long-running processes and kernel pages). -- error compiling committee.c: too many arguments to function