From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: RFC: shadow page table reclaim Date: Mon, 31 Aug 2009 15:40:29 +0300 Message-ID: <4A9BC4BD.2010308@redhat.com> References: <200908280431.04960.max@laiers.net> <4A9B9E0C.2080701@redhat.com> <200908311409.09346.max@laiers.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Max Laier Return-path: Received: from mx1.redhat.com ([209.132.183.28]:57667 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751386AbZHaMkc (ORCPT ); Mon, 31 Aug 2009 08:40:32 -0400 In-Reply-To: <200908311409.09346.max@laiers.net> Sender: kvm-owner@vger.kernel.org List-ID: On 08/31/2009 03:09 PM, Max Laier wrote: > >>> As you can see there is less saw- >>> toothing in the after plot and also fewer changes overall (because we >>> don't zap mappings that are still in use as often). This is with a limit >>> of 64 for the shadow page table to increase the effect and vmx/ept. >>> >>> I realize that the list_move and parent walk are quite expensive and that >>> kvm_mmu_alloc_page is only half the story. It should really be done >>> every time a new guest page table is mapped - maybe via rmap_add. This >>> would obviously completely kill performance-wise, though. >>> >>> Another idea would be to improve the reclaim logic in a way that it >>> prefers "old" PT_PAGE_TABLE_LEVEL over directories. Though I'm not sure >>> how to code that up sensibly, either. >>> >>> As I said, this is proof-of-concept and RFC. So any comments welcome. >>> For my use case the proof-of-concept diff seems to do well enough, >>> though. >>> >> Given that reclaim is fairly rare, we should try to move the cost >> there. So how about this: >> >> - add an 'accessed' flag to struct kvm_mmu_page >> - when reclaiming, try to evict pages that were not recently accessed >> (but don't overscan - if you scan many recently accessed pages, evict >> some of them anyway) >> > - prefer page table level pages over directory level pages in the face of > overscan. > I'm hoping that overscan will only occur when we start to feel memory pressure, and that once we do a full scan we'll get accurate recency information. >> - when scanning, update the accessed flag with the accessed bit of all >> parent_ptes >> > I might be misunderstanding, but I think it should be the other way 'round. > i.e. a page is accessed if any of it's children have been accessed. > They're both true, but looking at the parents is much more efficient. Note we need to look at the accessed bit of the parent_ptes, not parent kvm_mmu_pages. >> - when dropping an spte, update the accessed flag of the kvm_mmu_page it >> points to >> - when reloading cr3, mark the page as accessed (since it has no >> parent_ptes) >> >> This should introduce some LRU-ness that depends not only on fault >> behaviour but also on long-term guest access behaviour (which is >> important for long-running processes and kernel pages). >> > I'll try to come up with a patch for this, later tonight. Unless you already > have something in the making. Thanks. > Please do, it's an area that need attention. -- error compiling committee.c: too many arguments to function