From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [PATCH] kvm swapping with mmu notifiers + age_page Date: Tue, 22 Jan 2008 15:41:49 +0100 Message-ID: <20080122144149.GD7331@v2.random> References: <20080121124124.GG6970@v2.random> <4795F8D0.30102@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org To: Avi Kivity Return-path: Content-Disposition: inline In-Reply-To: <4795F8D0.30102-atKUWr5tajBWk0Htik3J/w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org On Tue, Jan 22, 2008 at 04:08:16PM +0200, Avi Kivity wrote: > Andrea Arcangeli wrote: >> This is the same as before but it uses the age_page callback to >> prevent the guest OS working set to be swapped out. It works well here >> so far. This depends on the memslot locking with mmu lock patch and on >> the mmu notifiers #v3 patch that I'll post in CC with linux-mm shortly >> that implements the age_page callback and that changes follow_page to >> set the young bit in the pte instead of setting the referenced bit (so >> the age_page will be called again later when the VM clears the young >> bit). >> >> +static void unmap_spte(struct kvm *kvm, u64 *spte) >> +{ >> + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> >> PAGE_SHIFT); >> + get_page(page); >> + rmap_remove(kvm, spte); >> + set_shadow_pte(spte, shadow_trap_nonpresent_pte); >> + kvm_flush_remote_tlbs(kvm); >> + __free_page(page); >> +} >> > > Why is get_page()/__free_page() needed here? Isn't kvm_release_page_*() > sufficient? The other-cpus-tlb have to be flushed _before_ the page is visible in the host kernel freelist, otherwise other host-cpus with tlbs still mapping the page with write-access would be able to modify the page even after it's queued in the freelist. The mmu_notifier are called in places like munmap where the __free_page will not be a put_page but a real __free_page. Furthermore kvm_release_page_ aren't calling __free_page but put_page that would leak ram in those paths (mostly invalidate_range). I'd rather not depend on the mmu_notifiers always being invoked with an additional reference count on the page (in addition to the spte reference count). The ->invalidate_* methods might be the ones that put the page in the freelist. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/