From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: mmu notifier spte pinning Date: Tue, 22 Jul 2008 17:36:27 +0200 Message-ID: <20080722153627.GU13826@duo.random> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org To: Sukanto Ghosh Return-path: Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:40408 "EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750869AbYGVPg2 (ORCPT ); Tue, 22 Jul 2008 11:36:28 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Sun, Jul 20, 2008 at 09:43:53AM +0530, Sukanto Ghosh wrote: > Hi all, > > I was going through the slides "Integrating KVM with the linux memory > management" on MMU notifiers. > > can anyone tell me ... if spte is invisible to linux vm, then why does > page->_mapcount increments when a spte mapping is created ? is it done > explicitly by the kvm code (if so, why) ? It's not the mapcount but the page_count (= page->_count) that is incremented. The mapcount tells how many userland virtual addresses are mapping the page (i.e. number of mm/rmap.c reversible mappings). The page_count is guaranteed >= mapcount, and it tells how many users are using the page. So the page_count that has to account for shadow pagetables too. The moment the page is mapped by a spte the page_count is increased by kvm spte establishment code (i.e. gfn_to_pfn). The page_count increase is done inside get_user_pages called by kvm gfn_to_pfn function, and so that's pretty much explicitly done by kvm to prevent the page to be swapped out while it's mapped by sptes. And it's also invisible to the linux vm. All the Linux VM sees eventually, is a swapcache page that has no mappings (mapcount == 0) but still has a page_count > 1, and in turn can't be swapped out (because it has more users than just the swapcache). > do references to any page from a kernel data-structure (spte in this > case) don't involve access though page table entries (hence > try_to_unmap() can't unmap those references) ? which makes any page > referenced through kernel structures unswappable ? Yes, the rmap layer has no way to know how to unshadow those pages with pinned page count. mmu notifiers whole objective is to keep the sptes (i.e. secondary mmu) in synchrony with the pte mappings (i.e. primary mmu). So when Linux decides to unmap a pte, all the sptes associated with that pte are tear down at the same time. In turn the page pinning isn't required anymore and all pages are swapped out like if they were never pointed by any spte.