From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrea Arcangeli <andrea@qumranet.com>
Subject: Re: mmu notifier spte pinning
Date: Tue, 22 Jul 2008 17:36:27 +0200
Message-ID: <20080722153627.GU13826@duo.random>
References: <a85e78f50807192113s2e0d5329s976f089ce0705ae4@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: kvm@vger.kernel.org
To: Sukanto Ghosh <sukanto.cse.iitb@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:40408
	"EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750869AbYGVPg2 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 22 Jul 2008 11:36:28 -0400
Content-Disposition: inline
In-Reply-To: <a85e78f50807192113s2e0d5329s976f089ce0705ae4@mail.gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Sun, Jul 20, 2008 at 09:43:53AM +0530, Sukanto Ghosh wrote:
> Hi all,
> 
> I was going through the slides "Integrating KVM with the linux memory
> management" on MMU notifiers.
> 
> can anyone tell me ... if spte is invisible to linux vm, then why does
> page->_mapcount increments when a spte mapping is created ? is it done
> explicitly by the kvm code (if so, why) ?

It's not the mapcount but the page_count (= page->_count) that is
incremented.

The mapcount tells how many userland virtual addresses are mapping the
page (i.e. number of mm/rmap.c reversible mappings).

The page_count is guaranteed >= mapcount, and it tells how many users
are using the page. So the page_count that has to account for shadow
pagetables too. The moment the page is mapped by a spte the page_count
is increased by kvm spte establishment code (i.e. gfn_to_pfn).

The page_count increase is done inside get_user_pages called by kvm
gfn_to_pfn function, and so that's pretty much explicitly done by kvm
to prevent the page to be swapped out while it's mapped by sptes. And
it's also invisible to the linux vm.

All the Linux VM sees eventually, is a swapcache page that has no
mappings (mapcount == 0) but still has a page_count > 1, and in turn
can't be swapped out (because it has more users than just the
swapcache).

> do references to any page from a kernel data-structure (spte in this
> case) don't involve access though page table entries (hence
> try_to_unmap() can't unmap those references) ? which makes any page
> referenced through kernel structures unswappable ?

Yes, the rmap layer has no way to know how to unshadow those pages
with pinned page count. mmu notifiers whole objective is to keep the
sptes (i.e. secondary mmu) in synchrony with the pte mappings
(i.e. primary mmu). So when Linux decides to unmap a pte, all the
sptes associated with that pte are tear down at the same time. In turn
the page pinning isn't required anymore and all pages are swapped out
like if they were never pointed by any spte.