From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: Demand paging for VM on KVM Date: Thu, 20 Mar 2014 14:18:50 +0100 Message-ID: <532AEABA.2070000@redhat.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Eric Northup , Andrea Arcangeli To: Grigory Makarevich , kvm@vger.kernel.org, gleb@redhat.com Return-path: Received: from mx1.redhat.com ([209.132.183.28]:59982 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756225AbaCTNS5 (ORCPT ); Thu, 20 Mar 2014 09:18:57 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Il 20/03/2014 00:27, Grigory Makarevich ha scritto: > Hi All, > > I have been exploring different ways to implement on-demand paging for > VMs running in KVM. > > The core of the idea is to introduce an additional exit > KVM_EXIT_MEMORY_NOT_PRESENT to inform VMM's user space to process > access to "not yet present" guest's page. > Each memory slot may be instructed to keep track of ondemand bit per > page. If the page is marked as "ondemand", page fault will generate > exit to the host's > user-space with the information about the faulting page. Once the page > is filled, VMM instructs the KVM to clear "ondemand" bit for the page. > > I have working prototype and would like to consider upstreaming > corresponding KVM changes. > > To start up the discussion before sending the actual patch-set, I'd like > to send the patch for the kvm's api.txt. Please, let me know what you > think. Hi, Andrea Arcangeli is considering a similar infrastructure at the generic mm level. Last time I discussed it with him, his idea was roughly to have: * a "userfaultfd" syscall that would take a memory range and return a file descriptor; the file descriptor becomes readable when the first access happens on a page in the region, and the read gives the address of the access. Any thread that accesses a still-unmapped region remains blocked until the address of the faulting page is written back to the userfaultfd, or gets a SIGBUS if the userfaultfd is closed. * a remap_anon_pages syscall that would be used in the userfaultfd I/O handler to make the page accessible. The handler would build the page in a "shadow" area with the actual contents of guest memory, and then remap the shadow area onto the actual guest memory. Andrea, please correct me. QEMU would use this infrastructure for post-copy migration and possibly also for live snapshotting of the guests. The advantage in making this generic rather than KVM-based is that QEMU could use it also in system-emulation mode (and of course anything else needing a read barrier could use it too). Paolo