From mboxrd@z Thu Jan 1 00:00:00 1970 From: Izik Eidus Subject: Re: [RFC] Expose infrastructure for unpinning guest memory Date: Thu, 11 Oct 2007 23:59:18 +0200 Message-ID: <470E9CB6.4030107@qumranet.com> References: <1192138344500-git-send-email-aliguori@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org, Avi Kivity To: Anthony Liguori Return-path: In-Reply-To: <1192138344500-git-send-email-aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org Anthony Liguori wrote: > Now that we have userspace memory allocation, I wanted to play with ballooning. > The idea is that when a guest "balloons" down, we simply unpin the underlying > physical memory and the host kernel may or may not swap it. To reclaim > ballooned memory, the guest can just start using it and we'll pin it on demand. > > The following patch is a stab at providing the right infrastructure for pinning > and automatic repinning. I don't have a lot of comfort in the MMU code so I > thought I'd get some feedback before going much further. > > gpa_to_hpa is a little awkward to hook, but it seems like the right place in the > code. I'm most uncertain about the SMP safety of the unpinning. Presumably, > I have to hold the kvm lock around the mmu_unshadow and page_cache release to > ensure that another VCPU doesn't fault the page back in after mmu_unshadow? > > Feedback would be greatly appreciated! > > diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h > index 4a52d6e..8abe770 100644 > --- a/drivers/kvm/kvm.h > +++ b/drivers/kvm/kvm.h > @@ -409,6 +409,7 @@ struct kvm_memory_slot { > unsigned long *rmap; > unsigned long *dirty_bitmap; > int user_alloc; /* user allocated memory */ > + unsigned long userspace_addr; > }; > > struct kvm { > @@ -652,6 +653,7 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); > void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); > int kvm_mmu_load(struct kvm_vcpu *vcpu); > void kvm_mmu_unload(struct kvm_vcpu *vcpu); > +int kvm_mmu_unpin(struct kvm *kvm, gfn_t gfn); > > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); > > diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c > index a0f8366..74105d1 100644 > --- a/drivers/kvm/kvm_main.c > +++ b/drivers/kvm/kvm_main.c > @@ -774,6 +774,7 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, > unsigned long pages_num; > > new.user_alloc = 1; > + new.userspace_addr = mem->userspace_addr; > down_read(¤t->mm->mmap_sem); > > pages_num = get_user_pages(current, current->mm, > @@ -1049,12 +1050,36 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn) > struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) > { > struct kvm_memory_slot *slot; > + struct page *page; > + uint64_t slot_index; > > gfn = unalias_gfn(kvm, gfn); > slot = __gfn_to_memslot(kvm, gfn); > if (!slot) > return NULL; > - return slot->phys_mem[gfn - slot->base_gfn]; > + > + slot_index = gfn - slot->base_gfn; > + page = slot->phys_mem[slot_index]; > + if (unlikely(page == NULL)) { > + unsigned long pages_num; > + > + down_read(¤t->mm->mmap_sem); > + > + pages_num = get_user_pages(current, current->mm, > + slot->userspace_addr + > + (slot_index << PAGE_SHIFT), > + 1, 1, 0, &slot->phys_mem[slot_index], > + NULL); > + > + up_read(¤t->mm->mmap_sem); > + > + if (pages_num != 1) > + page = NULL; > + else > + page = slot->phys_mem[slot_index]; > + } > + > + return page; > } > EXPORT_SYMBOL_GPL(gfn_to_page); > > diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c > index f52604a..1820816 100644 > --- a/drivers/kvm/mmu.c > +++ b/drivers/kvm/mmu.c > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -820,6 +821,33 @@ static void mmu_unshadow(struct kvm *kvm, gfn_t gfn) > } > } > > +int kvm_mmu_unpin(struct kvm *kvm, gfn_t gfn) > +{ > + struct kvm_memory_slot *slot; > + struct page *page; > + > + /* FIXME for each active vcpu */ > + > + gfn = unalias_gfn(kvm, gfn); > + slot = gfn_to_memslot(kvm, gfn); > + if (!gfn) > + return -EINVAL; > + > + /* FIXME: do we need to hold a lock here? */ > + > + /* Remove page from shadow MMU and unpin page */ > + mmu_unshadow(kvm, gfn); > + page = slot->phys_mem[gfn - slot->base_gfn]; > + if (page) { > + if (!PageReserved(page)) > + SetPageDirty(page); > + page_cache_release(page); > + slot->phys_mem[gfn - slot->base_gfn] = NULL; > + } > + > + return 0; > +} > + > static void page_header_update_slot(struct kvm *kvm, void *pte, gpa_t gpa) > { > int slot = memslot_id(kvm, gfn_to_memslot(kvm, gpa >> PAGE_SHIFT)); > > ------------------------------------------------------------------------- > kvm_memory_slot heh, i am working on similir patch, and our gfn_to_page and the change to kvm_memory_slot even by varible names :) few things you have to do to make this work: make gfn_to_page safe always function (return bad_page in case of failure, i have patch for this if you want) hacking the kvm_read_guest_page / kvm_write_guest_page kvm_clear_guest_page to do put_page after the usage of the page secoend, is hacking the rmap to do reverse mapping to every present pte and put_page() the pages at rmap_remove() and this about all, to make this work. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/