From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marc Zyngier Subject: Re: [PATCH 4/5] add 2nd stage page fault handling during live migration Date: Thu, 17 Apr 2014 09:17:53 +0100 Message-ID: <8761m8xtce.fsf@approximate.cambridge.arm.com> References: <534F2FAF.5090009@samsung.com> Mime-Version: 1.0 Content-Type: text/plain Cc: "kvmarm\@lists.cs.columbia.edu" , "christoffer.dall\@linaro.org" , "kvm\@vger.kernel.org" , =?utf-8?B?7J207KCV7ISd?= , =?utf-8?B?7KCV7ISx7KeE?= To: Mario Smarduch Return-path: Received: from fw-tnat.austin.arm.com ([217.140.110.23]:11409 "EHLO collaborate-mta1.arm.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755300AbaDQISM (ORCPT ); Thu, 17 Apr 2014 04:18:12 -0400 In-Reply-To: <534F2FAF.5090009@samsung.com> (Mario Smarduch's message of "Thu, 17 Apr 2014 02:34:39 +0100") Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Apr 17 2014 at 2:34:39 am BST, Mario Smarduch wrote: > Additional logic to handle second stage page faults during migration. Primarily > page faults are prevented from creating huge pages. > > > Signed-off-by: Mario Smarduch > --- > arch/arm/kvm/mmu.c | 33 +++++++++++++++++++++++++++++++-- > 1 file changed, 31 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 47bec1c..ebec33c 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -839,6 +839,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; > struct vm_area_struct *vma; > pfn_t pfn; > + bool migration_active; > > write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); > if (fault_status == FSC_PERM && !write_fault) { > @@ -890,12 +891,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > return -EFAULT; > > spin_lock(&kvm->mmu_lock); > + /* place inside lock to prevent race condition when whole VM is being > + * write proteced initially, prevent pmd update if it's split up. > + */ > + migration_active = vcpu->kvm->arch.migration_in_progress; > + > if (mmu_notifier_retry(kvm, mmu_seq)) > goto out_unlock; > - if (!hugetlb && !force_pte) > + > + /* During migration don't rebuild huge pages */ > + if (!hugetlb && !force_pte && !migration_active) > hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); > > - if (hugetlb) { > + /* Steer away from installing PMDs if migrating, migration failed, > + * or this an initial page fault. Migrating huge pages is too slow. > + */ > + if (!migration_active && hugetlb) { > pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2); > new_pmd = pmd_mkhuge(new_pmd); > if (writable) { > @@ -907,6 +918,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > } else { > pte_t new_pte = pfn_pte(pfn, PAGE_S2); > if (writable) { > + /* First convert huge page pfn to normal 4k page pfn, > + * while migration is in progress. > + * Second in migration mode and rare case where > + * splitting of huge pages fails check if pmd is > + * mapping a huge page if it is then clear it so > + * stage2_set_pte() can map in a small page. > + */ > + if (migration_active && hugetlb) { > + pmd_t *pmd; > + pfn += (fault_ipa >> PAGE_SHIFT) & > + (PTRS_PER_PMD-1); Shouldn't that be "pfn += pte_index(fault_addr);"? > + new_pte = pfn_pte(pfn, PAGE_S2); > + pmd = stage2_get_pmd(kvm, NULL, fault_ipa); > + if (pmd && kvm_pmd_huge(*pmd)) > + clear_pmd_entry(kvm, pmd, fault_ipa); > + } > kvm_set_s2pte_writable(&new_pte); > kvm_set_pfn_dirty(pfn); > } > @@ -914,6 +941,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false); > } > > + if (writable) Shouldn't that be done only when migration is active? > + mark_page_dirty(kvm, gfn); > > out_unlock: > spin_unlock(&kvm->mmu_lock); -- Jazz is not dead. It just smells funny.