[PATCH v6 4/4] add 2nd stage page fault handling during live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: m.smarduch@samsung.com (Mario Smarduch)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v6 4/4] add 2nd stage page fault handling during live migration
Date: Tue, 27 May 2014 18:30:23 -0700	[thread overview]
Message-ID: <53853C2F.8080003@samsung.com> (raw)
In-Reply-To: <20140527201945.GD16428@lvm>

On 05/27/2014 01:19 PM, Christoffer Dall wrote:
> On Thu, May 15, 2014 at 11:27:31AM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and splits up existing huge pages.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index b939312..10e7bf6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1002,6 +1002,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  	struct vm_area_struct *vma;
>>  	pfn_t pfn;
>> +	bool migration_active;
>>  
>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  	if (fault_status == FSC_PERM && !write_fault) {
>> @@ -1053,12 +1054,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		return -EFAULT;
>>  
>>  	spin_lock(&kvm->mmu_lock);
>> +
>> +	/*
>> +	 * Place inside lock to prevent race condition when whole VM is being
>> +	 * write proteced. Prevent race of huge page install when migration is
>> +	 * active.
>> +	 */
>> +	migration_active = vcpu->kvm->arch.migration_in_progress;
>> +
>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>  		goto out_unlock;
>> -	if (!hugetlb && !force_pte)
>> +
>> +	/* When migrating don't spend cycles coalescing huge pages */
>> +	if (!hugetlb && !force_pte && !migration_active)
>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>> -	if (hugetlb) {
>> +	/* During migration don't install huge pages */
> 
> again, all this is not about migration per se, it's about when logging
> dirty pages, (which may be commonly used for migration).
> 

Yes that's true , I'll update but until recently (new RFC on qemu list) where
dirty logging is used for getting VM RSS or hot memory regions, I don't see any
other use case.

>> +	if (hugetlb && !migration_active) {
>>  		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
>>  		new_pmd = pmd_mkhuge(new_pmd);
>>  		if (writable) {
>> @@ -1069,6 +1081,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>>  	} else {
>>  		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
>> +
>> +		/*
>> +		 * If pmd is  mapping a huge page then split it up into
>> +		 * small pages, when doing live migration.
>> +		 */
>> +		if (migration_active) {
>> +			pmd_t *pmd;
>> +			if (hugetlb) {
>> +				pfn += pte_index(fault_ipa);
>> +				gfn = fault_ipa >> PAGE_SHIFT;
>> +			}
> 
> how can you have hugetlb when we entered this else-clause conditional on
> having !hugetlb?
> 
- if(hugetlb && !migration_active)

forces all page faults to enter here while in migration. Huge page entries
are cleared and stage2_set_pte() splits the huge page, and installs the pte
for the fault_ipa. I placed that there since it flows with installing 
a pte as well as splitting a huge page. But your comment on performance
split up huge page vs. deferred  page faulting should move it out of here. 


>> +			new_pte = pfn_pte(pfn, PAGE_S2);
>> +			pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +			if (pmd && kvm_pmd_huge(*pmd))
>> +				clear_pmd_entry(kvm, pmd, fault_ipa);
> 
> If we have a huge pmd entry, how did we take a fault on there?  Would
> that be if a different CPU inserted a huge page entry since we got here,
> is this what you're trying to handle?
> 
> I'm confused.
> 

I thing this related to the above.

>> +		}
>> +
>>  		if (writable) {
>>  			kvm_set_s2pte_writable(&new_pte);
>>  			kvm_set_pfn_dirty(pfn);
>> @@ -1077,6 +1106,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>  	}
>>  
>> +	/* Assuming 4k pages, set one bit/page in memslot dirty_bitmap[] */
> 
> Assuming? this makes me nervous.  The point is probably that it's
> harmless if we're not logging dirty pages, because then nobody reads teh
> data structure, and if we are logging, then we are mapping everything
> using 4K pages?
> 
> It's probably clearer code-wise to condition this on whether or not we
> are logging dirty page, and the branch is also likely to be much faster
> than the function call to mark_page_dirty.
> 

I'm not sure I get the point. The call is always safe, you either 
have old copy or new copy of memory slot with dirty_bitmap set or not set.
The log read is done while holding kvm slots_lock.

Is the comment related to performance, not supporting multiple page sizes,
or it's unsafe to call mark_page_dirty() under all circumstances, or 
something else? 


>> +	if (writable)
>> +		mark_page_dirty(kvm, gfn);
>>  
>>  out_unlock:
>>  	spin_unlock(&kvm->mmu_lock);
>> -- 
>> 1.7.9.5
>>
> 
> -Christoffer
>

WARNING: multiple messages have this Message-ID (diff)

From: Mario Smarduch <m.smarduch@samsung.com>
To: Christoffer Dall <christoffer.dall@linaro.org>
Cc: kvmarm@lists.cs.columbia.edu, marc.zyngier@arm.com,
	steve.capper@arm.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, gavin.guo@canonical.com,
	peter.maydell@linaro.org, jays.lee@samsung.com,
	sungjinn.chung@samsung.com
Subject: Re: [PATCH v6 4/4] add 2nd stage page fault handling during live migration
Date: Tue, 27 May 2014 18:30:23 -0700	[thread overview]
Message-ID: <53853C2F.8080003@samsung.com> (raw)
In-Reply-To: <20140527201945.GD16428@lvm>

On 05/27/2014 01:19 PM, Christoffer Dall wrote:
> On Thu, May 15, 2014 at 11:27:31AM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and splits up existing huge pages.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index b939312..10e7bf6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1002,6 +1002,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  	struct vm_area_struct *vma;
>>  	pfn_t pfn;
>> +	bool migration_active;
>>  
>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  	if (fault_status == FSC_PERM && !write_fault) {
>> @@ -1053,12 +1054,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		return -EFAULT;
>>  
>>  	spin_lock(&kvm->mmu_lock);
>> +
>> +	/*
>> +	 * Place inside lock to prevent race condition when whole VM is being
>> +	 * write proteced. Prevent race of huge page install when migration is
>> +	 * active.
>> +	 */
>> +	migration_active = vcpu->kvm->arch.migration_in_progress;
>> +
>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>  		goto out_unlock;
>> -	if (!hugetlb && !force_pte)
>> +
>> +	/* When migrating don't spend cycles coalescing huge pages */
>> +	if (!hugetlb && !force_pte && !migration_active)
>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>> -	if (hugetlb) {
>> +	/* During migration don't install huge pages */
> 
> again, all this is not about migration per se, it's about when logging
> dirty pages, (which may be commonly used for migration).
> 

Yes that's true , I'll update but until recently (new RFC on qemu list) where
dirty logging is used for getting VM RSS or hot memory regions, I don't see any
other use case.

>> +	if (hugetlb && !migration_active) {
>>  		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
>>  		new_pmd = pmd_mkhuge(new_pmd);
>>  		if (writable) {
>> @@ -1069,6 +1081,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>>  	} else {
>>  		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
>> +
>> +		/*
>> +		 * If pmd is  mapping a huge page then split it up into
>> +		 * small pages, when doing live migration.
>> +		 */
>> +		if (migration_active) {
>> +			pmd_t *pmd;
>> +			if (hugetlb) {
>> +				pfn += pte_index(fault_ipa);
>> +				gfn = fault_ipa >> PAGE_SHIFT;
>> +			}
> 
> how can you have hugetlb when we entered this else-clause conditional on
> having !hugetlb?
> 
- if(hugetlb && !migration_active)

forces all page faults to enter here while in migration. Huge page entries
are cleared and stage2_set_pte() splits the huge page, and installs the pte
for the fault_ipa. I placed that there since it flows with installing 
a pte as well as splitting a huge page. But your comment on performance
split up huge page vs. deferred  page faulting should move it out of here. 


>> +			new_pte = pfn_pte(pfn, PAGE_S2);
>> +			pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +			if (pmd && kvm_pmd_huge(*pmd))
>> +				clear_pmd_entry(kvm, pmd, fault_ipa);
> 
> If we have a huge pmd entry, how did we take a fault on there?  Would
> that be if a different CPU inserted a huge page entry since we got here,
> is this what you're trying to handle?
> 
> I'm confused.
> 

I thing this related to the above.

>> +		}
>> +
>>  		if (writable) {
>>  			kvm_set_s2pte_writable(&new_pte);
>>  			kvm_set_pfn_dirty(pfn);
>> @@ -1077,6 +1106,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>  	}
>>  
>> +	/* Assuming 4k pages, set one bit/page in memslot dirty_bitmap[] */
> 
> Assuming? this makes me nervous.  The point is probably that it's
> harmless if we're not logging dirty pages, because then nobody reads teh
> data structure, and if we are logging, then we are mapping everything
> using 4K pages?
> 
> It's probably clearer code-wise to condition this on whether or not we
> are logging dirty page, and the branch is also likely to be much faster
> than the function call to mark_page_dirty.
> 

I'm not sure I get the point. The call is always safe, you either 
have old copy or new copy of memory slot with dirty_bitmap set or not set.
The log read is done while holding kvm slots_lock.

Is the comment related to performance, not supporting multiple page sizes,
or it's unsafe to call mark_page_dirty() under all circumstances, or 
something else? 


>> +	if (writable)
>> +		mark_page_dirty(kvm, gfn);
>>  
>>  out_unlock:
>>  	spin_unlock(&kvm->mmu_lock);
>> -- 
>> 1.7.9.5
>>
> 
> -Christoffer
>

next prev parent reply	other threads:[~2014-05-28  1:30 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-15 18:27 [PATCH v6 0/4] live migration dirty bitmap support for ARMv7 Mario Smarduch
2014-05-15 18:27 ` Mario Smarduch
2014-05-15 18:27 ` [PATCH v6 1/4] add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
2014-05-15 18:27   ` Mario Smarduch
2014-05-27 19:51   ` Christoffer Dall
2014-05-27 19:51     ` Christoffer Dall
2014-05-15 18:27 ` [PATCH v6 2/4] live migration support for initial write protect of VM Mario Smarduch
2014-05-15 18:27   ` Mario Smarduch
2014-05-27 19:58   ` Christoffer Dall
2014-05-27 19:58     ` Christoffer Dall
2014-05-27 20:15     ` Mario Smarduch
2014-05-27 20:15       ` Mario Smarduch
2014-05-27 20:20       ` Christoffer Dall
2014-05-27 20:20         ` Christoffer Dall
2014-05-15 18:27 ` [PATCH v6 3/4] live migration support for VM dirty log management Mario Smarduch
2014-05-15 18:27   ` Mario Smarduch
2014-05-27 20:12   ` Christoffer Dall
2014-05-27 20:12     ` Christoffer Dall
2014-05-27 21:55     ` Mario Smarduch
2014-05-27 21:55       ` Mario Smarduch
2014-05-28  9:08       ` Christoffer Dall
2014-05-28  9:08         ` Christoffer Dall
2014-05-28 17:59         ` Mario Smarduch
2014-05-28 17:59           ` Mario Smarduch
2014-05-15 18:27 ` [PATCH v6 4/4] add 2nd stage page fault handling during live migration Mario Smarduch
2014-05-15 18:27   ` Mario Smarduch
2014-05-27 20:19   ` Christoffer Dall
2014-05-27 20:19     ` Christoffer Dall
2014-05-28  1:30     ` Mario Smarduch [this message]
2014-05-28  1:30       ` Mario Smarduch
2014-05-28  8:09       ` Christoffer Dall
2014-05-28  8:09         ` Christoffer Dall
2014-05-28 17:55         ` Mario Smarduch
2014-05-28 17:55           ` Mario Smarduch
2014-05-28 18:42           ` Mario Smarduch
2014-05-28 18:42             ` Mario Smarduch
2014-05-29  2:02             ` Mario Smarduch
2014-05-29  2:02               ` Mario Smarduch
2014-05-29  8:42             ` Christoffer Dall
2014-05-29  8:42               ` Christoffer Dall
2014-05-29  8:51           ` Christoffer Dall
2014-05-29  8:51             ` Christoffer Dall
2014-05-29 17:08             ` Mario Smarduch
2014-05-29 17:08               ` Mario Smarduch
2014-05-29 17:57               ` Christoffer Dall
2014-05-29 17:57                 ` Christoffer Dall
2014-05-29 19:10                 ` Mario Smarduch
2014-05-29 19:10                   ` Mario Smarduch
2014-05-15 18:51 ` [PATCH v6 0/4] live migration dirty bitmap support for ARMv7 Christoffer Dall
2014-05-15 18:51   ` Christoffer Dall
2014-05-15 22:53   ` Mario Smarduch
2014-05-15 22:53     ` Mario Smarduch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53853C2F.8080003@samsung.com \
    --to=m.smarduch@samsung.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.