All of lore.kernel.org
 help / color / mirror / Atom feed
From: m.smarduch@samsung.com (Mario Smarduch)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH RESEND v15 07/10] KVM: arm: page logging 2nd stage fault handling
Date: Tue, 13 Jan 2015 09:42:34 -0800	[thread overview]
Message-ID: <54B5590A.5060602@samsung.com> (raw)
In-Reply-To: <20150112194318.GA26222@cbox>

On 01/12/2015 11:43 AM, Christoffer Dall wrote:
> On Mon, Jan 12, 2015 at 11:04:45AM -0800, Mario Smarduch wrote:
> 
> [...]
> 
>>>>>> @@ -1059,12 +1104,35 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>  	if (is_error_pfn(pfn))
>>>>>>  		return -EFAULT;
>>>>>>  
>>>>>> -	if (kvm_is_device_pfn(pfn))
>>>>>> +	if (kvm_is_device_pfn(pfn)) {
>>>>>>  		mem_type = PAGE_S2_DEVICE;
>>>>>> +		set_pte_flags = KVM_S2PTE_FLAG_IS_IOMAP;
>>>>>> +	}
>>>>>>  
>>>>>>  	spin_lock(&kvm->mmu_lock);
>>>>>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>>>>>  		goto out_unlock;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * When logging is enabled general page fault handling changes:
>>>>>> +	 * -  Writable huge pages are dissolved on a read or write fault.
>>>>>
>>>>> why dissolve huge pages on a read fault?
>>>>
>>>> What I noticed on write you would dissolve, on read you
>>>> rebuild THPs, flip back and forth like that, performance
>>>> & convergence was really bad.
>>>
>>> ah, that makes sense, we should probably indicate that reasoning
>>> somehow.  In fact, what threw me off was the use of the word "dissolve
>>> huge pages" which is not really what you're doing on a read fault, there
>>> you are just never adjusting to huge pages.
>>>
>>> I'm wondering why that would slow things down much though, the only cost
>>> would be the extra tlb invalidation and replacing the PMD on a
>>> subsequent write fault, but I trust your numbers nevertheless.
>>
>> If I understand correctly -
>> you do few writes, dissolve a huge page insert pte TLB entries,
>> then a read page fault installs a pmd clears the TLB cache
>> for that range, and it repeats over. Appears like you
>> need to constantly re-fault pte TLBs on writes to huge
>> page range.
> 
> that makes good sense, thanks for the explanation.
> 
> [...]
> 
>>>>>  	} else {
>>>>> +		unsigned long flags = 0;
>>>>>  		pte_t new_pte = pfn_pte(pfn, mem_type);
>>>>> +
>>>>>  		if (writable) {
>>>>>  			kvm_set_s2pte_writable(&new_pte);
>>>>>  			kvm_set_pfn_dirty(pfn);
>>>>>  		}
>>>>>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
>>>>>  					  fault_ipa_uncached);
>>>>> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
>>>>> -			pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
>>>>> +
>>>>> +		if (pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE))
>>>>> +			flags |= KVM_S2PTE_FLAG_IS_IOMAP;
>>>>> +
>>>>> +		if (memslot_is_logging(memslot))
>>>>> +			flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
>>>> Now that it either IOMAP or LOGGING_ACTIVE do we need to acumulate flags?
>>>> Although we don't know if device mappings will be handled here.
>>>>
>>>
>>> so forget all I said about this in the past, I confused the code
>>> checking for !cache with the iomap flag.
>>>
>>> So, I think you can always safeful assume that stage2_get_pmd() gives you
>>> something valid back when you have the LOGGING flag set, because you
>>> always call the function with a valid cache when the LOGGING flag is
>>> set.  It could be worth adding the following to stage2_set_pte():
>>>
>>> VM_BUG_ON((flags & KVM_S2_FLAG_LOGGING_ACTIVE) && !cache)
>>
>> I see ok, thanks for clearing that up.
>>
>>>
>>> As for this code, the IOMAP flag's only effect is that we return -EFAULT
>>> if we are seeing an existing PTE for the faulting address.  This would
>>> no longer be valid if we allow logging dirty device memory pages, so we
>> Sorry, do you mean allow or disallow?
> 
> if we (by these patches) allow logging dirty pages for device memory,
> then we...
> 
>>
>>> really need to think about if there's any conceivable use case for this?

No I can't think of any use case to log Device address space.

So I could move forward - drop the IOMAP flag, and add the
VM_BUG_ON to stage2_set_pte().

Thanks.

>>>
>>> It doesn't really make sense to me, so I would suggest that we never
>>> enable logging for pages that return kvm_is_device_pfn().
>>>
>>> Thanks,
>>> -Christoffer
>>>
>>

WARNING: multiple messages have this Message-ID (diff)
From: Mario Smarduch <m.smarduch@samsung.com>
To: Christoffer Dall <christoffer.dall@linaro.org>
Cc: marc.zyngier@arm.com, pbonzini@redhat.com,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH RESEND v15 07/10] KVM: arm: page logging 2nd stage fault handling
Date: Tue, 13 Jan 2015 09:42:34 -0800	[thread overview]
Message-ID: <54B5590A.5060602@samsung.com> (raw)
In-Reply-To: <20150112194318.GA26222@cbox>

On 01/12/2015 11:43 AM, Christoffer Dall wrote:
> On Mon, Jan 12, 2015 at 11:04:45AM -0800, Mario Smarduch wrote:
> 
> [...]
> 
>>>>>> @@ -1059,12 +1104,35 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>  	if (is_error_pfn(pfn))
>>>>>>  		return -EFAULT;
>>>>>>  
>>>>>> -	if (kvm_is_device_pfn(pfn))
>>>>>> +	if (kvm_is_device_pfn(pfn)) {
>>>>>>  		mem_type = PAGE_S2_DEVICE;
>>>>>> +		set_pte_flags = KVM_S2PTE_FLAG_IS_IOMAP;
>>>>>> +	}
>>>>>>  
>>>>>>  	spin_lock(&kvm->mmu_lock);
>>>>>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>>>>>  		goto out_unlock;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * When logging is enabled general page fault handling changes:
>>>>>> +	 * -  Writable huge pages are dissolved on a read or write fault.
>>>>>
>>>>> why dissolve huge pages on a read fault?
>>>>
>>>> What I noticed on write you would dissolve, on read you
>>>> rebuild THPs, flip back and forth like that, performance
>>>> & convergence was really bad.
>>>
>>> ah, that makes sense, we should probably indicate that reasoning
>>> somehow.  In fact, what threw me off was the use of the word "dissolve
>>> huge pages" which is not really what you're doing on a read fault, there
>>> you are just never adjusting to huge pages.
>>>
>>> I'm wondering why that would slow things down much though, the only cost
>>> would be the extra tlb invalidation and replacing the PMD on a
>>> subsequent write fault, but I trust your numbers nevertheless.
>>
>> If I understand correctly -
>> you do few writes, dissolve a huge page insert pte TLB entries,
>> then a read page fault installs a pmd clears the TLB cache
>> for that range, and it repeats over. Appears like you
>> need to constantly re-fault pte TLBs on writes to huge
>> page range.
> 
> that makes good sense, thanks for the explanation.
> 
> [...]
> 
>>>>>  	} else {
>>>>> +		unsigned long flags = 0;
>>>>>  		pte_t new_pte = pfn_pte(pfn, mem_type);
>>>>> +
>>>>>  		if (writable) {
>>>>>  			kvm_set_s2pte_writable(&new_pte);
>>>>>  			kvm_set_pfn_dirty(pfn);
>>>>>  		}
>>>>>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
>>>>>  					  fault_ipa_uncached);
>>>>> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
>>>>> -			pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
>>>>> +
>>>>> +		if (pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE))
>>>>> +			flags |= KVM_S2PTE_FLAG_IS_IOMAP;
>>>>> +
>>>>> +		if (memslot_is_logging(memslot))
>>>>> +			flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
>>>> Now that it either IOMAP or LOGGING_ACTIVE do we need to acumulate flags?
>>>> Although we don't know if device mappings will be handled here.
>>>>
>>>
>>> so forget all I said about this in the past, I confused the code
>>> checking for !cache with the iomap flag.
>>>
>>> So, I think you can always safeful assume that stage2_get_pmd() gives you
>>> something valid back when you have the LOGGING flag set, because you
>>> always call the function with a valid cache when the LOGGING flag is
>>> set.  It could be worth adding the following to stage2_set_pte():
>>>
>>> VM_BUG_ON((flags & KVM_S2_FLAG_LOGGING_ACTIVE) && !cache)
>>
>> I see ok, thanks for clearing that up.
>>
>>>
>>> As for this code, the IOMAP flag's only effect is that we return -EFAULT
>>> if we are seeing an existing PTE for the faulting address.  This would
>>> no longer be valid if we allow logging dirty device memory pages, so we
>> Sorry, do you mean allow or disallow?
> 
> if we (by these patches) allow logging dirty pages for device memory,
> then we...
> 
>>
>>> really need to think about if there's any conceivable use case for this?

No I can't think of any use case to log Device address space.

So I could move forward - drop the IOMAP flag, and add the
VM_BUG_ON to stage2_set_pte().

Thanks.

>>>
>>> It doesn't really make sense to me, so I would suggest that we never
>>> enable logging for pages that return kvm_is_device_pfn().
>>>
>>> Thanks,
>>> -Christoffer
>>>
>>


  reply	other threads:[~2015-01-13 17:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-10  4:17 [PATCH RESEND v15 07/10] KVM: arm: page logging 2nd stage fault handling Mario Smarduch
2015-01-10  4:17 ` Mario Smarduch
2015-01-11 14:00 ` Christoffer Dall
2015-01-11 14:00   ` Christoffer Dall
2015-01-12 16:27   ` Mario Smarduch
2015-01-12 16:27     ` Mario Smarduch
2015-01-12 17:49     ` Christoffer Dall
2015-01-12 17:49       ` Christoffer Dall
2015-01-12 19:04       ` Mario Smarduch
2015-01-12 19:04         ` Mario Smarduch
2015-01-12 19:43         ` Christoffer Dall
2015-01-12 19:43           ` Christoffer Dall
2015-01-13 17:42           ` Mario Smarduch [this message]
2015-01-13 17:42             ` Mario Smarduch
2015-01-14 10:33             ` Christoffer Dall
2015-01-14 10:33               ` Christoffer Dall
2015-01-13 23:14       ` Mario Smarduch
2015-01-13 23:14         ` Mario Smarduch
2015-01-14 10:32         ` Christoffer Dall
2015-01-14 10:32           ` Christoffer Dall
2015-01-14 23:10           ` Mario Smarduch
2015-01-14 23:10             ` Mario Smarduch
2015-01-15 10:20             ` Christoffer Dall
2015-01-15 10:20               ` Christoffer Dall
  -- strict thread matches above, loose matches on Subject: below --
2015-01-15  2:51 Mario Smarduch
2015-01-15  2:51 ` Mario Smarduch
2015-01-15 10:55 ` Christoffer Dall
2015-01-15 10:55   ` Christoffer Dall
2015-01-09  1:42 Mario Smarduch
2015-01-09  1:42 ` Mario Smarduch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B5590A.5060602@samsung.com \
    --to=m.smarduch@samsung.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.