Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault
Date: Fri, 27 Apr 2012 13:53:09 +0800	[thread overview]
Message-ID: <4F9A3445.2060305@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120426234535.GA5057@amt.cnet>

On 04/27/2012 07:45 AM, Marcelo Tosatti wrote:


>> +static bool
>> +fast_pf_fix_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>> +		  u64 *sptep, u64 spte)
>> +{
>> +	gfn_t gfn;
>> +
>> +	spin_lock(&vcpu->kvm->mmu_lock);
>> +
>> +	/* The spte has been changed. */
>> +	if (*sptep != spte)
>> +		goto exit;
>> +
>> +	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>> +
>> +	*sptep = spte | PT_WRITABLE_MASK;
>> +	mark_page_dirty(vcpu->kvm, gfn);
>> +
>> +exit:
>> +	spin_unlock(&vcpu->kvm->mmu_lock);
> 
> There was a misunderstanding. 


Sorry, Marcelo! It is still not clear for me now. :(

> The suggestion is to change codepaths that
> today assume that a side effect of holding mmu_lock is that sptes are
> not updated or read before attempting to introduce any lockless spte
> updates.
> 
> example)
> 
>         u64 mask, old_spte = *sptep;
> 
>         if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask)
>                 __update_clear_spte_fast(sptep, new_spte);
>         else
>                 old_spte = __update_clear_spte_slow(sptep, new_spte);
> 
> The local old_spte copy might be stale by the
> time "spte_has_volatile_bits(old_spte)" reads the writable bit.
> 
> example)
> 
> 
> VCPU0                                       VCPU1
> set_spte
> mmu_spte_update decides fast write 
> mov newspte to sptep
> (non-locked write instruction)
> newspte in store buffer
> 
>                                             lockless spte read path reads stale value
> 
> spin_unlock(mmu_lock) 
> 
> Depending on the what stale value CPU1 read and what decision it took
> this can be a problem, say the new bits (and we do not want to verify
> every single case). The spte write from inside mmu_lock should always be
> atomic now?
> 
> So these details must be exposed to the reader, they are hidden now
> (note mmu_spte_update is already a maze, its getting too deep).
> 


Actually, in this patch, all the spte update is under mmu-lock, and we
lockless-ly read spte , but the spte will be verified again after holding
mmu-lock.

+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/* The spte has been changed. */
+	if (*sptep != spte)
+		goto exit;
+
+	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+
+	*sptep = spte | PT_WRITABLE_MASK;
+	mark_page_dirty(vcpu->kvm, gfn);
+
+exit:
+	spin_unlock(&vcpu->kvm->mmu_lock);

Is not the same as both read/update spte under mmu-lock?

Hmm, this is what you want?

+static bool
+fast_pf_fix_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+		  u64 *sptep, u64 spte)
+{
+	gfn_t gfn;
+
+	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+	mmu_spte_update(sptep, spte | PT_WRITABLE_MASK);
+	mark_page_dirty(vcpu->kvm, gfn);
+
+	return true;
+}
+
+/*
+ * Return value:
+ * - true: let the vcpu to access on the same address again.
+ * - false: let the real page fault path to fix it.
+ */
+static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+			    int level, u32 error_code)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	struct kvm_mmu_page *sp;
+	bool ret = false;
+	u64 spte = 0ull;
+
+	if (!page_fault_can_be_fast(vcpu, gfn, error_code))
+		return false;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	for_each_shadow_entry(vcpu, gva, iterator) {
+		spte = *iterator.sptep;
+
+		if (!is_shadow_present_pte(spte) || iterator.level < level)
+			break;
+	}
+
+	/*
+	 * If the mapping has been changed, let the vcpu fault on the
+	 * same address again.
+	 */
+	if (!is_rmap_spte(spte)) {
+		ret = true;
+		goto exit;
+	}
+
+	if (!is_last_spte(spte, level))
+		goto exit;
+
+	/*
+	 * Check if it is a spurious fault caused by TLB lazily flushed.
+	 *
+	 * Need not check the access of upper level table entries since
+	 * they are always ACC_ALL.
+	 */
+	 if (is_writable_pte(spte)) {
+		ret = true;
+		goto exit;
+	}
+
+	/*
+	 * Currently, to simplify the code, only the spte write-protected
+	 * by dirty-log can be fast fixed.
+	 */
+	if (!spte_wp_by_dirty_log(spte))
+		goto exit;
+
+	sp = page_header(__pa(iterator.sptep));
+
+	ret = fast_pf_fix_spte(vcpu, sp, iterator.sptep, spte);
+
+exit:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret;
+}

next prev parent reply	other threads:[~2012-04-27  5:53 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-25  4:00 [PATCH v4 00/10] KVM: MMU: fast page fault Xiao Guangrong
2012-04-25  4:01 ` [PATCH v4 01/10] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
2012-04-25  4:01 ` [PATCH v4 02/10] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-04-25  4:02 ` [PATCH v4 03/10] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-04-25  4:02 ` [PATCH v4 04/10] KVM: MMU: introduce SPTE_MMU_WRITEABLE bit Xiao Guangrong
2012-04-25  4:03 ` [PATCH v4 05/10] KVM: MMU: introduce SPTE_WRITE_PROTECT bit Xiao Guangrong
2012-04-25  4:03 ` [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-04-26 23:45   ` Marcelo Tosatti
2012-04-27  5:53     ` Xiao Guangrong [this message]
2012-04-27 14:52       ` Marcelo Tosatti
2012-04-28  6:10         ` Xiao Guangrong
2012-05-01  1:34           ` Marcelo Tosatti
2012-05-02  5:28             ` Xiao Guangrong
2012-05-02 21:07               ` Marcelo Tosatti
2012-05-03 11:26                 ` Xiao Guangrong
2012-05-05 14:08                   ` Marcelo Tosatti
2012-05-06  9:36                     ` Avi Kivity
2012-05-07  6:52                     ` Xiao Guangrong
2012-04-29  8:50         ` Takuya Yoshikawa
2012-05-01  2:31           ` Marcelo Tosatti
2012-05-02  5:39           ` Xiao Guangrong
2012-05-02 21:10             ` Marcelo Tosatti
2012-05-03 12:09               ` Xiao Guangrong
2012-05-03 12:13                 ` Avi Kivity
2012-05-03  0:15             ` Takuya Yoshikawa
2012-05-03 12:23               ` Xiao Guangrong
2012-05-03 12:40                 ` Takuya Yoshikawa
2012-04-25  4:04 ` [PATCH v4 07/10] KVM: MMU: lockless update spte on fast page fault path Xiao Guangrong
2012-04-25  4:04 ` [PATCH v4 08/10] KVM: MMU: trace fast page fault Xiao Guangrong
2012-04-25  4:05 ` [PATCH v4 09/10] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-04-25  4:06 ` [PATCH v4 10/10] KVM: MMU: document mmu-lock and fast page fault Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F9A3445.2060305@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).