From: Avi Kivity <avi@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH v5 6/9] KVM: MMU: fast path of handling guest page fault
Date: Wed, 23 May 2012 14:34:16 +0300 [thread overview]
Message-ID: <4FBCCB38.9080302@redhat.com> (raw)
In-Reply-To: <4FBCA60F.5040701@linux.vnet.ibm.com>
On 05/23/2012 11:55 AM, Xiao Guangrong wrote:
> If the the present bit of page fault error code is set, it indicates
> the shadow page is populated on all levels, it means what we do is
> only modify the access bit which can be done out of mmu-lock
>
> Currently, in order to simplify the code, we only fix the page fault
> caused by write-protect on the fast path
>
> +
> static bool spte_has_volatile_bits(u64 spte)
> {
> + /*
> + * Always atomicly update spte if it can be updated
> + * out of mmu-lock.
> + */
> + if (spte_can_lockless_update(spte))
> + return true;
This is a really subtle point, but is it really needed?
Lockless spte updates should always set the dirty and accessed bits, so
we won't be overwriting any volatile bits there.
> +
> if (!shadow_accessed_mask)
> return false;
>
> @@ -498,13 +517,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
> return ret;
> }
>
> - new_spte |= old_spte & shadow_dirty_mask;
> -
> - mask = shadow_accessed_mask;
> - if (is_writable_pte(old_spte))
> - mask |= shadow_dirty_mask;
> -
> - if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask)
> + if (!spte_has_volatile_bits(old_spte))
> __update_clear_spte_fast(sptep, new_spte);
> else
> old_spte = __update_clear_spte_slow(sptep, new_spte);
It looks like the old code is bad.. why can we ignore volatile bits in
the old spte? Suppose pfn is changing?
> +
> +static bool
> +fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> + u64 *sptep, u64 spte, gfn_t gfn)
> +{
> + pfn_t pfn;
> + bool ret = false;
> +
> + /*
> + * For the indirect spte, it is hard to get a stable gfn since
> + * we just use a cmpxchg to avoid all the races which is not
> + * enough to avoid the ABA problem: the host can arbitrarily
> + * change spte and the mapping from gfn to pfn.
> + *
> + * What we do is call gfn_to_pfn_atomic to bind the gfn and the
> + * pfn because after the call:
> + * - we have held the refcount of pfn that means the pfn can not
> + * be freed and be reused for another gfn.
> + * - the pfn is writable that means it can not be shared by different
> + * gfn.
> + */
> + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn);
> +
> + /* The host page is swapped out or merged. */
> + if (mmu_invalid_pfn(pfn))
> + goto exit;
> +
> + ret = true;
> +
> + if (pfn != spte_to_pfn(spte))
> + goto exit;
> +
> + if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
> + mark_page_dirty(vcpu->kvm, gfn);
Isn't it better to kvm_release_pfn_dirty() here?
> +
> +exit:
> + kvm_release_pfn_clean(pfn);
> + return ret;
> +}
> +
> +> +
> +/*
> + * Return value:
> + * - true: let the vcpu to access on the same address again.
> + * - false: let the real page fault path to fix it.
> + */
> +static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> + int level, u32 error_code)
> +{
> + struct kvm_shadow_walk_iterator iterator;
> + struct kvm_mmu_page *sp;
> + bool ret = false;
> + u64 spte = 0ull;
> +
> + if (!page_fault_can_be_fast(vcpu, gfn, error_code))
> + return false;
> +
No need to pass gfn here.
> + walk_shadow_page_lockless_begin(vcpu);
> + for_each_shadow_entry_lockless(vcpu, gva, iterator, spte)
> + if (!is_shadow_present_pte(spte) || iterator.level < level)
> + break;
> +
> + /*
> + * If the mapping has been changed, let the vcpu fault on the
> + * same address again.
> + */
> + if (!is_rmap_spte(spte)) {
> + ret = true;
> + goto exit;
> + }
> +
> + if (!is_last_spte(spte, level))
> + goto exit;
> +
> + /*
> + * Check if it is a spurious fault caused by TLB lazily flushed.
> + *
> + * Need not check the access of upper level table entries since
> + * they are always ACC_ALL.
> + */
> + if (is_writable_pte(spte)) {
> + ret = true;
> + goto exit;
> + }
> +
> + /*
> + * Currently, to simplify the code, only the spte write-protected
> + * by dirty-log can be fast fixed.
> + */
> + if (!spte_can_be_writable(spte))
> + goto exit;
> +
> + sp = page_header(__pa(iterator.sptep));
> +
> + if (sp->role.direct)
> + ret = fast_pf_fix_direct_spte(vcpu, sp, iterator.sptep, spte);
> + else
> + ret = fast_pf_fix_indirect_spte(vcpu, sp, iterator.sptep,
> + spte, gfn);
> +
> +exit:
> + walk_shadow_page_lockless_end(vcpu);
> +
> + return ret;
> +}
> +
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2012-05-23 11:34 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-23 8:51 [PATCH v5 0/9] KVM: fast page fault Xiao Guangrong
2012-05-23 8:51 ` [PATCH v5 1/9] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
2012-05-23 8:52 ` [PATCH v5 2/9] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-05-23 8:53 ` [PATCH v5 3/9] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-05-23 8:53 ` [PATCH v5 4/9] KVM: MMU: fold tlb flush judgement into mmu_spte_update Xiao Guangrong
2012-05-23 8:55 ` [PATCH v5 5/9] KVM: MMU: introduce SPTE_MMU_WRITEABLE bit Xiao Guangrong
2012-05-23 8:55 ` [PATCH v5 6/9] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-05-23 11:34 ` Avi Kivity [this message]
2012-05-24 6:26 ` Xiao Guangrong
2012-05-24 8:25 ` Avi Kivity
2012-05-24 9:03 ` Xiao Guangrong
2012-05-23 8:56 ` [PATCH v5 7/9] KVM: MMU: trace fast " Xiao Guangrong
2012-05-23 8:57 ` [PATCH v5 8/9] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-05-23 8:57 ` [PATCH v5 9/9] KVM: MMU: document mmu-lock and fast page fault Xiao Guangrong
2012-05-23 11:37 ` [PATCH v5 0/9] KVM: " Avi Kivity
2012-05-24 6:31 ` Xiao Guangrong
2012-05-24 7:19 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBCCB38.9080302@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.