From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH v2 21/22] KVM: MMU: mmio page fault support
Date: Thu, 23 Jun 2011 11:21:34 -0300 [thread overview]
Message-ID: <20110623142134.GA12181@amt.cnet> (raw)
In-Reply-To: <4E02B0BE.7070003@cn.fujitsu.com>
On Thu, Jun 23, 2011 at 11:19:26AM +0800, Xiao Guangrong wrote:
> Marcelo,
>
> Thanks for your review!
>
> On 06/23/2011 05:59 AM, Marcelo Tosatti wrote:
>
> >> static int is_large_pte(u64 pte)
> >> @@ -2123,6 +2158,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >> u64 spte, entry = *sptep;
> >> int ret = 0;
> >>
> >> + if (set_mmio_spte(sptep, gfn, pfn, pte_access))
> >> + return 0;
> >> +
> >
> >
> > Should zap all mmio sptes when creating new memory slots (current qemu
> > never exhibits that pattern, but its not an unsupported scenario).
> > Easier to zap all mmu in that case.
> >
>
> OK, will do it in the next version.
>
> > For shadow you need to remove mmio sptes on invlpg and all mmio sptes
> > under CR3 (mmu_sync_roots), other than clearing the gva/gpa variables.
> >
>
> Oh, kvm_mmu_pte_write and invlpg do not zap mmio spte, i will
> fix these, thanks for your point it out.
>
> For mmu_sync_roots, we properly do it in FNAME(sync_page) in this patch:
>
> static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
> int *nr_present)
> {
> if (unlikely(is_mmio_spte(*sptep))) {
> if (gfn != get_mmio_spte_gfn(*sptep)) {
> mmu_spte_clear_no_track(sptep);
> return true;
> }
>
> (*nr_present)++;
> mark_mmio_spte(sptep, gfn, access);
> return true;
> }
>
> return false;
> }
>
> > Otherwise you can move the mmio info from an mmio spte back to
> > mmio_gva/mmio_gfn after a TLB flush, without rereading the guest
> > pagetable.
> >
>
> We do not read the guest page table when mmio page fault occurred,
> we just do it as you say:
> - Firstly, walk the shadow page table to get the mmio spte
> - Then, cache the mmio spte info to mmio_gva/mmio_gfn
>
> I missed your meaning?
I meant that guest pagetables must be read again on CR3 switch or invlpg
(GVA->GPA can change in that case), which only happens if mmio sptes are
zapped.
> >> + struct kvm_shadow_walk_iterator iterator;
> >> + u64 spte, ret = 0ull;
> >> +
> >> + walk_shadow_page_lockless_begin(vcpu);
> >> + for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
> >> + if (iterator.level == 1) {
> >> + ret = spte;
> >> + break;
> >> + }
> >
> > Have you verified it is safe to walk sptes with parallel modifications
> > to them? Other than shadow page deletion which is in theory covered by
> > RCU. It would be good to have all cases written down.
> >
>
> Um, i think it is safe, when spte is being write, we can get the old spte
> or the new spte, both cases are ok for us: it is just like the page structure
> cache on the real machine, the cpu can fetch the old spte even if the page
> structure is changed by other cpus.
> >> +
> >> + if (!is_shadow_present_pte(spte))
> >> + break;
> >> + }
> >> + walk_shadow_page_lockless_end(vcpu);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +/*
> >> + * If it is a real mmio page fault, return 1 and emulat the instruction
> >> + * directly, return 0 if it needs page fault path to fix it, -1 is
> >> + * returned if bug is detected.
> >> + */
> >> +int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> >> +{
> >> + u64 spte;
> >> +
> >> + if (quickly_check_mmio_pf(vcpu, addr, direct))
> >> + return 1;
> >> +
> >> + spte = walk_shadow_page_get_mmio_spte(vcpu, addr);
> >> +
> >> + if (is_mmio_spte(spte)) {
> >> + gfn_t gfn = get_mmio_spte_gfn(spte);
> >> + unsigned access = get_mmio_spte_access(spte);
> >> +
> >> + if (direct)
> >> + addr = 0;
> >> + vcpu_cache_mmio_info(vcpu, addr, gfn, access);
> >> + return 1;
> >> + }
> >> +
> >> + /*
> >> + * It's ok if the gva is remapped by other cpus on shadow guest,
> >> + * it's a BUG if the gfn is not a mmio page.
> >> + */
> >
> > If the gva is remapped there should be no mmio page fault in the first
> > place.
> >
>
> For example, as follow case:
>
> VCPU 0 VCPU 1
> gva is mapped to a mmio region
> access gva
> remap gva to other page
> emulate mmio access by kvm
> (*the point we are discussing*)
> flush all tlb
>
> In theory, it can happen.
>
> >> + if (direct && is_shadow_present_pte(spte))
> >> + return -1;
> >
> > An spte does not have to contain the present bit to generate a valid EPT
> > misconfiguration (and an spte dump is still required in that case).
> > Use !is_mmio_spte() instead.
> >
>
> We can not use !is_mmio_spte() here, since the shadow page can be zapped anytime,
> for example:
>
> sp.spt[i] = mmio-spte
>
> VCPU 0 VCPU 1
> Access sp.spte[i], ept misconfig is occurred
> delete sp
> (if the number of shadow page is out of the limit
> or page shrink is required, and other events...)
>
> Walk shadow page out of the lock and get the
> non-present spte
> (*the point we are discussing*)
Then is_mmio_spte(non-present spte) == false, right? Point is that it
only sptes with precise mmio spte pattern should be considered mmio
sptes, otherwise consider a genuine EPT misconfiguration error (which
must be reported).
What about using fault code instead of spte as Avi suggested instead?
> So, the bug we can detect is: it is the mmio access but the spte is point to the normal
> page.
>
> >
> >> +
> >> + /*
> >> + * If the page table is zapped by other cpus, let the page
> >> + * fault path to fix it.
> >> + */
> >> + return 0;
> >> +}
> >
> > I don't understand when would this happen, can you please explain?
> >
>
> The case is above :-)
No need to jump to page fault handler, can let CPU fault again on non
present spte.
next prev parent reply other threads:[~2011-06-23 16:39 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-22 14:27 [PATCH v2 0/22] KVM: optimize for MMIO handled Xiao Guangrong
2011-06-22 14:28 ` [PATCH v2 01/22] KVM: MMU: fix walking shadow page table Xiao Guangrong
2011-06-22 17:13 ` Marcelo Tosatti
2011-06-23 2:05 ` Xiao Guangrong
2011-06-27 6:35 ` Xiao Guangrong
2011-06-22 14:28 ` [PATCH v2 02/22] KVM: MMU: do not update slot bitmap if spte is nonpresent Xiao Guangrong
2011-06-22 14:29 ` [PATCH v2 03/22] KVM: x86: fix broken read emulation spans a page boundary Xiao Guangrong
2011-06-29 8:21 ` Avi Kivity
2011-06-29 10:53 ` Xiao Guangrong
2011-06-29 11:19 ` Avi Kivity
2011-06-22 14:29 ` [PATCH v2 04/22] KVM: x86: introduce vcpu_gva_to_gpa to cleanup the code Xiao Guangrong
2011-06-29 8:24 ` Avi Kivity
2011-06-29 10:56 ` Xiao Guangrong
2011-06-29 11:09 ` Avi Kivity
2011-06-29 11:26 ` Xiao Guangrong
2011-06-29 11:26 ` Avi Kivity
2011-06-29 11:48 ` Gleb Natapov
2011-06-22 14:30 ` [PATCH v2 05/22] KVM: x86: abstract the operation for read/write emulation Xiao Guangrong
2011-06-29 8:37 ` Avi Kivity
2011-06-29 10:59 ` Xiao Guangrong
2011-06-22 14:30 ` [PATCH v2 06/22] KVM: x86: cleanup the code of " Xiao Guangrong
2011-06-22 14:31 ` [PATCH v2 07/22] KVM: MMU: cache mmio info on page fault path Xiao Guangrong
2011-06-29 8:48 ` Avi Kivity
2011-06-29 11:09 ` Xiao Guangrong
2011-06-29 11:10 ` Avi Kivity
2011-06-22 14:31 ` [PATCH v2 08/22] KVM: MMU: optimize to handle dirty bit Xiao Guangrong
2011-06-22 14:31 ` [PATCH v2 09/22] KVM: MMU: cleanup for FNAME(fetch) Xiao Guangrong
2011-06-22 14:32 ` [PATCH v2 10/22] KVM: MMU: rename 'pt_write' to 'emulate' Xiao Guangrong
2011-06-22 14:32 ` [PATCH v2 11/22] KVM: MMU: count used shadow pages on prepareing path Xiao Guangrong
2011-06-22 14:32 ` [PATCH v2 12/22] KVM: MMU: split kvm_mmu_free_page Xiao Guangrong
2011-06-22 14:33 ` [PATCH v2 13/22] KVM: MMU: remove bypass_guest_pf Xiao Guangrong
2011-06-22 14:33 ` [PATCH v2 14/22] KVM: MMU: filter out the mmio pfn from the fault pfn Xiao Guangrong
2011-06-22 14:34 ` [PATCH v2 15/22] KVM: MMU: abstract some functions to handle " Xiao Guangrong
2011-06-22 14:34 ` [PATCH v2 16/22] KVM: MMU: introduce the rules to modify shadow page table Xiao Guangrong
2011-06-22 14:34 ` [PATCH v2 17/22] KVM: MMU: clean up spte updating and clearing Xiao Guangrong
2011-06-22 14:35 ` [PATCH 18/22] KVM: MMU: do not need atomicly to set/clear spte Xiao Guangrong
2011-06-22 14:35 ` [PATCH v2 19/22] KVM: MMU: lockless walking shadow page table Xiao Guangrong
2011-06-29 9:16 ` Avi Kivity
2011-06-29 11:16 ` Xiao Guangrong
2011-06-29 11:18 ` Avi Kivity
2011-06-29 11:50 ` Xiao Guangrong
2011-06-29 12:18 ` Avi Kivity
2011-06-29 12:28 ` Xiao Guangrong
2011-06-29 12:27 ` Avi Kivity
2011-06-29 12:39 ` Xiao Guangrong
2011-06-29 13:01 ` Avi Kivity
2011-06-29 13:05 ` Xiao Guangrong
2011-06-22 14:35 ` [PATCH v2 20/22] KVM: MMU: reorganize struct kvm_shadow_walk_iterator Xiao Guangrong
2011-06-22 14:36 ` [PATCH v2 21/22] KVM: MMU: mmio page fault support Xiao Guangrong
2011-06-22 21:59 ` Marcelo Tosatti
2011-06-23 3:19 ` Xiao Guangrong
2011-06-23 6:40 ` Xiao Guangrong
2011-06-23 14:21 ` Marcelo Tosatti [this message]
2011-06-23 17:55 ` Xiao Guangrong
2011-06-23 20:13 ` Marcelo Tosatti
2011-06-24 2:04 ` Xiao Guangrong
2011-06-26 8:42 ` Avi Kivity
2011-06-27 11:00 ` [PATCH v3 " Xiao Guangrong
2011-06-29 9:22 ` [PATCH v2 " Avi Kivity
2011-06-29 12:28 ` Xiao Guangrong
2011-06-22 14:36 ` [PATCH v2 22/22] KVM: MMU: trace mmio page fault Xiao Guangrong
2011-06-29 9:23 ` [PATCH v2 0/22] KVM: optimize for MMIO handled Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110623142134.GA12181@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xiaoguangrong@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox