From: Avi Kivity <avi@redhat.com>
To: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Date: Mon, 05 Jul 2010 11:23:53 +0300 [thread overview]
Message-ID: <4C319699.9000104@redhat.com> (raw)
In-Reply-To: <4C3148FF.3030209@cn.fujitsu.com>
On 07/05/2010 05:52 AM, Xiao Guangrong wrote:
>
> Avi Kivity wrote:
>
>
>>> Umm, if we move the check before the judgment, it'll check every level,
>>> actually, only the opened mapping and the laster level need checked, so
>>> for the performance reason, maybe it's better to keep two check-point.
>>>
>>>
>> What exactly are the conditions when you want to check?
>>
>> Perhaps we do need to check every level. A write to a PDE (or higher
>> level) will clear the corresponding spte, but a fault immediately
>> afterwards can re-establish the spte to point to the new page.
>>
>>
> Looks into the code more carefully, maybe this code is wrong:
>
>
> if (!direct) {
> r = kvm_read_guest_atomic(vcpu->kvm,
> - gw->pte_gpa[level - 2],
> + gw->pte_gpa[level - 1],
> &curr_pte, sizeof(curr_pte));
> - if (r || curr_pte != gw->ptes[level - 2]) {
> + if (r || curr_pte != gw->ptes[level - 1]) {
> kvm_mmu_put_page(sp, sptep);
> kvm_release_pfn_clean(pfn);
> sptep = NULL;
>
> It should check the 'level' mapping not 'level - 1', in the later description
> i'll explain it.
>
Right, this fixes the check for the top level, but it removes a check
from the bottom level.
We need to move this to the top of the loop so we check all levels. I
guess this is why you needed to add a new check point. But since we
loop at least glevels times, we don't need two check points.
> Since the 'walk_addr' functions is out of mmu_lock protection, we should
> handle the guest modify its mapping between 'walk_addr' and 'fetch'.
> Now, let's consider every case may be happened while handle guest #PF.
>
> One case is host handle guest written after 'fetch', just like this order:
>
> VCPU 0 VCPU 1
> walk_addr
> guest modify its mapping
> fetch
> host handle this written(pte_write or invlpg)
>
> This case is not broken anything, even if 'fetch' setup the wrong mapping, the
> later written handler will fix it.
>
Ok.
> Another case is host handle guest written before 'fetch', like this:
>
> CPU 0 VCPU 1
> walk_addr
> guest modify its mapping
> host handle this written(pte_write or invlpg)
> fetch
>
> We should notice this case, since the later 'fetch' will setup the wrong mapping.
>
> For example, the guest mapping which 'walk_addr' got is:
>
> GPML4E -> GPDPE -> GPDE -> GPTE -> GFNA
> (Just take small page for example, other mapping way is also applied)
>
> And, for good to describe, we abstract 'fetch''s work:
>
> for_each_shadow_entry(vcpu, addr, iterator) {
> if (iterator.level == hlevel)
> Mapping the later level
>
> if (is_shadow_present_pte(*sptep)&&
> !is_large_pte(*sptep)) <------ Point A
> continue;
>
> /* handle the non-present/wrong middle level */
>
> find/create the corresponding sp<----- Point B
>
> if (the guest mapping is modified)<----- Point C
> break;
> setup this level's mapping<----- Point D
>
> }
>
> [
> Note: the later level means PTE, PDE if 2M page size, PDPE if 1G page size
> the middle level means PDE, PDPE if not using large page size / PML4E
> ]
>
> There are two cases:
>
> 1: Guest modify the middle level, for example, guest modify the GPDE.
> a: the GPDE has corresponding sp entry, after VCPU1 handle this written,
> the corresponding host mapping is like this:
> HPML4E -> HPDPE -> HPDE
> [ HPDE.P = 0 since VCPU1 written handler zapped it in pte_wirte ]
>
> Under this case, it can broke Point A's judgment and Point C can detect
> the guest mapping is modified, so it exits this function without setup the
> mapping.
>
> And, we should check the guest mapping at GPDE not GPTE since it's GPDE
> modified not GPTE, it's the explanation for the front fix.
>
Agree.
> b: the GPDE not has corresponding sp entry(the area is firstly accessed),
> corresponding host mapping is like this:
> HPML4E -> HPDPE
> [ HPDPE.P = 0]
>
> under this case, VCPU1 happily write GPDE without #PF since the GPDE not has shadow
> page, it's not write-protected.
>
> while we handle HPDPE, we will create the shadow page for GPDE
> at Point B, then the host mapping is like this:
> HPML4E -> HPDPE -> HPDE
> [ HPDE.P = 0 ]
> it's the same as 1.a, so do the same work as 1.a
>
> Note: there is a trick: we should put 'Point C' behind of 'Point B', since after we
> create sp for GPDE, the later modify GPDE will cause #PF, it can ensure later
> check is valid
>
> 2: Guest modify the later level.
> Form 'fetch''s abstract, we can see the late level is not checked by 'Point C', if
> guest modified the GPTE's mapping, the wrong-mapping will be setup by 'fetch', this
> is just this path does
>
>
Ok. So moving the check to before point A, and s/level - 2/level - 1/
should work, yes?
Should be slightly simpler since we don't need to kvm_mmu_put_page(sp,
sptep) any more.
Thanks for the detailed explanation.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2010-07-05 8:23 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-01 13:53 [PATCH v4 1/6] KVM: MMU: introduce gfn_to_pfn_atomic() function Xiao Guangrong
2010-07-01 13:53 ` [PATCH v4 2/6] KVM: MMU: introduce gfn_to_page_many_atomic() function Xiao Guangrong
2010-07-01 13:54 ` [PATCH v4 3/6] KVM: MMU: introduce pte_prefetch_topup_memory_cache() Xiao Guangrong
2010-07-01 13:55 ` [PATCH v4 4/6] KVM: MMU: prefetch ptes when intercepted guest #PF Xiao Guangrong
2010-07-02 16:54 ` Marcelo Tosatti
2010-07-03 8:08 ` Xiao Guangrong
2010-07-05 12:01 ` Marcelo Tosatti
2010-07-06 0:50 ` Xiao Guangrong
2010-07-01 13:55 ` [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch Xiao Guangrong
2010-07-02 17:03 ` Marcelo Tosatti
2010-07-03 10:31 ` Xiao Guangrong
2010-07-03 12:08 ` Avi Kivity
2010-07-03 12:16 ` Xiao Guangrong
2010-07-03 12:26 ` Avi Kivity
2010-07-03 12:31 ` Xiao Guangrong
2010-07-03 12:44 ` Avi Kivity
2010-07-03 12:49 ` Avi Kivity
2010-07-03 13:03 ` Xiao Guangrong
2010-07-04 14:30 ` Avi Kivity
2010-07-05 2:52 ` Xiao Guangrong
2010-07-05 8:23 ` Avi Kivity [this message]
2010-07-05 8:45 ` Xiao Guangrong
2010-07-05 9:05 ` Avi Kivity
2010-07-05 9:09 ` Xiao Guangrong
2010-07-05 9:20 ` Avi Kivity
2010-07-05 9:31 ` Xiao Guangrong
2010-07-03 12:57 ` Xiao Guangrong
2010-07-04 14:32 ` Avi Kivity
2010-07-03 11:48 ` Avi Kivity
2010-07-01 13:56 ` [PATCH v4 6/6] KVM: MMU: trace " Xiao Guangrong
2010-07-02 16:47 ` [PATCH v4 1/6] KVM: MMU: introduce gfn_to_pfn_atomic() function Marcelo Tosatti
2010-07-03 3:13 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C319699.9000104@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=xiaoguangrong@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).