Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Date: Mon, 05 Jul 2010 11:23:53 +0300	[thread overview]
Message-ID: <4C319699.9000104@redhat.com> (raw)
In-Reply-To: <4C3148FF.3030209@cn.fujitsu.com>

On 07/05/2010 05:52 AM, Xiao Guangrong wrote:
>
> Avi Kivity wrote:
>
>    
>>> Umm, if we move the check before the judgment, it'll check every level,
>>> actually, only the opened mapping and the laster level need checked, so
>>> for the performance reason, maybe it's better to keep two check-point.
>>>
>>>        
>> What exactly are the conditions when you want to check?
>>
>> Perhaps we do need to check every level.  A write to a PDE (or higher
>> level) will clear the corresponding spte, but a fault immediately
>> afterwards can re-establish the spte to point to the new page.
>>
>>      
> Looks into the code more carefully, maybe this code is wrong:
>
>
>               if (!direct) {
>                       r = kvm_read_guest_atomic(vcpu->kvm,
> -                                          gw->pte_gpa[level - 2],
> +                                          gw->pte_gpa[level - 1],
>                                        &curr_pte, sizeof(curr_pte));
> -                    if (r || curr_pte != gw->ptes[level - 2]) {
> +                    if (r || curr_pte != gw->ptes[level - 1]) {
>                                  kvm_mmu_put_page(sp, sptep);
>                                  kvm_release_pfn_clean(pfn);
>                                  sptep = NULL;
>
> It should check the 'level' mapping not 'level - 1', in the later description
> i'll explain it.
>    

Right, this fixes the check for the top level, but it removes a check 
from the bottom level.

We need to move this to the top of the loop so we check all levels.  I 
guess this is why you needed to add a new check point.  But since we 
loop at least glevels times, we don't need two check points.

> Since the 'walk_addr' functions is out of mmu_lock protection, we should
> handle the guest modify its mapping between 'walk_addr' and 'fetch'.
> Now, let's consider every case may be happened while handle guest #PF.
>
> One case is host handle guest written after 'fetch', just like this order:
>
> VCPU 0		VCPU 1
> walk_addr
>                  guest modify its mapping
> fetch
>                  host handle this written(pte_write or invlpg)
>
> This case is not broken anything, even if 'fetch' setup the wrong mapping, the
> later written handler will fix it.
>    

Ok.

> Another case is host handle guest written before 'fetch', like this:
>
> CPU 0		VCPU 1
> walk_addr
>                  guest modify its mapping
>                  host handle this written(pte_write or invlpg)
> fetch
>
> We should notice this case, since the later 'fetch' will setup the wrong mapping.
>
> For example, the guest mapping which 'walk_addr' got is:
>
> GPML4E ->  GPDPE ->  GPDE ->  GPTE ->  GFNA
> (Just take small page for example, other mapping way is also applied)
>
> And, for good to describe, we abstract 'fetch''s work:
>
> for_each_shadow_entry(vcpu, addr, iterator) {
> 	if (iterator.level == hlevel)
> 		Mapping the later level
>
> 	if (is_shadow_present_pte(*sptep)&&
> 		!is_large_pte(*sptep))		<------ Point A
> 		continue;
>
> 	/* handle the non-present/wrong  middle level */
>
> 	find/create the corresponding sp<----- Point B
>
> 	if (the guest mapping is modified)<----- Point C
> 		break;
> 	setup this level's mapping<----- Point D
>
> }
>
> [
>   Note: the later level means PTE, PDE if 2M page size, PDPE if 1G page size
>   the middle level means PDE, PDPE if not using large page size / PML4E
> ]
>
> There are two cases:
>
> 1: Guest modify the middle level, for example, guest modify the GPDE.
>     a: the GPDE has corresponding sp entry, after VCPU1 handle this written,
>        the corresponding host mapping is like this:
>        HPML4E ->  HPDPE ->  HPDE
>        [ HPDE.P = 0 since VCPU1 written handler zapped it in pte_wirte ]
>
>        Under this case, it can broke Point A's judgment and Point C can detect
>        the guest mapping is modified, so it exits this function without setup the
>        mapping.
>
>        And, we should check the guest mapping at GPDE not GPTE since it's GPDE
>        modified not GPTE, it's the explanation for the front fix.
>    

Agree.

>    b: the GPDE not has corresponding sp entry(the area is firstly accessed),
>       corresponding host mapping is like this:
>       HPML4E ->  HPDPE
>       [ HPDPE.P = 0]
>
>       under this case, VCPU1 happily write GPDE without #PF since the GPDE not has shadow
>       page, it's not write-protected.
>
>       while we handle HPDPE, we will create the shadow page for GPDE
>       at Point B, then the host mapping is like this:
>       HPML4E ->  HPDPE ->  HPDE
>       [ HPDE.P = 0 ]
>       it's the same as 1.a, so do the same work as 1.a
>
>       Note: there is a trick: we should put 'Point C' behind of 'Point B', since after we
>             create sp for GPDE, the later modify GPDE will cause #PF, it can ensure later
>             check is valid
>
> 2: Guest modify the later level.
>     Form 'fetch''s abstract, we can see the late level is not checked by 'Point C', if
>     guest modified the GPTE's mapping, the wrong-mapping will be setup by 'fetch', this
>     is just this path does
>
>    

Ok.   So moving the check to before point A, and s/level - 2/level - 1/ 
should work, yes?

Should be slightly simpler since we don't need to kvm_mmu_put_page(sp, 
sptep) any more.

Thanks for the detailed explanation.

-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2010-07-05  8:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01 13:53 [PATCH v4 1/6] KVM: MMU: introduce gfn_to_pfn_atomic() function Xiao Guangrong
2010-07-01 13:53 ` [PATCH v4 2/6] KVM: MMU: introduce gfn_to_page_many_atomic() function Xiao Guangrong
2010-07-01 13:54 ` [PATCH v4 3/6] KVM: MMU: introduce pte_prefetch_topup_memory_cache() Xiao Guangrong
2010-07-01 13:55 ` [PATCH v4 4/6] KVM: MMU: prefetch ptes when intercepted guest #PF Xiao Guangrong
2010-07-02 16:54   ` Marcelo Tosatti
2010-07-03  8:08     ` Xiao Guangrong
2010-07-05 12:01       ` Marcelo Tosatti
2010-07-06  0:50         ` Xiao Guangrong
2010-07-01 13:55 ` [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch Xiao Guangrong
2010-07-02 17:03   ` Marcelo Tosatti
2010-07-03 10:31     ` Xiao Guangrong
2010-07-03 12:08       ` Avi Kivity
2010-07-03 12:16         ` Xiao Guangrong
2010-07-03 12:26           ` Avi Kivity
2010-07-03 12:31             ` Xiao Guangrong
2010-07-03 12:44               ` Avi Kivity
2010-07-03 12:49                 ` Avi Kivity
2010-07-03 13:03                   ` Xiao Guangrong
2010-07-04 14:30                     ` Avi Kivity
2010-07-05  2:52                       ` Xiao Guangrong
2010-07-05  8:23                         ` Avi Kivity [this message]
2010-07-05  8:45                           ` Xiao Guangrong
2010-07-05  9:05                             ` Avi Kivity
2010-07-05  9:09                               ` Xiao Guangrong
2010-07-05  9:20                                 ` Avi Kivity
2010-07-05  9:31                                   ` Xiao Guangrong
2010-07-03 12:57                 ` Xiao Guangrong
2010-07-04 14:32                   ` Avi Kivity
2010-07-03 11:48     ` Avi Kivity
2010-07-01 13:56 ` [PATCH v4 6/6] KVM: MMU: trace " Xiao Guangrong
2010-07-02 16:47 ` [PATCH v4 1/6] KVM: MMU: introduce gfn_to_pfn_atomic() function Marcelo Tosatti
2010-07-03  3:13   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C319699.9000104@redhat.com \
    --to=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=xiaoguangrong@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).