public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Date: Mon, 05 Jul 2010 10:52:47 +0800	[thread overview]
Message-ID: <4C3148FF.3030209@cn.fujitsu.com> (raw)
In-Reply-To: <4C309B23.9060808@redhat.com>



Avi Kivity wrote:

>> Umm, if we move the check before the judgment, it'll check every level,
>> actually, only the opened mapping and the laster level need checked, so
>> for the performance reason, maybe it's better to keep two check-point.
>>    
> 
> What exactly are the conditions when you want to check?
> 
> Perhaps we do need to check every level.  A write to a PDE (or higher
> level) will clear the corresponding spte, but a fault immediately
> afterwards can re-establish the spte to point to the new page.
> 

Looks into the code more carefully, maybe this code is wrong:


             if (!direct) {
                     r = kvm_read_guest_atomic(vcpu->kvm,
-                                          gw->pte_gpa[level - 2],
+                                          gw->pte_gpa[level - 1],
                                      &curr_pte, sizeof(curr_pte));
-                    if (r || curr_pte != gw->ptes[level - 2]) {
+                    if (r || curr_pte != gw->ptes[level - 1]) {
                                kvm_mmu_put_page(sp, sptep);
                                kvm_release_pfn_clean(pfn);
                                sptep = NULL;

It should check the 'level' mapping not 'level - 1', in the later description
i'll explain it.

Since the 'walk_addr' functions is out of mmu_lock protection, we should
handle the guest modify its mapping between 'walk_addr' and 'fetch'.
Now, let's consider every case may be happened while handle guest #PF.

One case is host handle guest written after 'fetch', just like this order:

VCPU 0		VCPU 1
walk_addr
                guest modify its mapping
fetch
                host handle this written(pte_write or invlpg)

This case is not broken anything, even if 'fetch' setup the wrong mapping, the
later written handler will fix it.

Another case is host handle guest written before 'fetch', like this:

CPU 0		VCPU 1
walk_addr
                guest modify its mapping
                host handle this written(pte_write or invlpg)
fetch

We should notice this case, since the later 'fetch' will setup the wrong mapping.

For example, the guest mapping which 'walk_addr' got is:

GPML4E -> GPDPE -> GPDE -> GPTE -> GFNA
(Just take small page for example, other mapping way is also applied)

And, for good to describe, we abstract 'fetch''s work:

for_each_shadow_entry(vcpu, addr, iterator) {
	if (iterator.level == hlevel)
		Mapping the later level

	if (is_shadow_present_pte(*sptep) &&
		!is_large_pte(*sptep))		   <------ Point A
		continue;

	/* handle the non-present/wrong  middle level */

	find/create the corresponding sp            <----- Point B

	if (the guest mapping is modified)          <----- Point C
		break;
	setup this level's mapping                  <----- Point D

}

[
 Note: the later level means PTE, PDE if 2M page size, PDPE if 1G page size
 the middle level means PDE, PDPE if not using large page size / PML4E
]

There are two cases:

1: Guest modify the middle level, for example, guest modify the GPDE.
   a: the GPDE has corresponding sp entry, after VCPU1 handle this written,
      the corresponding host mapping is like this:
      HPML4E -> HPDPE -> HPDE
      [ HPDE.P = 0 since VCPU1 written handler zapped it in pte_wirte ]

      Under this case, it can broke Point A's judgment and Point C can detect
      the guest mapping is modified, so it exits this function without setup the
      mapping.

      And, we should check the guest mapping at GPDE not GPTE since it's GPDE
      modified not GPTE, it's the explanation for the front fix.

  b: the GPDE not has corresponding sp entry(the area is firstly accessed),
     corresponding host mapping is like this:
     HPML4E -> HPDPE
     [ HPDPE.P = 0]

     under this case, VCPU1 happily write GPDE without #PF since the GPDE not has shadow
     page, it's not write-protected.

     while we handle HPDPE, we will create the shadow page for GPDE
     at Point B, then the host mapping is like this:
     HPML4E -> HPDPE -> HPDE
     [ HPDE.P = 0 ]
     it's the same as 1.a, so do the same work as 1.a

     Note: there is a trick: we should put 'Point C' behind of 'Point B', since after we
           create sp for GPDE, the later modify GPDE will cause #PF, it can ensure later
           check is valid

2: Guest modify the later level.
   Form 'fetch''s abstract, we can see the late level is not checked by 'Point C', if
   guest modified the GPTE's mapping, the wrong-mapping will be setup by 'fetch', this
   is just this path does




     

  reply	other threads:[~2010-07-05  2:56 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01 13:53 [PATCH v4 1/6] KVM: MMU: introduce gfn_to_pfn_atomic() function Xiao Guangrong
2010-07-01 13:53 ` [PATCH v4 2/6] KVM: MMU: introduce gfn_to_page_many_atomic() function Xiao Guangrong
2010-07-01 13:54 ` [PATCH v4 3/6] KVM: MMU: introduce pte_prefetch_topup_memory_cache() Xiao Guangrong
2010-07-01 13:55 ` [PATCH v4 4/6] KVM: MMU: prefetch ptes when intercepted guest #PF Xiao Guangrong
2010-07-02 16:54   ` Marcelo Tosatti
2010-07-03  8:08     ` Xiao Guangrong
2010-07-05 12:01       ` Marcelo Tosatti
2010-07-06  0:50         ` Xiao Guangrong
2010-07-01 13:55 ` [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch Xiao Guangrong
2010-07-02 17:03   ` Marcelo Tosatti
2010-07-03 10:31     ` Xiao Guangrong
2010-07-03 12:08       ` Avi Kivity
2010-07-03 12:16         ` Xiao Guangrong
2010-07-03 12:26           ` Avi Kivity
2010-07-03 12:31             ` Xiao Guangrong
2010-07-03 12:44               ` Avi Kivity
2010-07-03 12:49                 ` Avi Kivity
2010-07-03 13:03                   ` Xiao Guangrong
2010-07-04 14:30                     ` Avi Kivity
2010-07-05  2:52                       ` Xiao Guangrong [this message]
2010-07-05  8:23                         ` Avi Kivity
2010-07-05  8:45                           ` Xiao Guangrong
2010-07-05  9:05                             ` Avi Kivity
2010-07-05  9:09                               ` Xiao Guangrong
2010-07-05  9:20                                 ` Avi Kivity
2010-07-05  9:31                                   ` Xiao Guangrong
2010-07-03 12:57                 ` Xiao Guangrong
2010-07-04 14:32                   ` Avi Kivity
2010-07-03 11:48     ` Avi Kivity
2010-07-01 13:56 ` [PATCH v4 6/6] KVM: MMU: trace " Xiao Guangrong
2010-07-02 16:47 ` [PATCH v4 1/6] KVM: MMU: introduce gfn_to_pfn_atomic() function Marcelo Tosatti
2010-07-03  3:13   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C3148FF.3030209@cn.fujitsu.com \
    --to=xiaoguangrong@cn.fujitsu.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox