Re: [PATCH 11/13] KVM: MMU: fast path of handling guest page fault

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
To: Avi Kivity <avi@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 11/13] KVM: MMU: fast path of handling guest page fault
Date: Fri, 06 Apr 2012 03:39:23 +0800	[thread overview]
Message-ID: <4F7DF4EB.8040708@gmail.com> (raw)
In-Reply-To: <4F788119.6090401@redhat.com>

On 04/02/2012 12:23 AM, Avi Kivity wrote:

> On 03/29/2012 11:27 AM, Xiao Guangrong wrote:
>> If the the present bit of page fault error code is set, it indicates
>> the shadow page is populated on all levels, it means what we do is only
>> modify the access bit which can be done out of mmu-lock
>>
>> The tricks in this patch is avoiding the race between fast page fault
>> path and write-protect path, write-protect path is a read-check-modify
>> path:
>> read spte, check W bit, then clear W bit. What we do is populating a
>> identification in spte, if write-protect meets it, it modify the spte
>> even if the spte is readonly. See the comment in the code to get more
>> information
>>
>> +
>> +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, gfn_t gfn,
>> +				   u32 error_code)
>> +{
>> +	unsigned long *rmap;
>> +	bool write = error_code & PFERR_WRITE_MASK;
>> +
>> +	/*
>> +	 * #PF can be fast only if the shadow page table is present, that
>> +	 * means we just need change the access bits (e.g: R/W, U/S...)
>> +	 * which can be done out of mmu-lock.
>> +	 */
>> +	if (!(error_code & PFERR_PRESENT_MASK))
>> +		return false;
>> +
>> +	if (unlikely(vcpu->vcpu_id > max_vcpu_spte()))
>> +		return false;
>> +
>> +	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
> 
> What prevents sp->gfns from being freed while this is going on?  Did I
> miss the patch that makes mmu pages freed by rcu?  


sp->gfns is not used here, what we are using is rmap which came from
memslot protected by srcu.

In this patch, we called kvm_mmu_page_get_gfn in fast_pf_fetch_direct_spte
which only used for direct sp, sp->gfns is also not touched in this case.

But, i want to use it in the next version, it should be freed in the
rcu context, see:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88934

> Also need barriers in
> kvm_mmu_get_page() to make sure sp->gfns is made visible before the page
> is hashed in.
> 


We do not walk the hash tables of shadow page, we walk shadow page table
which currently is being used by vcpu, so we just need care the order
of setting spte and setting sp->gfns, but it has data dependence in
currently code:

set spte
if (is_shadow_present_pte(*sptep))
   set sp->gfns[]

setting sp->gfns can not be reordered to the head of set spte


> + smp_rmb()
> 


Actually, i do not want to use barrier here since a read barrier is
not a complier barrier on X86. And we just quickly check the page
can be writable before shadow page table walking. A real barrier is
used before updating spte.

>> +
>> +	/* Quickly check the page can be writable. */
>> +	if (write && (ACCESS_ONCE(*rmap) & PTE_LIST_WRITE_PROTECT))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +typedef bool (*fast_pf_fetch_spte)(struct kvm_vcpu *vcpu, u64 *sptep,
>> +				   u64 *new_spte, gfn_t gfn, u32 expect_access,
>> +				   u64 spte);
>> +
>> +static bool
>> +fast_pf_fetch_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 *new_spte,
>> +			  gfn_t gfn, u32 expect_access, u64 spte)
>> +{
>> +	struct kvm_mmu_page *sp = page_header(__pa(sptep));
>> +
> 
> Why is this stable?  Another cpu could have removed it.
> 


It is ok for the removed sp since it is not freed on this path. And if the
sp is remove, cmpxchg can see this change since the spte is zapped.

>>
>> +	WARN_ON(!sp->role.direct);
>> +
>> +	if (kvm_mmu_page_get_gfn(sp, sptep - sp->spt) != gfn)
>> +		return false;
> 
> Simpler to check if the page is sync; if it is, gfn has not changed.
> 
> (but the check for a sync page is itself unstable).
> 


Checking sp.sync is unsafe since there has a window between guest page write
emulation and update spte:

At the Beginning:
gpte = gfn1
spte = gfn1's pfn
(gfn1 is write-protect, gfn2 is write-free)

VCPU 1                          VCPU 2

modify gpte and let it point to gfn2
page fault
write-emulation: gpte = gfn2

                             page fault on gpte:
                                 gfn = gfn2
                             fast page fault:
                                 sett gfn2 is write free and update spte:
                             spte = gfn1's pfn + W
                                 OOPS!!!
zap spte

>>
>> +
>> +	set_spte(vcpu, new_spte, sp->role.access,
>> +		 expect_access & ACC_USER_MASK, expect_access & ACC_WRITE_MASK,
>> +		 sp->role.level, gfn, spte_to_pfn(spte), false, false,
>> +		 spte & SPTE_HOST_WRITEABLE, true);
>> +
> 
> What if the conditions have changed meanwhile?  Should we set the spte
> atomically via set_spte?  For example an mmu notifier clearing the spte
> in parallel.
> 


All of these case are ok, we do not directly update spte and it is updated like
this:

new_spte = 0x0ull /* set it to nonpresent, so rmap path can not be touched. */
set_spte(vcpu, &new_spte, ......) /* the expected spte is temporarily saved to new_spte.*/
cmpxchg(spte, spte_old, new_spte)

It depends on cmpxchg.

> What if another code path has read *spte, and did not re-read it because
> it holds the mmu lock and thinks only the guest can access it?
> 


We just change access bits here and the mapped pfn is not changed. All the read
paths are ok except page write-protect path since its work depends on the W bit.

That it is why we introduce spte.identification.

>>
>> +	return true;
>> +}
>> +
>> +static bool
>> +fast_page_fault_fix_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 spte,
>> +			 gfn_t gfn, u32 expect_access,
>> +			 fast_pf_fetch_spte fn)
>> +{
>> +	u64 new_spte = 0ull;
>> +	int vcpu_id = vcpu->vcpu_id;
>> +
>> +	spte = mark_vcpu_id_spte(sptep, spte, vcpu_id);
> 
> What's the point of reading spte?  It can change any minute, so the
> value is stale and we can't use it.  We have to read and lock it
> atomically, no?
> 


No. :)

> Your patches read the spte, then set the identity.  In between some
> other path can change the spte.
> 


It can set the new identity (vcpu id) whatever the current identity is, cmpxchg can
see the change: it updates spte only when the identity is matched.

next prev parent reply	other threads:[~2012-04-05 19:39 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-29  9:20 [PATCH 00/13] KVM: MMU: fast page fault Xiao Guangrong
2012-03-29  9:20 ` [PATCH 01/13] KVM: MMU: properly assert spte on rmap_next path Xiao Guangrong
2012-03-29  9:21 ` [PATCH 02/13] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-03-29 11:11   ` Avi Kivity
2012-03-29 11:51     ` Xiao Guangrong
2012-03-29  9:22 ` [PATCH 03/13] KVM: MMU: split FNAME(prefetch_invalid_gpte) Xiao Guangrong
2012-03-29 13:00   ` Avi Kivity
2012-03-30  3:51     ` Xiao Guangrong
2012-03-29  9:22 ` [PATCH 04/13] KVM: MMU: introduce FNAME(get_sp_gpa) Xiao Guangrong
2012-03-29 13:07   ` Avi Kivity
2012-03-30  5:01     ` Xiao Guangrong
2012-04-01 12:42       ` Avi Kivity
2012-03-29  9:23 ` [PATCH 05/13] KVM: MMU: reset shadow_mmio_mask Xiao Guangrong
2012-03-29 13:10   ` Avi Kivity
2012-03-29 15:28     ` Avi Kivity
2012-03-29 16:24       ` Avi Kivity
2012-03-29  9:23 ` [PATCH 06/13] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-03-29  9:24 ` [PATCH 07/13] KVM: MMU: store more bits in rmap Xiao Guangrong
2012-03-29  9:25 ` [PATCH 08/13] KVM: MMU: fask check whether page is writable Xiao Guangrong
2012-03-29 15:49   ` Avi Kivity
2012-03-30  5:10     ` Xiao Guangrong
2012-04-01 15:52   ` Avi Kivity
2012-04-05 17:54     ` Xiao Guangrong
2012-04-12 23:08       ` Marcelo Tosatti
2012-04-13 10:26         ` Xiao Guangrong
2012-03-29  9:25 ` [PATCH 09/13] KVM: MMU: get expected spte out of mmu-lock Xiao Guangrong
2012-04-01 15:53   ` Avi Kivity
2012-04-05 18:25     ` Xiao Guangrong
2012-04-09 12:28       ` Avi Kivity
2012-04-09 13:16         ` Takuya Yoshikawa
2012-04-09 13:21           ` Avi Kivity
2012-03-29  9:26 ` [PATCH 10/13] KVM: MMU: store vcpu id in spte to notify page write-protect path Xiao Guangrong
2012-03-29  9:27 ` [PATCH 11/13] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-03-31 12:24   ` Xiao Guangrong
2012-04-01 16:23   ` Avi Kivity
2012-04-03 13:04     ` Avi Kivity
2012-04-05 19:39     ` Xiao Guangrong [this message]
2012-03-29  9:27 ` [PATCH 12/13] KVM: MMU: trace fast " Xiao Guangrong
2012-03-29  9:28 ` [PATCH 13/13] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-03-29 10:18 ` [PATCH 00/13] KVM: MMU: fast page fault Avi Kivity
2012-03-29 11:40   ` Xiao Guangrong
2012-03-29 12:57     ` Avi Kivity
2012-03-30  9:18       ` Xiao Guangrong
2012-03-31 13:12         ` Xiao Guangrong
2012-04-01 12:58         ` Avi Kivity
2012-04-05 21:57           ` Xiao Guangrong
2012-04-06  5:24             ` Xiao Guangrong
2012-04-09 13:20               ` Avi Kivity
2012-04-09 13:59                 ` Xiao Guangrong
2012-04-09 13:12 ` Avi Kivity
2012-04-09 13:55   ` Xiao Guangrong
2012-04-09 14:01     ` Xiao Guangrong
2012-04-09 14:25     ` Avi Kivity
2012-04-09 17:58   ` Marcelo Tosatti
2012-04-09 18:13     ` Xiao Guangrong
2012-04-09 19:31       ` Marcelo Tosatti
2012-04-09 18:26     ` Xiao Guangrong
2012-04-09 19:46       ` Marcelo Tosatti
2012-04-10  3:06         ` Xiao Guangrong
2012-04-10 10:04         ` Avi Kivity
2012-04-11  1:47           ` Marcelo Tosatti
2012-04-11  9:15             ` Avi Kivity
2012-04-10 10:39         ` Avi Kivity
2012-04-10 11:40           ` Takuya Yoshikawa
2012-04-10 11:58             ` Xiao Guangrong
2012-04-11 12:15               ` Takuya Yoshikawa
2012-04-11 12:38                 ` Xiao Guangrong
2012-04-11 14:14                   ` Takuya Yoshikawa
2012-04-11 14:21                     ` Avi Kivity
2012-04-11 22:26                       ` Takuya Yoshikawa
2012-04-13 14:25                     ` Takuya Yoshikawa
2012-04-15  9:32                       ` Avi Kivity
2012-04-16 15:49                         ` Takuya Yoshikawa
2012-04-16 16:02                           ` Avi Kivity
2012-04-17  6:26                           ` Xiao Guangrong
2012-04-17  7:51                             ` Avi Kivity
2012-04-17 12:37                               ` Takuya Yoshikawa
2012-04-17 12:41                                 ` Avi Kivity
2012-04-17 14:54                                   ` Takuya Yoshikawa
2012-04-17 14:56                                     ` Avi Kivity
2012-04-18 13:42                                       ` Takuya Yoshikawa
2012-04-17  6:16                         ` Xiao Guangrong
2012-04-10 10:10       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F7DF4EB.8040708@gmail.com \
    --to=xiaoguangrong.eric@gmail.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).