From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@kernel.org, avi.kivity@gmail.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
Date: Tue, 15 Apr 2014 19:11:24 +0800 [thread overview]
Message-ID: <534D13DC.7040400@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140409145137.GA23449@amt.cnet>
Hi Marcelo,
Thanks your time to review it.
On 04/09/2014 10:51 PM, Marcelo Tosatti wrote:
>> +/*
>> + * Please note PT_WRITABLE_MASK is not stable since
>> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
>> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>> + * can write protect sptes but flush tlb out mmu-lock that means we may use
>> + * the corrupt tlb entries which depend on this bit.
>> + *
>> + * Both cases do not modify the status of spte_is_locklessly_modifiable() so
>> + * if you want to check whether the spte is writable on MMU you can check
>> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
>> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
>> + * instead. See the comments in spte_has_volatile_bits() and
>> + * mmu_spte_update().
>> + */
>> static inline int is_writable_pte(unsigned long pte)
>> {
>
> Xiao,
>
> Can't get the SPTE_MMU_WRITEABLE part.
>
> So assume you are writing code to perform some action after guest memory
> has been write protected. You would
>
> spin_lock(mmu_lock);
>
> if (writeable spte bit is set)
> remove writeable spte bit from spte
> remote TLB flush (*)
> action
>
> spin_unlock(mmu_lock);
>
> (*) is necessary because reading the writeable spte bit as zero
> does not guarantee remote TLBs have been flushed.
>
> Now what SPTE_MMU_WRITEABLE has to do with it ?
It is contained in "remove writeable spte bit from spte" which
is done by mmu_spte_update():
if (spte_is_locklessly_modifiable(old_spte) &&
!is_writable_pte(new_spte))
ret = true;
>
> Perhaps a recipe like that (or just the rules) would be useful.
Okay, i am considering to improve this comments, how about like
this:
Currently, we have two sorts of write-protection, a) the first
one write-protects guest page to sync the guest modification,
b) another one is used to sync dirty bitmap when we do
KVM_GET_DIRTY_LOG. The differences between these two sorts are:
1) the first case clears SPTE_MMU_WRITEABLE bit.
2) the first case requires flushing tlb immediately avoiding
corrupting shadow page table between all vcpus so it should
be in the protection of mmu-lock. And the another case does
not need to flush tlb until returning the dirty bitmap to
userspace since it only write-protects the page logged in
the bitmap, that means the page in the dirty bitmap is not
missed, so it can flush tlb out of mmu-lock.
So, there is the problem: the first case can meet the corrupted
tlb caused by another case which write-protects pages but without
flush tlb immediately. In order to making the first case be aware
this problem we let it flush tlb if we try to write-protect
a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another
case never touches SPTE_MMU_WRITEABLE bit.
Anyway, whenever a spte is updated (only permission and status bits
are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE
becomes readonly, if that happens, we need to flush tlb. Fortunately,
mmu_spte_update() has already handled it perfectly.
The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
- if we want to see if it has writable tlb entry or if the spte can
be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is
the most case, otherwise
- if we fix page fault on the spte or do write-protection by dirty logging,
check PT_WRITABLE_MASK.
TODO: introduce APIs to split these two cases.
>
> The remaining patches look good.
Thank you.
next prev parent reply other threads:[~2014-04-15 11:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
2014-03-10 14:01 ` [PATCH 1/5] Revert "KVM: Simplify kvm->tlbs_dirty handling" Xiao Guangrong
2014-03-10 14:01 ` [PATCH 2/5] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2014-03-10 14:01 ` [PATCH 3/5] KVM: MMU: lazily drop large spte Xiao Guangrong
2014-03-10 14:01 ` [PATCH 4/5] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2014-03-10 14:01 ` [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2014-04-09 14:51 ` Marcelo Tosatti
2014-04-15 11:11 ` Xiao Guangrong [this message]
2014-03-25 3:29 ` [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
2014-03-25 10:22 ` Paolo Bonzini
2014-03-25 16:25 ` Hu Yaohui
2014-03-26 4:40 ` Hu Yaohui
2014-03-26 5:07 ` Xiao Guangrong
2014-03-26 5:30 ` Hu Yaohui
2014-03-26 4:54 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534D13DC.7040400@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).