From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v3 04/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
Date: Fri, 15 Nov 2013 15:09:13 +0800 [thread overview]
Message-ID: <5285C899.6040005@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131114183959.GA27064@amt.cnet>
On 11/15/2013 02:39 AM, Marcelo Tosatti wrote:
> On Thu, Nov 14, 2013 at 01:15:24PM +0800, Xiao Guangrong wrote:
>>
>> Hi Marcelo,
>>
>> On 11/14/2013 08:36 AM, Marcelo Tosatti wrote:
>>
>>>
>>> Any code location which reads the writable bit in the spte and assumes if its not
>>> set, that the translation which the spte refers to is not cached in a
>>> remote CPU's TLB can become buggy. (*)
>>>
>>> It might be the case that now its not an issue, but its so subtle that
>>> it should be improved.
>>>
>>> Can you add a fat comment on top of is_writeable_bit describing this?
>>> (and explain why is_writable_pte users do not make an assumption
>>> about (*).
>>>
>>> "Writeable bit of locklessly modifiable sptes might be cleared
>>> but TLBs not flushed: so whenever reading locklessly modifiable sptes
>>> you cannot assume TLBs are flushed".
>>>
>>> For example this one is unclear:
>>>
>>> if (!can_unsync && is_writable_pte(*sptep))
>>> goto set_pte;
>>> And:
>>>
>>> if (!is_writable_pte(spte) &&
>>> !(pt_protect && spte_is_locklessly_modifiable(spte)))
>>> return false;
>>>
>>> This is safe because get_dirty_log/kvm_mmu_slot_remove_write_access are
>>> serialized by a single mutex (if there were two mutexes, it would not be
>>> safe). Can you add an assert to both
>>> kvm_mmu_slot_remove_write_access/kvm_vm_ioctl_get_dirty_log
>>> for (slots_lock) is locked, and explain?
>>>
>>> So just improve the comments please, thanks (no need to resend whole
>>> series).
>>
>> Thank you very much for your time to review it and really appreciate
>> for you detailed the issue so clearly to me.
>>
>> I will do it on the top of this patchset or after it is merged
>> (if it's possiable).
>
> Ok, can you explain why every individual caller of is_writable_pte have
> no such assumption now? (the one mentioned above is not clear to me for
> example, should explain all of them).
Okay.
Generally speak, we 1) needn't care readonly spte too much since it
can not be locklessly write-protected and 2) if is_writable_pte() is used
to check mmu-mode's state we can check SPTE_MMU_WRITEABLE instead.
There are the places is_writable_pte is used:
1) in spte_has_volatile_bits():
527 static bool spte_has_volatile_bits(u64 spte)
528 {
529 /*
530 * Always atomicly update spte if it can be updated
531 * out of mmu-lock, it can ensure dirty bit is not lost,
532 * also, it can help us to get a stable is_writable_pte()
533 * to ensure tlb flush is not missed.
534 */
535 if (spte_is_locklessly_modifiable(spte))
536 return true;
537
538 if (!shadow_accessed_mask)
539 return false;
540
541 if (!is_shadow_present_pte(spte))
542 return false;
543
544 if ((spte & shadow_accessed_mask) &&
545 (!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
546 return false;
547
548 return true;
549 }
this path is not broken since any spte can be lockless modifiable will do
lockless update (will always return 'true' in the line 536).
2): in mmu_spte_update()
594 /*
595 * For the spte updated out of mmu-lock is safe, since
596 * we always atomicly update it, see the comments in
597 * spte_has_volatile_bits().
598 */
599 if (spte_is_locklessly_modifiable(old_spte) &&
600 !is_writable_pte(new_spte))
601 ret = true;
The new_spte is a temp value that can not be fetched by lockless
write-protection and !is_writable_pte() is stable enough (can not be
locklessly write-protected).
3) in spte_write_protect()
1368 if (!is_writable_pte(spte) &&
1369 !spte_is_locklessly_modifiable(spte))
1370 return false;
1371
It always do write-protection if the spte is lockelss modifiable.
(This code is the aspect after applying the whole pachset, the code is safe too
before patch "[PATCH v3 14/15] KVM: MMU: clean up spte_write_protect" since
the lockless write-protection path is serialized by a single lock.).
4) in set_spte()
2690 /*
2691 * Optimization: for pte sync, if spte was writable the hash
2692 * lookup is unnecessary (and expensive). Write protection
2693 * is responsibility of mmu_get_page / kvm_sync_page.
2694 * Same reasoning can be applied to dirty page accounting.
2695 */
2696 if (!can_unsync && is_writable_pte(*sptep))
2697 goto set_pte;
It is used for a optimization and the worst case is the optimization is disabled
(walking the shadow pages in the hast table) when the spte has been locklessly
write-protected. It does not hurt anything since it is a rare event. And the
optimization can be back if we check SPTE_MMU_WRITEABLE instead.
5) fast_page_fault()
3110 /*
3111 * Check if it is a spurious fault caused by TLB lazily flushed.
3112 *
3113 * Need not check the access of upper level table entries since
3114 * they are always ACC_ALL.
3115 */
3116 if (is_writable_pte(spte)) {
3117 ret = true;
3118 goto exit;
3119 }
Since kvm_vm_ioctl_get_dirty_log() firstly get-and-clear dirty-bitmap before
do write-protect, the dirty-bitmap will be properly set again when fast_page_fault
fix the spte who is write-protected by lockless write-protection.
6) in fast_page_fault's tracepoint:
244 #define __spte_satisfied(__spte) \
245 (__entry->retry && is_writable_pte(__entry->__spte))
It causes the tracepoint reports the wrong result when fast_page_fault
and tdp_page_fault/lockless-write-protect run concurrently, i guess it's
okay since it's only used for trace.
7) in audit_write_protection():
202 if (is_writable_pte(*sptep))
203 audit_printk(kvm, "shadow page has writable "
204 "mappings: gfn %llx role %x\n",
205 sp->gfn, sp->role.word);
It's okay since lockless-write-protection does not update the readonly sptes.
>
> OK to improve comments later.
Thank you, Marcelo!
next prev parent reply other threads:[~2013-11-15 7:09 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-23 13:29 [PATCH v3 00/15] KVM: MMU: locklessly write-protect Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 01/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-11-12 0:25 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 02/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-11-12 22:44 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 03/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-11-13 0:10 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 04/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-11-14 0:36 ` Marcelo Tosatti
2013-11-14 5:15 ` Xiao Guangrong
2013-11-14 18:39 ` Marcelo Tosatti
2013-11-15 7:09 ` Xiao Guangrong [this message]
2013-11-19 0:19 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 05/15] KVM: MMU: update spte and add it into rmap before dirty log Xiao Guangrong
2013-11-15 0:08 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 06/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-11-19 0:48 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 07/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-11-22 19:14 ` Marcelo Tosatti
2013-11-25 6:11 ` Xiao Guangrong
2013-11-25 6:29 ` Xiao Guangrong
2013-11-25 18:12 ` Marcelo Tosatti
2013-11-26 3:21 ` Xiao Guangrong
2013-11-26 10:12 ` Gleb Natapov
2013-11-26 19:31 ` Marcelo Tosatti
2013-11-28 8:53 ` Xiao Guangrong
2013-12-03 7:10 ` Xiao Guangrong
2013-12-05 13:50 ` Marcelo Tosatti
2013-12-05 15:30 ` Xiao Guangrong
2013-12-06 0:15 ` Marcelo Tosatti
2013-12-06 0:22 ` Marcelo Tosatti
2013-12-10 6:58 ` Xiao Guangrong
2013-11-25 10:19 ` Gleb Natapov
2013-11-25 10:25 ` Xiao Guangrong
2013-11-25 12:48 ` Avi Kivity
2013-11-25 14:23 ` Marcelo Tosatti
2013-11-25 14:29 ` Gleb Natapov
2013-11-25 18:06 ` Marcelo Tosatti
2013-11-26 3:10 ` Xiao Guangrong
2013-11-26 10:15 ` Gleb Natapov
2013-11-26 19:58 ` Marcelo Tosatti
2013-11-28 8:32 ` Xiao Guangrong
2013-11-25 14:08 ` Marcelo Tosatti
2013-11-26 3:02 ` Xiao Guangrong
2013-11-25 9:31 ` Peter Zijlstra
2013-11-25 10:59 ` Xiao Guangrong
2013-11-25 11:05 ` Peter Zijlstra
2013-11-25 11:29 ` Peter Zijlstra
2013-10-23 13:29 ` [PATCH v3 08/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 09/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 10/15] KVM: MMU: allocate shadow pages from slab Xiao Guangrong
2013-10-24 9:19 ` Gleb Natapov
2013-10-24 9:29 ` Xiao Guangrong
2013-10-24 9:52 ` Gleb Natapov
2013-10-24 10:10 ` Xiao Guangrong
2013-10-24 10:39 ` Gleb Natapov
2013-10-24 11:01 ` Xiao Guangrong
2013-10-24 12:32 ` Gleb Natapov
2013-10-28 3:16 ` Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 11/15] KVM: MMU: locklessly access shadow page under rcu protection Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 12/15] KVM: MMU: check last spte with unawareness of mapping level Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-10-24 9:17 ` Gleb Natapov
2013-10-24 9:24 ` Xiao Guangrong
2013-10-24 9:32 ` Gleb Natapov
2013-10-23 13:29 ` [PATCH v3 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-11-03 12:29 ` [PATCH v3 00/15] KVM: MMU: locklessly write-protect Gleb Natapov
2013-11-11 5:33 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5285C899.6040005@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.