From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v5 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect
 the sptes
Date: Mon, 28 Apr 2014 13:30:50 +0200
Message-ID: <535E3BEA.4050704@redhat.com>
References: <1397725576-6617-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <1397725576-6617-6-git-send-email-xiaoguangrong@linux.vnet.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Cc: avi.kivity@gmail.com, mtosatti@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>, gleb@kernel.org
Return-path: <linux-kernel-owner@vger.kernel.org>
In-Reply-To: <1397725576-6617-6-git-send-email-xiaoguangrong@linux.vnet.ibm.com>
Sender: linux-kernel-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

What about some editing of the big comment...

/*
 * Currently, shadow PTEs are write protected in two cases, 1) write protecting
 * guest page tables, 2) resetting dirty tracking after KVM_GET_DIRTY_LOG. The
 * differences between these two sorts are:
 *
 * a) only the first case clears SPTE_MMU_WRITEABLE bit.
 *
 * b) the first case requires flushing the TLB immediately to avoid corruption
 *    of the shadow page table on other VCPUs.  In order to synchronize with
 *    other VCPUs the flush is done under the MMU lock.
 *
 *    The second case instead can delay flushing of the TLB until just before
 *    returning the dirty bitmap is returned to userspace; this is because it
 *    only write-protects pages that are set in the bitmap, and further writes
 *    to those pages can be safely ignored until userspace examines the bitmap.
 *    We rely on this to flush the TLB outside the MMU lock.
 *
 * A problem arises when these two cases occur concurrently.  Userspace can
 * call KVM_GET_DIRTY_LOG, which write-protects pages but does not immediately
 * flush the TLB; in the meanwhile, KVM wants to write-protect a guest page
 * table, sees it's already write-protected, and the result is a corrupted TLB.
 *
 * To avoid this problem, when write protecting guest page tables we *always*
 * flush the TLB if the spte has the SPTE_MMU_WRITEABLE bit set, even if 
 * the spte was already write-protected.  This works since case 2 never touches
 * SPTE_MMU_WRITEABLE bit.  In other words, whenever a spte is updated (only
 * permission and status bits are changed) we need to check whether a spte with
 * SPTE_MMU_WRITEABLE becomes readonly.  If that happens, we flush the TLB.
 * mmu_spte_update() handles this.
 *
 * The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK are as follows:
 *
 * a) if you want to see if it has a writable TLB entry, or if the spte can be
 *    writable on the mmu mapping, check SPTE_MMU_WRITEABLE.  This is the most
 *    common case, otherwise
 *
 * b) when fixing a page fault on the spte or doing write-protection for
 *    dirty logging, check PT_WRITABLE_MASK.


Is the above accurate?

>  * TODO: introduce APIs to split these two cases.

What do you mean exactly?

Paolo