From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH] KVM: MMU: Fix rmap_remove() race Date: Wed, 26 Mar 2008 16:01:31 -0300 Message-ID: <20080326190131.GA22963@dmt> References: <1206543773-26386-1-git-send-email-avi@qumranet.com> <20080326175128.GA21800@dmt> <20080326181225.GA11130@v2.random> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm-devel@lists.sourceforge.net, Avi Kivity To: Andrea Arcangeli Return-path: Content-Disposition: inline In-Reply-To: <20080326181225.GA11130@v2.random> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces@lists.sourceforge.net Errors-To: kvm-devel-bounces@lists.sourceforge.net List-Id: kvm.vger.kernel.org On Wed, Mar 26, 2008 at 07:12:25PM +0100, Andrea Arcangeli wrote: > On Wed, Mar 26, 2008 at 02:51:28PM -0300, Marcelo Tosatti wrote: > > Nope. If a physical CPU has page translations cached it _must_ be > > running in the context of a qemu thread (does not matter if its in > > userspace or executing guest code). The bit corresponding to such CPU's > > will be set in mm->vm_cpu_mask and flush_tlb_page() will take care of > > flushing the TLBs appropriately. > > That would require the tlb to lookup any SVM/VMX virtualized address > that could point to the same physical page that is pointed by the > invlpg linux virtual address. So it might be feasible if the hardware > is using a physical tag on each spte translation, but it'd require the > tlb to do a lot more work than we need it to do, so I hope you're > wrong on the hardware side of things. That would be an hardware > complexity or slowdown that would provide no benefit to software. > > But regardless, even if this was the case and you're right that > invalidating a linux userland address invalidates atomically all other > virtualized addresses in the svm/vmx tlbs (all asn included, not just > one), the spte is still instantiated when flush_tlb_page runs on all > cpus. You're right, there is no guarantee that a full TLB flush will happen, and invlpg for the qemu task virtual address is not enough. > So just after the global tlb flush, a guest spte tlb-miss (no page > fault required, no get_user_pages required) can happen that will > re-instantiate the spte contents inside the tlb before flush_tlb_page > returns. > > CPU0 CPU 1 > pte_clear() inside ptep_clear_flush > flush_tlb_page inside ptep_clear_flush inside rmap > page_count = 1 > guest tlb miss > tlb entry is back! > ioctl() > mark spte nonpresent > rmap_remove -> put_page > tlb flush > > Any tlb flush happening before clearing the shadow-pte entry is > totally useless. > > Avi patch is great fix, and it will need furtherx changes to properly > fix this race, because many set_shadow_pte/and pt[i] = nonpresent, are > executed _after_ rmap_remove (they must be executed first, in case the > array is full and we have flush tlb and free the page). One comment on the patch: +static void kvm_release_page_clean_deferred(struct kvm *kvm, + struct page *page) +{ + kvm_release_page_deferred(kvm, page, true); +} false ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace