From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [PATCH] KVM: MMU: Fix rmap_remove() race Date: Wed, 26 Mar 2008 20:22:31 +0100 Message-ID: <20080326192231.GC11130@v2.random> References: <1206543773-26386-1-git-send-email-avi@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel@lists.sourceforge.net, Marcelo Tosatti To: Avi Kivity Return-path: Content-Disposition: inline In-Reply-To: <1206543773-26386-1-git-send-email-avi@qumranet.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces@lists.sourceforge.net Errors-To: kvm-devel-bounces@lists.sourceforge.net List-Id: kvm.vger.kernel.org On Wed, Mar 26, 2008 at 05:02:53PM +0200, Avi Kivity wrote: > Andrea notes that freeing the page before flushing the tlb is a race, as the > guest can sneak in one last write before the tlb is flushed, writing to a > page that may belong to someone else. > > Fix be reversing the order of freeing and flushing the tlb. Since the tlb > flush is expensive, queue the pages to be freed so we need to flush just once. Problem is the *spte = nonpresent, must now happen _before_ rmap_remove for this race to be fixed. So we either remove the is_rmap_pte from the top of rmap_remove and we reorder all the callers, or we apply this. I'm not sure anymore the mmu notifiers are enough to fix this, because what happens if invalidate_page runs after rmap_remove is returned (the spte isn't visible anymore by the rmap code and in turn by invalidate_page) but before the set_shadow_pte(nonpresent) runs. Index: arch/x86/kvm/mmu.c --- arch/x86/kvm/mmu.c.~1~ 2008-03-26 16:55:14.000000000 +0100 +++ arch/x86/kvm/mmu.c 2008-03-26 19:57:53.000000000 +0100 @@ -582,7 +582,7 @@ static void kvm_release_page_clean_defer kvm_release_page_deferred(kvm, page, true); } -static void rmap_remove(struct kvm *kvm, u64 *spte) +static void rmap_remove(struct kvm *kvm, u64 *sptep) { struct kvm_rmap_desc *desc; struct kvm_rmap_desc *prev_desc; @@ -590,35 +590,37 @@ static void rmap_remove(struct kvm *kvm, struct page *page; unsigned long *rmapp; int i; + u64 spte = *sptep; - if (!is_rmap_pte(*spte)) + if (!is_rmap_pte(spte)) return; - sp = page_header(__pa(spte)); - page = spte_to_page(*spte); + sp = page_header(__pa(sptep)); + page = spte_to_page(spte); mark_page_accessed(page); - if (is_writeble_pte(*spte)) + set_shadow_pte(sptep, shadow_trap_nonpresent_pte); + if (is_writeble_pte(spte)) kvm_release_page_dirty_deferred(kvm, page); else kvm_release_page_clean_deferred(kvm, page); - rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], is_large_pte(*spte)); + rmapp = gfn_to_rmap(kvm, sp->gfns[sptep - sp->spt], is_large_pte(spte)); if (!*rmapp) { - printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte); + printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", sptep, spte); BUG(); } else if (!(*rmapp & 1)) { - rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte); - if ((u64 *)*rmapp != spte) { + rmap_printk("rmap_remove: %p %llx 1->0\n", sptep, spte); + if ((u64 *)*rmapp != sptep) { printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n", - spte, *spte); + sptep, spte); BUG(); } *rmapp = 0; } else { - rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte); + rmap_printk("rmap_remove: %p %llx many->many\n", sptep, spte); desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul); prev_desc = NULL; while (desc) { for (i = 0; i < RMAP_EXT && desc->shadow_ptes[i]; ++i) - if (desc->shadow_ptes[i] == spte) { + if (desc->shadow_ptes[i] == sptep) { rmap_desc_remove_entry(rmapp, desc, i, prev_desc); ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace