From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction Date: Thu, 10 Jul 2008 17:42:19 +0300 Message-ID: <48761FCB.4080307@qumranet.com> References: <20080704010618.GA25834@dmt.cnet> <486FAE8A.2050308@qumranet.com> <20080705192344.GA16596@dmt.cnet> <486FE48C.7030002@qumranet.com> <20080707173155.GB10372@dmt.cnet> <20080707195822.GA16787@dmt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Richard W.M. Jones" , kvm-devel , Hollis Blanchard , "Zhang, Xiantao" To: Marcelo Tosatti Return-path: Received: from il.qumranet.com ([212.179.150.194]:34985 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754156AbYGJOmV (ORCPT ); Thu, 10 Jul 2008 10:42:21 -0400 In-Reply-To: <20080707195822.GA16787@dmt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > On Mon, Jul 07, 2008 at 02:31:55PM -0300, Marcelo Tosatti wrote: > >> On Sun, Jul 06, 2008 at 12:15:56AM +0300, Avi Kivity wrote: >> >>> Marcelo Tosatti wrote: >>> >>>> On Sat, Jul 05, 2008 at 08:25:30PM +0300, Avi Kivity wrote: >>>> >>>> >>>>>> @@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st >>>>>> } >>>>>> } >>>>>> +int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot) >>>>>> +{ >>>>>> + struct kvm_mmu_page *sp; >>>>>> + int ret = 0; >>>>>> + >>>>>> + spin_lock(&kvm->mmu_lock); >>>>>> + list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) { >>>>>> + if (test_bit(slot, &sp->slot_bitmap)) { >>>>>> + ret = -EINVAL; >>>>>> + break; >>>>>> + } >>>>>> + } >>>>>> + spin_unlock(&kvm->mmu_lock); >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> >>>>>> >>>>> I don't like the guest influencing host actions in this way. It's >>>>> just a guest. >>>>> >>>>> But I think it's unneeded. kvm_mmu_zap_page() will mark a root >>>>> shadow page invalid and force all vcpus to reload it, so all that's >>>>> needed is to keep the mmu spinlock held while removing the slot. >>>>> >>>>> >>>> You're still keeping a shadowed page around with sp->gfn pointing to >>>> non-existant memslot. The code generally makes the assumption that >>>> gfn_to_memslot(gfn) on shadowed info will not fail. >>>> >>>> kvm_mmu_zap_page -> unaccount_shadowed, for example. >>>> >>>> >>>> >>> The page has already been zapped, so we might as well >>> unaccount_shadowed() on the first run. It needs to be moved until after >>> the reload_remote_mmus() call, though. >>> > > Oops, previous patch was unaccounting multiple times for invalid pages. > This should be better: > > > During RH6.2 graphical installation the following oops is triggered: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [] :kvm:gfn_to_rmap+0x3e/0x61 > Pid: 4559, comm: qemu-system-x86 Not tainted > > The problem is that KVM allows shadow pagetable entries that > point to a removed memslot to exist. In this case the cirrus vram > mapping was removed, and the NULL dereference happened during > kvm_set_memory_alias()'s zap_all_pages(). > > So nuke all shadowed pages before memslot removal. > > Signed-off-by: Marcelo Tosatti > > > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c > index a4cf4a2..76259da 100644 > --- a/arch/ia64/kvm/kvm-ia64.c > +++ b/arch/ia64/kvm/kvm-ia64.c > @@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kvm *kvm, > return 0; > } > > +int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot) > +{ > + return 0; > +} > > This (and its friends) ought to be static inlines. On the other hand, don't the other arches have to flush their tlbs? Xiantao/Hollis? So maybe this function needs to be renamed kvm_flush_shadow() and implemented across the board. > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index b90da0b..5ef3a5e 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -405,6 +405,12 @@ int __kvm_set_memory_region(struct kvm *kvm, > if (mem->slot >= kvm->nmemslots) > kvm->nmemslots = mem->slot + 1; > > + if (!npages) { > + r = kvm_arch_destroy_memory_region(kvm, mem->slot); > + if (r) > + goto out_free; > + } > + > Destructors should never fail, since there is no possible recovery. And indeed you have 'return 0' in the actual implementation. So I think the function better return void. -- error compiling committee.c: too many arguments to function