From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joerg Rodel Subject: Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled Date: Thu, 28 Aug 2008 16:58:38 +0200 Message-ID: <20080828145838.GA4971@amd.com> References: <1219839523-25677-1-git-send-email-joerg.roedel@amd.com> <48B55266.4000300@qumranet.com> <48B55C56.2060503@qumranet.com> <20080827135731.GC26059@amd.com> <48B57126.7000603@qumranet.com> <20080827153550.GB3801@8bytes.org> <48B577C3.3050302@qumranet.com> <20080827162715.GA28498@amd.com> <48B58599.1040509@qumranet.com> <48B587EC.7020606@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Joerg Roedel , kvm@vger.kernel.org, stable@kernel.org, Alexander Graf To: Avi Kivity Return-path: Received: from outbound-wa4.frontbridge.com ([216.32.181.16]:5665 "EHLO WA4EHSOBE001.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752993AbYH1O67 (ORCPT ); Thu, 28 Aug 2008 10:58:59 -0400 Content-Disposition: inline In-Reply-To: <48B587EC.7020606@qumranet.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Aug 27, 2008 at 07:59:24PM +0300, Avi Kivity wrote: > Avi Kivity wrote: > >Joerg Rodel wrote: > >>>Meanwhile, I applied the patch, but I'm very worried about this. > >>> > >> > >>Yes, we are also worried. Another question is why this only happens with > >>NPT. The SoftMMU code should also fail with shadow paging if there is a > >>bug. > >> > > > >Slightly different paths -- direct_map vs page_fault. Also, with npt, all cpus will access the same pte > >that's being modified; without npt, faults on the same page will result in different ptes being instantiated, > >as each access will be from a different guest pte. > > > >Maybe we should turn on the dirty bit in the instantiated ptes -- that will reduce the processor's mucking > >about with them. > > > > I meant the accessed bit. The dirty bit is always set, but the accessed bit it not, due to a bug. Fixing it > doesn't help, though. I did a bit meditation about the softmmu code today. In the path of the NPT fault the function kvm_mmu_free_some_pages() is called which itself calls kvm_mmu_zap_page(). There the two functions kvm_mmu_page_unlink_children() and kvm_mmu_unlink_parents() are called. They both call mmu_page_remove_parent_pte() which modifies ptes. But only the first function, kvm_mmu_page_unlink_children(), flushes remote TLBs. The function kvm_mmu_unlink_parents() does not. Is this correct? If yes, why? Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy