From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joerg Rodel Subject: Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled Date: Wed, 27 Aug 2008 18:27:15 +0200 Message-ID: <20080827162715.GA28498@amd.com> References: <1219839523-25677-1-git-send-email-joerg.roedel@amd.com> <48B55266.4000300@qumranet.com> <48B55C56.2060503@qumranet.com> <20080827135731.GC26059@amd.com> <48B57126.7000603@qumranet.com> <20080827153550.GB3801@8bytes.org> <48B577C3.3050302@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Joerg Roedel , kvm@vger.kernel.org, stable@kernel.org, Alexander Graf To: Avi Kivity Return-path: Received: from outbound-dub.frontbridge.com ([213.199.154.16]:60692 "EHLO IE1EHSOBE003.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754923AbYH0Q2R (ORCPT ); Wed, 27 Aug 2008 12:28:17 -0400 Content-Disposition: inline In-Reply-To: <48B577C3.3050302@qumranet.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Aug 27, 2008 at 06:50:27PM +0300, Avi Kivity wrote: > Joerg Roedel wrote: > >On Wed, Aug 27, 2008 at 06:22:14PM +0300, Avi Kivity wrote: > > > >>Joerg Rodel wrote: > >> > >>>I will test it. Is the fix in your latest kernel.org tree? > >>It is now. It doesn't fix the problem. > >> > >> > >>>Reproduce it > >>>with a KVM guest and start tbench in it with around 100 clients > >>>configured. The tbench-process will crash when the bug is hit. > >>> > >>Does it reproduce with uniprocessor guests? > >> > > > >Don't know yet. We will try that. > > > > > > It didn't reproduce here on uniprocessor, but I hadn't tried for long. We are still testing. In the moment it does not reproduce very fast, for whatever reason... > > Some observations: > > - tbench triggers many cases where we have concurrent faults on the same address. > these are serialized by mmu_lock. I tried to have direct_map_entry() return is > it detects a race. didn't help. > - I instrumented set_shadow_pte() to warn if changing the pfn or writeable bit. > Didn't trip. > > Are there any rules for touching npt ptes concurrently? Hmm, not that I am aware of. I will ask the silicon guys if they know something. But I don't think so. > Meanwhile, I applied the patch, but I'm very worried about this. Yes, we are also worried. Another question is why this only happens with NPT. The SoftMMU code should also fail with shadow paging if there is a bug. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy