From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Kernel bug from 3.0 (was phy disks and vifs timing out in DomU) Date: Thu, 01 Sep 2011 10:32:07 -0700 Message-ID: <4E5FC197.7040004@goop.org> References: <4E31820C.5030200@overnetdata.com> <1311870512.24408.153.camel@cthulhu.hellion.org.uk> <4E3266DE.9000606@overnetdata.com> <20110803152841.GA2860@dumpdata.com> <4E4E3957.1040007@overnetdata.com> <20110819125615.GA26558@dumpdata.com> <4E56B132.9050708@overnetdata.com> <20110826142606.GA25511@dumpdata.com> <20110826144438.GA24836@dumpdata.com> <4E5E6843.7050206@citrix.com> <20110831170711.GB13642@dumpdata.com> <1314862972.28989.74.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1314862972.28989.74.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: Todd Deshane , "xen-devel@lists.xensource.com" , David Vrabel , Anthony Wright , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 09/01/2011 12:42 AM, Ian Campbell wrote: > On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote: >> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote: >>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote: >>>> So while I am still looking at the hypervisor code to figure out why >>>> it would give me [when trying to map a grant page]: >>>> >>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 >>> It is failing in guest_map_l1e() because the page for the vmalloc'd >>> virtual address PTEs is not present. >>> >>> The test that fails is: >>> >>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT >>> >>> I think this is because the GNTTABOP_map_grant_ref hypercall is done >>> when task->active_mm != &init_mm and alloc_vm_area() only adds PTEs into >>> init_mm so when Xen looks in the page tables it doesn't find the entries >>> because they're not there yet. >>> >>> Putting a call to vmalloc_sync_all() after create_vm_area() and before >>> the hypercall makes it work for me. Classic Xen kernels used to have >>> such a call. >> That sounds quite reasonable. > I was wondering why upstream was missing the vmalloc_sync_all() in > alloc_vm_area() since the out-of-tree kernels did have it and the > function was added by us. I found this: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a > > commit ef691947d8a3d479e67652312783aedcf629320a > Author: Jeremy Fitzhardinge > Date: Wed Dec 1 15:45:48 2010 -0800 > > vmalloc: remove vmalloc_sync_all() from alloc_vm_area() > > There's no need for it: it will get faulted into the current pagetable > as needed. > > Signed-off-by: Jeremy Fitzhardinge > > The flaw in the reasoning here is that you cannot take a kernel fault > while processing a hypercall, so hypercall arguments must have been > faulted in beforehand and that is what the sync_all was for. That's a good point. (Maybe Xen should have generated pagefaults when hypercall arg pointers are bad...) > It's probably fair to say that the Xen specific caller should take care > of that Xen-specific requirement rather than pushing it into common > code. On the other hand Xen is the only user and creating a Xen specific > helper/wrapper seems a bit pointless. There's already a wrapper: xen_alloc_vm_area(), which is just a #define. But we could easily add a sync_all to it (and use it in netback, like we do in grant-table and xenbus). J