From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [BUG] Xen vm kernel crash in get_free_entries. Date: Tue, 12 Nov 2013 10:56:14 -0500 Message-ID: <20131112155614.GA11354@phenom.dumpdata.com> References: <1382089564.18128.103.camel@kazak.uk.xensource.com> <20131018141554.GN2924@reaktio.net> <1382105942.18128.124.camel@kazak.uk.xensource.com> <1746041225.20131019011459@eikelenboom.it> <5262642A.2060609@rat.ru> <1382180597.28188.9.camel@dagon.hellion.org.uk> <267039552.20131019135850@eikelenboom.it> <20131021105510.GB12019@u109add4315675089e695.ant.amazon.com> <527B2311.2030605@rat.ru> <1383832023.32399.39.camel@kazak.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1383832023.32399.39.camel@kazak.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Steven Noonan , Stefano Stabellini , Matt Wilson , xen-devel@lists.xen.org, Sander Eikelenboom , astarta@rat.ru, David Vrabel , Matt Wilson List-Id: xen-devel@lists.xenproject.org On Thu, Nov 07, 2013 at 01:47:03PM +0000, Ian Campbell wrote: > On Thu, 2013-11-07 at 09:20 +0400, Astarta wrote: > > Hello, > > > > Let me bring some new life to this discussion. > > > > I've investigated a bit and found another way to make kernels starting > > from 3.8.x to boot on the VMs with platform device_id 0002. > > Reverting of xen-grant-table-correctly-initialize-grant-table-version-1 > > patch is not necessary. > > > > We can simply modify struct pci_device_id platform_pci_tbl[] (in > > drivers/xen/platform-pci.c) to respect 0002 and 0000 device ids. > > That makes the kernel (3.8.x and 3.11.6) to boot correctly, disks and > > network are also recognized. > > I think this is just working around the problem, by avoiding the > situation where the error occurs. You could just as well switch to > platform device id < 2. I am bit late to this discussion - but shouldn't there be something in the kernel to deal with this? > > > IMO, there is no need to add new fields with device id 0002 and device > > id 0000 to platform_pci_tbl[] , we can modify the existing one to use > > PCI_ANY_ID instead of PCI_DEVICE_ID_XEN_PLATFORM (which is 0001), so if > > we have PCI_VENDOR_ID_XEN there is no need to pay attention on device id. > > That omits the possibility that a future rev might differ in some > meaningful way though. > > Ian. > > > > > So the patch is more than simple. See attached. I've tested the resulted > > kernel in my environment (with device ids 0002, 0001 and 0000) and it > > seems to work well. > > > > > > -- > > Marina > > > > On 10/21/2013 02:55 PM, Matt Wilson wrote: > > > On Sat, Oct 19, 2013 at 01:58:50PM +0200, Sander Eikelenboom wrote: > > >> Saturday, October 19, 2013, 1:03:17 PM, you wrote: > > >> > > >>> On Sat, 2013-10-19 at 14:51 +0400, Astarta wrote: > > >>>> On 10/19/2013 03:14 AM, Sander Eikelenboom wrote: > > >>>>> makes a HVM guest (qemu-xen-traditional) with xen_platform_pci=0 boot again using xl, haven't tested it with xend. > > >>>>> > > >>>> Great catch! > > >>>> I also confirm that 3.11.5 kernel boots just fine after reverting of > > >>>> 'correctly initialize grant table version 1' patch. > > >>> This could just be down to that patch adding some BUG_ONs to catch bad > > >>> things going on, e.g. the one in gnttab_expand which I think is being > > >>> hit here. > > >>> I have a feeling that it is still wrong (but just more benign) to be > > >>> hitting that call chain in a configuration where there is no platform > > >>> device driver running. IOW reverting that patch removes the obvious > > >>> symptom (blowing up) but not the root cause, i.e. the patch is doing its > > >>> job. > > >> That was my suspicion too, but at least it seems like some starting point > > >> of further debugging. > > >> (and indication of the kernels affected since this commit went to stable as well) > > >> > > >> Since i was still seeing the "Booting PV enabled guest on Xen HVM" is was wondering > > >> what is supposed to happen when there are some combinations .... > > > This is the enlightenment code noticing that it's running in a HVM > > > guest under Xen via the hypervisor cpuid leaf (cpuid leaf > > > 0x40000000). > > > > > >> xen HVM xen_platform_pci=0 + guest kernel without PV guest support and without xen pv drivers (net + block) > > > This should work. > > > > > >> xen HVM xen_platform_pci=0 + guest kernel with PV guest support but without xen pv drivers (net + block) > > > This should work. > > > > > >> xen HVM xen_platform_pci=0 + guest kernel with PV guest support and with xen pv drivers (net + block) > > >> -- This is the configuration that hits the bug described here. > > > I don't see how this can be expected to work - the PV net and block > > > devices need the facilities that are initialized by the Xen platform > > > PCI device to operate. Of course it shouldn't crash either, it should > > > just use emulated devices instead of xen-netfront/xen-blkfront. > > > > > >> xen HVM xen_platform_pci=1 + guest kernel without PV guest support and without xen pv drivers (net + block) > > > This should work. > > > > > >> xen HVM xen_platform_pci=1 + guest kernel with PV guest support and without xen pv drivers (net + block) > > > This should work. > > > > > >> xen HVM xen_platform_pci=1 + guest kernel with PV guest support and with xen pv drivers (net + block) > > > This should work. > > > > > >> Booting a guest kernel with PV support as HVM but without using PV doesn't seem possible with a .cfg option ? > > >> (yes it's a hypothetical option (performance wise), as is running with a guest kernel which supports PV drivers, > > >> but not using them with xen_platform_pci=0 .. but it is useful for debugging ) > > > AFAICT the expected behavior would be to for the guest kernel to use > > > basic enlightenment for CPU operations (hotplug, timers) but no PV IO > > > support (net + block). But perhaps I'm missing something since you > > > theoretically don't need the PCI device if you have event channel > > > callback support in the guest kernel and sufficient support in the > > > hypervisor. > > > > > > --msw > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel