From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen-unstable-staging: Xen BUG at iommu_map.c:455 Date: Sat, 11 Apr 2015 18:35:57 +0100 Message-ID: <55295B7D.4020107@citrix.com> References: <1864736142.20150331231153@eikelenboom.it> <551B2FFA.6080001@citrix.com> <10810464855.20150410122433@eikelenboom.it> <55281C9F.7080409@citrix.com> <467745093.20150411161154@eikelenboom.it> <55292E04.9000302@citrix.com> <1246180833.20150411182125@eikelenboom.it> <55294CA8.2070208@citrix.com> <55294DF9.9040207@citrix.com> <1189087521.20150411192519@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YgzK7-00005A-GN for xen-devel@lists.xenproject.org; Sat, 11 Apr 2015 17:36:03 +0000 In-Reply-To: <1189087521.20150411192519@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom Cc: xen-devel , Paul Durrant , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 11/04/15 18:25, Sander Eikelenboom wrote: > Saturday, April 11, 2015, 6:38:17 PM, you wrote: > >> On 11/04/15 17:32, Andrew Cooper wrote: >>> On 11/04/15 17:21, Sander Eikelenboom wrote: >>>> Saturday, April 11, 2015, 4:21:56 PM, you wrote: >>>> >>>>> On 11/04/15 15:11, Sander Eikelenboom wrote: >>>>>> Friday, April 10, 2015, 8:55:27 PM, you wrote: >>>>>> >>>>>>> On 10/04/15 11:24, Sander Eikelenboom wrote: >>>>>>>> Hi Andrew, >>>>>>>> >>>>>>>> Finally got some time to figure this out .. and i have narrowed it down to: >>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git >>>>>>>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the ioreq-server API when available" >>>>>>>> A straight revert of this commit prevents the issue from happening. >>>>>>>> >>>>>>>> The reason i had a hard time figuring this out was: >>>>>>>> - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't >>>>>>>> auto update the qemu-* trees. >>>>>>> This has caught me out so many times. It is very non-obvious behaviour. >>>>>>>> - So i happen to get this when i cloned a fresh tree to try to figure out the >>>>>>>> other issue i was seeing. >>>>>>>> - After that checking out previous versions of the main xen tree didn't resolve >>>>>>>> this new issue, because the qemu tree doesn't get auto updated and is set >>>>>>>> "master". >>>>>>>> - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific >>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not >>>>>>>> master. >>>>>>>> >>>>>>>> *sigh* >>>>>>>> >>>>>>>> This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 >>>>>>>> (normal xen-unstable, not the staging branch) >>>>>>>> >>>>>>>> Ok so i have added some extra debug info (see attached diff) and this is the >>>>>>>> output when it crashes due to something the commit above triggered, the >>>>>>>> level is out of bounds and the pfn looks fishy too. >>>>>>>> Complete serial log from both bad and good (specific commit reverted) are >>>>>>>> attached. >>>>>>> Just to confirm, you are positively identifying a qemu changeset as >>>>>>> causing this crash? >>>>>>> If so, the qemu change has discovered a pre-existing issue in the >>>>>>> toolstack pci-passthrough interface. Whatever qemu is or isn't doing, >>>>>>> it should not be able to cause a crash like this. >>>>>>> With this in mind, I need to brush up on my AMD-Vi details. >>>>>>> In the meantime, can you run with the following patch to identify what >>>>>>> is going on, domctl wise? I assume it is the assign_device which is >>>>>>> failing, but it will be nice to observe the differences between the >>>>>>> working and failing case, which might offer a hint. >>>>>> Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: >>>>>> >>>>>> (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- >>>>>> (XEN) [2015-04-11 14:03:31.857] CPU: 5 >>>>>> (XEN) [2015-04-11 14:03:31.868] RIP: e008:[] iommu_do_pci_domctl+0x2dc/0x740 >>>>>> (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256 CONTEXT: hypervisor >>>>>> (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008 rbx: 0000000000000800 rcx: ffffffffffebe5ed >>>>>> (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800 rsi: 0000000000000000 rdi: ffff830256ef7e38 >>>>>> (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98 rsp: ffff830256ef7c08 r8: 00000000deadbeef >>>>>> (XEN) [2015-04-11 14:03:31.995] r9: 00000000deadbeef r10: ffff82d08024e500 r11: 0000000000000282 >>>>>> (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000 r13: 0000000000000008 r14: 0000000000000000 >>>>>> (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000006f0 >>>>>> (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000 cr2: 0000000000000000 >>>>>> (XEN) [2015-04-11 14:03:32.096] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >>>>>> (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=ffff830256ef7c08: >>>>>> (XEN) [2015-04-11 14:03:32.141] ffff830256ef7c78 ffff82d08012c178 ffff830256ef7c28 ffff830256ef7c28 >>>>>> (XEN) [2015-04-11 14:03:32.168] 0000000000000010 0000000000000000 0000000000000000 0000000000000000 >>>>>> (XEN) [2015-04-11 14:03:32.195] 00000000000006f0 00007fe300000000 ffff830256eb7790 ffff83025cc6d300 >>>>>> (XEN) [2015-04-11 14:03:32.222] ffff82d080330c60 00007fe396bab004 0000000000000000 00007fe396bab004 >>>>>> (XEN) [2015-04-11 14:03:32.249] 0000000000000000 0000000000000005 ffff830256ef7ca8 ffff82d08014900b >>>>>> (XEN) [2015-04-11 14:03:32.276] ffff830256ef7d98 ffff82d080161f2d 0000000000000010 0000000000000000 >>>>>> (XEN) [2015-04-11 14:03:32.303] 0000000000000000 ffff830256ef7ce8 ffff82d08018b655 ffff830256ef7d48 >>>>>> (XEN) [2015-04-11 14:03:32.330] ffff830256ef7cf8 ffff82d08018b66a ffff830256ef7d38 ffff82d08012925e >>>>>> (XEN) [2015-04-11 14:03:32.357] ffff830256efc068 0000000800000001 800000022e12c167 0000000000000000 >>>>>> (XEN) [2015-04-11 14:03:32.384] 0000000000000002 ffff830256ef7e38 0000000800000000 800000022e12c167 >>>>>> (XEN) [2015-04-11 14:03:32.411] 0000000000000003 ffff830256ef7db8 0000000000000000 00007fe396780eb0 >>>>>> (XEN) [2015-04-11 14:03:32.439] 0000000000000202 ffffffffffffffff 0000000000000000 00007fe396bab004 >>>>>> (XEN) [2015-04-11 14:03:32.466] 0000000000000000 0000000000000005 ffff830256ef7ef8 ffff82d08010497f >>>>>> (XEN) [2015-04-11 14:03:32.493] 0000000000000001 0000000000100001 800000022e12c167 ffff88001f7ecc00 >>>>>> (XEN) [2015-04-11 14:03:32.520] 00007fe396780eb0 ffff88001c849508 0000000e00000007 ffffffff8105594a >>>>>> (XEN) [2015-04-11 14:03:32.547] 000000000000e033 0000000000000202 ffff88001ece3d40 000000000000e02b >>>>>> (XEN) [2015-04-11 14:03:32.574] ffff830256ef7e28 ffff82d080194933 000000000000beef ffffffff81bd6c85 >>>>>> (XEN) [2015-04-11 14:03:32.601] ffff830256ef7f08 ffff82d080193edd 0000000b0000002d 0000000000000001 >>>>>> (XEN) [2015-04-11 14:03:32.628] 0000000100000800 00007fe3962abbd0 ffff000a81050001 00007fe39656ce6e >>>>>> (XEN) [2015-04-11 14:03:32.655] 00007ffdf2a654f0 00007fe39656d0c9 00007fe39656ce6e 00007fe3969a9a55 >>>>>> (XEN) [2015-04-11 14:03:32.682] Xen call trace: >>>>>> (XEN) [2015-04-11 14:03:32.695] [] iommu_do_pci_domctl+0x2dc/0x740 >>>>>> (XEN) [2015-04-11 14:03:32.718] [] iommu_do_domctl+0x17/0x1a >>>>>> (XEN) [2015-04-11 14:03:32.739] [] arch_do_domctl+0x2469/0x26e1 >>>>>> (XEN) [2015-04-11 14:03:32.762] [] do_domctl+0x1a1f/0x1d60 >>>>>> (XEN) [2015-04-11 14:03:32.783] [] syscall_enter+0xeb/0x145 >>>>>> (XEN) [2015-04-11 14:03:32.804] >>>>>> (XEN) [2015-04-11 14:03:32.813] Pagetable walk from 0000000000000000: >>>>>> (XEN) [2015-04-11 14:03:32.831] L4[0x000] = 0000000234075067 000000000001f2a8 >>>>>> (XEN) [2015-04-11 14:03:32.852] L3[0x000] = 0000000229ad4067 0000000000014c49 >>>>>> (XEN) [2015-04-11 14:03:32.873] L2[0x000] = 0000000000000000 ffffffffffffffff >>>>>> (XEN) [2015-04-11 14:03:32.894] >>>>>> (XEN) [2015-04-11 14:03:32.903] **************************************** >>>>>> (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5: >>>>>> (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT >>>>>> (XEN) [2015-04-11 14:03:32.948] [error_code=0000] >>>>>> (XEN) [2015-04-11 14:03:32.961] Faulting linear address: 0000000000000000 >>>>>> (XEN) [2015-04-11 14:03:32.981] **************************************** >>>>>> (XEN) [2015-04-11 14:03:33.000] >>>>>> (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds... >>>>>> >>>>>> The RIP resolves to the prink added by your patch in: >>>>>> >>>>>> case XEN_DOMCTL_test_assign_device: >>>>>> ret = xsm_test_assign_device(XSM_HOOK, domctl->u.assign_device.machine_sbdf); >>>>>> if ( ret ) >>>>>> break; >>>>>> >>>>>> seg = domctl->u.assign_device.machine_sbdf >> 16; >>>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>>> >>>>>> printk("*** %pv->d%d: test_assign_device({%04x:%02x:%02x.%u})\n", >>>>>> current, d->domain_id, >>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>> >>>>>> if ( device_assigned(seg, bus, devfn) ) >>>>>> { >>>>>> printk(XENLOG_G_INFO >>>>>> "%04x:%02x:%02x.%u already assigned, or non-existent\n", >>>>>> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>>> ret = -EINVAL; >>>>>> } >>>>>> break; >>>>> hmm - 'd' is NULL. This ought to work better. >>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >>>>> index 9f3413c..85ff1fc 100644 >>>>> --- a/xen/drivers/passthrough/pci.c >>>>> +++ b/xen/drivers/passthrough/pci.c >>>>> @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl( >>>>> max_sdevs = domctl->u.get_device_group.max_sdevs; >>>>> sdevs = domctl->u.get_device_group.sdev_array; >>>>> >>>>> + printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u, %u})\n", >>>>> + current, d->domain_id, >>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), >>>>> + max_sdevs); >>>>> + >>>>> ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, max_sdevs); >>>>> if ( ret < 0 ) >>>>> { >>>>> @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl( >>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>> >>>>> + printk("*** %pv: test_assign_device({%04x:%02x:%02x.%u})\n", >>>>> + current, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>> + >>>>> if ( device_assigned(seg, bus, devfn) ) >>>>> { >>>>> printk(XENLOG_G_INFO >>>>> @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl( >>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>> >>>>> + printk("*** %pv->d%d: assign_device({%04x:%02x:%02x.%u})\n", >>>>> + current, d->domain_id, >>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>> + >>>>> ret = device_assigned(seg, bus, devfn) ?: >>>>> assign_device(d, seg, bus, devfn); >>>>> if ( ret == -ERESTART ) >>>>> @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl( >>>>> bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff; >>>>> devfn = domctl->u.assign_device.machine_sbdf & 0xff; >>>>> >>>>> + printk("*** %pv->d%d: deassign_device({%04x:%02x:%02x.%u})\n", >>>>> + current, d->domain_id, >>>>> + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); >>>>> + >>>>> spin_lock(&pcidevs_lock); >>>>> ret = deassign_device(d, seg, bus, devfn); >>>>> spin_unlock(&pcidevs_lock); >>>> Hi Andrew, >>>> >>>> Attached are the serial logs good (with revert) and bad (without): >>>> >>>> Some things that seems strange to me: >>>> - The numerous calls to get the device 08:00.0 assigned ... for 0a:00.0 there >>>> was only one call to both test assign and assign. >>>> - However these numerous calls are there in both the good and the bad case, >>>> so perhaps it's strange and wrong .. but not the cause .. >>>> - I had a hunch it could be due to the 08:00.0 using MSI-X, but when only >>>> passing through 0a:00.0, i get the same numerous calls but now for the >>>> 0a:00.0 which uses IntX, so I think that is more related to being the *first* >>>> device to be passed through to a guest. >>> I have also observed this behaviour, but not had time to investigate. >>> It doesn't appear problematic in the longrun but it probably a toolstack >>> issue which wants fixing (if only in the name of efficiency). >> And just after I sent this email, I have realised why. >> The first assign device will have to build IO pagetables, which is a >> long operation and subject to hypercall continuations. The second >> device will reused the same pagetables, so is quick to complete. > So .. is the ioreq patch from Paul involved in providing something used in building > the pagetables .. and could it have say some off by one resulting in the > 0xffffffffffff .. which could lead to the pagetable building going beserk, > requiring a paging_mode far greater than normally would be required .. which > get's set .. since that isn't checked properly .. leading to things breaking > a bit further when it does get checked ? A -1 is slipping in somewhere and ending up in the gfn field. The result is that update_paging_mode() attempts to construct iopagetables to cover a 76bit address space, which is how level ends up at 8. (Note that a level of 7 is reserved, and a level of anything greater than 4 is implausible on your system.) I think the crash is collateral damage following on from update_paging_mode() not properly sanitising its input, but that there is still some other issue causing -1 to be passed in the first place. I am still trying to locate where a -1 might plausibly be coming from. ~Andrew