From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Dealing with non-existent BDF devices in VT-d and in the hardware. Date: Tue, 11 Mar 2014 13:30:39 -0400 Message-ID: <20140311173039.GB14684@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="k+w/mQv8wyuph6w0" Return-path: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: gordan@bobich.net, xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hey, I am one of those lucky folks who had purchased a motherboard that has bugs. I figured I would post this email as way for a starting point for some discussion on this - and perhaps have a similar as 'pci-phantom' way of instructing the hypervisor what to do with them. The problem I am seeing is that this device: 08:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx] Can't be passed in the guest. Or rather it can - but everytime the guest (or domain0) tries to access I see: (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault (XEN) [VT-D]iommu.c:865: DMAR:[DMA Write] Request device [0000:08:00.0] fault addr 0, iommu reg = ffff82c3ffd53000 (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear (XEN) print_vtd_entries: iommu ffff83043dca99b0 dev 0000:08:00.0 gmfn 0 (XEN) root_entry = ffff83043dc6b000 (XEN) root_entry[8] = 3326b5001 (XEN) context = ffff8303326b5000 (XEN) context[0] = 0_0 (XEN) ctxt_entry[0] not present Of course the '08:00.0' device does not exist. It is rather this chipset: 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01) that is buggy and using the wrong BDF when forwarding DMA requests from devices underneath it (like this Firewire chip). The hack I came up with was to create in the Xen code that deals with PCI passthrough a copy of the bridge (so 07:00.0) but with a new BDF: 08:00.0. And link it to the PCI device that I am passing to the guest (so 08:03.0). The end result is that when loading the driver (hack.c) one should see: (XEN) 0000:08:00.0 linked with 08:03.0 (XEN) [VT-D]iommu.c:1456: d0:PCI: map 0000:08:00.0 (XEN) [VT-D]iommu.c:1476: d0:PCI: map 0000:08:03.0 (XEN) PCI add link 0000:08:00.0 And when launching a guest with the BDF: pci = ["08:03.0"] the hypervisor will automatically also create an VT-d context for the 08:00.0 device. To use this hack, apply the 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch to your hypervisor, compile and install. And also compile the 'hack.c' module. There is an attached 'Makefile' that will do it for you. Make sure you edit it to set the right BDF entries in it. Once done install your new hypervisor, and insmod ./hack.ko and try passing in the device to your guest (or use it normally). The 'DMAR:[DMA Write]' error should go away. This should be generic enough for most devices. It needn't be a bridge that is spewing out these DMAR errors. --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch" >>From cb165429726978952f5b9e75bece1dcb5630667f Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 19 Feb 2014 10:58:19 -0500 Subject: [PATCH] xen/pci: Introduce a way to deal with buggy hardware with "hidden" PCI buses. Signed-off-by: Konrad Rzeszutek Wilk --- xen/arch/x86/physdev.c | 14 +++++- xen/drivers/passthrough/pci.c | 89 +++++++++++++++++++++++++++++++++---- xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++ xen/include/public/physdev.h | 1 + xen/include/xen/pci.h | 3 ++ 5 files changed, 128 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index bc0634c..f843c49 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -609,7 +609,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&add, arg, 1) != 0 ) break; - pdev_info.is_extfn = !!(add.flags & XEN_PCI_DEV_EXTFN); + if ( add.flags & XEN_PCI_DEV_EXTFN) + pdev_info.is_extfn = 1; + else + pdev_info.is_extfn = 0; + if ( add.flags & XEN_PCI_DEV_VIRTFN ) { pdev_info.is_virtfn = 1; @@ -618,6 +622,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } else pdev_info.is_virtfn = 0; + + if ( add.flags & XEN_PCI_DEV_LINK ) + { + pdev_info.is_link = 1; + pdev_info.physfn.bus = add.physfn.bus; + pdev_info.physfn.devfn = add.physfn.devfn; + } else + pdev_info.is_link = 0; ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); break; } diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 2a6eaa4..0e59216 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -153,7 +153,8 @@ static void __init parse_phantom_dev(char *str) { } custom_param("pci-phantom", parse_phantom_dev); -static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) +static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn, + int link, u8 orig_bus, u8 orig_devfn) { struct pci_dev *pdev; @@ -169,8 +170,38 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) *((u8*) &pdev->bus) = bus; *((u8*) &pdev->devfn) = devfn; pdev->domain = NULL; + pdev->link = NULL; INIT_LIST_HEAD(&pdev->msi_list); + if ( link ) + { + struct pci_dev *dev; + list_for_each_entry ( dev, &pseg->alldevs_list, alldevs_list ) + { + if ( dev->bus == orig_bus && dev->devfn == orig_devfn ) + { + /* N.B. The 'bus' passed is 'new' one, while 'orig_bus' are + * the ones we expect to exist. We over-write 'bus' and + * 'devfn' with the original one so that this new device + * will be created with the original device properties. + */ + if ( dev->link ) + { + xfree (pdev); + return NULL; + } + bus = dev->bus; + devfn = dev->devfn; + dev->link = pdev; + pdev->link = dev; + pdev->info.is_link = 1; + printk("%04x:%02x:%02x.%u linked with %02x:%02x.%u\n", + pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), dev->bus, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn)); + } + } + } if ( pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), PCI_CAP_ID_MSIX) ) { @@ -201,12 +232,32 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) sub_bus = pci_conf_read8(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), PCI_SUBORDINATE_BUS); + if ( pdev->info.is_link ) + { + if ( sec_bus >= pdev->bus && pdev->bus <= sub_bus ) + { +#if 0 + u8 i = sec_bus; + /* We can create an loop in bus2bridge by pointing to ourselves. + * Hence destroy sec_bus up to pdev_bus values */ + spin_lock(&pseg->bus2bridge_lock); + for ( ; i <= pdev->bus; i++ ) + pseg->bus2bridge[sec_bus].map = 0; + spin_unlock(&pseg->bus2bridge_lock); + /* And increment it so it won't cover us again*/ + sec_bus = pdev->bus + 1; + printk("Link corrected [%02x:%02x:%u] spanning %x->%x\n", pdev->bus, + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),sec_bus, sub_bus); +#endif + break; + } + } spin_lock(&pseg->bus2bridge_lock); for ( ; sec_bus <= sub_bus; sec_bus++ ) { pseg->bus2bridge[sec_bus].map = 1; - pseg->bus2bridge[sec_bus].bus = bus; - pseg->bus2bridge[sec_bus].devfn = devfn; + pseg->bus2bridge[sec_bus].bus = pdev->bus; + pseg->bus2bridge[sec_bus].devfn = pdev->devfn; } spin_unlock(&pseg->bus2bridge_lock); break; @@ -299,7 +350,7 @@ int __init pci_hide_device(int bus, int devfn) int rc = -ENOMEM; spin_lock(&pcidevs_lock); - pdev = alloc_pdev(get_pseg(0), bus, devfn); + pdev = alloc_pdev(get_pseg(0), bus, devfn, 0, 0, 0); if ( pdev ) { _pci_hide_device(pdev); @@ -317,7 +368,7 @@ int __init pci_ro_device(int seg, int bus, int devfn) if ( !pseg ) return -ENOMEM; - pdev = alloc_pdev(pseg, bus, devfn); + pdev = alloc_pdev(pseg, bus, devfn, 0, 0, 0); if ( !pdev ) return -ENOMEM; @@ -458,6 +509,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info) struct pci_seg *pseg; struct pci_dev *pdev; unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn); + u8 bus_link = 0, devfn_link = 0; const char *pdev_type; int ret; @@ -474,6 +526,12 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info) pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL); pdev_type = "virtual function"; } + else if (info->is_link) + { + bus_link = info->physfn.bus; + devfn_link = info->physfn.devfn; + pdev_type = "link"; + } else { info = NULL; @@ -490,7 +548,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info) pseg = alloc_pseg(seg); if ( !pseg ) goto out; - pdev = alloc_pdev(pseg, bus, devfn); + pdev = alloc_pdev(pseg, bus, devfn, (info && info->is_link), bus_link, devfn_link); if ( !pdev ) goto out; @@ -604,7 +662,7 @@ out: int pci_remove_device(u16 seg, u8 bus, u8 devfn) { struct pci_seg *pseg = get_pseg(seg); - struct pci_dev *pdev; + struct pci_dev *pdev, *link = NULL; int ret; ret = xsm_resource_unplug_pci(XSM_PRIV, (seg << 16) | (bus << 8) | devfn); @@ -617,16 +675,29 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) return -ENODEV; spin_lock(&pcidevs_lock); +retry: list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) if ( pdev->bus == bus && pdev->devfn == devfn ) { ret = iommu_remove_device(pdev); if ( pdev->domain ) list_del(&pdev->domain_list); - pci_cleanup_msi(pdev); + if ( !pdev->info.is_link ) /* If we are not the 'fake device' */ + pci_cleanup_msi(pdev); + if ( pdev->link ) { + /* Can be NULL if the other device was removed first. */ + link = pdev->link; + } free_pdev(pseg, pdev); printk(XENLOG_DEBUG "PCI remove device %04x:%02x:%02x.%u\n", seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); + if ( link ) + { + bus = link->bus; + devfn = link->devfn; + link->link = NULL; + goto retry; + } break; } @@ -838,7 +909,7 @@ static int __init _scan_pci_devices(struct pci_seg *pseg, void *arg) continue; } - pdev = alloc_pdev(pseg, bus, PCI_DEVFN(dev, func)); + pdev = alloc_pdev(pseg, bus, PCI_DEVFN(dev, func), 0, 0, 0); if ( !pdev ) { printk("%s: alloc_pdev failed.\n", __func__); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 5f10034..a5a4664 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1468,6 +1468,25 @@ static int domain_context_mapping( if ( ret ) break; + if ( pdev->link ) + { + u8 bus_link, devfn_link; + struct pci_dev *link_dev = pdev->link; + + ASSERT ( link_dev ); + + bus_link = link_dev->bus; + devfn_link = link_dev->devfn; + + if ( iommu_verbose ) + dprintk(VTDPREFIX, "d%d:PCI: map %04x:%02x:%02x.%u\n", + domain->domain_id, seg, bus_link, + PCI_SLOT(devfn_link), PCI_FUNC(devfn_link)); + + + ret = domain_context_mapping_one(domain, drhd->iommu, bus_link, devfn_link, + pci_get_pdev(seg, bus, devfn)); + } if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 ) break; @@ -1603,6 +1622,18 @@ static int domain_context_unmap( if ( ret ) break; + if ( pdev->link ) + { + struct pci_dev *link = pdev->link; + + ASSERT(link->link == pdev); + tmp_bus = link->bus; + tmp_devfn = link->devfn; + if ( iommu_verbose ) + dprintk(VTDPREFIX, "d%d:PCI: unmap %04x:%02x:%02x.%u\n", + domain->domain_id, seg, tmp_bus, PCI_SLOT(tmp_devfn), PCI_FUNC(tmp_devfn)); + ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn); + } tmp_bus = bus; tmp_devfn = devfn; if ( find_upstream_bridge(seg, &tmp_bus, &tmp_devfn, &secbus) < 1 ) diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index d547928..c476175 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -281,6 +281,7 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_mmcfg_reserved_t); #define XEN_PCI_DEV_EXTFN 0x1 #define XEN_PCI_DEV_VIRTFN 0x2 #define XEN_PCI_DEV_PXM 0x4 +#define XEN_PCI_DEV_LINK 0x8 #define PHYSDEVOP_pci_device_add 25 struct physdev_pci_device_add { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index cadb525..b883c28 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -39,6 +39,7 @@ struct pci_dev_info { u8 bus; u8 devfn; } physfn; + bool_t is_link; }; struct pci_dev { @@ -75,6 +76,8 @@ struct pci_dev { #define PT_FAULT_THRESHOLD 10 } fault; u64 vf_rlen[6]; + + struct pci_dev *link; }; #define for_each_pdev(domain, pdev) \ -- 1.8.5.3 --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="hack.c" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define LSI_HACK "0.1" MODULE_AUTHOR("Konrad Rzeszutek Wilk "); MODULE_DESCRIPTION("lsi hack"); MODULE_LICENSE("GPL"); MODULE_VERSION(LSI_HACK); /* Want to link to the device passed in the guest */ int bus = 8; module_param(bus, int, 0644); MODULE_PARM_DESC(bus, "bus"); int slot = 3; module_param(slot, int, 0644); MODULE_PARM_DESC(slot, "slot"); int func = 0; module_param(func, int, 0644); MODULE_PARM_DESC(func, "slot"); #define XEN_PCI_DEV_LINK 0x8 static int __init lsi_hack_init(void) { int r = 0; struct physdev_pci_device_add add = { .seg = 0, .bus = 0x8, /* The phantom bridge */ .devfn = PCI_DEVFN(0,0), /* And its slot and function */ .physfn.bus = bus, /* The device we want to link too. */ .physfn.devfn = PCI_DEVFN(slot,func), .flags = XEN_PCI_DEV_LINK, }; printk("%s: %02x:%02x.%u, %02x:%02x.%u, %x\n", __func__, add.bus, PCI_SLOT(add.devfn), PCI_FUNC(add.devfn), add.physfn.bus, PCI_SLOT(add.physfn.devfn), PCI_FUNC(add.physfn.devfn), add.flags); r = HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_add, &add); return r; } static void __exit lsi_hack_exit(void) { int r = 0; struct physdev_manage_pci manage_pci; manage_pci.bus = 0x7; manage_pci.devfn = PCI_DEVFN(0,0); r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, &manage_pci); if (r) printk(KERN_ERR "%s: %d\n", __FUNCTION__, r); } module_init(lsi_hack_init); module_exit(lsi_hack_exit); --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=Makefile # Comment/uncomment the following line to disable/enable debugging #DEBUG = y # Add your debugging flag (or not) to CFLAGS ifeq ($(DEBUG),y) DEBFLAGS = -O -g # "-O" is needed to expand inlines else DEBFLAGS = -O2 endif EXTRA_CFLAGS += $(DEBFLAGS) -I$(LDDINCDIR) ifneq ($(KERNELRELEASE),) # call from kernel build system obj-m := hack.o else KERNELDIR ?= /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KERNELDIR) M=$(PWD) LDDINCDIR=$(PWD)/../include modules endif clean: rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions depend .depend dep: $(CC) $(CFLAGS) -M *.c > .depend ifeq (.depend,$(wildcard .depend)) include .depend endif --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --k+w/mQv8wyuph6w0--