xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Dealing with non-existent BDF devices in VT-d and in the hardware.
@ 2014-03-11 17:30 Konrad Rzeszutek Wilk
  2014-03-11 17:36 ` Andrew Cooper
  2014-03-12  9:17 ` Jan Beulich
  0 siblings, 2 replies; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-11 17:30 UTC (permalink / raw)
  To: gordan, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2558 bytes --]

Hey,

I am one of those lucky folks who had purchased a motherboard that has bugs.

I figured I would post this email as way for a starting point
for some discussion on this - and perhaps have a similar as 'pci-phantom'
way of instructing the hypervisor what to do with them.

The problem I am seeing is that this device:

08:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]

Can't be passed in the guest. Or rather it can - but everytime
the guest (or domain0) tries to access I see:

(XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]iommu.c:865: DMAR:[DMA Write] Request device [0000:08:00.0] fault addr 0, iommu reg = ffff82c3ffd53000
(XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
(XEN) print_vtd_entries: iommu ffff83043dca99b0 dev 0000:08:00.0 gmfn 0
(XEN)     root_entry = ffff83043dc6b000
(XEN)     root_entry[8] = 3326b5001
(XEN)     context = ffff8303326b5000
(XEN)     context[0] = 0_0
(XEN)     ctxt_entry[0] not present


Of course the '08:00.0' device does not exist. It is rather this chipset:
07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01)

that is buggy and using the wrong BDF when forwarding DMA requests from
devices underneath it (like this Firewire chip).

The hack I came up with was to create in the Xen code that deals with
PCI passthrough a copy of the bridge (so 07:00.0) but with a new
BDF: 08:00.0. And link it to the PCI device that I am passing to the
guest (so 08:03.0).

The end result is that when loading the driver (hack.c) one should
see:

(XEN) 0000:08:00.0 linked with 08:03.0
(XEN) [VT-D]iommu.c:1456: d0:PCI: map 0000:08:00.0
(XEN) [VT-D]iommu.c:1476: d0:PCI: map 0000:08:03.0
(XEN) PCI add link 0000:08:00.0

And when launching a guest with the BDF:
pci = ["08:03.0"]

the hypervisor will automatically also create an VT-d context for the
08:00.0 device.

To use this hack, apply the 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
to your hypervisor, compile and install.

And also compile the 'hack.c' module. There is an attached 'Makefile'
that will do it for you. Make sure you edit it to set the right BDF
entries in it.

Once done install your new hypervisor, and insmod ./hack.ko and try
passing in the device to your guest (or use it normally). The
'DMAR:[DMA Write]' error should go away.

This should be generic enough for most devices. It needn't be a bridge
that is spewing out these DMAR errors.

[-- Attachment #2: 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch --]
[-- Type: text/plain, Size: 11683 bytes --]

>From cb165429726978952f5b9e75bece1dcb5630667f Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 19 Feb 2014 10:58:19 -0500
Subject: [PATCH] xen/pci: Introduce a way to deal with buggy hardware with
 "hidden" PCI buses.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/physdev.c              | 14 +++++-
 xen/drivers/passthrough/pci.c       | 89 +++++++++++++++++++++++++++++++++----
 xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++
 xen/include/public/physdev.h        |  1 +
 xen/include/xen/pci.h               |  3 ++
 5 files changed, 128 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index bc0634c..f843c49 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -609,7 +609,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&add, arg, 1) != 0 )
             break;
 
-        pdev_info.is_extfn = !!(add.flags & XEN_PCI_DEV_EXTFN);
+        if ( add.flags & XEN_PCI_DEV_EXTFN)
+            pdev_info.is_extfn = 1;
+        else
+            pdev_info.is_extfn = 0;
+
         if ( add.flags & XEN_PCI_DEV_VIRTFN )
         {
             pdev_info.is_virtfn = 1;
@@ -618,6 +622,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         }
         else
             pdev_info.is_virtfn = 0;
+
+        if ( add.flags & XEN_PCI_DEV_LINK )
+        {
+            pdev_info.is_link = 1;
+            pdev_info.physfn.bus = add.physfn.bus;
+            pdev_info.physfn.devfn = add.physfn.devfn;
+        } else
+            pdev_info.is_link = 0;
         ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
         break;
     }
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 2a6eaa4..0e59216 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -153,7 +153,8 @@ static void __init parse_phantom_dev(char *str) {
 }
 custom_param("pci-phantom", parse_phantom_dev);
 
-static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
+static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn,
+                                  int link, u8 orig_bus, u8 orig_devfn)
 {
     struct pci_dev *pdev;
 
@@ -169,8 +170,38 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
     *((u8*) &pdev->bus) = bus;
     *((u8*) &pdev->devfn) = devfn;
     pdev->domain = NULL;
+    pdev->link = NULL;
     INIT_LIST_HEAD(&pdev->msi_list);
 
+    if ( link )
+    {
+        struct pci_dev *dev;
+        list_for_each_entry ( dev, &pseg->alldevs_list, alldevs_list )
+        {
+            if ( dev->bus == orig_bus && dev->devfn == orig_devfn )
+            {
+                /* N.B. The 'bus' passed is 'new' one, while 'orig_bus' are
+                 * the ones we expect to exist. We over-write 'bus' and
+                 * 'devfn' with the original one so that this new device
+                 * will be created with the original device properties.
+                 */
+                if ( dev->link )
+                {
+                    xfree (pdev);
+                    return NULL;
+                }
+                bus = dev->bus;
+                devfn = dev->devfn;
+                dev->link = pdev;
+                pdev->link = dev;
+                pdev->info.is_link = 1;
+                printk("%04x:%02x:%02x.%u linked with %02x:%02x.%u\n",
+                       pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn),
+                       PCI_FUNC(pdev->devfn), dev->bus, PCI_SLOT(dev->devfn),
+                       PCI_FUNC(dev->devfn));
+            }
+        }
+    }
     if ( pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                              PCI_CAP_ID_MSIX) )
     {
@@ -201,12 +232,32 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
             sub_bus = pci_conf_read8(pseg->nr, bus, PCI_SLOT(devfn),
                                      PCI_FUNC(devfn), PCI_SUBORDINATE_BUS);
 
+            if ( pdev->info.is_link )
+            {
+                if ( sec_bus >= pdev->bus && pdev->bus <= sub_bus )
+                {
+#if 0
+                    u8 i = sec_bus;
+                    /* We can create an loop in bus2bridge by pointing to ourselves.
+                     * Hence destroy sec_bus up to pdev_bus values */
+                    spin_lock(&pseg->bus2bridge_lock);
+                    for ( ; i <= pdev->bus; i++ )
+                        pseg->bus2bridge[sec_bus].map = 0;
+                    spin_unlock(&pseg->bus2bridge_lock);
+                    /* And increment it so it won't cover us again*/
+                    sec_bus = pdev->bus + 1;
+                    printk("Link corrected [%02x:%02x:%u] spanning %x->%x\n", pdev->bus,
+                           PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),sec_bus, sub_bus);
+#endif
+                    break;
+                }
+            }
             spin_lock(&pseg->bus2bridge_lock);
             for ( ; sec_bus <= sub_bus; sec_bus++ )
             {
                 pseg->bus2bridge[sec_bus].map = 1;
-                pseg->bus2bridge[sec_bus].bus = bus;
-                pseg->bus2bridge[sec_bus].devfn = devfn;
+                pseg->bus2bridge[sec_bus].bus = pdev->bus;
+                pseg->bus2bridge[sec_bus].devfn = pdev->devfn;
             }
             spin_unlock(&pseg->bus2bridge_lock);
             break;
@@ -299,7 +350,7 @@ int __init pci_hide_device(int bus, int devfn)
     int rc = -ENOMEM;
 
     spin_lock(&pcidevs_lock);
-    pdev = alloc_pdev(get_pseg(0), bus, devfn);
+    pdev = alloc_pdev(get_pseg(0), bus, devfn, 0, 0, 0);
     if ( pdev )
     {
         _pci_hide_device(pdev);
@@ -317,7 +368,7 @@ int __init pci_ro_device(int seg, int bus, int devfn)
 
     if ( !pseg )
         return -ENOMEM;
-    pdev = alloc_pdev(pseg, bus, devfn);
+    pdev = alloc_pdev(pseg, bus, devfn, 0, 0, 0);
     if ( !pdev )
         return -ENOMEM;
 
@@ -458,6 +509,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
     struct pci_seg *pseg;
     struct pci_dev *pdev;
     unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
+    u8 bus_link = 0, devfn_link = 0;
     const char *pdev_type;
     int ret;
 
@@ -474,6 +526,12 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
             pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
         pdev_type = "virtual function";
     }
+    else if (info->is_link)
+    {
+        bus_link = info->physfn.bus;
+        devfn_link = info->physfn.devfn;
+        pdev_type = "link";
+    }
     else
     {
         info = NULL;
@@ -490,7 +548,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
     pseg = alloc_pseg(seg);
     if ( !pseg )
         goto out;
-    pdev = alloc_pdev(pseg, bus, devfn);
+    pdev = alloc_pdev(pseg, bus, devfn, (info && info->is_link), bus_link, devfn_link);
     if ( !pdev )
         goto out;
 
@@ -604,7 +662,7 @@ out:
 int pci_remove_device(u16 seg, u8 bus, u8 devfn)
 {
     struct pci_seg *pseg = get_pseg(seg);
-    struct pci_dev *pdev;
+    struct pci_dev *pdev, *link = NULL;
     int ret;
 
     ret = xsm_resource_unplug_pci(XSM_PRIV, (seg << 16) | (bus << 8) | devfn);
@@ -617,16 +675,29 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
         return -ENODEV;
 
     spin_lock(&pcidevs_lock);
+retry:
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
             ret = iommu_remove_device(pdev);
             if ( pdev->domain )
                 list_del(&pdev->domain_list);
-            pci_cleanup_msi(pdev);
+            if ( !pdev->info.is_link ) /* If we are not the 'fake device' */
+                pci_cleanup_msi(pdev);
+            if ( pdev->link ) {
+                /* Can be NULL if the other device was removed first. */
+                link = pdev->link;
+            }
             free_pdev(pseg, pdev);
             printk(XENLOG_DEBUG "PCI remove device %04x:%02x:%02x.%u\n",
                    seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+            if ( link )
+            {
+                bus = link->bus;
+                devfn = link->devfn;
+                link->link = NULL;
+                goto retry;
+            }
             break;
         }
 
@@ -838,7 +909,7 @@ static int __init _scan_pci_devices(struct pci_seg *pseg, void *arg)
                     continue;
                 }
 
-                pdev = alloc_pdev(pseg, bus, PCI_DEVFN(dev, func));
+                pdev = alloc_pdev(pseg, bus, PCI_DEVFN(dev, func), 0, 0, 0);
                 if ( !pdev )
                 {
                     printk("%s: alloc_pdev failed.\n", __func__);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 5f10034..a5a4664 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1468,6 +1468,25 @@ static int domain_context_mapping(
         if ( ret )
             break;
 
+        if ( pdev->link )
+        {
+            u8 bus_link, devfn_link;
+            struct pci_dev *link_dev = pdev->link;
+
+            ASSERT ( link_dev );
+
+            bus_link = link_dev->bus;
+            devfn_link = link_dev->devfn;
+
+            if ( iommu_verbose )
+                dprintk(VTDPREFIX, "d%d:PCI: map %04x:%02x:%02x.%u\n",
+                        domain->domain_id, seg, bus_link,
+                        PCI_SLOT(devfn_link), PCI_FUNC(devfn_link));
+
+
+            ret = domain_context_mapping_one(domain, drhd->iommu, bus_link, devfn_link,
+                                             pci_get_pdev(seg, bus, devfn));
+        }
         if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
             break;
 
@@ -1603,6 +1622,18 @@ static int domain_context_unmap(
         if ( ret )
             break;
 
+        if ( pdev->link )
+        {
+            struct pci_dev *link = pdev->link;
+
+            ASSERT(link->link == pdev);
+            tmp_bus = link->bus;
+            tmp_devfn = link->devfn;
+            if ( iommu_verbose )
+                dprintk(VTDPREFIX, "d%d:PCI: unmap %04x:%02x:%02x.%u\n",
+                        domain->domain_id, seg, tmp_bus, PCI_SLOT(tmp_devfn), PCI_FUNC(tmp_devfn));
+            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn);
+        }
         tmp_bus = bus;
         tmp_devfn = devfn;
         if ( find_upstream_bridge(seg, &tmp_bus, &tmp_devfn, &secbus) < 1 )
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index d547928..c476175 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -281,6 +281,7 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_mmcfg_reserved_t);
 #define XEN_PCI_DEV_EXTFN              0x1
 #define XEN_PCI_DEV_VIRTFN             0x2
 #define XEN_PCI_DEV_PXM                0x4
+#define XEN_PCI_DEV_LINK               0x8
 
 #define PHYSDEVOP_pci_device_add        25
 struct physdev_pci_device_add {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index cadb525..b883c28 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -39,6 +39,7 @@ struct pci_dev_info {
         u8 bus;
         u8 devfn;
     } physfn;
+    bool_t is_link;
 };
 
 struct pci_dev {
@@ -75,6 +76,8 @@ struct pci_dev {
 #define PT_FAULT_THRESHOLD 10
     } fault;
     u64 vf_rlen[6];
+
+    struct pci_dev *link;
 };
 
 #define for_each_pdev(domain, pdev) \
-- 
1.8.5.3


[-- Attachment #3: hack.c --]
[-- Type: text/plain, Size: 2052 bytes --]


#include <linux/module.h>
#include <linux/string.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/stat.h>
#include <linux/err.h>
#include <linux/ctype.h>
#include <linux/slab.h>
#include <linux/limits.h>
#include <linux/device.h>
#include <linux/pci.h>
#include <linux/device.h>

#include <linux/pci.h>

#include <xen/interface/xen.h>
#include <xen/interface/physdev.h>

#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

#define LSI_HACK  "0.1"

MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>");
MODULE_DESCRIPTION("lsi hack");
MODULE_LICENSE("GPL");
MODULE_VERSION(LSI_HACK);

/* Want to link to the device passed in the guest */
int bus = 8;
module_param(bus, int, 0644);
MODULE_PARM_DESC(bus, "bus");
int slot = 3;
module_param(slot, int, 0644);
MODULE_PARM_DESC(slot, "slot");
int func = 0;
module_param(func, int, 0644);
MODULE_PARM_DESC(func, "slot");

#define XEN_PCI_DEV_LINK 0x8
static int __init lsi_hack_init(void)
{
        int r = 0;

        struct physdev_pci_device_add add = {
			.seg	= 0,
                        .bus    = 0x8, /* The phantom bridge */
                        .devfn  = PCI_DEVFN(0,0), /* And its slot and function */
			.physfn.bus	= bus, /* The device we want to link too. */
			.physfn.devfn	= PCI_DEVFN(slot,func),
			.flags = XEN_PCI_DEV_LINK,
                };
	printk("%s: %02x:%02x.%u, %02x:%02x.%u, %x\n",
		__func__, add.bus, PCI_SLOT(add.devfn),
		PCI_FUNC(add.devfn), add.physfn.bus,
		PCI_SLOT(add.physfn.devfn), PCI_FUNC(add.physfn.devfn),
		add.flags);
        r = HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_add, &add);

        return r;
}

static void __exit lsi_hack_exit(void)
{
        int r = 0;
        struct physdev_manage_pci manage_pci;

        manage_pci.bus = 0x7;
        manage_pci.devfn = PCI_DEVFN(0,0);

        r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove,
                &manage_pci);
        if (r)
                printk(KERN_ERR "%s: %d\n", __FUNCTION__, r);
}

module_init(lsi_hack_init);
module_exit(lsi_hack_exit);

[-- Attachment #4: Makefile --]
[-- Type: text/plain, Size: 700 bytes --]

# Comment/uncomment the following line to disable/enable debugging
#DEBUG = y

# Add your debugging flag (or not) to CFLAGS
ifeq ($(DEBUG),y)
  DEBFLAGS = -O -g # "-O" is needed to expand inlines
else
  DEBFLAGS = -O2
endif

EXTRA_CFLAGS += $(DEBFLAGS) -I$(LDDINCDIR)

ifneq ($(KERNELRELEASE),)
# call from kernel build system

obj-m   := hack.o

else

KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD       := $(shell pwd)

default:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) LDDINCDIR=$(PWD)/../include modules

endif

clean:
	rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions

depend .depend dep:
	$(CC) $(CFLAGS) -M *.c > .depend


ifeq (.depend,$(wildcard .depend))
include .depend
endif

[-- Attachment #5: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-11 17:30 Konrad Rzeszutek Wilk
@ 2014-03-11 17:36 ` Andrew Cooper
  2014-03-11 17:49   ` Konrad Rzeszutek Wilk
  2014-03-12  9:17 ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2014-03-11 17:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: gordan, xen-devel

On 11/03/14 17:30, Konrad Rzeszutek Wilk wrote:
> Hey,
>
> I am one of those lucky folks who had purchased a motherboard that has bugs.

You say this as if you expect someone has managed to find a bugfree
motherboard :)

>
> I figured I would post this email as way for a starting point
> for some discussion on this - and perhaps have a similar as 'pci-phantom'
> way of instructing the hypervisor what to do with them.
>
> The problem I am seeing is that this device:
>
> 08:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
>
> Can't be passed in the guest. Or rather it can - but everytime
> the guest (or domain0) tries to access I see:
>
> (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
> (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
> (XEN) [VT-D]iommu.c:865: DMAR:[DMA Write] Request device [0000:08:00.0] fault addr 0, iommu reg = ffff82c3ffd53000
> (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
> (XEN) print_vtd_entries: iommu ffff83043dca99b0 dev 0000:08:00.0 gmfn 0
> (XEN)     root_entry = ffff83043dc6b000
> (XEN)     root_entry[8] = 3326b5001
> (XEN)     context = ffff8303326b5000
> (XEN)     context[0] = 0_0
> (XEN)     ctxt_entry[0] not present
>
>
> Of course the '08:00.0' device does not exist. It is rather this chipset:
> 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01)
>
> that is buggy and using the wrong BDF when forwarding DMA requests from
> devices underneath it (like this Firewire chip).
>
> The hack I came up with was to create in the Xen code that deals with
> PCI passthrough a copy of the bridge (so 07:00.0) but with a new
> BDF: 08:00.0. And link it to the PCI device that I am passing to the
> guest (so 08:03.0).
>
> The end result is that when loading the driver (hack.c) one should
> see:
>
> (XEN) 0000:08:00.0 linked with 08:03.0
> (XEN) [VT-D]iommu.c:1456: d0:PCI: map 0000:08:00.0
> (XEN) [VT-D]iommu.c:1476: d0:PCI: map 0000:08:03.0
> (XEN) PCI add link 0000:08:00.0
>
> And when launching a guest with the BDF:
> pci = ["08:03.0"]
>
> the hypervisor will automatically also create an VT-d context for the
> 08:00.0 device.
>
> To use this hack, apply the 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
> to your hypervisor, compile and install.
>
> And also compile the 'hack.c' module. There is an attached 'Makefile'
> that will do it for you. Make sure you edit it to set the right BDF
> entries in it.
>
> Once done install your new hypervisor, and insmod ./hack.ko and try
> passing in the device to your guest (or use it normally). The
> 'DMAR:[DMA Write]' error should go away.
>
> This should be generic enough for most devices. It needn't be a bridge
> that is spewing out these DMAR errors.


Do you have an lspci -tv for the system?

It is genuinely the case that the bridge doesn't exist, or simply that
it is not correctly attributed in the DMAR table?

If the latter, it Xen can probably gain some DMAR[$FOO]=$BAR command
line workarounds similar to the IVRS ones for AMD systems.

~Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-11 17:36 ` Andrew Cooper
@ 2014-03-11 17:49   ` Konrad Rzeszutek Wilk
  2014-03-14  2:18     ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-11 17:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: gordan, xen-devel

On Tue, Mar 11, 2014 at 05:36:36PM +0000, Andrew Cooper wrote:
> On 11/03/14 17:30, Konrad Rzeszutek Wilk wrote:
> > Hey,
> >
> > I am one of those lucky folks who had purchased a motherboard that has bugs.
> 
> You say this as if you expect someone has managed to find a bugfree
> motherboard :)

One can dream :-)
> 
> >
> > I figured I would post this email as way for a starting point
> > for some discussion on this - and perhaps have a similar as 'pci-phantom'
> > way of instructing the hypervisor what to do with them.
> >
> > The problem I am seeing is that this device:
> >
> > 08:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
> >
> > Can't be passed in the guest. Or rather it can - but everytime
> > the guest (or domain0) tries to access I see:
> >
> > (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
> > (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
> > (XEN) [VT-D]iommu.c:865: DMAR:[DMA Write] Request device [0000:08:00.0] fault addr 0, iommu reg = ffff82c3ffd53000
> > (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
> > (XEN) print_vtd_entries: iommu ffff83043dca99b0 dev 0000:08:00.0 gmfn 0
> > (XEN)     root_entry = ffff83043dc6b000
> > (XEN)     root_entry[8] = 3326b5001
> > (XEN)     context = ffff8303326b5000
> > (XEN)     context[0] = 0_0
> > (XEN)     ctxt_entry[0] not present
> >
> >
> > Of course the '08:00.0' device does not exist. It is rather this chipset:
> > 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01)
> >
> > that is buggy and using the wrong BDF when forwarding DMA requests from
> > devices underneath it (like this Firewire chip).
> >
> > The hack I came up with was to create in the Xen code that deals with
> > PCI passthrough a copy of the bridge (so 07:00.0) but with a new
> > BDF: 08:00.0. And link it to the PCI device that I am passing to the
> > guest (so 08:03.0).
> >
> > The end result is that when loading the driver (hack.c) one should
> > see:
> >
> > (XEN) 0000:08:00.0 linked with 08:03.0
> > (XEN) [VT-D]iommu.c:1456: d0:PCI: map 0000:08:00.0
> > (XEN) [VT-D]iommu.c:1476: d0:PCI: map 0000:08:03.0
> > (XEN) PCI add link 0000:08:00.0
> >
> > And when launching a guest with the BDF:
> > pci = ["08:03.0"]
> >
> > the hypervisor will automatically also create an VT-d context for the
> > 08:00.0 device.
> >
> > To use this hack, apply the 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
> > to your hypervisor, compile and install.
> >
> > And also compile the 'hack.c' module. There is an attached 'Makefile'
> > that will do it for you. Make sure you edit it to set the right BDF
> > entries in it.
> >
> > Once done install your new hypervisor, and insmod ./hack.ko and try
> > passing in the device to your guest (or use it normally). The
> > 'DMAR:[DMA Write]' error should go away.
> >
> > This should be generic enough for most devices. It needn't be a bridge
> > that is spewing out these DMAR errors.
> 
> 
> Do you have an lspci -tv for the system?

Yes of course:

-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller
           +-01.0-[01]--+-00.0  Intel Corporation 82576 Gigabit Network Connection
           |            \-00.1  Intel Corporation 82576 Gigabit Network Connection
           +-01.1-[02]----00.0  LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
           +-02.0  Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller
           +-03.0  Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
           +-14.0  Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
           +-16.0  Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
           +-19.0  Intel Corporation Ethernet Connection I217-LM
           +-1a.0  Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
           +-1b.0  Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
           +-1c.0-[03]----00.0  Intel Corporation 82574L Gigabit Network Connection
           +-1c.1-[04]----00.0  Intel Corporation 82574L Gigabit Network Connection
           +-1c.3-[05]----00.0  Intel Corporation I210 Gigabit Network Connection
           +-1c.4-[06]--+-00.0  Intel Corporation 82571EB Gigabit Ethernet Controller
           |            \-00.1  Intel Corporation 82571EB Gigabit Ethernet Controller
           +-1c.5-[07-09]----00.0-[08-09]--+-01.0-[09]--+-08.0  Brooktree Corporation Bt878 Video Capture
           |                               |            +-08.1  Brooktree Corporation Bt878 Audio Capture
           |                               |            +-09.0  Brooktree Corporation Bt878 Video Capture
           |                               |            +-09.1  Brooktree Corporation Bt878 Audio Capture
           |                               |            +-0a.0  Brooktree Corporation Bt878 Video Capture
           |                               |            +-0a.1  Brooktree Corporation Bt878 Audio Capture
           |                               |            +-0b.0  Brooktree Corporation Bt878 Video Capture
           |                               |            \-0b.1  Brooktree Corporation Bt878 Audio Capture
           |                               \-03.0  Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
           +-1c.6-[0a]----00.0  Renesas Technology Corp. uPD720202 USB 3.0 Host Controller
           +-1c.7-[0b]----00.0  ASMedia Technology Inc. ASM1062 Serial ATA Controller
           +-1d.0  Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
           +-1f.0  Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller
           +-1f.2  Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
           +-1f.3  Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller
           \-1f.6  Intel Corporation 8 Series Chipset Family Thermal Management Controller

> 
> It is genuinely the case that the bridge doesn't exist, or simply that
> it is not correctly attributed in the DMAR table?

It does not exist. The DMAR looks correct.

(XEN) [VT-D]dmar.c:778: Host address width 39
(XEN) [VT-D]dmar.c:792: found ACPI_DMAR_DRHD:
(XEN) [VT-D]dmar.c:472:   dmaru->address = fed90000
(XEN) [VT-D]iommu.c:1158: drhd->address = fed90000 iommu->reg = ffff82c3ffd54000
(XEN) [VT-D]iommu.c:1160: cap = c0000020660462 ecap = f0101a
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
(XEN) [VT-D]dmar.c:792: found ACPI_DMAR_DRHD:
(XEN) [VT-D]dmar.c:472:   dmaru->address = fed91000
(XEN) [VT-D]iommu.c:1158: drhd->address = fed91000 iommu->reg = ffff82c3ffd53000
(XEN) [VT-D]iommu.c:1160: cap = d2008020660462 ecap = f010da
(XEN) [VT-D]dmar.c:397:  IOAPIC: 0000:f0:1f.0
(XEN) [VT-D]dmar.c:361:  MSI HPET: 0000:f0:0f.0
(XEN) [VT-D]dmar.c:486:   flags: INCLUDE_ALL
(XEN) [VT-D]dmar.c:797: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1d.0
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1a.0
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:14.0
(XEN) [VT-D]dmar.c:666:   RMRR region: base_addr b7530000 end_address b753cfff
(XEN) [VT-D]dmar.c:797: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
(XEN) [VT-D]dmar.c:666:   RMRR region: base_addr bc000000 end_address be1fffff

As it has the INCLUDE_ALL flag.
> 
> If the latter, it Xen can probably gain some DMAR[$FOO]=$BAR command
> line workarounds similar to the IVRS ones for AMD systems.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-11 17:30 Konrad Rzeszutek Wilk
  2014-03-11 17:36 ` Andrew Cooper
@ 2014-03-12  9:17 ` Jan Beulich
  2014-03-12 14:22   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2014-03-12  9:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, gordan

>>> On 11.03.14 at 18:30, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> To use this hack, apply the 
> 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
> to your hypervisor, compile and install.

I'm still rather hesitant to consider such a pretty involved
workaround for general inclusion. Did you investigate whether
leveraging the grouping functionality (iommu_get_device_group())
might be possible instead? We're talking about a legacy PCI bridge
after all, and if done that way also covering the AMD IOMMU case
might be more straightforward (after all that case is missing from
your already large patch).

While looking over this, I found that this only has a use in xend -
another xl deficiency? And only for checking purposes, rather
than to enforce the assignment of all (non-bridge?) devices in
the group...

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-12  9:17 ` Jan Beulich
@ 2014-03-12 14:22   ` Konrad Rzeszutek Wilk
  2014-03-12 17:10     ` Gordan Bobic
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-12 14:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, gordan

On Wed, Mar 12, 2014 at 09:17:59AM +0000, Jan Beulich wrote:
> >>> On 11.03.14 at 18:30, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > To use this hack, apply the 
> > 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
> > to your hypervisor, compile and install.
> 
> I'm still rather hesitant to consider such a pretty involved
> workaround for general inclusion. Did you investigate whether
> leveraging the grouping functionality (iommu_get_device_group())
> might be possible instead? We're talking about a legacy PCI bridge
> after all, and if done that way also covering the AMD IOMMU case
> might be more straightforward (after all that case is missing from
> your already large patch).

<nods>

I think other people have experienced other non-bridge issues.
I have CC-ed Gordan on this as he had a LSI card that was misbehaving.

I am curious to see what his lspci and lspci -vt looks for his culprit.

> 
> While looking over this, I found that this only has a use in xend -
> another xl deficiency? And only for checking purposes, rather

Gosh.
> than to enforce the assignment of all (non-bridge?) devices in
> the group...
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-12 14:22   ` Konrad Rzeszutek Wilk
@ 2014-03-12 17:10     ` Gordan Bobic
  0 siblings, 0 replies; 22+ messages in thread
From: Gordan Bobic @ 2014-03-12 17:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich; +Cc: xen-devel

On 03/12/2014 02:22 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 12, 2014 at 09:17:59AM +0000, Jan Beulich wrote:
>>>>> On 11.03.14 at 18:30, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>>> To use this hack, apply the
>>> 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
>>> to your hypervisor, compile and install.
>>
>> I'm still rather hesitant to consider such a pretty involved
>> workaround for general inclusion. Did you investigate whether
>> leveraging the grouping functionality (iommu_get_device_group())
>> might be possible instead? We're talking about a legacy PCI bridge
>> after all, and if done that way also covering the AMD IOMMU case
>> might be more straightforward (after all that case is missing from
>> your already large patch).
>
> <nods>
>
> I think other people have experienced other non-bridge issues.
> I have CC-ed Gordan on this as he had a LSI card that was misbehaving.

Not one but two different LSI 8-port SAS cards, and an Adaptec 16-port 
SAS card. As far as I can tell this is related to the cards not being 
native PCIe but being bridged. The very latest LSI cards are native PCIe 
and don't seem to suffer from the same problem (I only did 5 minutes of 
testing with a borrowed recent LSI card, but the problem is very obvious 
(disks don't show up!), so I'm quite confident the problem doesn't 
manifest on the new one.

> I am curious to see what his lspci and lspci -vt looks for his culprit.

I'll try to put this together in a few days when I have the test machine 
set up.

>> While looking over this, I found that this only has a use in xend -
>> another xl deficiency? And only for checking purposes, rather
>
> Gosh.

That could make it rather difficult to test with latest 4.4.x code...

Gordan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-11 17:49   ` Konrad Rzeszutek Wilk
@ 2014-03-14  2:18     ` Zhang, Yang Z
  2014-03-14 17:51       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2014-03-14  2:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper
  Cc: gordan@bobich.net, xen-devel@lists.xensource.com

Konrad Rzeszutek Wilk wrote on 2014-03-12:
> On Tue, Mar 11, 2014 at 05:36:36PM +0000, Andrew Cooper wrote:
> > On 11/03/14 17:30, Konrad Rzeszutek Wilk wrote:
> > > Hey,
> > >
> > > I am one of those lucky folks who had purchased a motherboard that has
> bugs.
> >
> > You say this as if you expect someone has managed to find a bugfree
> > motherboard :)
> 
> One can dream :-)
> >
> > >
> > > I figured I would post this email as way for a starting point for
> > > some discussion on this - and perhaps have a similar as 'pci-phantom'
> > > way of instructing the hypervisor what to do with them.
> > >
> > > The problem I am seeing is that this device:
> > >
> > > 08:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A
> > > IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
> > >
> > > Can't be passed in the guest. Or rather it can - but everytime the
> > > guest (or domain0) tries to access I see:
> > >
> > > (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
> > > (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
> > > (XEN) [VT-D]iommu.c:865: DMAR:[DMA Write] Request device
> > > [0000:08:00.0] fault addr 0, iommu reg = ffff82c3ffd53000
> > > (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
> > > (XEN) print_vtd_entries: iommu ffff83043dca99b0 dev 0000:08:00.0 gmfn 0
> > > (XEN)     root_entry = ffff83043dc6b000
> > > (XEN)     root_entry[8] = 3326b5001
> > > (XEN)     context = ffff8303326b5000
> > > (XEN)     context[0] = 0_0
> > > (XEN)     ctxt_entry[0] not present
> > >
> > >
> > > Of course the '08:00.0' device does not exist. It is rather this chipset:
> > > 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01)
> > >
> > > that is buggy and using the wrong BDF when forwarding DMA requests
> > > from devices underneath it (like this Firewire chip).
> > >
> > > The hack I came up with was to create in the Xen code that deals
> > > with PCI passthrough a copy of the bridge (so 07:00.0) but with a
> > > new
> > > BDF: 08:00.0. And link it to the PCI device that I am passing to the
> > > guest (so 08:03.0).
> > >
> > > The end result is that when loading the driver (hack.c) one should
> > > see:
> > >
> > > (XEN) 0000:08:00.0 linked with 08:03.0
> > > (XEN) [VT-D]iommu.c:1456: d0:PCI: map 0000:08:00.0
> > > (XEN) [VT-D]iommu.c:1476: d0:PCI: map 0000:08:03.0
> > > (XEN) PCI add link 0000:08:00.0
> > >
> > > And when launching a guest with the BDF:
> > > pci = ["08:03.0"]
> > >
> > > the hypervisor will automatically also create an VT-d context for
> > > the
> > > 08:00.0 device.
> > >
> > > To use this hack, apply the
> > > 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
> > > to your hypervisor, compile and install.
> > >
> > > And also compile the 'hack.c' module. There is an attached 'Makefile'
> > > that will do it for you. Make sure you edit it to set the right BDF
> > > entries in it.
> > >
> > > Once done install your new hypervisor, and insmod ./hack.ko and try
> > > passing in the device to your guest (or use it normally). The
> > > 'DMAR:[DMA Write]' error should go away.
> > >
> > > This should be generic enough for most devices. It needn't be a
> > > bridge that is spewing out these DMAR errors.
> >
> >
> > Do you have an lspci -tv for the system?
> 
> Yes of course:
> 
> -[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v3 Processor DRAM
> Controller
>            +-01.0-[01]--+-00.0  Intel Corporation 82576 Gigabit Network
> Connection
>            |            \-00.1  Intel Corporation 82576 Gigabit Network
> Connection
>            +-01.1-[02]----00.0  LSI Logic / Symbios Logic SAS2008
> PCI-Express Fusion-MPT SAS-2 [Falcon]
>            +-02.0  Intel Corporation Xeon E3-1200 v3 Processor Integrated
> Graphics Controller
>            +-03.0  Intel Corporation Xeon E3-1200 v3/4th Gen Core
> Processor HD Audio Controller
>            +-14.0  Intel Corporation 8 Series/C220 Series Chipset Family
> USB xHCI
>            +-16.0  Intel Corporation 8 Series/C220 Series Chipset Family
> MEI Controller #1
>            +-19.0  Intel Corporation Ethernet Connection I217-LM
>            +-1a.0  Intel Corporation 8 Series/C220 Series Chipset Family
> USB EHCI #2
>            +-1b.0  Intel Corporation 8 Series/C220 Series Chipset High
> Definition Audio Controller
>            +-1c.0-[03]----00.0  Intel Corporation 82574L Gigabit Network
> Connection
>            +-1c.1-[04]----00.0  Intel Corporation 82574L Gigabit Network
> Connection
>            +-1c.3-[05]----00.0  Intel Corporation I210 Gigabit Network
> Connection
>            +-1c.4-[06]--+-00.0  Intel Corporation 82571EB Gigabit Ethernet
> Controller
>            |            \-00.1  Intel Corporation 82571EB Gigabit
> Ethernet Controller
>            +-1c.5-[07-09]----00.0-[08-09]--+-01.0-[09]--+-08.0  Brooktree
> Corporation Bt878 Video Capture
>            |                               |            +-08.1
> Brooktree Corporation Bt878 Audio Capture
>            |                               |            +-09.0
> Brooktree Corporation Bt878 Video Capture
>            |                               |            +-09.1
> Brooktree Corporation Bt878 Audio Capture
>            |                               |            +-0a.0
> Brooktree Corporation Bt878 Video Capture
>            |                               |            +-0a.1
> Brooktree Corporation Bt878 Audio Capture
>            |                               |            +-0b.0
> Brooktree Corporation Bt878 Video Capture
>            |                               |            \-0b.1
> Brooktree Corporation Bt878 Audio Capture
>            |                               \-03.0  Texas
> Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
>            +-1c.6-[0a]----00.0  Renesas Technology Corp. uPD720202 USB
> 3.0 Host Controller
>            +-1c.7-[0b]----00.0  ASMedia Technology Inc. ASM1062 Serial ATA
> Controller
>            +-1d.0  Intel Corporation 8 Series/C220 Series Chipset Family
> USB EHCI #1
>            +-1f.0  Intel Corporation C226 Series Chipset Family Server
> Advanced SKU LPC Controller
>            +-1f.2  Intel Corporation 8 Series/C220 Series Chipset Family
> 6-port SATA Controller 1 [AHCI mode]
>            +-1f.3  Intel Corporation 8 Series/C220 Series Chipset Family
> SMBus Controller
>            \-1f.6  Intel Corporation 8 Series Chipset Family Thermal
> Management Controller
> 

What happens if you assign the devices under bus 09 to another guest?
Is it better to add Xen command line to add such devices to a group and assign the whole group to a guest when trying to assign a device of the group to guest?

> >
> > It is genuinely the case that the bridge doesn't exist, or simply that
> > it is not correctly attributed in the DMAR table?
> 
> It does not exist. The DMAR looks correct.
> 
> (XEN) [VT-D]dmar.c:778: Host address width 39
> (XEN) [VT-D]dmar.c:792: found ACPI_DMAR_DRHD:
> (XEN) [VT-D]dmar.c:472:   dmaru->address = fed90000
> (XEN) [VT-D]iommu.c:1158: drhd->address = fed90000 iommu->reg =
> ffff82c3ffd54000
> (XEN) [VT-D]iommu.c:1160: cap = c0000020660462 ecap = f0101a
> (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
> (XEN) [VT-D]dmar.c:792: found ACPI_DMAR_DRHD:
> (XEN) [VT-D]dmar.c:472:   dmaru->address = fed91000
> (XEN) [VT-D]iommu.c:1158: drhd->address = fed91000 iommu->reg =
> ffff82c3ffd53000
> (XEN) [VT-D]iommu.c:1160: cap = d2008020660462 ecap = f010da
> (XEN) [VT-D]dmar.c:397:  IOAPIC: 0000:f0:1f.0
> (XEN) [VT-D]dmar.c:361:  MSI HPET: 0000:f0:0f.0
> (XEN) [VT-D]dmar.c:486:   flags: INCLUDE_ALL
> (XEN) [VT-D]dmar.c:797: found ACPI_DMAR_RMRR:
> (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1d.0
> (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1a.0
> (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:14.0
> (XEN) [VT-D]dmar.c:666:   RMRR region: base_addr b7530000 end_address
> b753cfff
> (XEN) [VT-D]dmar.c:797: found ACPI_DMAR_RMRR:
> (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
> (XEN) [VT-D]dmar.c:666:   RMRR region: base_addr bc000000 end_address
> be1fffff
> 
> As it has the INCLUDE_ALL flag.
> >
> > If the latter, it Xen can probably gain some DMAR[$FOO]=$BAR command
> > line workarounds similar to the IVRS ones for AMD systems.



> >
> > ~Andrew> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Best regards,
Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-14  2:18     ` Zhang, Yang Z
@ 2014-03-14 17:51       ` Konrad Rzeszutek Wilk
  2014-03-17  1:03         ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-14 17:51 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

On Fri, Mar 14, 2014 at 02:18:52AM +0000, Zhang, Yang Z wrote:
> Konrad Rzeszutek Wilk wrote on 2014-03-12:
> > On Tue, Mar 11, 2014 at 05:36:36PM +0000, Andrew Cooper wrote:
> > > On 11/03/14 17:30, Konrad Rzeszutek Wilk wrote:
> > > > Hey,
> > > >
> > > > I am one of those lucky folks who had purchased a motherboard that has
> > bugs.
> > >
> > > You say this as if you expect someone has managed to find a bugfree
> > > motherboard :)
> > 
> > One can dream :-)
> > >
> > > >
> > > > I figured I would post this email as way for a starting point for
> > > > some discussion on this - and perhaps have a similar as 'pci-phantom'
> > > > way of instructing the hypervisor what to do with them.
> > > >
> > > > The problem I am seeing is that this device:
> > > >
> > > > 08:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A
> > > > IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
> > > >
> > > > Can't be passed in the guest. Or rather it can - but everytime the
> > > > guest (or domain0) tries to access I see:
> > > >
> > > > (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
> > > > (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
> > > > (XEN) [VT-D]iommu.c:865: DMAR:[DMA Write] Request device
> > > > [0000:08:00.0] fault addr 0, iommu reg = ffff82c3ffd53000
> > > > (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
> > > > (XEN) print_vtd_entries: iommu ffff83043dca99b0 dev 0000:08:00.0 gmfn 0
> > > > (XEN)     root_entry = ffff83043dc6b000
> > > > (XEN)     root_entry[8] = 3326b5001
> > > > (XEN)     context = ffff8303326b5000
> > > > (XEN)     context[0] = 0_0
> > > > (XEN)     ctxt_entry[0] not present
> > > >
> > > >
> > > > Of course the '08:00.0' device does not exist. It is rather this chipset:
> > > > 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01)
> > > >
> > > > that is buggy and using the wrong BDF when forwarding DMA requests
> > > > from devices underneath it (like this Firewire chip).
> > > >
> > > > The hack I came up with was to create in the Xen code that deals
> > > > with PCI passthrough a copy of the bridge (so 07:00.0) but with a
> > > > new
> > > > BDF: 08:00.0. And link it to the PCI device that I am passing to the
> > > > guest (so 08:03.0).
> > > >
> > > > The end result is that when loading the driver (hack.c) one should
> > > > see:
> > > >
> > > > (XEN) 0000:08:00.0 linked with 08:03.0
> > > > (XEN) [VT-D]iommu.c:1456: d0:PCI: map 0000:08:00.0
> > > > (XEN) [VT-D]iommu.c:1476: d0:PCI: map 0000:08:03.0
> > > > (XEN) PCI add link 0000:08:00.0
> > > >
> > > > And when launching a guest with the BDF:
> > > > pci = ["08:03.0"]
> > > >
> > > > the hypervisor will automatically also create an VT-d context for
> > > > the
> > > > 08:00.0 device.
> > > >
> > > > To use this hack, apply the
> > > > 0001-xen-pci-Introduce-a-way-to-deal-with-buggy-hardware-.patch
> > > > to your hypervisor, compile and install.
> > > >
> > > > And also compile the 'hack.c' module. There is an attached 'Makefile'
> > > > that will do it for you. Make sure you edit it to set the right BDF
> > > > entries in it.
> > > >
> > > > Once done install your new hypervisor, and insmod ./hack.ko and try
> > > > passing in the device to your guest (or use it normally). The
> > > > 'DMAR:[DMA Write]' error should go away.
> > > >
> > > > This should be generic enough for most devices. It needn't be a
> > > > bridge that is spewing out these DMAR errors.
> > >
> > >
> > > Do you have an lspci -tv for the system?
> > 
> > Yes of course:
> > 
> > -[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v3 Processor DRAM
> > Controller
> >            +-01.0-[01]--+-00.0  Intel Corporation 82576 Gigabit Network
> > Connection
> >            |            \-00.1  Intel Corporation 82576 Gigabit Network
> > Connection
> >            +-01.1-[02]----00.0  LSI Logic / Symbios Logic SAS2008
> > PCI-Express Fusion-MPT SAS-2 [Falcon]
> >            +-02.0  Intel Corporation Xeon E3-1200 v3 Processor Integrated
> > Graphics Controller
> >            +-03.0  Intel Corporation Xeon E3-1200 v3/4th Gen Core
> > Processor HD Audio Controller
> >            +-14.0  Intel Corporation 8 Series/C220 Series Chipset Family
> > USB xHCI
> >            +-16.0  Intel Corporation 8 Series/C220 Series Chipset Family
> > MEI Controller #1
> >            +-19.0  Intel Corporation Ethernet Connection I217-LM
> >            +-1a.0  Intel Corporation 8 Series/C220 Series Chipset Family
> > USB EHCI #2
> >            +-1b.0  Intel Corporation 8 Series/C220 Series Chipset High
> > Definition Audio Controller
> >            +-1c.0-[03]----00.0  Intel Corporation 82574L Gigabit Network
> > Connection
> >            +-1c.1-[04]----00.0  Intel Corporation 82574L Gigabit Network
> > Connection
> >            +-1c.3-[05]----00.0  Intel Corporation I210 Gigabit Network
> > Connection
> >            +-1c.4-[06]--+-00.0  Intel Corporation 82571EB Gigabit Ethernet
> > Controller
> >            |            \-00.1  Intel Corporation 82571EB Gigabit
> > Ethernet Controller
> >            +-1c.5-[07-09]----00.0-[08-09]--+-01.0-[09]--+-08.0  Brooktree
> > Corporation Bt878 Video Capture
> >            |                               |            +-08.1
> > Brooktree Corporation Bt878 Audio Capture
> >            |                               |            +-09.0
> > Brooktree Corporation Bt878 Video Capture
> >            |                               |            +-09.1
> > Brooktree Corporation Bt878 Audio Capture
> >            |                               |            +-0a.0
> > Brooktree Corporation Bt878 Video Capture
> >            |                               |            +-0a.1
> > Brooktree Corporation Bt878 Audio Capture
> >            |                               |            +-0b.0
> > Brooktree Corporation Bt878 Video Capture
> >            |                               |            \-0b.1
> > Brooktree Corporation Bt878 Audio Capture
> >            |                               \-03.0  Texas
> > Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
> >            +-1c.6-[0a]----00.0  Renesas Technology Corp. uPD720202 USB
> > 3.0 Host Controller
> >            +-1c.7-[0b]----00.0  ASMedia Technology Inc. ASM1062 Serial ATA
> > Controller
> >            +-1d.0  Intel Corporation 8 Series/C220 Series Chipset Family
> > USB EHCI #1
> >            +-1f.0  Intel Corporation C226 Series Chipset Family Server
> > Advanced SKU LPC Controller
> >            +-1f.2  Intel Corporation 8 Series/C220 Series Chipset Family
> > 6-port SATA Controller 1 [AHCI mode]
> >            +-1f.3  Intel Corporation 8 Series/C220 Series Chipset Family
> > SMBus Controller
> >            \-1f.6  Intel Corporation 8 Series Chipset Family Thermal
> > Management Controller
> > 
> 
> What happens if you assign the devices under bus 09 to another guest?

Hadn't tried that. I think it would all blow up as the the non-existent
bridge is now assigned to one guest and the phantom DMA requests for the
09 would show up under the 08 device. I think I would corrupt the guest
memory with random DMA writes.

> Is it better to add Xen command line to add such devices to a group and assign the whole group to a guest when trying to assign a device of the group to guest?

Or implement the group assigment in QEMU or libxl so that nobody
tries doing it.
> 
> > >
> > > It is genuinely the case that the bridge doesn't exist, or simply that
> > > it is not correctly attributed in the DMAR table?
> > 
> > It does not exist. The DMAR looks correct.
> > 
> > (XEN) [VT-D]dmar.c:778: Host address width 39
> > (XEN) [VT-D]dmar.c:792: found ACPI_DMAR_DRHD:
> > (XEN) [VT-D]dmar.c:472:   dmaru->address = fed90000
> > (XEN) [VT-D]iommu.c:1158: drhd->address = fed90000 iommu->reg =
> > ffff82c3ffd54000
> > (XEN) [VT-D]iommu.c:1160: cap = c0000020660462 ecap = f0101a
> > (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
> > (XEN) [VT-D]dmar.c:792: found ACPI_DMAR_DRHD:
> > (XEN) [VT-D]dmar.c:472:   dmaru->address = fed91000
> > (XEN) [VT-D]iommu.c:1158: drhd->address = fed91000 iommu->reg =
> > ffff82c3ffd53000
> > (XEN) [VT-D]iommu.c:1160: cap = d2008020660462 ecap = f010da
> > (XEN) [VT-D]dmar.c:397:  IOAPIC: 0000:f0:1f.0
> > (XEN) [VT-D]dmar.c:361:  MSI HPET: 0000:f0:0f.0
> > (XEN) [VT-D]dmar.c:486:   flags: INCLUDE_ALL
> > (XEN) [VT-D]dmar.c:797: found ACPI_DMAR_RMRR:
> > (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1d.0
> > (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1a.0
> > (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:14.0
> > (XEN) [VT-D]dmar.c:666:   RMRR region: base_addr b7530000 end_address
> > b753cfff
> > (XEN) [VT-D]dmar.c:797: found ACPI_DMAR_RMRR:
> > (XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
> > (XEN) [VT-D]dmar.c:666:   RMRR region: base_addr bc000000 end_address
> > be1fffff
> > 
> > As it has the INCLUDE_ALL flag.
> > >
> > > If the latter, it Xen can probably gain some DMAR[$FOO]=$BAR command
> > > line workarounds similar to the IVRS ones for AMD systems.
> 
> 
> 
> > >
> > > ~Andrew> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 
> Best regards,
> Yang
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-14 17:51       ` Konrad Rzeszutek Wilk
@ 2014-03-17  1:03         ` Zhang, Yang Z
  2014-03-17 20:00           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2014-03-17  1:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

Konrad Rzeszutek Wilk wrote on 2014-03-15:
>> 
>> What happens if you assign the devices under bus 09 to another guest?
> 
> Hadn't tried that. I think it would all blow up as the the
> non-existent bridge is now assigned to one guest and the phantom DMA
> requests for the
> 09 would show up under the 08 device. I think I would corrupt the
> guest memory with random DMA writes.
> 
>> Is it better to add Xen command line to add such devices to a group
>> and
> assign the whole group to a guest when trying to assign a device of
> the group to guest?
> 
> Or implement the group assigment in QEMU or libxl so that nobody tries
> doing it.

But I think user still need to tell which device is buggy manually and I don't think QEMU or libxl can do it.

Best regards,
Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-17  1:03         ` Zhang, Yang Z
@ 2014-03-17 20:00           ` Konrad Rzeszutek Wilk
  2014-03-19  0:32             ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-17 20:00 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote:
> Konrad Rzeszutek Wilk wrote on 2014-03-15:
> >> 
> >> What happens if you assign the devices under bus 09 to another guest?
> > 
> > Hadn't tried that. I think it would all blow up as the the
> > non-existent bridge is now assigned to one guest and the phantom DMA
> > requests for the
> > 09 would show up under the 08 device. I think I would corrupt the
> > guest memory with random DMA writes.
> > 
> >> Is it better to add Xen command line to add such devices to a group
> >> and
> > assign the whole group to a guest when trying to assign a device of
> > the group to guest?
> > 
> > Or implement the group assigment in QEMU or libxl so that nobody tries
> > doing it.
> 
> But I think user still need to tell which device is buggy manually and I don't think QEMU or libxl can do it.

I think there are two issues here: 

a) Missing device assigments via groups. That should be done irregardless
   if the device / hardware is buggy.

b) Buggy devices like the IDT bridge that I see. That is a seperate issue - and
   we just discussion if we want to inject that in the VT-d (or AMD-VI) what
   would be the mechanism to do that.

> 
> Best regards,
> Yang
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-17 20:00           ` Konrad Rzeszutek Wilk
@ 2014-03-19  0:32             ` Zhang, Yang Z
  2014-03-19 12:57               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2014-03-19  0:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

Konrad Rzeszutek Wilk wrote on 2014-03-18:
> On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote:
>> Konrad Rzeszutek Wilk wrote on 2014-03-15:
>>>> 
>>>> What happens if you assign the devices under bus 09 to another guest?
>>> 
>>> Hadn't tried that. I think it would all blow up as the the
>>> non-existent bridge is now assigned to one guest and the phantom
>>> DMA requests for the
>>> 09 would show up under the 08 device. I think I would corrupt the
>>> guest memory with random DMA writes.
>>> 
>>>> Is it better to add Xen command line to add such devices to a
>>>> group and
>>> assign the whole group to a guest when trying to assign a device
>>> of the group to guest?
>>> 
>>> Or implement the group assigment in QEMU or libxl so that nobody
>>> tries doing it.
>> 
>> But I think user still need to tell which device is buggy manually
>> and I don't
> think QEMU or libxl can do it.
> 
> I think there are two issues here:
> 
> a) Missing device assigments via groups. That should be done irregardless
>    if the device / hardware is buggy.
>

Yes, this is missing.

> b) Buggy devices like the IDT bridge that I see. That is a seperate issue - and
>    we just discussion if we want to inject that in the VT-d (or AMD-VI) what
>    would be the mechanism to do that.

The question is that device 08:00.0 doesn't exist in your platform, you only saw the BDF in the DMA transaction. How can you add a non-exist device to a group? 

>> 
>> Best regards,
>> Yang
>> 
>>


Best regards,
Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-19  0:32             ` Zhang, Yang Z
@ 2014-03-19 12:57               ` Konrad Rzeszutek Wilk
  2014-03-19 14:24                 ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-19 12:57 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

On Wed, Mar 19, 2014 at 12:32:31AM +0000, Zhang, Yang Z wrote:
> Konrad Rzeszutek Wilk wrote on 2014-03-18:
> > On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote:
> >> Konrad Rzeszutek Wilk wrote on 2014-03-15:
> >>>> 
> >>>> What happens if you assign the devices under bus 09 to another guest?
> >>> 
> >>> Hadn't tried that. I think it would all blow up as the the
> >>> non-existent bridge is now assigned to one guest and the phantom
> >>> DMA requests for the
> >>> 09 would show up under the 08 device. I think I would corrupt the
> >>> guest memory with random DMA writes.
> >>> 
> >>>> Is it better to add Xen command line to add such devices to a
> >>>> group and
> >>> assign the whole group to a guest when trying to assign a device
> >>> of the group to guest?
> >>> 
> >>> Or implement the group assigment in QEMU or libxl so that nobody
> >>> tries doing it.
> >> 
> >> But I think user still need to tell which device is buggy manually
> >> and I don't
> > think QEMU or libxl can do it.
> > 
> > I think there are two issues here:
> > 
> > a) Missing device assigments via groups. That should be done irregardless
> >    if the device / hardware is buggy.
> >
> 
> Yes, this is missing.
> 
> > b) Buggy devices like the IDT bridge that I see. That is a seperate issue - and
> >    we just discussion if we want to inject that in the VT-d (or AMD-VI) what
> >    would be the mechanism to do that.
> 
> The question is that device 08:00.0 doesn't exist in your platform, you only saw the BDF in the DMA transaction. How can you add a non-exist device to a group? 

Why do I need to add it to a group? The patch I posted (see first email in this thread)
just made a fake PCI device in the Xen hypervisor. But I don't see libxl nor
QEMU doing any group operations - so why are they required? If I just bundle
all of the PCI devices underneath that bridge to the guest it should be OK, shouldn't it?

> 
> >> 
> >> Best regards,
> >> Yang
> >> 
> >>
> 
> 
> Best regards,
> Yang
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-19 12:57               ` Konrad Rzeszutek Wilk
@ 2014-03-19 14:24                 ` Jan Beulich
  2014-03-20  0:48                   ` Zhang, Yang Z
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2014-03-19 14:24 UTC (permalink / raw)
  To: Yang Z Zhang, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

>>> On 19.03.14 at 13:57, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, Mar 19, 2014 at 12:32:31AM +0000, Zhang, Yang Z wrote:
>> Konrad Rzeszutek Wilk wrote on 2014-03-18:
>> > On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote:
>> > I think there are two issues here:
>> > 
>> > a) Missing device assigments via groups. That should be done irregardless
>> >    if the device / hardware is buggy.
>> >
>> 
>> Yes, this is missing.
>> 
>> > b) Buggy devices like the IDT bridge that I see. That is a seperate issue - 
> and
>> >    we just discussion if we want to inject that in the VT-d (or AMD-VI) what
>> >    would be the mechanism to do that.
>> 
>> The question is that device 08:00.0 doesn't exist in your platform, you only 
> saw the BDF in the DMA transaction. How can you add a non-exist device to a 
> group? 
> 
> Why do I need to add it to a group? The patch I posted (see first email in 
> this thread)
> just made a fake PCI device in the Xen hypervisor. But I don't see libxl nor
> QEMU doing any group operations - so why are they required? If I just bundle
> all of the PCI devices underneath that bridge to the guest it should be OK, 
> shouldn't it?

It should. You're in trouble if (by mistake) you don't pass them all,
and to avoid that is what the grouping seems to have been intended
for. The fact that only xend used it (and even then only for checking
rather to enforce the grouping) doesn't help it of course. But that
grouping issue is orthogonal to your issue, it's just that the group
assignment (if it were there) could take care of the assignment part
of your issue - the create-a-fake-device part would remain.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-19 14:24                 ` Jan Beulich
@ 2014-03-20  0:48                   ` Zhang, Yang Z
  2014-03-20  7:14                     ` Gordan Bobic
  2014-03-20  9:58                     ` Jan Beulich
  0 siblings, 2 replies; 22+ messages in thread
From: Zhang, Yang Z @ 2014-03-20  0:48 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

Jan Beulich wrote on 2014-03-19:
>>>> On 19.03.14 at 13:57, Konrad Rzeszutek Wilk
>>>> <konrad.wilk@oracle.com>
> wrote:
>> On Wed, Mar 19, 2014 at 12:32:31AM +0000, Zhang, Yang Z wrote:
>>> Konrad Rzeszutek Wilk wrote on 2014-03-18:
>>>> On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote:
>>>> I think there are two issues here:
>>>> 
>>>> a) Missing device assigments via groups. That should be done irregardless
>>>>    if the device / hardware is buggy.
>>> 
>>> Yes, this is missing.
>>> 
>>>> b) Buggy devices like the IDT bridge that I see. That is a
>>>> seperate issue -
>> and
>>>>    we just discussion if we want to inject that in the VT-d (or
>>>> AMD-VI)
> what
>>>>    would be the mechanism to do that.
>>> 
>>> The question is that device 08:00.0 doesn't exist in your platform,
>>> you only
>> saw the BDF in the DMA transaction. How can you add a non-exist
>> device to a group?
>> 
>> Why do I need to add it to a group? The patch I posted (see first
>> email in this thread) just made a fake PCI device in the Xen
>> hypervisor. But I don't see libxl nor QEMU doing any group
>> operations
>> - so why are they required? If I just bundle all of the PCI devices
>> underneath that bridge to the guest it should be OK, shouldn't it?
> 
> It should. You're in trouble if (by mistake) you don't pass them all,
> and to avoid that is what the grouping seems to have been intended
> for. The fact that only xend used it (and even then only for checking
> rather to enforce the grouping) doesn't help it of course. But that
> grouping issue is orthogonal to your issue, it's just that the group
> assignment (if it were there) could take care of the assignment part of your issue - the create-a-fake-device part would remain.

fake a device is a solution. But I am thinking (maybe I am wrong) why not setup all VT-d entries under a bridge if passing a PCI device under a bridge. Because when passing a PCI device under a bridge, all devices under bridge should be assigned to the guest too. What current Xen dose is only set the entry which has device, so why not extend it to setup all entries? In this case, there is no user input is required. 

> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
@ 2014-03-20  1:34 Konrad Rzeszutek Wilk
  2014-03-20 10:02 ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-20  1:34 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: andrew.cooper3, gordan@bobich.net, Xen Devel, Jan Beulich


On Mar 19, 2014 8:48 PM, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
>
> Jan Beulich wrote on 2014-03-19: 
> >>>> On 19.03.14 at 13:57, Konrad Rzeszutek Wilk 
> >>>> <konrad.wilk@oracle.com> 
> > wrote: 
> >> On Wed, Mar 19, 2014 at 12:32:31AM +0000, Zhang, Yang Z wrote: 
> >>> Konrad Rzeszutek Wilk wrote on 2014-03-18: 
> >>>> On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote: 
> >>>> I think there are two issues here: 
> >>>> 
> >>>> a) Missing device assigments via groups. That should be done irregardless 
> >>>>    if the device / hardware is buggy. 
> >>> 
> >>> Yes, this is missing. 
> >>> 
> >>>> b) Buggy devices like the IDT bridge that I see. That is a 
> >>>> seperate issue - 
> >> and 
> >>>>    we just discussion if we want to inject that in the VT-d (or 
> >>>> AMD-VI) 
> > what 
> >>>>    would be the mechanism to do that. 
> >>> 
> >>> The question is that device 08:00.0 doesn't exist in your platform, 
> >>> you only 
> >> saw the BDF in the DMA transaction. How can you add a non-exist 
> >> device to a group? 
> >> 
> >> Why do I need to add it to a group? The patch I posted (see first 
> >> email in this thread) just made a fake PCI device in the Xen 
> >> hypervisor. But I don't see libxl nor QEMU doing any group 
> >> operations 
> >> - so why are they required? If I just bundle all of the PCI devices 
> >> underneath that bridge to the guest it should be OK, shouldn't it? 
> > 
> > It should. You're in trouble if (by mistake) you don't pass them all, 
> > and to avoid that is what the grouping seems to have been intended 
> > for. The fact that only xend used it (and even then only for checking 
> > rather to enforce the grouping) doesn't help it of course. But that 
> > grouping issue is orthogonal to your issue, it's just that the group 
> > assignment (if it were there) could take care of the assignment part of your issue - the create-a-fake-device part would remain. 
>
> fake a device is a solution. But I am thinking (maybe I am wrong) why not setup all VT-d entries under a bridge if passing a PCI device under a bridge. Because when passing a PCI device under a bridge, all devices under bridge should be assigned to the guest too. What current Xen dose is only set the entry which has device, so why not extend it to setup all entries? In this case, there is no user input is required.

We are talking about two different things here.

To your idea of passing in all of the devices along that are under a bridge (or at least check for that) is sensible. We can't just pass in it in without checking that the devices have been deassigned from the dom0 device drivers. But if they are all 'floating' and not being in use - sure (thought maybe provide an option in the tool stack- we needn't to pass all of them if nobody else is using them).

The original issue in this thread was - what if any option should we come up to work around broken firmware that uses non-existent BDFs. Should it be some boot up parameter or should we alter an hyper call to allow creation of phantom devices under a bridge. Then later it can be assigned as part of a group to a guest.
>
> > 
> > Jan 
>
>
> Best regards, 
> Yang 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-20  0:48                   ` Zhang, Yang Z
@ 2014-03-20  7:14                     ` Gordan Bobic
  2014-03-20 10:04                       ` Jan Beulich
  2014-03-20  9:58                     ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Gordan Bobic @ 2014-03-20  7:14 UTC (permalink / raw)
  To: Zhang, Yang Z, Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, xen-devel@lists.xensource.com

On 03/20/2014 12:48 AM, Zhang, Yang Z wrote:
> Jan Beulich wrote on 2014-03-19:
>>>>> On 19.03.14 at 13:57, Konrad Rzeszutek Wilk
>>>>> <konrad.wilk@oracle.com>
>> wrote:
>>> On Wed, Mar 19, 2014 at 12:32:31AM +0000, Zhang, Yang Z wrote:
>>>> Konrad Rzeszutek Wilk wrote on 2014-03-18:
>>>>> On Mon, Mar 17, 2014 at 01:03:00AM +0000, Zhang, Yang Z wrote:
>>>>> I think there are two issues here:
>>>>>
>>>>> a) Missing device assigments via groups. That should be done irregardless
>>>>>     if the device / hardware is buggy.
>>>>
>>>> Yes, this is missing.
>>>>
>>>>> b) Buggy devices like the IDT bridge that I see. That is a
>>>>> seperate issue -
>>> and
>>>>>     we just discussion if we want to inject that in the VT-d (or
>>>>> AMD-VI)
>> what
>>>>>     would be the mechanism to do that.
>>>>
>>>> The question is that device 08:00.0 doesn't exist in your platform,
>>>> you only
>>> saw the BDF in the DMA transaction. How can you add a non-exist
>>> device to a group?
>>>
>>> Why do I need to add it to a group? The patch I posted (see first
>>> email in this thread) just made a fake PCI device in the Xen
>>> hypervisor. But I don't see libxl nor QEMU doing any group
>>> operations
>>> - so why are they required? If I just bundle all of the PCI devices
>>> underneath that bridge to the guest it should be OK, shouldn't it?
>>
>> It should. You're in trouble if (by mistake) you don't pass them all,
>> and to avoid that is what the grouping seems to have been intended
>> for. The fact that only xend used it (and even then only for checking
>> rather to enforce the grouping) doesn't help it of course. But that
>> grouping issue is orthogonal to your issue, it's just that the group
>> assignment (if it were there) could take care of the assignment part
>> of your issue - the create-a-fake-device part would remain.
>
> fake a device is a solution. But I am thinking (maybe I am wrong) why
> not setup all VT-d entries under a bridge if passing a PCI device under
> a bridge. Because when passing a PCI device under a bridge, all devices
> under bridge should be assigned to the guest too. What current Xen dose
> is only set the entry which has device, so why not extend it to setup
> all entries? In this case, there is no user input is required.

I'm not sure if I'm reading this right, but you wouldn't necessarily be 
passing all devices under a particular bridge to guests (and certainly 
not necessarily to the same guest). You could have multiple levels of 
bridges to provide extra PCIe links. One obvious example is dual GPU 
cards where both GPUs are under a bridge, but you wouldn't necessarily 
be passing both devices to a guest (one might be the primary GPU for the 
host).

Gordan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-20  0:48                   ` Zhang, Yang Z
  2014-03-20  7:14                     ` Gordan Bobic
@ 2014-03-20  9:58                     ` Jan Beulich
  2014-03-24  2:37                       ` Zhang, Yang Z
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2014-03-20  9:58 UTC (permalink / raw)
  To: Yang Z Zhang, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

>>> On 20.03.14 at 01:48, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> fake a device is a solution. But I am thinking (maybe I am wrong) why not 
> setup all VT-d entries under a bridge if passing a PCI device under a bridge. 
> Because when passing a PCI device under a bridge, all devices under bridge 
> should be assigned to the guest too. What current Xen dose is only set the 
> entry which has device, so why not extend it to setup all entries? In this 
> case, there is no user input is required. 

You'd have to prove that this doesn't impact isolation/security.
Just look at xend: It checks that all devices in a group are owned
by pciback/pci-stub, but it doesn't enforce assignment of all of
them. This might be intentional (namely for any intermediate
bridges).

But yes, I think this would address Konrad's problem.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-20  1:34 Dealing with non-existent BDF devices in VT-d and in the hardware Konrad Rzeszutek Wilk
@ 2014-03-20 10:02 ` Jan Beulich
  2014-03-20 19:44   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2014-03-20 10:02 UTC (permalink / raw)
  To: Yang Z Zhang, Konrad Rzeszutek Wilk
  Cc: andrew.cooper3, gordan@bobich.net, Xen Devel

>>> On 20.03.14 at 02:34, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mar 19, 2014 8:48 PM, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
>> fake a device is a solution. But I am thinking (maybe I am wrong) why not 
> setup all VT-d entries under a bridge if passing a PCI device under a bridge. 
> Because when passing a PCI device under a bridge, all devices under bridge 
> should be assigned to the guest too. What current Xen dose is only set the 
> entry which has device, so why not extend it to setup all entries? In this 
> case, there is no user input is required.
> 
> We are talking about two different things here.

Not really.

> To your idea of passing in all of the devices along that are under a bridge 
> (or at least check for that) is sensible. We can't just pass in it in without 
> checking that the devices have been deassigned from the dom0 device drivers. 

That's what xend is doing, but xl isn't.

> But if they are all 'floating' and not being in use - sure (thought maybe 
> provide an option in the tool stack- we needn't to pass all of them if nobody 
> else is using them).

Aiui he wasn't suggesting to pass them all to the guest, just to put all
IDs in the IOMMU tables, such that no faults would arise if an ID other
then the root-most PCI bridge's one or that of any _existing_ device.

> The original issue in this thread was - what if any option should we come up 
> to work around broken firmware that uses non-existent BDFs. Should it be some 
> boot up parameter or should we alter an hyper call to allow creation of 
> phantom devices under a bridge. Then later it can be assigned as part of a 
> group to a guest.

With Yang's proposal - provided its security can be proven - you
wouldn't need any command line override.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-20  7:14                     ` Gordan Bobic
@ 2014-03-20 10:04                       ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2014-03-20 10:04 UTC (permalink / raw)
  To: Gordan Bobic, Yang Z Zhang, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, xen-devel@lists.xensource.com

>>> On 20.03.14 at 08:14, Gordan Bobic <gordan@bobich.net> wrote:
> On 03/20/2014 12:48 AM, Zhang, Yang Z wrote:
>> fake a device is a solution. But I am thinking (maybe I am wrong) why
>> not setup all VT-d entries under a bridge if passing a PCI device under
>> a bridge. Because when passing a PCI device under a bridge, all devices
>> under bridge should be assigned to the guest too. What current Xen dose
>> is only set the entry which has device, so why not extend it to setup
>> all entries? In this case, there is no user input is required.
> 
> I'm not sure if I'm reading this right, but you wouldn't necessarily be 
> passing all devices under a particular bridge to guests (and certainly 
> not necessarily to the same guest). You could have multiple levels of 
> bridges to provide extra PCIe links. One obvious example is dual GPU 
> cards where both GPUs are under a bridge, but you wouldn't necessarily 
> be passing both devices to a guest (one might be the primary GPU for the 
> host).

PCIe devices behind PCI bridges still need to be treated as PCI ones
(i.e. not necessarily presenting their own IDs in transactions, due to
the intermediate non-express bridge). And it's only the non-PCIe
case we're talking about here - no problems of this kind are known for
PCIe devices.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-20 10:02 ` Jan Beulich
@ 2014-03-20 19:44   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-20 19:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Z Zhang, andrew.cooper3, gordan@bobich.net, Xen Devel

On Thu, Mar 20, 2014 at 10:02:01AM +0000, Jan Beulich wrote:
> >>> On 20.03.14 at 02:34, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Mar 19, 2014 8:48 PM, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> >> fake a device is a solution. But I am thinking (maybe I am wrong) why not 
> > setup all VT-d entries under a bridge if passing a PCI device under a bridge. 
> > Because when passing a PCI device under a bridge, all devices under bridge 
> > should be assigned to the guest too. What current Xen dose is only set the 
> > entry which has device, so why not extend it to setup all entries? In this 
> > case, there is no user input is required.
> > 
> > We are talking about two different things here.
> 
> Not really.
> 
> > To your idea of passing in all of the devices along that are under a bridge 
> > (or at least check for that) is sensible. We can't just pass in it in without 
> > checking that the devices have been deassigned from the dom0 device drivers. 
> 
> That's what xend is doing, but xl isn't.
> 
> > But if they are all 'floating' and not being in use - sure (thought maybe 
> > provide an option in the tool stack- we needn't to pass all of them if nobody 
> > else is using them).
> 
> Aiui he wasn't suggesting to pass them all to the guest, just to put all
> IDs in the IOMMU tables, such that no faults would arise if an ID other
> then the root-most PCI bridge's one or that of any _existing_ device.

Aha! Thank you explaining. That would certainly make it easier.
> 
> > The original issue in this thread was - what if any option should we come up 
> > to work around broken firmware that uses non-existent BDFs. Should it be some 
> > boot up parameter or should we alter an hyper call to allow creation of 
> > phantom devices under a bridge. Then later it can be assigned as part of a 
> > group to a guest.
> 
> With Yang's proposal - provided its security can be proven - you
> wouldn't need any command line override.

Right.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-20  9:58                     ` Jan Beulich
@ 2014-03-24  2:37                       ` Zhang, Yang Z
  2014-03-24  7:25                         ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Yang Z @ 2014-03-24  2:37 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

Jan Beulich wrote on 2014-03-20:
>>>> On 20.03.14 at 01:48, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
>> fake a device is a solution. But I am thinking (maybe I am wrong) why
>> not setup all VT-d entries under a bridge if passing a PCI device under
>> a bridge. Because when passing a PCI device under a bridge, all devices
>> under bridge should be assigned to the guest too. What current Xen dose
>> is only set the entry which has device, so why not extend it to setup
>> all entries? In this case, there is no user input is required.
> 
> You'd have to prove that this doesn't impact isolation/security.

Yes, this need more deeply think. 

BTW, do you see any potential issue with doing this?

> Just look at xend: It checks that all devices in a group are owned by
> pciback/pci-stub, but it doesn't enforce assignment of all of them.
> This might be intentional (namely for any intermediate bridges).
> 
> But yes, I think this would address Konrad's problem.
> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Dealing with non-existent BDF devices in VT-d and in the hardware.
  2014-03-24  2:37                       ` Zhang, Yang Z
@ 2014-03-24  7:25                         ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2014-03-24  7:25 UTC (permalink / raw)
  To: Yang Z Zhang
  Cc: Andrew Cooper, gordan@bobich.net, xen-devel@lists.xensource.com

>>> On 24.03.14 at 03:37, <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2014-03-20:
>>>>> On 20.03.14 at 01:48, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
>>> fake a device is a solution. But I am thinking (maybe I am wrong) why
>>> not setup all VT-d entries under a bridge if passing a PCI device under
>>> a bridge. Because when passing a PCI device under a bridge, all devices
>>> under bridge should be assigned to the guest too. What current Xen dose
>>> is only set the entry which has device, so why not extend it to setup
>>> all entries? In this case, there is no user input is required.
>> 
>> You'd have to prove that this doesn't impact isolation/security.
> 
> Yes, this need more deeply think. 
> 
> BTW, do you see any potential issue with doing this?

Not a concrete one - I simply think that security/isolation guarantees
are easier to validate if permissions for a guest are kept to the smallest
possible set. So without full proof of the security of above concept I
think we should accept the new behavior only as an opt-in (via
command line or domain config option).

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-03-24  7:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-20  1:34 Dealing with non-existent BDF devices in VT-d and in the hardware Konrad Rzeszutek Wilk
2014-03-20 10:02 ` Jan Beulich
2014-03-20 19:44   ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2014-03-11 17:30 Konrad Rzeszutek Wilk
2014-03-11 17:36 ` Andrew Cooper
2014-03-11 17:49   ` Konrad Rzeszutek Wilk
2014-03-14  2:18     ` Zhang, Yang Z
2014-03-14 17:51       ` Konrad Rzeszutek Wilk
2014-03-17  1:03         ` Zhang, Yang Z
2014-03-17 20:00           ` Konrad Rzeszutek Wilk
2014-03-19  0:32             ` Zhang, Yang Z
2014-03-19 12:57               ` Konrad Rzeszutek Wilk
2014-03-19 14:24                 ` Jan Beulich
2014-03-20  0:48                   ` Zhang, Yang Z
2014-03-20  7:14                     ` Gordan Bobic
2014-03-20 10:04                       ` Jan Beulich
2014-03-20  9:58                     ` Jan Beulich
2014-03-24  2:37                       ` Zhang, Yang Z
2014-03-24  7:25                         ` Jan Beulich
2014-03-12  9:17 ` Jan Beulich
2014-03-12 14:22   ` Konrad Rzeszutek Wilk
2014-03-12 17:10     ` Gordan Bobic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).