xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] AER unrecoverable error containment
@ 2017-06-27 17:14 Venu Busireddy
  2017-06-27 17:14 ` [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices Venu Busireddy
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Jan Beulich, Daniel De Graaf, Ian Jackson,
	Wei Liu, Marek Marczykowski-Górecki
  Cc: xen-devel

This patch set is part of a set of patchs that together allow containment
of unrecoverable AER errors from PCIe devices assigned to guests in
passthrough mode. The containment is achieved by killing the guest and
hiding the device upon receiving the fatal AER error.

The overall approach is:

1. Change the BIOS settings such that the AER error handling is delegated
   to the host.

2. Change the xen_pciback driver to store the name (SBDF) of the erring
   device in xenstore.

3. At the time of creating the guest, setup a watcher for such writes to
   the xenstore.

4. When the watcher is kicked off due to errors, *destroy* the guest and
   mark the erring device as unassignable until administrative intervention.

5. Provide command line tools to manage the erring devices.

This part implements the support for automatically destroying the guest
that has the device assigned to it, and marking the device as hidden.
Also implemented are the command line tools to manage the hidden devices.

Note:
When unrecoverable AER errors are detected from the PCIe devices
assigned to guests in passthrough mode, BIOS's bring down the server,
thus bringing down the entire hypervisor. For this patch set to work,
the AER error handling needs to be delegated to the host operating system.

Venu Busireddy (6):
  xen: Add support for hiding and unhiding pcie passthrough devices
  xl: Add commands for hiding and unhiding pcie passthrough devices
  libxc: Add wrappers for new commands
  libxl: Add wrappers for new commands and add AER error handler
  tools/python/xc: Update pyxc_methods with new commands
  docs: Document the new commands.

 docs/man/xl.pod.1.in                |  24 ++++++
 tools/libxc/include/xenctrl.h       |   4 +
 tools/libxc/xc_domain.c             |  38 +++++++++
 tools/libxl/libxl.h                 |   3 +
 tools/libxl/libxl_event.h           |   2 +
 tools/libxl/libxl_pci.c             | 150 +++++++++++++++++++++++++++++++++
 tools/python/xen/lowlevel/xc/xc.c   |  84 +++++++++++++++++++
 tools/xl/xl.h                       |   3 +
 tools/xl/xl_cmdtable.c              |  17 ++++
 tools/xl/xl_pci.c                   | 125 +++++++++++++++++++++++++++-
 tools/xl/xl_vmcontrol.c             |  11 +++
 xen/common/domctl.c                 |   6 ++
 xen/drivers/passthrough/pci.c       | 161 ++++++++++++++++++++++++++++++++++--
 xen/include/public/domctl.h         |   3 +
 xen/include/xsm/dummy.h             |  18 ++++
 xen/include/xsm/xsm.h               |  16 ++++
 xen/xsm/dummy.c                     |   3 +
 xen/xsm/flask/hooks.c               |  19 +++++
 xen/xsm/flask/policy/access_vectors |   6 +-
 19 files changed, 679 insertions(+), 14 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
@ 2017-06-27 17:14 ` Venu Busireddy
  2017-07-04 15:46   ` Jan Beulich
  2017-06-27 17:14 ` [PATCH 2/6] xl: Add commands " Venu Busireddy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Jan Beulich, Daniel De Graaf; +Cc: Elena Ufimtseva, xen-devel

xen: Add support for hiding and unhiding pcie passthrough devices

Add support for hiding and unhiding (by introducing two new hypercall
subops) pci devices that trigger AER fatal errors while assigned to
guests in passthrough mode. Hiding of the device is done by assigning
it to dom_xen dummy domain.

XEN_DOMCTL_hide_device subop is used after the domain is being destroyed
and iommu pages and context are mapped to dom0. Unhiding is a reverse
operation and done with XEN_DOMCTL_unhide_device called by the toolstack.

XSM hooks used for device hide, unhide, test_hidden use respectively
assign_device, deassign_device, and test_assign_device hooks.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 xen/common/domctl.c                 |   6 ++
 xen/drivers/passthrough/pci.c       | 161 ++++++++++++++++++++++++++++++++++--
 xen/include/public/domctl.h         |   3 +
 xen/include/xsm/dummy.h             |  18 ++++
 xen/include/xsm/xsm.h               |  16 ++++
 xen/xsm/dummy.c                     |   3 +
 xen/xsm/flask/hooks.c               |  19 +++++
 xen/xsm/flask/policy/access_vectors |   6 +-
 8 files changed, 220 insertions(+), 12 deletions(-)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 951a5dc..5e0f123 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -393,9 +393,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     {
     case XEN_DOMCTL_createdomain:
     case XEN_DOMCTL_test_assign_device:
+    case XEN_DOMCTL_test_hidden_device:
     case XEN_DOMCTL_gdbsx_guestmemio:
         d = NULL;
         break;
+    case XEN_DOMCTL_hide_device:
+    case XEN_DOMCTL_unhide_device:
+        rcu_lock_domain(dom_xen);
+        d = dom_xen;
+        break;
     default:
         d = rcu_lock_domain_by_id(op->domain);
         if ( !d && op->cmd != XEN_DOMCTL_getdomaininfo )
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c8e2d2d..ef4681a 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -31,6 +31,7 @@
 #include <xen/softirq.h>
 #include <xen/tasklet.h>
 #include <xsm/xsm.h>
+#include <xen/mm.h>
 #include <asm/msi.h>
 #include "ats.h"
 
@@ -1333,19 +1334,31 @@ int iommu_remove_device(struct pci_dev *pdev)
     return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev));
 }
 
+static bool device_assigned_to_domain(struct domain *d, u16 seg, u8 bus, u8 devfn)
+{
+    bool rc = false;
+
+    pcidevs_lock();
+
+    if ( pci_get_pdev_by_domain(d, seg, bus, devfn) )
+        rc = true;
+
+    pcidevs_unlock();
+    return rc;
+}
+
 /*
  * If the device isn't owned by the hardware domain, it means it already
  * has been assigned to other domain, or it doesn't exist.
  */
 static int device_assigned(u16 seg, u8 bus, u8 devfn)
 {
-    struct pci_dev *pdev;
-
-    pcidevs_lock();
-    pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
-    pcidevs_unlock();
+    return device_assigned_to_domain(hardware_domain, seg, bus, devfn) ? 0 : -EBUSY;
+}
 
-    return pdev ? 0 : -EBUSY;
+static int device_hidden(u16 seg, u8 bus, u8 devfn)
+{
+    return device_assigned_to_domain(dom_xen, seg, bus, devfn) ? -EBUSY : 0;
 }
 
 static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
@@ -1354,6 +1367,22 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     struct pci_dev *pdev;
     int rc = 0;
 
+    if ( device_hidden(seg, bus, devfn) )
+        return -EINVAL;
+
+    if ( d == dom_xen )
+    {
+        pdev = pci_get_pdev(seg, bus, devfn);
+        if ( pdev )
+        {
+            list_move(&pdev->domain_list, &dom_xen->arch.pdev_list);
+            pdev->domain = dom_xen;
+            return rc;
+        }
+        else
+            return -ENODEV;
+    }
+
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
@@ -1417,10 +1446,23 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     struct pci_dev *pdev = NULL;
     int ret = 0;
 
+    ASSERT(pcidevs_locked());
+
+    if ( d == dom_xen )
+    {
+        pdev = pci_get_pdev(seg, bus, devfn);
+        if ( pdev )
+        {
+            list_move(&pdev->domain_list, &hardware_domain->arch.pdev_list);
+            pdev->domain = hardware_domain;
+            return ret;
+        }
+        else return -ENODEV;
+    }
+
     if ( !iommu_enabled || !hd->platform_ops )
         return -EINVAL;
 
-    ASSERT(pcidevs_locked());
     pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
     if ( !pdev )
         return -ENODEV;
@@ -1600,6 +1642,15 @@ int iommu_do_pci_domctl(
                    seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
             ret = -EINVAL;
         }
+
+        if ( device_hidden(seg, bus, devfn) )
+        {
+            printk(XENLOG_G_INFO
+                   "%04x:%02x:%02x.%u device is hidden\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+            ret = -EINVAL;
+        }
+
         break;
 
     case XEN_DOMCTL_assign_device:
@@ -1636,8 +1687,15 @@ int iommu_do_pci_domctl(
             break;
         }
 
-        ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn, flag);
+        if ( device_hidden(seg, bus, devfn) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( !device_assigned(seg, bus, devfn) )
+            ret = assign_device(d, seg, bus, devfn, flag);
+
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
@@ -1671,6 +1729,12 @@ int iommu_do_pci_domctl(
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
 
+        if ( device_hidden(seg, bus, devfn) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
         pcidevs_lock();
         ret = deassign_device(d, seg, bus, devfn);
         pcidevs_unlock();
@@ -1679,7 +1743,86 @@ int iommu_do_pci_domctl(
                    "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
                    seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                    d->domain_id, ret);
+        break;
+
+    case XEN_DOMCTL_hide_device:
+        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
+        ret = xsm_hide_device(XSM_HOOK, d, machine_sbdf);
+        if ( ret )
+            break;
+
+        if ( unlikely(d->is_dying) )
+        {
+            ret = -EAGAIN;
+            break;
+        }
+
+        seg = machine_sbdf >> 16;
+        bus = PCI_BUS(machine_sbdf);
+        devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
+
+        if ( device_hidden(seg, bus, devfn) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        pcidevs_lock();
+        ret = assign_device(dom_xen, seg, bus, devfn, flag);
+        pcidevs_unlock();
+        if ( ret == -ERESTART )
+            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                "h", u_domctl);
+        else if ( ret )
+            printk(XENLOG_G_ERR "XEN_DOMCTL_hide_device: "
+                   "hide %04x:%02x:%02x.%u failed (%d)\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
+        break;
+
+    case XEN_DOMCTL_unhide_device:
+        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
+        ret = xsm_unhide_device(XSM_HOOK, d, machine_sbdf);
+        if ( ret )
+            break;
+
+        if ( unlikely(d->is_dying) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        seg = machine_sbdf >> 16;
+        bus = PCI_BUS(machine_sbdf);
+        devfn = PCI_DEVFN2(machine_sbdf);
+
+        if ( !device_hidden(seg, bus, devfn) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        pcidevs_lock();
+        ret = deassign_device(dom_xen, seg, bus, devfn);
+        pcidevs_unlock();
+
+        if ( ret == -ERESTART )
+            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                "h", u_domctl);
+        else if ( ret )
+            printk(XENLOG_G_ERR "XEN_DOMCTL_unhide_device: "
+                   "assign %04x:%02x:%02x.%u to dom%d failed (%d)\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                   d->domain_id, ret);
+        break;
+
+    case XEN_DOMCTL_test_hidden_device:
+        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
+        seg = machine_sbdf >> 16;
+        bus = PCI_BUS(machine_sbdf);
+        devfn = PCI_DEVFN2(machine_sbdf);
 
+        ret = device_hidden(seg, bus, devfn);
         break;
 
     default:
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index e6cf211..1b043ea 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1222,6 +1222,9 @@ struct xen_domctl {
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
 #define XEN_DOMCTL_gdbsx_domstatus             1003
+#define XEN_DOMCTL_hide_device                 2001
+#define XEN_DOMCTL_unhide_device               2002
+#define XEN_DOMCTL_test_hidden_device          2003
     uint32_t interface_version; /* XEN_DOMCTL_INTERFACE_VERSION */
     domid_t  domain;
     union {
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 62fcea6..0b820e1 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -355,6 +355,24 @@ static XSM_INLINE int xsm_deassign_device(XSM_DEFAULT_ARG struct domain *d, uint
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_hide_device(XSM_DEFAULT_ARG struct domain *d, uint32_t machine_bdf)
+{
+    XSM_ASSERT_ACTION(XSM_HOOK);
+    return xsm_default_action(action, current->domain, d);
+}
+
+static XSM_INLINE int xsm_unhide_device(XSM_DEFAULT_ARG struct domain *d, uint32_t machine_bdf)
+{
+    XSM_ASSERT_ACTION(XSM_HOOK);
+    return xsm_default_action(action, current->domain, d);
+}
+
+static XSM_INLINE int xsm_test_hidden_device(XSM_DEFAULT_ARG uint32_t machine_bdf)
+{
+    XSM_ASSERT_ACTION(XSM_HOOK);
+    return xsm_default_action(action, current->domain, NULL);
+}
+
 #endif /* HAS_PASSTHROUGH && HAS_PCI */
 
 #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE)
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 60c0fd6..03dbeff 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -479,6 +479,22 @@ static inline int xsm_deassign_device(xsm_default_t def, struct domain *d, uint3
 {
     return xsm_ops->deassign_device(d, machine_bdf);
 }
+
+static inline int xsm_hide_device(xsm_default_t def, struct domain *d, uint32_t machine_bdf)
+{
+    return xsm_ops->hide_device(d, machine_bdf);
+}
+
+static inline int xsm_unhide_device(xsm_default_t def, struct domain *d, uint32_t machine_bdf)
+{
+    return xsm_ops->unhide_device(d, machine_bdf);
+}
+
+static inline int xsm_test_hidden_device(xsm_default_t def, uint32_t machine_bdf)
+{
+    return xsm_ops->test_hidden_device(machine_bdf);
+}
+
 #endif /* HAS_PASSTHROUGH && HAS_PCI) */
 
 #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE)
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 3cb5492..78111bb 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -94,6 +94,9 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, test_assign_device);
     set_to_dummy_if_null(ops, assign_device);
     set_to_dummy_if_null(ops, deassign_device);
+    set_to_dummy_if_null(ops, hide_device);
+    set_to_dummy_if_null(ops, unhide_device);
+    set_to_dummy_if_null(ops, test_hidden_device);
 #endif
 
 #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE)
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index fd84ac0..3695768 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1311,6 +1311,22 @@ static int flask_deassign_device(struct domain *d, uint32_t machine_bdf)
 
     return avc_current_has_perm(rsid, SECCLASS_RESOURCE, RESOURCE__REMOVE_DEVICE, NULL);
 }
+
+static int flask_unhide_device(struct domain *d, uint32_t machine_bdf)
+{
+    return flask_deassign_device(d, machine_bdf);
+}
+
+static int flask_hide_device(struct domain *d, uint32_t machine_bdf)
+{
+    return flask_assign_device(d, machine_bdf);
+}
+
+static int flask_test_hidden_device(struct domain *d, uint32_t machine_bdf)
+{
+    return flask_test_assign_device(d, machine_bdf);
+}
+
 #endif /* HAS_PASSTHROUGH && HAS_PCI */
 
 #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE)
@@ -1783,6 +1799,9 @@ static struct xsm_operations flask_ops = {
     .test_assign_device = flask_test_assign_device,
     .assign_device = flask_assign_device,
     .deassign_device = flask_deassign_device,
+    .hide_device = flask_hide_device,
+    .unhide_device = flask_unhide_device,
+    .test_hidden_device = flask_test_hidden_device,
 #endif
 
 #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 1f7eb35..873df59 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -437,13 +437,13 @@ class resource
 # XEN_DOMCTL_iomem_permission, XEN_DOMCTL_memory_mapping
     add_iomem
     remove_iomem
-# XEN_DOMCTL_get_device_group, XEN_DOMCTL_test_assign_device:
+# XEN_DOMCTL_get_device_group, XEN_DOMCTL_test_assign_device, XEN_DOMCTL_test_hidden_device:
 #  source = domain making the hypercall
 #  target = device being queried
     stat_device
-# XEN_DOMCTL_assign_device
+# XEN_DOMCTL_assign_device, XEN_DOMCTL_hide_device
     add_device
-# XEN_DOMCTL_deassign_device
+# XEN_DOMCTL_deassign_device, XEN_DOMCTL_unhide_device
     remove_device
 # checked for PCI hot and cold-plug hypercalls, with target as the PCI device
 # checked for CPU and memory hotplug with xen_t as the target

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
  2017-06-27 17:14 ` [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices Venu Busireddy
@ 2017-06-27 17:14 ` Venu Busireddy
  2017-06-30 10:18   ` Wei Liu
  2017-06-27 17:14 ` [PATCH 3/6] libxc: Add wrappers for new commands Venu Busireddy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Ian Jackson, Wei Liu; +Cc: xen-devel

xl: Add commands for hiding and unhiding pcie passthrough devices

Introduce three subcommands: 'xl pci-assignable-hide <s:b:d.f>' to hide a
device, 'xl pci-assignable-unhide <s:b:d.f>' to unhide a previously hidden
device, and 'xl pci-assignable-list-hidden' to list the hidden devices.

Changed create_domain() to register a handler for unrecoverable AER
errors.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/xl/xl.h           |   3 ++
 tools/xl/xl_cmdtable.c  |  17 +++++++
 tools/xl/xl_pci.c       | 125 +++++++++++++++++++++++++++++++++++++++++++++++-
 tools/xl/xl_vmcontrol.c |  11 +++++
 4 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index aa95b77..915fe86 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -121,9 +121,12 @@ int main_vncviewer(int argc, char **argv);
 int main_pcilist(int argc, char **argv);
 int main_pcidetach(int argc, char **argv);
 int main_pciattach(int argc, char **argv);
+int main_pciassignable_hide(int argc, char **argv);
+int main_pciassignable_unhide(int argc, char **argv);
 int main_pciassignable_add(int argc, char **argv);
 int main_pciassignable_remove(int argc, char **argv);
 int main_pciassignable_list(int argc, char **argv);
+int main_pciassignable_list_hidden(int argc, char **argv);
 #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
 int main_restore(int argc, char **argv);
 int main_migrate_receive(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 30eb93c..e23bd15 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -119,6 +119,23 @@ struct cmd_spec cmd_table[] = {
       "List all the assignable pci devices",
       "",
     },
+    { "pci-assignable-list-hidden",
+      &main_pciassignable_list_hidden, 0, 0,
+      "List all the pci devices hidden due to AER errors",
+      "",
+    },
+    { "pci-assignable-hide",
+      &main_pciassignable_hide, 0, 1,
+      "Hide a PCI device",
+      "<BDF>",
+      "-h                      Print this help.\n"
+    },
+    { "pci-assignable-unhide",
+      &main_pciassignable_unhide, 0, 1,
+      "Unhide a PCI device",
+      "<BDF>",
+      "-h                      Print this help.\n"
+    },
     { "pause",
       &main_pause, 0, 1,
       "Pause execution of a domain",
diff --git a/tools/xl/xl_pci.c b/tools/xl/xl_pci.c
index 58345bd..f48c469 100644
--- a/tools/xl/xl_pci.c
+++ b/tools/xl/xl_pci.c
@@ -163,8 +163,9 @@ static void pciassignable_list(void)
     if ( pcidevs == NULL )
         return;
     for (i = 0; i < num; i++) {
-        printf("%04x:%02x:%02x.%01x\n",
-               pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func);
+        if (!libxl_device_pci_assignable_is_hidden(ctx, &pcidevs[i]))
+            printf("%04x:%02x:%02x.%01x\n",
+                   pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func);
         libxl_device_pci_dispose(&pcidevs[i]);
     }
     free(pcidevs);
@@ -182,6 +183,126 @@ int main_pciassignable_list(int argc, char **argv)
     return 0;
 }
 
+static void pciassignable_list_hidden(void)
+{
+    libxl_device_pci *pcidevs;
+    int num, i;
+
+    pcidevs = libxl_device_pci_assignable_list(ctx, &num);
+
+    if ( pcidevs == NULL )
+        return;
+    for (i = 0; i < num; i++) {
+        if (libxl_device_pci_assignable_is_hidden(ctx, &pcidevs[i]))
+            printf("%04x:%02x:%02x.%01x\n",
+                   pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func);
+        libxl_device_pci_dispose(&pcidevs[i]);
+    }
+    free(pcidevs);
+}
+
+int main_pciassignable_list_hidden(int argc, char **argv)
+{
+    int opt;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pci-assignable-list-hidden", 0) {
+        /* No options */
+    }
+
+    pciassignable_list_hidden();
+    return 0;
+}
+
+static int pciassignable_hide(const char *bdf)
+{
+    libxl_device_pci pcidev;
+    XLU_Config *config;
+    int r = EXIT_SUCCESS;
+
+    libxl_device_pci_init(&pcidev);
+
+    config = xlu_cfg_init(stderr, "command line");
+    if (!config) {
+        perror("xlu_cfg_init");
+        exit(-1);
+    }
+
+    if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
+        fprintf(stderr, "pci-assignable-hide: malformed BDF specification \"%s\"\n", bdf);
+        exit(2);
+    }
+
+    if (libxl_device_pci_assignable_hide(ctx, &pcidev))
+        r = EXIT_FAILURE;
+
+    libxl_device_pci_dispose(&pcidev);
+    xlu_cfg_destroy(config);
+
+    return r;
+}
+
+int main_pciassignable_hide(int argc, char **argv)
+{
+    int opt;
+    const char *bdf = NULL;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "main_pciassignable_hide", 1) {
+        /* No options */
+    }
+
+    bdf = argv[optind];
+
+    if (pciassignable_hide(bdf))
+        return EXIT_FAILURE;
+
+    return EXIT_SUCCESS;
+}
+
+static int pciassignable_unhide(const char *bdf)
+{
+    libxl_device_pci pcidev;
+    XLU_Config *config;
+    int r = EXIT_SUCCESS;
+
+    libxl_device_pci_init(&pcidev);
+
+    config = xlu_cfg_init(stderr, "command line");
+    if (!config) {
+        perror("xlu_cfg_init");
+        exit(-1);
+    }
+
+    if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
+        fprintf(stderr, "pci-assignable-unhide: malformed BDF specification \"%s\"\n", bdf);
+        exit(2);
+    }
+
+    if (libxl_device_pci_assignable_unhide(ctx, &pcidev))
+        r = EXIT_FAILURE;
+
+    libxl_device_pci_dispose(&pcidev);
+    xlu_cfg_destroy(config);
+
+    return r;
+}
+
+int main_pciassignable_unhide(int argc, char **argv)
+{
+    int opt;
+    const char *bdf = NULL;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "main_pciassignable_unhide", 1) {
+        /* No options */
+    }
+
+    bdf = argv[optind];
+
+    if (pciassignable_unhide(bdf))
+        return EXIT_FAILURE;
+
+    return EXIT_SUCCESS;
+}
+
 static int pciassignable_add(const char *bdf, int rebind)
 {
     libxl_device_pci pcidev;
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25..10a48a9 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -966,6 +966,15 @@ start:
     LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
         d_config.c_info.name, domid, (long)getpid());
 
+    ret = libxl_reg_aer_events_handler(ctx, domid);
+    if (ret) {
+        /*
+         * This error may not be severe enough to fail the creation of the VM.
+         * Log the error, and continue with the creation.
+         */
+        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
+    }
+
     ret = libxl_evenable_domain_death(ctx, domid, 0, &deathw);
     if (ret) goto out;
 
@@ -993,6 +1002,7 @@ start:
             LOG("Domain %u has shut down, reason code %d 0x%x", domid,
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
+            libxl_unreg_aer_events_handler(ctx, domid);
             switch (handle_domain_death(&domid, event, &d_config)) {
             case DOMAIN_RESTART_SOFT_RESET:
                 domid_soft_reset = domid;
@@ -1059,6 +1069,7 @@ start:
 
         case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
             LOG("Domain %u has been destroyed.", domid);
+            libxl_unreg_aer_events_handler(ctx, domid);
             libxl_event_free(ctx, event);
             ret = 0;
             goto out;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] libxc: Add wrappers for new commands
  2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
  2017-06-27 17:14 ` [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices Venu Busireddy
  2017-06-27 17:14 ` [PATCH 2/6] xl: Add commands " Venu Busireddy
@ 2017-06-27 17:14 ` Venu Busireddy
  2017-06-29 17:52   ` Wei Liu
  2017-06-27 17:14 ` [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler Venu Busireddy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Ian Jackson, Wei Liu; +Cc: xen-devel

libxc: Add wrappers for new commands

Add wrappers for the newly introduced commands "pci-assignable-hide",
"pci-assignable-unhide", and "pci-assignable-list-hidden".

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/libxc/include/xenctrl.h |  4 ++++
 tools/libxc/xc_domain.c       | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1629f41..9730285 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1670,6 +1670,10 @@ int xc_assign_device(xc_interface *xch,
                      uint32_t machine_sbdf,
                      uint32_t flag);
 
+int xc_hide_device(xc_interface *xch, uint32_t machine_bdf);
+int xc_unhide_device(xc_interface *xch, uint32_t machine_bdf);
+int xc_test_hidden_device(xc_interface *xch, uint32_t machine_bdf);
+
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
                      uint32_t machine_sbdf,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 00909ad4..714d632 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1501,6 +1501,44 @@ int xc_assign_device(
     return do_domctl(xch, &domctl);
 }
 
+int xc_hide_device(
+    xc_interface *xch,
+    uint32_t machine_sbdf)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_hide_device;
+    domctl.domain = DOMID_XEN;
+    domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_unhide_device(
+    xc_interface *xch,
+    uint32_t machine_sbdf)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_unhide_device;
+    domctl.domain = DOMID_XEN;
+    domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_test_hidden_device(
+    xc_interface *xch,
+    uint32_t machine_sbdf)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_test_hidden_device;
+    domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_get_device_group(
     xc_interface *xch,
     uint32_t domid,

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler
  2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
                   ` (2 preceding siblings ...)
  2017-06-27 17:14 ` [PATCH 3/6] libxc: Add wrappers for new commands Venu Busireddy
@ 2017-06-27 17:14 ` Venu Busireddy
  2017-06-30 10:18   ` Wei Liu
  2017-06-27 17:14 ` [PATCH 5/6] tools/python/xc: Update pyxc_methods with new commands Venu Busireddy
  2017-06-27 17:14 ` [PATCH 6/6] docs: Document the " Venu Busireddy
  5 siblings, 1 reply; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Ian Jackson, Wei Liu; +Cc: xen-devel

libxl: Add wrappers for new commands and add AER error handler

Add wrappers for the newly introduced commands "pci-assignable-hide",
"pci-assignable-unhide", and "pci-assignable-list-hidden".

Implement the callback function to handle unrecoverable AER errors.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/libxl/libxl.h       |   3 +
 tools/libxl/libxl_event.h |   2 +
 tools/libxl/libxl_pci.c   | 150 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 155 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index cf8687a..5a5bd14 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1944,6 +1944,9 @@ int libxl_device_events_handler(libxl_ctx *ctx,
 int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind);
 int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind);
 libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
+int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev);
+int libxl_device_pci_assignable_unhide(libxl_ctx *ctx, libxl_device_pci *pcidev);
+int libxl_device_pci_assignable_is_hidden(libxl_ctx *ctx, libxl_device_pci *pcidev);
 
 /* CPUID handling */
 int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 1ea789e..4c78798 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -178,6 +178,8 @@ void libxl_event_register_callbacks(libxl_ctx *ctx,
 typedef struct libxl__evgen_domain_death libxl_evgen_domain_death;
 int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid,
                          libxl_ev_user, libxl_evgen_domain_death **evgen_out);
+int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t) LIBXL_EXTERNAL_CALLERS_ONLY;
+void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t);
 void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
   /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH
    * events.  A domain which is destroyed before it shuts down
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index b14df16..e6996e5 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -874,6 +874,42 @@ int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev,
     return rc;
 }
 
+int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = xc_hide_device(ctx->xch, pcidev_encode_bdf(pcidev));
+    if (rc < 0)
+        LOGD(ERROR, 0, "xc_hide_device failed");
+
+    GC_FREE;
+    return rc;
+}
+
+int libxl_device_pci_assignable_unhide(libxl_ctx *ctx, libxl_device_pci *pcidev)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = xc_unhide_device(ctx->xch, pcidev_encode_bdf(pcidev));
+    if (rc < 0)
+        LOGD(ERROR, 0, "xc_unhide_device failed");
+
+    GC_FREE;
+    return rc;
+}
+
+int libxl_device_pci_assignable_is_hidden(libxl_ctx *ctx, libxl_device_pci *pcidev)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = xc_test_hidden_device(ctx->xch, pcidev_encode_bdf(pcidev));
+
+    GC_FREE;
+    return rc;
+}
 
 int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev,
                                        int rebind)
@@ -1292,6 +1328,120 @@ out:
     return rc;
 }
 
+static void domain_destroy_callback(libxl__egc *egc,
+                                    libxl__domain_destroy_state *dds,
+                                    int rc)
+{
+    STATE_AO_GC(dds->ao);
+
+    if (rc)
+        LOGD(ERROR, dds->domid, "Destruction of domain failed, rc = %d", rc);
+
+    libxl__nested_ao_free(ao);
+}
+
+
+typedef struct {
+    uint32_t domid;
+    libxl__ao *ao;
+    libxl__ev_xswatch watch;
+} libxl_aer_watch;
+static libxl_aer_watch aer_watch;
+
+static void aer_backend_watch_callback(libxl__egc *egc,
+                                       libxl__ev_xswatch *watch,
+                                       const char *watch_path,
+                                       const char *event_path)
+{
+    libxl_aer_watch *l_aer_watch = CONTAINER_OF(watch, *l_aer_watch, watch);
+    libxl__ao *nested_ao = libxl__nested_ao_create(l_aer_watch->ao);
+    STATE_AO_GC(nested_ao);
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    uint32_t domid = l_aer_watch->domid;
+    uint32_t seg, bus, dev, fn;
+    int rc;
+    char *p, *path, *dst_path;
+    const char *aerFailedSBDF;
+    struct xs_permissions rwperm[1];
+    libxl__domain_destroy_state *dds;
+    GCNEW(dds);
+
+    /* Extract the backend directory. */
+    path = libxl__strdup(gc, event_path);
+    p = strrchr(path, '/');
+    if (p == NULL)
+        goto skip;
+    if (strcmp(p, "/aerFailedSBDF") != 0)
+        goto skip;
+    /* Truncate the string so it points to the backend directory. */
+    *p = '\0';
+
+    /* Fetch the value of the failed PCI device. */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
+    if (rc || !aerFailedSBDF)
+        goto skip;
+    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &seg, &bus, &dev, &fn);
+
+    libxl_unreg_aer_events_handler(ctx, domid);
+
+    dds->ao = nested_ao;
+    dds->domid = domid;
+    dds->callback = domain_destroy_callback;
+    libxl__domain_destroy(egc, dds);
+
+    rc = xc_hide_device(ctx->xch, seg << 16 | bus << 8 | dev << 3 | fn);
+    if (rc)
+        LOGD(ERROR, domid, " xc_hide_device() failed, rc = %d", rc);
+
+    rwperm[0].id = 0;
+    rwperm[0].perms = XS_PERM_READ | XS_PERM_WRITE;
+    dst_path = GCSPRINTF("/local/domain/0/backend/pci/0/0/%s", "aerFailedPCIs");
+    rc = libxl__xs_mknod(gc, XBT_NULL, dst_path, rwperm, 1);
+    if (rc) {
+        LOGD(ERROR, domid, " libxl__xs_mknod() failed, rc = %d", rc);
+        goto skip;
+    }
+
+    rc = libxl__xs_write_checked(gc, XBT_NULL, dst_path, aerFailedSBDF);
+    if (rc)
+        LOGD(ERROR, domid, " libxl__xs_write_checked() failed, rc = %d", rc);
+
+skip:
+    return;
+}
+
+/* Handler of events for device driver domains */
+int libxl_reg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
+{
+    AO_CREATE(ctx, 0, 0);
+    int rc;
+    char *be_path;
+
+    /*
+     * We use absolute paths because we want xswatch to also return
+     * absolute paths that can be parsed by libxl__parse_backend_path.
+     */
+    aer_watch.ao = ao;
+    aer_watch.domid = domid;
+    be_path = GCSPRINTF("/local/domain/0/backend/pci/%u/0/aerFailedSBDF", domid);
+    rc = libxl__ev_xswatch_register(gc, &aer_watch.watch,
+                                    aer_backend_watch_callback, be_path);
+    if (rc)
+        return AO_CREATE_FAIL(rc);
+
+    return AO_INPROGRESS;
+}
+
+/* Handler of events for device driver domains */
+void libxl_unreg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
+{
+    GC_INIT(ctx);
+
+    libxl__ev_xswatch_deregister(gc, &aer_watch.watch);
+    return;
+}
+
 static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
                                libxl_domain_config *d_config,
                                libxl__multidev *multidev)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] tools/python/xc: Update pyxc_methods with new commands
  2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
                   ` (3 preceding siblings ...)
  2017-06-27 17:14 ` [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler Venu Busireddy
@ 2017-06-27 17:14 ` Venu Busireddy
  2017-06-27 17:14 ` [PATCH 6/6] docs: Document the " Venu Busireddy
  5 siblings, 0 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Ian Jackson, Wei Liu,
	Marek Marczykowski-Górecki
  Cc: xen-devel

tools/python/xc: Update pyxc_methods with new commands

Add pyxc_unhide_device() and pyxc_hide_device(), and update pyxc_methods.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/python/xen/lowlevel/xc/xc.c | 84 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 5d112af..ad822df 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -609,6 +609,76 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     return Py_BuildValue("i", sbdf);
 }
 
+static PyObject *pyxc_hide_device(XcObject *self,
+                                  PyObject *args,
+                                  PyObject *kwds)
+{
+    uint32_t sbdf = 0;
+    char *pci_str;
+    int seg, bus, dev, func;
+    static char *kwd_list[] = { "pci", NULL };
+
+    if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list, &pci_str) )
+    {
+        sbdf = -1;
+        goto end_hide;
+    }
+
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    {
+        sbdf = seg << 16;
+        sbdf |= (bus & 0xff) << 8;
+        sbdf |= (dev & 0x1f) << 3;
+        sbdf |= (func & 0x7);
+
+        if ( xc_hide_device(self->xc_handle, sbdf) != 0 )
+        {
+            if ( errno == ENOSYS )
+                sbdf = -1;
+            break;
+        }
+        sbdf = 0;
+    }
+
+end_hide:
+    return Py_BuildValue("i", sbdf);
+}
+
+static PyObject *pyxc_unhide_device(XcObject *self,
+                                    PyObject *args,
+                                    PyObject *kwds)
+{
+    uint32_t sbdf = 0;
+    char *pci_str;
+    int seg, bus, dev, func;
+    static char *kwd_list[] = { "pci", NULL };
+
+    if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list, &pci_str) )
+    {
+        sbdf = -1;
+        goto end_unhide;
+    }
+
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    {
+        sbdf = seg << 16;
+        sbdf |= (bus & 0xff) << 8;
+        sbdf |= (dev & 0x1f) << 3;
+        sbdf |= (func & 0x7);
+
+        if ( xc_unhide_device(self->xc_handle, sbdf) != 0 )
+        {
+            if ( errno == ENOSYS )
+                sbdf = -1;
+            break;
+        }
+        sbdf = 0;
+    }
+
+end_unhide:
+    return Py_BuildValue("i", sbdf);
+}
+
 static PyObject *pyxc_get_device_group(XcObject *self,
                                          PyObject *args)
 {
@@ -2233,7 +2303,21 @@ static PyMethodDef pyxc_methods[] = {
        " dom     [int]:      Domain to deassign device from.\n"
        " pci_str [str]:      PCI devices.\n"
        "Returns: [int] 0 on success, or device bdf that can't be deassigned.\n" },
+
+     { "hide_device",
+       (PyCFunction)pyxc_hide_device,
+       METH_VARARGS | METH_KEYWORDS, "\n"
+       "Hide device after AER fatal error trigger.\n"
+       " pci_str [str]:      PCI devices.\n"
+       "Returns: [int] device bdf on success or -1 if it cant be hidden.\n" },
   
+     { "unhide_device",
+       (PyCFunction)pyxc_unhide_device,
+       METH_VARARGS | METH_KEYWORDS, "\n"
+       "Unhide hidden device after AER fatal error trigger.\n"
+       " pci_str [str]:      PCI devices.\n"
+       "Returns: [int] device bdf on success or -1 if it cant be unhidden.\n" },
+
     { "sched_id_get",
       (PyCFunction)pyxc_sched_id_get,
       METH_NOARGS, "\n"

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] docs: Document the new commands.
  2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
                   ` (4 preceding siblings ...)
  2017-06-27 17:14 ` [PATCH 5/6] tools/python/xc: Update pyxc_methods with new commands Venu Busireddy
@ 2017-06-27 17:14 ` Venu Busireddy
  5 siblings, 0 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-06-27 17:14 UTC (permalink / raw)
  To: venu.busireddy, Ian Jackson, Wei Liu; +Cc: xen-devel

docs: Document the new commands.

Add documentation for the newly added commands "pci-assignable-hide",
"pci-assignable-unhide", and "pci-assignable-list-hidden".

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 docs/man/xl.pod.1.in | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index 78bf884..86f7089 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1462,6 +1462,13 @@ These are devices in the system which are configured to be
 available for passthrough and are bound to a suitable PCI
 backend driver in domain 0 rather than a real driver.
 
+=item B<pci-assignable-list-hidden>
+
+List all the assignable PCI devices that are hidden.
+When a PCI device assigned to a guest in passthrough mode causes
+unrecoverable AER errors, the hypervisor shuts down the guest and hides
+the device from being assignable to the guests.
+
 =item B<pci-assignable-add> I<BDF>
 
 Make the device at PCI Bus/Device/Function BDF assignable to guests.
@@ -1484,6 +1491,23 @@ it will also attempt to re-bind the device to its original driver, making it
 usable by Domain 0 again.  If the device is not bound to pciback, it will
 return success.
 
+=item B<pci-assignable-hide> I<BDF>
+
+Hide the device at PCI Bus/Device/Function BDF from being assignable
+to guests, similar to the way the hypervisor would hide the device that
+caused unrecoverable AER errors.
+When a PCI device assigned to a guest in passthrough mode causes
+unrecoverable AER errors, the hypervisor shuts down the guest and hides
+the device from being assignable to the guests.
+
+=item B<pci-assignable-unhide> I<BDF>
+
+Unhide the device at PCI Bus/Device/Function BDF that was previously
+hidden by the hypervisor due to unrecoverable AER errors.
+When a PCI device assigned to a guest in passthrough mode causes
+unrecoverable AER errors, the hypervisor shuts down the guest and hides
+the device from being assignable to the guests.
+
 =item B<pci-attach> I<domain-id> I<BDF>
 
 Hot-plug a new pass-through pci device to the specified domain.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] libxc: Add wrappers for new commands
  2017-06-27 17:14 ` [PATCH 3/6] libxc: Add wrappers for new commands Venu Busireddy
@ 2017-06-29 17:52   ` Wei Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2017-06-29 17:52 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Wei Liu, Ian Jackson, xen-devel

On Tue, Jun 27, 2017 at 12:14:55PM -0500, Venu Busireddy wrote:
> libxc: Add wrappers for new commands

Extraneous line here.

> 
> Add wrappers for the newly introduced commands "pci-assignable-hide",
> "pci-assignable-unhide", and "pci-assignable-list-hidden".
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h |  4 ++++
>  tools/libxc/xc_domain.c       | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 1629f41..9730285 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1670,6 +1670,10 @@ int xc_assign_device(xc_interface *xch,
>                       uint32_t machine_sbdf,
>                       uint32_t flag);
>  
> +int xc_hide_device(xc_interface *xch, uint32_t machine_bdf);
> +int xc_unhide_device(xc_interface *xch, uint32_t machine_bdf);
> +int xc_test_hidden_device(xc_interface *xch, uint32_t machine_bdf);
> +
>  int xc_get_device_group(xc_interface *xch,
>                       uint32_t domid,
>                       uint32_t machine_sbdf,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 00909ad4..714d632 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1501,6 +1501,44 @@ int xc_assign_device(
>      return do_domctl(xch, &domctl);
>  }
>  
> +int xc_hide_device(
> +    xc_interface *xch,
> +    uint32_t machine_sbdf)


  int xc_hide_device(xc_interface *xch, uint32_t machine_sbdf)

Otherwise this patch looks good.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-06-27 17:14 ` [PATCH 2/6] xl: Add commands " Venu Busireddy
@ 2017-06-30 10:18   ` Wei Liu
  2017-07-05 19:52     ` Venu Busireddy
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-06-30 10:18 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Wei Liu, Ian Jackson, xen-devel

I haven't reviewed the code in detail, but I have some questions
regarding the design. See the end of this email.

On Tue, Jun 27, 2017 at 12:14:54PM -0500, Venu Busireddy wrote:
>  
> +static void pciassignable_list_hidden(void)
> +{
> +    libxl_device_pci *pcidevs;
> +    int num, i;
> +
> +    pcidevs = libxl_device_pci_assignable_list(ctx, &num);
> +
> +    if ( pcidevs == NULL )

Coding style.

> +        return;
> +    for (i = 0; i < num; i++) {
> +        if (libxl_device_pci_assignable_is_hidden(ctx, &pcidevs[i]))
> +            printf("%04x:%02x:%02x.%01x\n",
> +                   pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func);
> +        libxl_device_pci_dispose(&pcidevs[i]);
> +    }
> +    free(pcidevs);
> +}
> +
> +int main_pciassignable_list_hidden(int argc, char **argv)
> +{
> +    int opt;
> +
> +    SWITCH_FOREACH_OPT(opt, "", NULL, "pci-assignable-list-hidden", 0) {
> +        /* No options */
> +    }
> +
> +    pciassignable_list_hidden();
> +    return 0;
> +}
> +
> +static int pciassignable_hide(const char *bdf)
> +{
> +    libxl_device_pci pcidev;
> +    XLU_Config *config;
> +    int r = EXIT_SUCCESS;
> +
> +    libxl_device_pci_init(&pcidev);
> +
> +    config = xlu_cfg_init(stderr, "command line");
> +    if (!config) {
> +        perror("xlu_cfg_init");
> +        exit(-1);

If you don't want EXIT_FAILURE, please document these exit values
somewhere -- manpage would be a good place.

> +    }
> +
> +    if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
> +        fprintf(stderr, "pci-assignable-hide: malformed BDF specification \"%s\"\n", bdf);
> +        exit(2);
> +    }
> +
> +    if (libxl_device_pci_assignable_hide(ctx, &pcidev))
> +        r = EXIT_FAILURE;
> +
> +    libxl_device_pci_dispose(&pcidev);
> +    xlu_cfg_destroy(config);
> +
> +    return r;
> +}
> +
> +int main_pciassignable_hide(int argc, char **argv)
> +{
> +    int opt;
> +    const char *bdf = NULL;
> +
> +    SWITCH_FOREACH_OPT(opt, "", NULL, "main_pciassignable_hide", 1) {
> +        /* No options */
> +    }
> +
> +    bdf = argv[optind];
> +
> +    if (pciassignable_hide(bdf))
> +        return EXIT_FAILURE;
> +
> +    return EXIT_SUCCESS;
> +}
[...]
> diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
> index 89c2b25..10a48a9 100644
> --- a/tools/xl/xl_vmcontrol.c
> +++ b/tools/xl/xl_vmcontrol.c
> @@ -966,6 +966,15 @@ start:
>      LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
>          d_config.c_info.name, domid, (long)getpid());
>  
> +    ret = libxl_reg_aer_events_handler(ctx, domid);
> +    if (ret) {
> +        /*
> +         * This error may not be severe enough to fail the creation of the VM.
> +         * Log the error, and continue with the creation.
> +         */
> +        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
> +    }
> +

First thing this suggests the ordering of this patch series is wrong --
you need to put the patch that implements the new function before this.

The other thing you need to be aware is that if the user chooses to not
use a daemonised xl, he / she doesn't get a chance to handle these
events.

This is potentially problematic for driver domains. You probably want to
also modify xl devd command. Also on the subject, what's your thought on
driver domain? I'm not sure if a driver domain has the permission to
kill the guest.

>      ret = libxl_evenable_domain_death(ctx, domid, 0, &deathw);
>      if (ret) goto out;
>  
> @@ -993,6 +1002,7 @@ start:
>              LOG("Domain %u has shut down, reason code %d 0x%x", domid,
>                  event->u.domain_shutdown.shutdown_reason,
>                  event->u.domain_shutdown.shutdown_reason);
> +            libxl_unreg_aer_events_handler(ctx, domid);
>              switch (handle_domain_death(&domid, event, &d_config)) {
>              case DOMAIN_RESTART_SOFT_RESET:
>                  domid_soft_reset = domid;
> @@ -1059,6 +1069,7 @@ start:
>  
>          case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
>              LOG("Domain %u has been destroyed.", domid);
> +            libxl_unreg_aer_events_handler(ctx, domid);
>              libxl_event_free(ctx, event);
>              ret = 0;
>              goto out;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler
  2017-06-27 17:14 ` [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler Venu Busireddy
@ 2017-06-30 10:18   ` Wei Liu
  2017-07-05 20:06     ` Venu Busireddy
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-06-30 10:18 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Wei Liu, Ian Jackson, xen-devel

On Tue, Jun 27, 2017 at 12:14:56PM -0500, Venu Busireddy wrote:
> libxl: Add wrappers for new commands and add AER error handler

Extraneous line.

> 
> Add wrappers for the newly introduced commands "pci-assignable-hide",
> "pci-assignable-unhide", and "pci-assignable-list-hidden".
> 
> Implement the callback function to handle unrecoverable AER errors.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  tools/libxl/libxl.h       |   3 +
>  tools/libxl/libxl_event.h |   2 +
>  tools/libxl/libxl_pci.c   | 150 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 155 insertions(+)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index cf8687a..5a5bd14 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1944,6 +1944,9 @@ int libxl_device_events_handler(libxl_ctx *ctx,
>  int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind);
>  int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind);
>  libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
> +int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev);
> +int libxl_device_pci_assignable_unhide(libxl_ctx *ctx, libxl_device_pci *pcidev);
> +int libxl_device_pci_assignable_is_hidden(libxl_ctx *ctx, libxl_device_pci *pcidev);
>  
>  /* CPUID handling */
>  int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index 1ea789e..4c78798 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -178,6 +178,8 @@ void libxl_event_register_callbacks(libxl_ctx *ctx,
>  typedef struct libxl__evgen_domain_death libxl_evgen_domain_death;
>  int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid,
>                           libxl_ev_user, libxl_evgen_domain_death **evgen_out);
> +int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t) LIBXL_EXTERNAL_CALLERS_ONLY;
> +void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t);
>  void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
>    /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH
>     * events.  A domain which is destroyed before it shuts down
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index b14df16..e6996e5 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -874,6 +874,42 @@ int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev,
>      return rc;
>  }
>  
> +int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev)
> +{
> +    GC_INIT(ctx);
> +    int rc;

int r;

> +
> +    rc = xc_hide_device(ctx->xch, pcidev_encode_bdf(pcidev));

r = xc_...

> +    if (rc < 0)

if (r < 0) { LOG(...); rc = ERROR_???; }

> +        LOGD(ERROR, 0, "xc_hide_device failed");
> +
> +    GC_FREE;
> +    return rc;
> +}
> +
[...]
>  
>  int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev,
>                                         int rebind)
> @@ -1292,6 +1328,120 @@ out:
>      return rc;
>  }
>  
> +static void domain_destroy_callback(libxl__egc *egc,
> +                                    libxl__domain_destroy_state *dds,
> +                                    int rc)
> +{
> +    STATE_AO_GC(dds->ao);
> +
> +    if (rc)
> +        LOGD(ERROR, dds->domid, "Destruction of domain failed, rc = %d", rc);
> +
> +    libxl__nested_ao_free(ao);
> +}
> +
> +
> +typedef struct {
> +    uint32_t domid;
> +    libxl__ao *ao;
> +    libxl__ev_xswatch watch;
> +} libxl_aer_watch;
> +static libxl_aer_watch aer_watch;
> +
> +static void aer_backend_watch_callback(libxl__egc *egc,
> +                                       libxl__ev_xswatch *watch,
> +                                       const char *watch_path,
> +                                       const char *event_path)
> +{
> +    libxl_aer_watch *l_aer_watch = CONTAINER_OF(watch, *l_aer_watch, watch);
> +    libxl__ao *nested_ao = libxl__nested_ao_create(l_aer_watch->ao);
> +    STATE_AO_GC(nested_ao);
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    uint32_t domid = l_aer_watch->domid;
> +    uint32_t seg, bus, dev, fn;
> +    int rc;
> +    char *p, *path, *dst_path;
> +    const char *aerFailedSBDF;
> +    struct xs_permissions rwperm[1];
> +    libxl__domain_destroy_state *dds;
> +    GCNEW(dds);
> +
> +    /* Extract the backend directory. */
> +    path = libxl__strdup(gc, event_path);
> +    p = strrchr(path, '/');
> +    if (p == NULL)
> +        goto skip;
> +    if (strcmp(p, "/aerFailedSBDF") != 0)
> +        goto skip;
> +    /* Truncate the string so it points to the backend directory. */
> +    *p = '\0';
> +
> +    /* Fetch the value of the failed PCI device. */
> +    rc = libxl__xs_read_checked(gc, XBT_NULL,
> +            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
> +    if (rc || !aerFailedSBDF)
> +        goto skip;
> +    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &seg, &bus, &dev, &fn);
> +
> +    libxl_unreg_aer_events_handler(ctx, domid);

You need to be careful about calling a public API. In this case I think
either calling libxl__ev_xswatch_deregister directly or having something
like libxl__unregister_aer_events_handler(libxl__gc *gc, ...).

> +
> +    dds->ao = nested_ao;
> +    dds->domid = domid;
> +    dds->callback = domain_destroy_callback;
> +    libxl__domain_destroy(egc, dds);
> +
> +    rc = xc_hide_device(ctx->xch, seg << 16 | bus << 8 | dev << 3 | fn);

Please provide a helper function or macro to return sbdf form instead of
open coding.

> +    if (rc)
> +        LOGD(ERROR, domid, " xc_hide_device() failed, rc = %d", rc);
> +
> +    rwperm[0].id = 0;
> +    rwperm[0].perms = XS_PERM_READ | XS_PERM_WRITE;
> +    dst_path = GCSPRINTF("/local/domain/0/backend/pci/0/0/%s", "aerFailedPCIs");
> +    rc = libxl__xs_mknod(gc, XBT_NULL, dst_path, rwperm, 1);
> +    if (rc) {
> +        LOGD(ERROR, domid, " libxl__xs_mknod() failed, rc = %d", rc);
> +        goto skip;
> +    }
> +
> +    rc = libxl__xs_write_checked(gc, XBT_NULL, dst_path, aerFailedSBDF);
> +    if (rc)
> +        LOGD(ERROR, domid, " libxl__xs_write_checked() failed, rc = %d", rc);
> +
> +skip:
> +    return;
> +}
> +
> +/* Handler of events for device driver domains */
> +int libxl_reg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
> +{
> +    AO_CREATE(ctx, 0, 0);
> +    int rc;
> +    char *be_path;
> +
> +    /*
> +     * We use absolute paths because we want xswatch to also return
> +     * absolute paths that can be parsed by libxl__parse_backend_path.
> +     */
> +    aer_watch.ao = ao;
> +    aer_watch.domid = domid;
> +    be_path = GCSPRINTF("/local/domain/0/backend/pci/%u/0/aerFailedSBDF", domid);

Again, what about driver domain?

> +    rc = libxl__ev_xswatch_register(gc, &aer_watch.watch,
> +                                    aer_backend_watch_callback, be_path);
> +    if (rc)
> +        return AO_CREATE_FAIL(rc);
> +
> +    return AO_INPROGRESS;
> +}
> +
> +/* Handler of events for device driver domains */
> +void libxl_unreg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)

unregister

> +{
> +    GC_INIT(ctx);
> +
> +    libxl__ev_xswatch_deregister(gc, &aer_watch.watch);
>

GC_FREE

> +    return;
> +}
> +
>  static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
>                                 libxl_domain_config *d_config,
>                                 libxl__multidev *multidev)
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-06-27 17:14 ` [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices Venu Busireddy
@ 2017-07-04 15:46   ` Jan Beulich
  2017-07-05 19:38     ` Venu Busireddy
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-07-04 15:46 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Elena Ufimtseva, Daniel De Graaf, xen-devel

>>> On 27.06.17 at 19:14, <venu.busireddy@oracle.com> wrote:

First of all, please Cc all maintainers of code you modify.

> xen: Add support for hiding and unhiding pcie passthrough devices

Please don't repeat the subject in the body of the mail.

> Add support for hiding and unhiding (by introducing two new hypercall
> subops) pci devices that trigger AER fatal errors while assigned to
> guests in passthrough mode. Hiding of the device is done by assigning
> it to dom_xen dummy domain.

Would you mind explaining why simply de-assigning the device
(with an existing operation) isn't suitable here? (This explanation
would presumably belong either in the description here or in the
cover letter.)

> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -393,9 +393,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      {
>      case XEN_DOMCTL_createdomain:
>      case XEN_DOMCTL_test_assign_device:
> +    case XEN_DOMCTL_test_hidden_device:
>      case XEN_DOMCTL_gdbsx_guestmemio:
>          d = NULL;
>          break;
> +    case XEN_DOMCTL_hide_device:
> +    case XEN_DOMCTL_unhide_device:
> +        rcu_lock_domain(dom_xen);
> +        d = dom_xen;
> +        break;

I'm opposed to the introduction of new operations which ignore the
input domain ID. See my recent patch eliminating this for
XEN_DOMCTL_test_assign_device [1]. If these really are domain
independent operations, they ought to be sysctls.

> @@ -1333,19 +1334,31 @@ int iommu_remove_device(struct pci_dev *pdev)
>      return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev));
>  }
>  
> +static bool device_assigned_to_domain(struct domain *d, u16 seg, u8 bus, u8 devfn)
> +{
> +    bool rc = false;
> +
> +    pcidevs_lock();
> +
> +    if ( pci_get_pdev_by_domain(d, seg, bus, devfn) )
> +        rc = true;
> +
> +    pcidevs_unlock();
> +    return rc;
> +}
> +
>  /*
>   * If the device isn't owned by the hardware domain, it means it already
>   * has been assigned to other domain, or it doesn't exist.
>   */
>  static int device_assigned(u16 seg, u8 bus, u8 devfn)
>  {
> -    struct pci_dev *pdev;
> -
> -    pcidevs_lock();
> -    pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
> -    pcidevs_unlock();
> +    return device_assigned_to_domain(hardware_domain, seg, bus, devfn) ? 0 : -EBUSY;
> +}
>  
> -    return pdev ? 0 : -EBUSY;
> +static int device_hidden(u16 seg, u8 bus, u8 devfn)
> +{
> +    return device_assigned_to_domain(dom_xen, seg, bus, devfn) ? -EBUSY : 0;
>  }

At least this new function you add wants to return bool. I cannot
see how -EBUSY could be an appropriate return value for meaning
"yes".

> @@ -1354,6 +1367,22 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>      struct pci_dev *pdev;
>      int rc = 0;
>  
> +    if ( device_hidden(seg, bus, devfn) )
> +        return -EINVAL;
> +
> +    if ( d == dom_xen )
> +    {
> +        pdev = pci_get_pdev(seg, bus, devfn);
> +        if ( pdev )
> +        {
> +            list_move(&pdev->domain_list, &dom_xen->arch.pdev_list);
> +            pdev->domain = dom_xen;
> +            return rc;
> +        }
> +        else
> +            return -ENODEV;
> +    }
> +
>      if ( !iommu_enabled || !hd->platform_ops )
>          return 0;

Your addition appears to be misplaced (would belong below the
checks seen above). Additionally you fail to acquire the pcidevs
lock. And the code would likely read better if you inverted the
inner if()'s condition and omitted the "else" and the braces.
Finally I'd prefer if you used d instead of dom_xen everywhere
inside the outer if().

> @@ -1679,7 +1743,86 @@ int iommu_do_pci_domctl(
>                     "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
>                     seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>                     d->domain_id, ret);
> +        break;
> +
> +    case XEN_DOMCTL_hide_device:
> +        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
> +        ret = xsm_hide_device(XSM_HOOK, d, machine_sbdf);
> +        if ( ret )
> +            break;
> +
> +        if ( unlikely(d->is_dying) )
> +        {
> +            ret = -EAGAIN;
> +            break;
> +        }
> +
> +        seg = machine_sbdf >> 16;
> +        bus = PCI_BUS(machine_sbdf);
> +        devfn = PCI_DEVFN2(machine_sbdf);
> +        flag = domctl->u.assign_device.flag;
> +
> +        if ( device_hidden(seg, bus, devfn) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        pcidevs_lock();
> +        ret = assign_device(dom_xen, seg, bus, devfn, flag);
> +        pcidevs_unlock();
> +        if ( ret == -ERESTART )
> +            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
> +                                                "h", u_domctl);
> +        else if ( ret )
> +            printk(XENLOG_G_ERR "XEN_DOMCTL_hide_device: "
> +                   "hide %04x:%02x:%02x.%u failed (%d)\n",
> +                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
> +        break;
> +
> +    case XEN_DOMCTL_unhide_device:
> +        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
> +        ret = xsm_unhide_device(XSM_HOOK, d, machine_sbdf);
> +        if ( ret )
> +            break;
> +
> +        if ( unlikely(d->is_dying) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        seg = machine_sbdf >> 16;
> +        bus = PCI_BUS(machine_sbdf);
> +        devfn = PCI_DEVFN2(machine_sbdf);
> +
> +        if ( !device_hidden(seg, bus, devfn) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        pcidevs_lock();
> +        ret = deassign_device(dom_xen, seg, bus, devfn);
> +        pcidevs_unlock();
> +
> +        if ( ret == -ERESTART )
> +            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
> +                                                "h", u_domctl);
> +        else if ( ret )
> +            printk(XENLOG_G_ERR "XEN_DOMCTL_unhide_device: "
> +                   "assign %04x:%02x:%02x.%u to dom%d failed (%d)\n",
> +                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                   d->domain_id, ret);
> +        break;

As it looks you're duplicating a whole lot of code here, with just
minor variations to the original. This is not a good idea
maintenance wise, so you'd have to have a good reason for
doing so.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1222,6 +1222,9 @@ struct xen_domctl {
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>  #define XEN_DOMCTL_gdbsx_domstatus             1003
> +#define XEN_DOMCTL_hide_device                 2001
> +#define XEN_DOMCTL_unhide_device               2002
> +#define XEN_DOMCTL_test_hidden_device          2003

Why these strange numbers?

> @@ -1783,6 +1799,9 @@ static struct xsm_operations flask_ops = {
>      .test_assign_device = flask_test_assign_device,
>      .assign_device = flask_assign_device,
>      .deassign_device = flask_deassign_device,
> +    .hide_device = flask_hide_device,
> +    .unhide_device = flask_unhide_device,
> +    .test_hidden_device = flask_test_hidden_device,
>  #endif

This is contrary to what you say in the description, and without
respective fields being added to struct xsm_operations I can't
see how this would build.

Jan

[1] https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg02871.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-04 15:46   ` Jan Beulich
@ 2017-07-05 19:38     ` Venu Busireddy
  2017-07-05 20:48       ` Konrad Rzeszutek Wilk
  2017-07-06  8:45       ` Jan Beulich
  0 siblings, 2 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-07-05 19:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Daniel De Graaf

On 2017-07-04 09:46:58 -0600, Jan Beulich wrote:
> >>> On 27.06.17 at 19:14, <venu.busireddy@oracle.com> wrote:
> 
> First of all, please Cc all maintainers of code you modify.

I was using the names spit out by the scripts/get_maintainer.pl script
for the patch file. I didn't know that the script had a "-f" option, and
without it, the script spits out only two names, which I included. I now
have Cc'ed all the names that the "-f" option produced. Interestingly,
Daniel's name is not in the "-f" output, and hence, I am still confused
what the correct list is!

> > xen: Add support for hiding and unhiding pcie passthrough devices
> 
> Please don't repeat the subject in the body of the mail.

This is a mistake. Will fix it.

> > Add support for hiding and unhiding (by introducing two new hypercall
> > subops) pci devices that trigger AER fatal errors while assigned to
> > guests in passthrough mode. Hiding of the device is done by assigning
> > it to dom_xen dummy domain.
> 
> Would you mind explaining why simply de-assigning the device
> (with an existing operation) isn't suitable here? (This explanation
> would presumably belong either in the description here or in the
> cover letter.)

My initial thinking (for the first revision) was that the guest and
the device together are party to the evil things, and hence the guest
should be killed. But I agree that unassigning the device should be
sufficient. Once the device is removed, the guest can't do much that
any other guest can't. Therefore, I will change this patchset to simply
unassign the device from the guest.

Is that acceptable?

> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -393,9 +393,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >      {
> >      case XEN_DOMCTL_createdomain:
> >      case XEN_DOMCTL_test_assign_device:
> > +    case XEN_DOMCTL_test_hidden_device:
> >      case XEN_DOMCTL_gdbsx_guestmemio:
> >          d = NULL;
> >          break;
> > +    case XEN_DOMCTL_hide_device:
> > +    case XEN_DOMCTL_unhide_device:
> > +        rcu_lock_domain(dom_xen);
> > +        d = dom_xen;
> > +        break;
> 
> I'm opposed to the introduction of new operations which ignore the
> input domain ID. See my recent patch eliminating this for
> XEN_DOMCTL_test_assign_device [1]. If these really are domain
> independent operations, they ought to be sysctls.

Do you think there is a need to hide the device after unassigning it
from the guest? If the answer is "no", then this code will go away. If
the answer is "yes", then I will change it as you did in your reference
[1]. Please let me know.

> > @@ -1333,19 +1334,31 @@ int iommu_remove_device(struct pci_dev *pdev)
> >      return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev));
> >  }
> >  
> > +static bool device_assigned_to_domain(struct domain *d, u16 seg, u8 bus, u8 devfn)
> > +{
> > +    bool rc = false;
> > +
> > +    pcidevs_lock();
> > +
> > +    if ( pci_get_pdev_by_domain(d, seg, bus, devfn) )
> > +        rc = true;
> > +
> > +    pcidevs_unlock();
> > +    return rc;
> > +}
> > +
> >  /*
> >   * If the device isn't owned by the hardware domain, it means it already
> >   * has been assigned to other domain, or it doesn't exist.
> >   */
> >  static int device_assigned(u16 seg, u8 bus, u8 devfn)
> >  {
> > -    struct pci_dev *pdev;
> > -
> > -    pcidevs_lock();
> > -    pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
> > -    pcidevs_unlock();
> > +    return device_assigned_to_domain(hardware_domain, seg, bus, devfn) ? 0 : -EBUSY;
> > +}
> >  
> > -    return pdev ? 0 : -EBUSY;
> > +static int device_hidden(u16 seg, u8 bus, u8 devfn)
> > +{
> > +    return device_assigned_to_domain(dom_xen, seg, bus, devfn) ? -EBUSY : 0;
> >  }
> 
> At least this new function you add wants to return bool. I cannot
> see how -EBUSY could be an appropriate return value for meaning
> "yes".

Will change, if this code stays (depends on the answer to question above).

> 
> > @@ -1354,6 +1367,22 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
> >      struct pci_dev *pdev;
> >      int rc = 0;
> >  
> > +    if ( device_hidden(seg, bus, devfn) )
> > +        return -EINVAL;
> > +
> > +    if ( d == dom_xen )
> > +    {
> > +        pdev = pci_get_pdev(seg, bus, devfn);
> > +        if ( pdev )
> > +        {
> > +            list_move(&pdev->domain_list, &dom_xen->arch.pdev_list);
> > +            pdev->domain = dom_xen;
> > +            return rc;
> > +        }
> > +        else
> > +            return -ENODEV;
> > +    }
> > +
> >      if ( !iommu_enabled || !hd->platform_ops )
> >          return 0;
> 
> Your addition appears to be misplaced (would belong below the

Will change, if this code stays...

> checks seen above). Additionally you fail to acquire the pcidevs

I am acquiring the lock in iommu_do_pci_domctl() in the case
"XEN_DOMCTL_hide_device." Is that not sufficient?

> lock. And the code would likely read better if you inverted the
> inner if()'s condition and omitted the "else" and the braces.
> Finally I'd prefer if you used d instead of dom_xen everywhere
> inside the outer if().

Will change, if this code stays...

> > @@ -1679,7 +1743,86 @@ int iommu_do_pci_domctl(
> >                     "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
> >                     seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> >                     d->domain_id, ret);
> > +        break;
> > +
> > +    case XEN_DOMCTL_hide_device:
> > +        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
> > +        ret = xsm_hide_device(XSM_HOOK, d, machine_sbdf);
> > +        if ( ret )
> > +            break;
> > +
> > +        if ( unlikely(d->is_dying) )
> > +        {
> > +            ret = -EAGAIN;
> > +            break;
> > +        }
> > +
> > +        seg = machine_sbdf >> 16;
> > +        bus = PCI_BUS(machine_sbdf);
> > +        devfn = PCI_DEVFN2(machine_sbdf);
> > +        flag = domctl->u.assign_device.flag;
> > +
> > +        if ( device_hidden(seg, bus, devfn) )
> > +        {
> > +            ret = -EINVAL;
> > +            break;
> > +        }
> > +
> > +        pcidevs_lock();
> > +        ret = assign_device(dom_xen, seg, bus, devfn, flag);
> > +        pcidevs_unlock();
> > +        if ( ret == -ERESTART )
> > +            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
> > +                                                "h", u_domctl);
> > +        else if ( ret )
> > +            printk(XENLOG_G_ERR "XEN_DOMCTL_hide_device: "
> > +                   "hide %04x:%02x:%02x.%u failed (%d)\n",
> > +                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
> > +        break;
> > +
> > +    case XEN_DOMCTL_unhide_device:
> > +        machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf;
> > +        ret = xsm_unhide_device(XSM_HOOK, d, machine_sbdf);
> > +        if ( ret )
> > +            break;
> > +
> > +        if ( unlikely(d->is_dying) )
> > +        {
> > +            ret = -EINVAL;
> > +            break;
> > +        }
> > +
> > +        seg = machine_sbdf >> 16;
> > +        bus = PCI_BUS(machine_sbdf);
> > +        devfn = PCI_DEVFN2(machine_sbdf);
> > +
> > +        if ( !device_hidden(seg, bus, devfn) )
> > +        {
> > +            ret = -EINVAL;
> > +            break;
> > +        }
> > +
> > +        pcidevs_lock();
> > +        ret = deassign_device(dom_xen, seg, bus, devfn);
> > +        pcidevs_unlock();
> > +
> > +        if ( ret == -ERESTART )
> > +            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
> > +                                                "h", u_domctl);
> > +        else if ( ret )
> > +            printk(XENLOG_G_ERR "XEN_DOMCTL_unhide_device: "
> > +                   "assign %04x:%02x:%02x.%u to dom%d failed (%d)\n",
> > +                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> > +                   d->domain_id, ret);
> > +        break;
> 
> As it looks you're duplicating a whole lot of code here, with just
> minor variations to the original. This is not a good idea
> maintenance wise, so you'd have to have a good reason for
> doing so.

The only good reason I have is that this is much easier to read. If you
prefer to code the way you coded in your reference [1], I can change
it. Do you want me to?

> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -1222,6 +1222,9 @@ struct xen_domctl {
> >  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
> >  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> >  #define XEN_DOMCTL_gdbsx_domstatus             1003
> > +#define XEN_DOMCTL_hide_device                 2001
> > +#define XEN_DOMCTL_unhide_device               2002
> > +#define XEN_DOMCTL_test_hidden_device          2003
> 
> Why these strange numbers?

I saw the numbers jump from 79 to 1000 thru 1003, and likewise used
different starting numbers. Would you prefer 80 thru 82, or 1004 thru
1006? Of course, depends on whether we support the hide/unhide operations.

> > @@ -1783,6 +1799,9 @@ static struct xsm_operations flask_ops = {
> >      .test_assign_device = flask_test_assign_device,
> >      .assign_device = flask_assign_device,
> >      .deassign_device = flask_deassign_device,
> > +    .hide_device = flask_hide_device,
> > +    .unhide_device = flask_unhide_device,
> > +    .test_hidden_device = flask_test_hidden_device,
> >  #endif
> 
> This is contrary to what you say in the description, and without
> respective fields being added to struct xsm_operations I can't
> see how this would build.

My mistake! I did not have XSM enabled in my build, and hence the
build went through, and I didn't realize that I forgot to change the
xsm_operations structure! Will fix it!

Venu

> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-06/msg02871.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-06-30 10:18   ` Wei Liu
@ 2017-07-05 19:52     ` Venu Busireddy
  2017-07-07 10:56       ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Venu Busireddy @ 2017-07-05 19:52 UTC (permalink / raw)
  To: Wei Liu
  Cc: Tim Deegan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich


On 2017-06-30 11:18:10 +0100, Wei Liu wrote:
> I haven't reviewed the code in detail, but I have some questions
> regarding the design. See the end of this email.
> 
> On Tue, Jun 27, 2017 at 12:14:54PM -0500, Venu Busireddy wrote:
> >  
> > +static void pciassignable_list_hidden(void)
> > +{
> > +    libxl_device_pci *pcidevs;
> > +    int num, i;
> > +
> > +    pcidevs = libxl_device_pci_assignable_list(ctx, &num);
> > +
> > +    if ( pcidevs == NULL )
> 
> Coding style.

Will fix.

> > +        return;
> > +    for (i = 0; i < num; i++) {
> > +        if (libxl_device_pci_assignable_is_hidden(ctx, &pcidevs[i]))
> > +            printf("%04x:%02x:%02x.%01x\n",
> > +                   pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func);
> > +        libxl_device_pci_dispose(&pcidevs[i]);
> > +    }
> > +    free(pcidevs);
> > +}
> > +
> > +int main_pciassignable_list_hidden(int argc, char **argv)
> > +{
> > +    int opt;
> > +
> > +    SWITCH_FOREACH_OPT(opt, "", NULL, "pci-assignable-list-hidden", 0) {
> > +        /* No options */
> > +    }
> > +
> > +    pciassignable_list_hidden();
> > +    return 0;
> > +}
> > +
> > +static int pciassignable_hide(const char *bdf)
> > +{
> > +    libxl_device_pci pcidev;
> > +    XLU_Config *config;
> > +    int r = EXIT_SUCCESS;
> > +
> > +    libxl_device_pci_init(&pcidev);
> > +
> > +    config = xlu_cfg_init(stderr, "command line");
> > +    if (!config) {
> > +        perror("xlu_cfg_init");
> > +        exit(-1);
> 
> If you don't want EXIT_FAILURE, please document these exit values
> somewhere -- manpage would be a good place.

I was following the semantics that other similar functions in that file
(such as pciassignable_add(), etc.) were following, and hence the exit
value of '-1'. I will change this to exit with EXIT_FAILURE.

> > +    }
> > +
> > +    if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
> > +        fprintf(stderr, "pci-assignable-hide: malformed BDF specification \"%s\"\n", bdf);
> > +        exit(2);
> > +    }
> > +
> > +    if (libxl_device_pci_assignable_hide(ctx, &pcidev))
> > +        r = EXIT_FAILURE;
> > +
> > +    libxl_device_pci_dispose(&pcidev);
> > +    xlu_cfg_destroy(config);
> > +
> > +    return r;
> > +}
> > +
> > +int main_pciassignable_hide(int argc, char **argv)
> > +{
> > +    int opt;
> > +    const char *bdf = NULL;
> > +
> > +    SWITCH_FOREACH_OPT(opt, "", NULL, "main_pciassignable_hide", 1) {
> > +        /* No options */
> > +    }
> > +
> > +    bdf = argv[optind];
> > +
> > +    if (pciassignable_hide(bdf))
> > +        return EXIT_FAILURE;
> > +
> > +    return EXIT_SUCCESS;
> > +}
> [...]
> > diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
> > index 89c2b25..10a48a9 100644
> > --- a/tools/xl/xl_vmcontrol.c
> > +++ b/tools/xl/xl_vmcontrol.c
> > @@ -966,6 +966,15 @@ start:
> >      LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
> >          d_config.c_info.name, domid, (long)getpid());
> >  
> > +    ret = libxl_reg_aer_events_handler(ctx, domid);
> > +    if (ret) {
> > +        /*
> > +         * This error may not be severe enough to fail the creation of the VM.
> > +         * Log the error, and continue with the creation.
> > +         */
> > +        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
> > +    }
> > +
> 
> First thing this suggests the ordering of this patch series is wrong --
> you need to put the patch that implements the new function before this.

I will change the order in the next revision.

> The other thing you need to be aware is that if the user chooses to not
> use a daemonised xl, he / she doesn't get a chance to handle these
> events.
> 
> This is potentially problematic for driver domains. You probably want to
> also modify xl devd command. Also on the subject, what's your thought on
> driver domain? I'm not sure if a driver domain has the permission to
> kill the guest.

I don't know if I understood your question correctly, but it is not the
driver domain that is killing another guest. It is Dom0 that is killing
the guest to which the device is assigned in passthrough mode. That guest
should still be killable by Dom0, even if it is a driver domain. Right?

However, I have been asked by Jan Beulich (and many others) on the
need to kill the guest, and why the device can't be unassigned from
that guest! My initial thinking (for the first revision) was that the
guest and the device together are party to evil things, and hence the
guest should be killed. But I agree that unassigning the device should
be sufficient. Once the device is removed, the guest can't do much that
any other guest can't. Therefore, I plan to change this patchset to
simply unassign the device from the guest. This aspect is also covered
in the thread:

https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00552.html

May I request you review that thread and post your thoughts?

And if we go with that approach, some of the questions related to
hide/unhide operations will be obviated!

Thanks,

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler
  2017-06-30 10:18   ` Wei Liu
@ 2017-07-05 20:06     ` Venu Busireddy
  0 siblings, 0 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-07-05 20:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Jan Beulich, xen-devel

On 2017-06-30 11:18:15 +0100, Wei Liu wrote:
> On Tue, Jun 27, 2017 at 12:14:56PM -0500, Venu Busireddy wrote:
> > libxl: Add wrappers for new commands and add AER error handler
> 
> Extraneous line.

My mistake. Will remove it.

> > 
> > Add wrappers for the newly introduced commands "pci-assignable-hide",
> > "pci-assignable-unhide", and "pci-assignable-list-hidden".
> > 
> > Implement the callback function to handle unrecoverable AER errors.
> > 
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> >  tools/libxl/libxl.h       |   3 +
> >  tools/libxl/libxl_event.h |   2 +
> >  tools/libxl/libxl_pci.c   | 150 ++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 155 insertions(+)
> > 
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index cf8687a..5a5bd14 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -1944,6 +1944,9 @@ int libxl_device_events_handler(libxl_ctx *ctx,
> >  int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind);
> >  int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind);
> >  libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
> > +int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev);
> > +int libxl_device_pci_assignable_unhide(libxl_ctx *ctx, libxl_device_pci *pcidev);
> > +int libxl_device_pci_assignable_is_hidden(libxl_ctx *ctx, libxl_device_pci *pcidev);
> >  
> >  /* CPUID handling */
> >  int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
> > diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> > index 1ea789e..4c78798 100644
> > --- a/tools/libxl/libxl_event.h
> > +++ b/tools/libxl/libxl_event.h
> > @@ -178,6 +178,8 @@ void libxl_event_register_callbacks(libxl_ctx *ctx,
> >  typedef struct libxl__evgen_domain_death libxl_evgen_domain_death;
> >  int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid,
> >                           libxl_ev_user, libxl_evgen_domain_death **evgen_out);
> > +int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t) LIBXL_EXTERNAL_CALLERS_ONLY;
> > +void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t);
> >  void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
> >    /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH
> >     * events.  A domain which is destroyed before it shuts down
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index b14df16..e6996e5 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -874,6 +874,42 @@ int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev,
> >      return rc;
> >  }
> >  
> > +int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev)
> > +{
> > +    GC_INIT(ctx);
> > +    int rc;
> 
> int r;
> 
> > +
> > +    rc = xc_hide_device(ctx->xch, pcidev_encode_bdf(pcidev));
> 
> r = xc_...
> 
> > +    if (rc < 0)
> 
> if (r < 0) { LOG(...); rc = ERROR_???; }
> 
> > +        LOGD(ERROR, 0, "xc_hide_device failed");
> > +
> > +    GC_FREE;
> > +    return rc;
> > +}
> > +
> [...]

Will make these changes, if deemed necessary, depending on what we agree
on the final solution. This code may go away if our approach changes.
Please see [1] and [2] below.

> >  
> >  int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev,
> >                                         int rebind)
> > @@ -1292,6 +1328,120 @@ out:
> >      return rc;
> >  }
> >  
> > +static void domain_destroy_callback(libxl__egc *egc,
> > +                                    libxl__domain_destroy_state *dds,
> > +                                    int rc)
> > +{
> > +    STATE_AO_GC(dds->ao);
> > +
> > +    if (rc)
> > +        LOGD(ERROR, dds->domid, "Destruction of domain failed, rc = %d", rc);
> > +
> > +    libxl__nested_ao_free(ao);
> > +}
> > +
> > +
> > +typedef struct {
> > +    uint32_t domid;
> > +    libxl__ao *ao;
> > +    libxl__ev_xswatch watch;
> > +} libxl_aer_watch;
> > +static libxl_aer_watch aer_watch;
> > +
> > +static void aer_backend_watch_callback(libxl__egc *egc,
> > +                                       libxl__ev_xswatch *watch,
> > +                                       const char *watch_path,
> > +                                       const char *event_path)
> > +{
> > +    libxl_aer_watch *l_aer_watch = CONTAINER_OF(watch, *l_aer_watch, watch);
> > +    libxl__ao *nested_ao = libxl__nested_ao_create(l_aer_watch->ao);
> > +    STATE_AO_GC(nested_ao);
> > +    libxl_ctx *ctx = libxl__gc_owner(gc);
> > +    uint32_t domid = l_aer_watch->domid;
> > +    uint32_t seg, bus, dev, fn;
> > +    int rc;
> > +    char *p, *path, *dst_path;
> > +    const char *aerFailedSBDF;
> > +    struct xs_permissions rwperm[1];
> > +    libxl__domain_destroy_state *dds;
> > +    GCNEW(dds);
> > +
> > +    /* Extract the backend directory. */
> > +    path = libxl__strdup(gc, event_path);
> > +    p = strrchr(path, '/');
> > +    if (p == NULL)
> > +        goto skip;
> > +    if (strcmp(p, "/aerFailedSBDF") != 0)
> > +        goto skip;
> > +    /* Truncate the string so it points to the backend directory. */
> > +    *p = '\0';
> > +
> > +    /* Fetch the value of the failed PCI device. */
> > +    rc = libxl__xs_read_checked(gc, XBT_NULL,
> > +            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
> > +    if (rc || !aerFailedSBDF)
> > +        goto skip;
> > +    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &seg, &bus, &dev, &fn);
> > +
> > +    libxl_unreg_aer_events_handler(ctx, domid);
> 
> You need to be careful about calling a public API. In this case I think
> either calling libxl__ev_xswatch_deregister directly or having something
> like libxl__unregister_aer_events_handler(libxl__gc *gc, ...).

Will switch to using libxl__ev_xswatch_deregister().

> > +
> > +    dds->ao = nested_ao;
> > +    dds->domid = domid;
> > +    dds->callback = domain_destroy_callback;
> > +    libxl__domain_destroy(egc, dds);
> > +
> > +    rc = xc_hide_device(ctx->xch, seg << 16 | bus << 8 | dev << 3 | fn);
> 
> Please provide a helper function or macro to return sbdf form instead of
> open coding.

Will do.

> > +    if (rc)
> > +        LOGD(ERROR, domid, " xc_hide_device() failed, rc = %d", rc);
> > +
> > +    rwperm[0].id = 0;
> > +    rwperm[0].perms = XS_PERM_READ | XS_PERM_WRITE;
> > +    dst_path = GCSPRINTF("/local/domain/0/backend/pci/0/0/%s", "aerFailedPCIs");
> > +    rc = libxl__xs_mknod(gc, XBT_NULL, dst_path, rwperm, 1);
> > +    if (rc) {
> > +        LOGD(ERROR, domid, " libxl__xs_mknod() failed, rc = %d", rc);
> > +        goto skip;
> > +    }
> > +
> > +    rc = libxl__xs_write_checked(gc, XBT_NULL, dst_path, aerFailedSBDF);
> > +    if (rc)
> > +        LOGD(ERROR, domid, " libxl__xs_write_checked() failed, rc = %d", rc);
> > +
> > +skip:
> > +    return;
> > +}
> > +
> > +/* Handler of events for device driver domains */
> > +int libxl_reg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
> > +{
> > +    AO_CREATE(ctx, 0, 0);
> > +    int rc;
> > +    char *be_path;
> > +
> > +    /*
> > +     * We use absolute paths because we want xswatch to also return
> > +     * absolute paths that can be parsed by libxl__parse_backend_path.
> > +     */
> > +    aer_watch.ao = ao;
> > +    aer_watch.domid = domid;
> > +    be_path = GCSPRINTF("/local/domain/0/backend/pci/%u/0/aerFailedSBDF", domid);
> 
> Again, what about driver domain?

Please see [1] and [2] below:

> 
> > +    rc = libxl__ev_xswatch_register(gc, &aer_watch.watch,
> > +                                    aer_backend_watch_callback, be_path);
> > +    if (rc)
> > +        return AO_CREATE_FAIL(rc);
> > +
> > +    return AO_INPROGRESS;
> > +}
> > +
> > +/* Handler of events for device driver domains */
> > +void libxl_unreg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
> 
> unregister

Will get rid of this wrapper.

> > +{
> > +    GC_INIT(ctx);
> > +
> > +    libxl__ev_xswatch_deregister(gc, &aer_watch.watch);
> >
> 
> GC_FREE
> 
> > +    return;
> > +}
> > +
> >  static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
> >                                 libxl_domain_config *d_config,
> >                                 libxl__multidev *multidev)
> > 


[1] https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00552.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00555.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-05 19:38     ` Venu Busireddy
@ 2017-07-05 20:48       ` Konrad Rzeszutek Wilk
  2017-07-06  8:45       ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-07-05 20:48 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Daniel De Graaf

> > > --- a/xen/include/public/domctl.h
> > > +++ b/xen/include/public/domctl.h
> > > @@ -1222,6 +1222,9 @@ struct xen_domctl {
> > >  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
> > >  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> > >  #define XEN_DOMCTL_gdbsx_domstatus             1003
> > > +#define XEN_DOMCTL_hide_device                 2001
> > > +#define XEN_DOMCTL_unhide_device               2002
> > > +#define XEN_DOMCTL_test_hidden_device          2003
> > 
> > Why these strange numbers?
> 
> I saw the numbers jump from 79 to 1000 thru 1003, and likewise used
> different starting numbers. Would you prefer 80 thru 82, or 1004 thru

1000 are got gdbsx and the 2000 are reserved for OEM vendors.

So use 80 range.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-05 19:38     ` Venu Busireddy
  2017-07-05 20:48       ` Konrad Rzeszutek Wilk
@ 2017-07-06  8:45       ` Jan Beulich
  2017-07-07 11:00         ` Wei Liu
  2017-07-07 18:11         ` Venu Busireddy
  1 sibling, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2017-07-06  8:45 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Daniel De Graaf

>>> On 05.07.17 at 21:38, <venu.busireddy@oracle.com> wrote:
> On 2017-07-04 09:46:58 -0600, Jan Beulich wrote:
>> >>> On 27.06.17 at 19:14, <venu.busireddy@oracle.com> wrote:
>> 
>> First of all, please Cc all maintainers of code you modify.
> 
> I was using the names spit out by the scripts/get_maintainer.pl script
> for the patch file. I didn't know that the script had a "-f" option, and
> without it, the script spits out only two names, which I included. I now
> have Cc'ed all the names that the "-f" option produced. Interestingly,
> Daniel's name is not in the "-f" output, and hence, I am still confused
> what the correct list is!

I can't talk about the script, except that it is known to have
limitations. Generally, changes to the public interface should be
Cc-ed to all REST maintainers.

>> > Add support for hiding and unhiding (by introducing two new hypercall
>> > subops) pci devices that trigger AER fatal errors while assigned to
>> > guests in passthrough mode. Hiding of the device is done by assigning
>> > it to dom_xen dummy domain.
>> 
>> Would you mind explaining why simply de-assigning the device
>> (with an existing operation) isn't suitable here? (This explanation
>> would presumably belong either in the description here or in the
>> cover letter.)
> 
> My initial thinking (for the first revision) was that the guest and
> the device together are party to the evil things, and hence the guest
> should be killed. But I agree that unassigning the device should be
> sufficient. Once the device is removed, the guest can't do much that
> any other guest can't. Therefore, I will change this patchset to simply
> unassign the device from the guest.
> 
> Is that acceptable?

I think so, but I may be missing parts of your reasoning as to why
hiding the device may be a good thing.

>> > @@ -1354,6 +1367,22 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>> >      struct pci_dev *pdev;
>> >      int rc = 0;
>> >  
>> > +    if ( device_hidden(seg, bus, devfn) )
>> > +        return -EINVAL;
>> > +
>> > +    if ( d == dom_xen )
>> > +    {
>> > +        pdev = pci_get_pdev(seg, bus, devfn);
>> > +        if ( pdev )
>> > +        {
>> > +            list_move(&pdev->domain_list, &dom_xen->arch.pdev_list);
>> > +            pdev->domain = dom_xen;
>> > +            return rc;
>> > +        }
>> > +        else
>> > +            return -ENODEV;
>> > +    }
>> > +
>> >      if ( !iommu_enabled || !hd->platform_ops )
>> >          return 0;
>> 
>> Your addition appears to be misplaced (would belong below the
> 
> Will change, if this code stays...
> 
>> checks seen above). Additionally you fail to acquire the pcidevs
> 
> I am acquiring the lock in iommu_do_pci_domctl() in the case
> "XEN_DOMCTL_hide_device." Is that not sufficient?

Oh, I did overlook this further asymmetry to
XEN_DOMCTL_assign_device handling.

>> > --- a/xen/include/public/domctl.h
>> > +++ b/xen/include/public/domctl.h
>> > @@ -1222,6 +1222,9 @@ struct xen_domctl {
>> >  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>> >  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>> >  #define XEN_DOMCTL_gdbsx_domstatus             1003
>> > +#define XEN_DOMCTL_hide_device                 2001
>> > +#define XEN_DOMCTL_unhide_device               2002
>> > +#define XEN_DOMCTL_test_hidden_device          2003
>> 
>> Why these strange numbers?
> 
> I saw the numbers jump from 79 to 1000 thru 1003, and likewise used
> different starting numbers. Would you prefer 80 thru 82, or 1004 thru
> 1006? Of course, depends on whether we support the hide/unhide operations.

The gdbsx ones were chosen this way long ago, perhaps to have
them out of the way from all "normal" ones.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-07-05 19:52     ` Venu Busireddy
@ 2017-07-07 10:56       ` Wei Liu
  2017-07-07 14:00         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-07-07 10:56 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich

On Wed, Jul 05, 2017 at 02:52:41PM -0500, Venu Busireddy wrote:
> > [...]
> > > diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
> > > index 89c2b25..10a48a9 100644
> > > --- a/tools/xl/xl_vmcontrol.c
> > > +++ b/tools/xl/xl_vmcontrol.c
> > > @@ -966,6 +966,15 @@ start:
> > >      LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
> > >          d_config.c_info.name, domid, (long)getpid());
> > >  
> > > +    ret = libxl_reg_aer_events_handler(ctx, domid);
> > > +    if (ret) {
> > > +        /*
> > > +         * This error may not be severe enough to fail the creation of the VM.
> > > +         * Log the error, and continue with the creation.
> > > +         */
> > > +        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
> > > +    }
> > > +
> > 
> > First thing this suggests the ordering of this patch series is wrong --
> > you need to put the patch that implements the new function before this.
> 
> I will change the order in the next revision.
> 
> > The other thing you need to be aware is that if the user chooses to not
> > use a daemonised xl, he / she doesn't get a chance to handle these
> > events.
> > 
> > This is potentially problematic for driver domains. You probably want to
> > also modify xl devd command. Also on the subject, what's your thought on
> > driver domain? I'm not sure if a driver domain has the permission to
> > kill the guest.
> 
> I don't know if I understood your question correctly, but it is not the
> driver domain that is killing another guest. It is Dom0 that is killing
> the guest to which the device is assigned in passthrough mode. That guest
> should still be killable by Dom0, even if it is a driver domain. Right?

OK. I'm not sure my understanding of how PCI passthrough works is
correct, so please correct me if I'm wrong.

First, let's split the two concepts: toolstack domain and driver domain.
They are mostly the same one (Dom0), but they don't have to.

A driver domain drives the underlying hardware and provides virtualised
devices to a DomU.

AIUI (again, I could be very wrong about this):

1. PV PCI passthrough is done via pciback, which means the physical
   device is assigned to the driver domain. All events to / from the
   guest / device are handled by the driver domain -- which includes
   the AER error you're trying to handle.

2. HVM PCI passthrough is done via QEMU, but you also need to pre-assign
   the device to the driver domain in which QEMU runs. All events are only
   visible to the driver domain.

Yes, a guest is going to be always killable by Dom0 (the toolstack
domain), even if some devices of the guest are handled by a driver
domain.

But Dom0 now can't see the AER event so it won't be able to issue the
"kill" or whatever action you want it to do. Is this not the case? Do
you expect the event to be always delivered to Dom0?

> 
> However, I have been asked by Jan Beulich (and many others) on the
> need to kill the guest, and why the device can't be unassigned from
> that guest! My initial thinking (for the first revision) was that the
> guest and the device together are party to evil things, and hence the
> guest should be killed. But I agree that unassigning the device should
> be sufficient. Once the device is removed, the guest can't do much that
> any other guest can't. Therefore, I plan to change this patchset to
> simply unassign the device from the guest. This aspect is also covered
> in the thread:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00552.html
> 
> May I request you review that thread and post your thoughts?
> 

Sure. But that's orthogonal to the problem we have here. I will reply to
that thread.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-06  8:45       ` Jan Beulich
@ 2017-07-07 11:00         ` Wei Liu
  2017-07-07 18:22           ` Venu Busireddy
  2017-07-07 18:11         ` Venu Busireddy
  1 sibling, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-07-07 11:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Venu Busireddy, Daniel De Graaf

On Thu, Jul 06, 2017 at 02:45:18AM -0600, Jan Beulich wrote:
> >>> On 05.07.17 at 21:38, <venu.busireddy@oracle.com> wrote:
> > On 2017-07-04 09:46:58 -0600, Jan Beulich wrote:
> >> >>> On 27.06.17 at 19:14, <venu.busireddy@oracle.com> wrote:
> >> 
> >> First of all, please Cc all maintainers of code you modify.
> > 
> > I was using the names spit out by the scripts/get_maintainer.pl script
> > for the patch file. I didn't know that the script had a "-f" option, and
> > without it, the script spits out only two names, which I included. I now
> > have Cc'ed all the names that the "-f" option produced. Interestingly,
> > Daniel's name is not in the "-f" output, and hence, I am still confused
> > what the correct list is!
> 
> I can't talk about the script, except that it is known to have
> limitations. Generally, changes to the public interface should be
> Cc-ed to all REST maintainers.
> 
> >> > Add support for hiding and unhiding (by introducing two new hypercall
> >> > subops) pci devices that trigger AER fatal errors while assigned to
> >> > guests in passthrough mode. Hiding of the device is done by assigning
> >> > it to dom_xen dummy domain.
> >> 
> >> Would you mind explaining why simply de-assigning the device
> >> (with an existing operation) isn't suitable here? (This explanation
> >> would presumably belong either in the description here or in the
> >> cover letter.)
> > 
> > My initial thinking (for the first revision) was that the guest and
> > the device together are party to the evil things, and hence the guest
> > should be killed. But I agree that unassigning the device should be
> > sufficient. Once the device is removed, the guest can't do much that
> > any other guest can't. Therefore, I will change this patchset to simply
> > unassign the device from the guest.
> > 
> > Is that acceptable?
> 
> I think so, but I may be missing parts of your reasoning as to why
> hiding the device may be a good thing.

My thought exactly.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-07-07 10:56       ` Wei Liu
@ 2017-07-07 14:00         ` Konrad Rzeszutek Wilk
  2017-07-18 13:38           ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-07-07 14:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: Tim Deegan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, xen-devel, Venu Busireddy, Jan Beulich

On Fri, Jul 07, 2017 at 11:56:43AM +0100, Wei Liu wrote:
> On Wed, Jul 05, 2017 at 02:52:41PM -0500, Venu Busireddy wrote:
> > > [...]
> > > > diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
> > > > index 89c2b25..10a48a9 100644
> > > > --- a/tools/xl/xl_vmcontrol.c
> > > > +++ b/tools/xl/xl_vmcontrol.c
> > > > @@ -966,6 +966,15 @@ start:
> > > >      LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
> > > >          d_config.c_info.name, domid, (long)getpid());
> > > >  
> > > > +    ret = libxl_reg_aer_events_handler(ctx, domid);
> > > > +    if (ret) {
> > > > +        /*
> > > > +         * This error may not be severe enough to fail the creation of the VM.
> > > > +         * Log the error, and continue with the creation.
> > > > +         */
> > > > +        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
> > > > +    }
> > > > +
> > > 
> > > First thing this suggests the ordering of this patch series is wrong --
> > > you need to put the patch that implements the new function before this.
> > 
> > I will change the order in the next revision.
> > 
> > > The other thing you need to be aware is that if the user chooses to not
> > > use a daemonised xl, he / she doesn't get a chance to handle these
> > > events.
> > > 
> > > This is potentially problematic for driver domains. You probably want to
> > > also modify xl devd command. Also on the subject, what's your thought on
> > > driver domain? I'm not sure if a driver domain has the permission to
> > > kill the guest.
> > 
> > I don't know if I understood your question correctly, but it is not the
> > driver domain that is killing another guest. It is Dom0 that is killing
> > the guest to which the device is assigned in passthrough mode. That guest
> > should still be killable by Dom0, even if it is a driver domain. Right?
> 
> OK. I'm not sure my understanding of how PCI passthrough works is
> correct, so please correct me if I'm wrong.
> 
> First, let's split the two concepts: toolstack domain and driver domain.
> They are mostly the same one (Dom0), but they don't have to.
> 
> A driver domain drives the underlying hardware and provides virtualised
> devices to a DomU.
> 
> AIUI (again, I could be very wrong about this):
> 
> 1. PV PCI passthrough is done via pciback, which means the physical
>    device is assigned to the driver domain. All events to / from the
>    guest / device are handled by the driver domain -- which includes
>    the AER error you're trying to handle.
> 
> 2. HVM PCI passthrough is done via QEMU, but you also need to pre-assign
>    the device to the driver domain in which QEMU runs. All events are only
>    visible to the driver domain.
> 
> Yes, a guest is going to be always killable by Dom0 (the toolstack
> domain), even if some devices of the guest are handled by a driver
> domain.
> 
> But Dom0 now can't see the AER event so it won't be able to issue the
> "kill" or whatever action you want it to do. Is this not the case? Do

It can. That is how it works right now - the AER errors are sent to the
PCIe bridge which is a device driver in domain0. Then the kernel
sends it to pciback (which owns the device) to deal with.

> you expect the event to be always delivered to Dom0?

Yes.
> 
> > 
> > However, I have been asked by Jan Beulich (and many others) on the
> > need to kill the guest, and why the device can't be unassigned from
> > that guest! My initial thinking (for the first revision) was that the
> > guest and the device together are party to evil things, and hence the
> > guest should be killed. But I agree that unassigning the device should
> > be sufficient. Once the device is removed, the guest can't do much that
> > any other guest can't. Therefore, I plan to change this patchset to
> > simply unassign the device from the guest. This aspect is also covered
> > in the thread:
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00552.html
> > 
> > May I request you review that thread and post your thoughts?
> > 
> 
> Sure. But that's orthogonal to the problem we have here. I will reply to
> that thread.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-06  8:45       ` Jan Beulich
  2017-07-07 11:00         ` Wei Liu
@ 2017-07-07 18:11         ` Venu Busireddy
  2017-07-10  7:52           ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Venu Busireddy @ 2017-07-07 18:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Daniel De Graaf

On 2017-07-06 02:45:18 -0600, Jan Beulich wrote:
> >>> On 05.07.17 at 21:38, <venu.busireddy@oracle.com> wrote:
> > On 2017-07-04 09:46:58 -0600, Jan Beulich wrote:
> >> >>> On 27.06.17 at 19:14, <venu.busireddy@oracle.com> wrote:
> >> 
> >> First of all, please Cc all maintainers of code you modify.
> > 
> > I was using the names spit out by the scripts/get_maintainer.pl script
> > for the patch file. I didn't know that the script had a "-f" option, and
> > without it, the script spits out only two names, which I included. I now
> > have Cc'ed all the names that the "-f" option produced. Interestingly,
> > Daniel's name is not in the "-f" output, and hence, I am still confused
> > what the correct list is!
> 
> I can't talk about the script, except that it is known to have
> limitations. Generally, changes to the public interface should be
> Cc-ed to all REST maintainers.

Understood. I assume that when in doubt, it is better to send to REST
maintainers than fewer people? I mean, is it better to send to more
people than omit some?

> >> > Add support for hiding and unhiding (by introducing two new hypercall
> >> > subops) pci devices that trigger AER fatal errors while assigned to
> >> > guests in passthrough mode. Hiding of the device is done by assigning
> >> > it to dom_xen dummy domain.
> >> 
> >> Would you mind explaining why simply de-assigning the device
> >> (with an existing operation) isn't suitable here? (This explanation
> >> would presumably belong either in the description here or in the
> >> cover letter.)
> > 
> > My initial thinking (for the first revision) was that the guest and
> > the device together are party to the evil things, and hence the guest
> > should be killed. But I agree that unassigning the device should be
> > sufficient. Once the device is removed, the guest can't do much that
> > any other guest can't. Therefore, I will change this patchset to simply
> > unassign the device from the guest.
> > 
> > Is that acceptable?
> 
> I think so, but I may be missing parts of your reasoning as to why
> hiding the device may be a good thing.

Here is the rationale behind hiding the erring device.

If a device is misbehaving, one of the following two things could be
happening:

a) The error is caused by the misconfiguration of the guest driver or
   the firmware. This may not be a big problem.

b) The error is caused by the owner of the domain re-flashing the firmware
   of the device and inserting a rogue firmware. This is a big problem.

And the problem is that we can't differentiate between a) and b).

If it is case b), then we certainly need to investigate and make sure
that the firmware is the correct version and/or reload a new firmware to
over-write the old one (just to be safe). Either way, the device needs to
be unassignable until the root cause is investigated. Hiding the device
is the safest way to ensure that the device is unassignable. Otherwise,
the administrator may inadvertently reboot the domain to which the
device was assigned, or, the domain itself may reboot upon errors, and in
either case, the device gets reassigned to the domain upon reboot! Hiding
the device prevents this.

However, if you think that all of this is too much paranoia, I am fine
with not hiding the device, and we simply de-assign the device from the
domain. I leave the decision to you.

[snip]
> >> > --- a/xen/include/public/domctl.h
> >> > +++ b/xen/include/public/domctl.h
> >> > @@ -1222,6 +1222,9 @@ struct xen_domctl {
> >> >  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
> >> >  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> >> >  #define XEN_DOMCTL_gdbsx_domstatus             1003
> >> > +#define XEN_DOMCTL_hide_device                 2001
> >> > +#define XEN_DOMCTL_unhide_device               2002
> >> > +#define XEN_DOMCTL_test_hidden_device          2003
> >> 
> >> Why these strange numbers?
> > 
> > I saw the numbers jump from 79 to 1000 thru 1003, and likewise used
> > different starting numbers. Would you prefer 80 thru 82, or 1004 thru
> > 1006? Of course, depends on whether we support the hide/unhide operations.
> 
> The gdbsx ones were chosen this way long ago, perhaps to have
> them out of the way from all "normal" ones.

As also suggested by Konrad, I will use 80 thru 82.

Venu
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-07 11:00         ` Wei Liu
@ 2017-07-07 18:22           ` Venu Busireddy
  0 siblings, 0 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-07-07 18:22 UTC (permalink / raw)
  To: Wei Liu
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Daniel De Graaf

On 2017-07-07 12:00:26 +0100, Wei Liu wrote:
> On Thu, Jul 06, 2017 at 02:45:18AM -0600, Jan Beulich wrote:
> > >>> On 05.07.17 at 21:38, <venu.busireddy@oracle.com> wrote:
> > > On 2017-07-04 09:46:58 -0600, Jan Beulich wrote:
> > >> >>> On 27.06.17 at 19:14, <venu.busireddy@oracle.com> wrote:
> > >> 
> > >> First of all, please Cc all maintainers of code you modify.
> > > 
> > > I was using the names spit out by the scripts/get_maintainer.pl script
> > > for the patch file. I didn't know that the script had a "-f" option, and
> > > without it, the script spits out only two names, which I included. I now
> > > have Cc'ed all the names that the "-f" option produced. Interestingly,
> > > Daniel's name is not in the "-f" output, and hence, I am still confused
> > > what the correct list is!
> > 
> > I can't talk about the script, except that it is known to have
> > limitations. Generally, changes to the public interface should be
> > Cc-ed to all REST maintainers.
> > 
> > >> > Add support for hiding and unhiding (by introducing two new hypercall
> > >> > subops) pci devices that trigger AER fatal errors while assigned to
> > >> > guests in passthrough mode. Hiding of the device is done by assigning
> > >> > it to dom_xen dummy domain.
> > >> 
> > >> Would you mind explaining why simply de-assigning the device
> > >> (with an existing operation) isn't suitable here? (This explanation
> > >> would presumably belong either in the description here or in the
> > >> cover letter.)
> > > 
> > > My initial thinking (for the first revision) was that the guest and
> > > the device together are party to the evil things, and hence the guest
> > > should be killed. But I agree that unassigning the device should be
> > > sufficient. Once the device is removed, the guest can't do much that
> > > any other guest can't. Therefore, I will change this patchset to simply
> > > unassign the device from the guest.
> > > 
> > > Is that acceptable?
> > 
> > I think so, but I may be missing parts of your reasoning as to why
> > hiding the device may be a good thing.
> 
> My thought exactly.

Answered this in
  https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00925.html
because there were some aditional questions answered in that thread.

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-07 18:11         ` Venu Busireddy
@ 2017-07-10  7:52           ` Jan Beulich
  2017-07-10 16:58             ` Venu Busireddy
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-07-10  7:52 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Daniel De Graaf

>>> On 07.07.17 at 20:11, <venu.busireddy@oracle.com> wrote:
> On 2017-07-06 02:45:18 -0600, Jan Beulich wrote:
>> I think so, but I may be missing parts of your reasoning as to why
>> hiding the device may be a good thing.
> 
> Here is the rationale behind hiding the erring device.
> 
> If a device is misbehaving, one of the following two things could be
> happening:
> 
> a) The error is caused by the misconfiguration of the guest driver or
>    the firmware. This may not be a big problem.
> 
> b) The error is caused by the owner of the domain re-flashing the firmware
>    of the device and inserting a rogue firmware. This is a big problem.
> 
> And the problem is that we can't differentiate between a) and b).
> 
> If it is case b), then we certainly need to investigate and make sure
> that the firmware is the correct version and/or reload a new firmware to
> over-write the old one (just to be safe). Either way, the device needs to
> be unassignable until the root cause is investigated. Hiding the device
> is the safest way to ensure that the device is unassignable. Otherwise,
> the administrator may inadvertently reboot the domain to which the
> device was assigned, or, the domain itself may reboot upon errors, and in
> either case, the device gets reassigned to the domain upon reboot! Hiding
> the device prevents this.
> 
> However, if you think that all of this is too much paranoia, I am fine
> with not hiding the device, and we simply de-assign the device from the
> domain. I leave the decision to you.

Well, what if the firmware being installed is rogue, but doesn't cause
behavior that would result in us noticing right away? Passing through
non-SR-IOV devices isn't entirely secure anyway, and I don't think
SR-IOV VFs would permit firmware updates (I'd expect that to be
possible via the PF only). So I'm afraid hiding the devices won't buy
us much.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices
  2017-07-10  7:52           ` Jan Beulich
@ 2017-07-10 16:58             ` Venu Busireddy
  0 siblings, 0 replies; 25+ messages in thread
From: Venu Busireddy @ 2017-07-10 16:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Tim Deegan, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Daniel De Graaf

On 2017-07-10 01:52:27 -0600, Jan Beulich wrote:
> >>> On 07.07.17 at 20:11, <venu.busireddy@oracle.com> wrote:
> > On 2017-07-06 02:45:18 -0600, Jan Beulich wrote:
> >> I think so, but I may be missing parts of your reasoning as to why
> >> hiding the device may be a good thing.
> > 
> > Here is the rationale behind hiding the erring device.
> > 
> > If a device is misbehaving, one of the following two things could be
> > happening:
> > 
> > a) The error is caused by the misconfiguration of the guest driver or
> >    the firmware. This may not be a big problem.
> > 
> > b) The error is caused by the owner of the domain re-flashing the firmware
> >    of the device and inserting a rogue firmware. This is a big problem.
> > 
> > And the problem is that we can't differentiate between a) and b).
> > 
> > If it is case b), then we certainly need to investigate and make sure
> > that the firmware is the correct version and/or reload a new firmware to
> > over-write the old one (just to be safe). Either way, the device needs to
> > be unassignable until the root cause is investigated. Hiding the device
> > is the safest way to ensure that the device is unassignable. Otherwise,
> > the administrator may inadvertently reboot the domain to which the
> > device was assigned, or, the domain itself may reboot upon errors, and in
> > either case, the device gets reassigned to the domain upon reboot! Hiding
> > the device prevents this.
> > 
> > However, if you think that all of this is too much paranoia, I am fine
> > with not hiding the device, and we simply de-assign the device from the
> > domain. I leave the decision to you.
> 
> Well, what if the firmware being installed is rogue, but doesn't cause
> behavior that would result in us noticing right away? Passing through
> non-SR-IOV devices isn't entirely secure anyway, and I don't think
> SR-IOV VFs would permit firmware updates (I'd expect that to be
> possible via the PF only). So I'm afraid hiding the devices won't buy
> us much.

Okay. In a week, I will send v2 of this patch without hiding the device,
unless we hear form others within that time-frame with other thoughts
that change the approach.

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-07-07 14:00         ` Konrad Rzeszutek Wilk
@ 2017-07-18 13:38           ` Wei Liu
  2017-07-18 15:38             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-07-18 13:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Venu Busireddy,
	Jan Beulich

On Fri, Jul 07, 2017 at 10:00:01AM -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > 1. PV PCI passthrough is done via pciback, which means the physical
> >    device is assigned to the driver domain. All events to / from the
> >    guest / device are handled by the driver domain -- which includes
> >    the AER error you're trying to handle.
> > 
> > 2. HVM PCI passthrough is done via QEMU, but you also need to pre-assign
> >    the device to the driver domain in which QEMU runs. All events are only
> >    visible to the driver domain.
> > 
> > Yes, a guest is going to be always killable by Dom0 (the toolstack
> > domain), even if some devices of the guest are handled by a driver
> > domain.
> > 
> > But Dom0 now can't see the AER event so it won't be able to issue the
> > "kill" or whatever action you want it to do. Is this not the case? Do
> 
> It can. That is how it works right now - the AER errors are sent to the
> PCIe bridge which is a device driver in domain0. Then the kernel
> sends it to pciback (which owns the device) to deal with.
> 

So pciback will have to run in Dom0?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] xl: Add commands for hiding and unhiding pcie passthrough devices
  2017-07-18 13:38           ` Wei Liu
@ 2017-07-18 15:38             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-07-18 15:38 UTC (permalink / raw)
  To: Wei Liu
  Cc: Tim Deegan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, xen-devel, Venu Busireddy, Jan Beulich

On Tue, Jul 18, 2017 at 02:38:14PM +0100, Wei Liu wrote:
> On Fri, Jul 07, 2017 at 10:00:01AM -0400, Konrad Rzeszutek Wilk wrote:
> > > 
> > > 1. PV PCI passthrough is done via pciback, which means the physical
> > >    device is assigned to the driver domain. All events to / from the
> > >    guest / device are handled by the driver domain -- which includes
> > >    the AER error you're trying to handle.
> > > 
> > > 2. HVM PCI passthrough is done via QEMU, but you also need to pre-assign
> > >    the device to the driver domain in which QEMU runs. All events are only
> > >    visible to the driver domain.
> > > 
> > > Yes, a guest is going to be always killable by Dom0 (the toolstack
> > > domain), even if some devices of the guest are handled by a driver
> > > domain.
> > > 
> > > But Dom0 now can't see the AER event so it won't be able to issue the
> > > "kill" or whatever action you want it to do. Is this not the case? Do
> > 
> > It can. That is how it works right now - the AER errors are sent to the
> > PCIe bridge which is a device driver in domain0. Then the kernel
> > sends it to pciback (which owns the device) to deal with.
> > 
> 
> So pciback will have to run in Dom0?

You need a kernel piece to deal with AER (as it is an normal interrupt to
the PCIe bridge and the PCIe bridge picks this up and does its thing).

The AER mechanism (in Linux) then walks through all the devices underneath
the PCIe bridge and gives it options to do things - and in case of pciback
we can do the right thing.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-07-18 15:38 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-27 17:14 [PATCH 0/6] AER unrecoverable error containment Venu Busireddy
2017-06-27 17:14 ` [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices Venu Busireddy
2017-07-04 15:46   ` Jan Beulich
2017-07-05 19:38     ` Venu Busireddy
2017-07-05 20:48       ` Konrad Rzeszutek Wilk
2017-07-06  8:45       ` Jan Beulich
2017-07-07 11:00         ` Wei Liu
2017-07-07 18:22           ` Venu Busireddy
2017-07-07 18:11         ` Venu Busireddy
2017-07-10  7:52           ` Jan Beulich
2017-07-10 16:58             ` Venu Busireddy
2017-06-27 17:14 ` [PATCH 2/6] xl: Add commands " Venu Busireddy
2017-06-30 10:18   ` Wei Liu
2017-07-05 19:52     ` Venu Busireddy
2017-07-07 10:56       ` Wei Liu
2017-07-07 14:00         ` Konrad Rzeszutek Wilk
2017-07-18 13:38           ` Wei Liu
2017-07-18 15:38             ` Konrad Rzeszutek Wilk
2017-06-27 17:14 ` [PATCH 3/6] libxc: Add wrappers for new commands Venu Busireddy
2017-06-29 17:52   ` Wei Liu
2017-06-27 17:14 ` [PATCH 4/6] libxl: Add wrappers for new commands and add AER error handler Venu Busireddy
2017-06-30 10:18   ` Wei Liu
2017-07-05 20:06     ` Venu Busireddy
2017-06-27 17:14 ` [PATCH 5/6] tools/python/xc: Update pyxc_methods with new commands Venu Busireddy
2017-06-27 17:14 ` [PATCH 6/6] docs: Document the " Venu Busireddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).